Tag Archives: Programming

Video Blog: Developing and Editing Julia Packages

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/video-blog-developing-editing-julia-packages/

Google Summer of Code is starting up, so I thought it would be a good time to share my workflow for developing my own Julia packages, as well as my workflow for contributing to other Julia packages. This does not assume familiarity with commandline Git, and instead shows you how to use a GUI (GitKraken) to make branches and PRs, as well as reviewing and merging code. You can think of it as an update to my old blog post on package development in Julia. However, this is not only updated but also improved since I am now able to walk through the “non-code” parts of package developing (such as setting up AppVeyor and code coverage).

Enjoy! (I quite like this video blog format: it was a lot less work)

The post Video Blog: Developing and Editing Julia Packages appeared first on Stochastic Lifestyle.

Using pipes while running external programs in Julia

By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=340

Recently I was using Julia to run ffprobe to get the length of a video file. The trouble was the ffprobe was dumping its output to stderr and I wanted to take that output and run it through grep. From a bash shell one would typically run:

ffprobe somefile.mkv 2>&1 |grep Duration

This would result in an output like

 Duration: 00:04:44.94, start: 0.000000, bitrate: 128 kb/s

This works because we used 2>&1 to redirect stderr to stdout which would in be piped to grep.

If you were try to run this in Julia

julia> run(`ffprobe somefile.mkv 2>&1 |grep Duration`)

you will get errors. Julia does not like pipes | inside the backticks command (for very sensible reasons). Instead you should be using Julia’s pipeline command. Also the redirection 2>&1 will not work. So instead, the best thing to use is and instance of Pipe. This was not in the manual. I stumbled upon it in an issue discussion on GitHub. So a good why to do what I am after is to run.

julia> p=Pipe()
Pipe(uninit => uninit, 0 bytes waiting)
 
julia> run(pipeline(`ffprobe -i  somefile.mkv`,stderr=p))

This would create a pipe object p that is then used to capture stderr after the execution of the command. Next we need to close the input end of the pipe.

julia> close(p.in)

Finally we can use the pipe with grep to filter the output.

julia> readstring(pipeline(p,`grep Duration`))
"  Duration: 00:04:44.94, start: 0.000000, bitrate: 128 kb/s\n"

We can then do a little regex magic to get the duration we are after.

julia> matchall(r"(\d{2}:\d{2}:\d{2}.\d{2})",ans)[1]
"00:04:44.94"

HDF5 in Julia

By: Metals, Magnets, and Miscellaneous Materials

Re-posted from: http://albi3ro.github.io/M4/programming/HDF5.html

So, last summer, my program was producing three dimensional data, and I needed a way to export and save that data from my C++ program. Simple ASCII files, my default method, no longer covered my needs. Of course, I wasn’t the first person to encounter this problem, so I discovered the HDF5 standard.

Instead of storing data in a human readable format like ASCII, the Hierarchical Data Format, HDF, stores data in binary format. This preserves the shape of the data in the computer and keeps it at its minimum size. WOHOO!!

Sadly, the syntax for HDF5 in C++ and Fortran is just as bad as FFTW or OpenBLAS. But happily, just like FFTW and OpenBLAS, HDF5 has wonderful syntax in Julia, Python, and julia, among others.

So how does it work?

We don’t just print a single variable. Each HDF5 file is like its own file system. In my home directory, I have my documents folder, my programming folder, my pictures, configuration files,… and inside each folder I can have subfolders or files.

The same is true for an HDF5 file. We have the root, and then we have groups and subgroups. A group is like a folder. Then we can have datasets. Datasets are objects that hold data (files).

Installing the Package

While running Pkg.add("HDF5"); should hopefully add the HDF5 library, additional steps may be required. I remember having a horrible time with the HDF installation when using C++ a year ago. If at all possible, just use a package manager, and do not try and install it from source! See the HDF5.jl or HDFGroup pages for details.

using HDF5;

Hello World

Firstly, lets open a file and then write some data to it.

We can open a file in three ways:

Symbol Meaning
“w” Write. Will overwrite anything already there.
“r” Ready-only.
“r+” Read-write. Preserving existing contents.

If we open with this syntax, we have to always remember to close it with close()

fid=h5open("test.h5","w")

fid["string"]="Hello World"

close(fid)

Now lets see if we were successful by reading. Instead of reading the dataset, we are going to checkout the structure of the file first.

names(fid) tells us what is inside the location fid.

dump(fid) is much more in depth, exploring everything below fid. If we had a bunch of subdirectories, it would go down each one to see what was there.

Both these functions help you find your way around a file.

fid=h5open("test.h5","r")

println("names \n",names(fid))

println("\n dump")
println(dump(fid))

close(fid)
names
Union{ASCIIString,UTF8String}["string"]

 dump
HDF5.HDF5File len 1
  string: HDF5Dataset () : Hello World
nothing

Reading Data

Now when we are reading data, we need to know the difference between dataset and the data the dataset contains.

Look at the below example

fid=h5open("test.h5","r")

dset=fid["string"]
println("the dataset: \t", typeof(dset))

data=read(dset)
println("the string: \t", typeof(data),"\t",data)

data2=read(fid,"string")
println("read another way: \t", typeof(data2),"\t",data2)

close(fid)
the dataset: 	HDF5.HDF5Dataset
the string: 	ASCIIString	Hello World
read another way: 	ASCIIString	Hello World

A dataset is like the filename “fairytale.txt”, so we then need to read the file to get “Once upon a time …”.

Groups

I’ve talked about groups, but we haven’t done anything with them yet. Let’s make some!

Here we use g_create to create two groups, one inside the other. For the subgroup, it’s parent is g, so we have to create it at location g. Just like in a filesystem, it’s name/ path is nested within its parent’s path.

fid=h5open("test.h5","w")

g=g_create(fid,"mygroup");
h=g_create(g,"mysubgroup");

println(dump(fid))

println("\n path of h:  ",name(h))

close(fid)
HDF5.HDF5File len 1
  mygroup: HDF5.HDF5Group len 1
    mysubgroup: HDF5.HDF5Group len 0
nothing

 path of h:  /mygroup/mysubgroup

Attributes

Say in a file I want to include the information that I ran the simulation with 100 sites, at 1 Kelvin, for 100,000 timesteps. Instead of creating new datasets for each of these individual numbers, I can create attributes and tie them to either a group or a dataset.

fid=h5open("test.h5","w")

fid["data"]=randn(3,3);

attrs(fid["data"])["Temp"]="1";
attrs(fid["data"])["N Sites"]="100";

close(fid)

fid=h5open("test.h5","r")

dset=fid["data"];

println("typeof attrs: \t", typeof(attrs(dset)))
println("Temp: \t",read( attrs(dset),"Temp"  ))
println("N Sites: \t",read(  attrs(dset),"N Sites"  ))

close(fid)
typeof attrs: 	HDF5.HDF5Attributes
Temp: 	1
N Sites: 	100

Final Tips

Before diving in to learn how to use this, think about whether you need it or not. How large and complex is your data? Is it worth the time to learn? While the syntax might be relatively simple in Julia, ASCII files are still much easier to deal with.

If you are going to play around or use this format, I recommend getting an HDF viewer, like HDFViewer. While you can have much more control via code, sometimes it is just that much simpler to check everything is working with a GUI.

For more information, checkout the Package page at HDF5.jl or the HDFGroup page at HDFGroup

I’ve shown some of the basic functionality in simple test cases. If you want more control, you might just have to work a bit for it.