A lot of Julia’s functionality is implemented as add on packages (or “modules”). An extensive (though possibly not exhaustive) list of available packages can be found at http://pkg.julialang.org/. If you browse through that list I can guarantee that you will find a number of packages that pique your curiosity. How to install them? Read on.
Package management is handled via Pkg. Pkg.dir() will tell you where the installed packages are stored on your file system. Before installing any new packages, always call Pkg.update() to update your local metadata and repository (it will update any installed packages to the their most recent version).
Adding a Package
Installing a new package is done with Pkg.add(). Any dependencies are handled automatically during the install process.
julia> Pkg.add("VennEuler")
INFO: Cloning cache of VennEuler from git://github.com/HarlanH/VennEuler.jl.git
INFO: Installing VennEuler v0.0.1
INFO: Building NLopt
INFO: Building Cairo
INFO: Package database updated
Pkg.available() generates a complete list of all available packages while Pkg.installed() or Pkg.status() can be used to find the versions of installed packages.
Pkg.pin() will fix a package at a specific version (no updates will be applied). Pkg.free() releases the effects of Pkg.pin().
Package Contents
The using directive loads the functions exported by a package into the global namespace. You can get a view of the capabilities of a package by typing its name followed by a period and then hitting the Tab key. Alternatively, names() will give a list of symbols exported by a package.
The package manager provides a host of other functionality which you can read about here. Check out the videos below to find out more about Julia’s package ecosystem. From tomorrow I’ll start looking at specific packages. To get yourself prepared for that, why not go ahead and install the following packages: Cpp, PyCall, DataArrays, DataFrames and RCall.
As opposed to many other languages, where parallel computing is bolted on as an afterthought, Julia was designed from the start with parallel computing in mind. It has a number of native features which lend themselves to efficient implementation of parallel algorithms. It also has packages which facilitate cluster computing (using MPI, for example). We won’t be looking at those, but focusing instead on coroutines, generic parallel processing and parallel loops.
Coroutines
Coroutines are not strictly parallel processing (in the sense of “many tasks running at the same time”) but they provide a lightweight mechanism for having multiple tasks defined (if not active) at once. According to Donald Knuth, coroutines are generalised subroutines (with which we are probably all familiar).
Under these conditions each module may be made into a coroutine; that is, it may be coded as an autonomous program which communicates with adjacent modules as if they were input or output subroutines. Thus, coroutines are subroutines all at the same level, each acting as if it were the master program when in fact there is no master program. There is no bound placed by this definition on the number of inputs and outputs a coroutine may have.
Conway, Design of a Separable Transition-Diagram Compiler, 1963.
Coroutines are implemented using produce() and consume(). In a moment you’ll see why those names are appropriate. To illustrate we’ll define a function which generates elements from the Lucas sequence. For reference, the first few terms in the sequence are 2, 1, 3, 4, 7, … If you know about Python’s generators then you’ll find the code below rather familiar.
julia> function lucas_producer(n)
a, b = (2, 1)
for i = 1:n
produce(a)
a, b = (b, a + b)
end
end
lucas_producer (generic function with 1 method)
This function is then wrapped in a Task, which has state :runnable.
Now we’re ready to start consuming data from the Task. Data elements can be retrieved individually or via a loop (in which case the Task acts like an iterable object and no consume() is required).
julia> consume(lucas_task)
2
julia> consume(lucas_task)
1
julia> consume(lucas_task)
3
julia> for n in lucas_task
println(n)
end
4
7
11
18
29
47
76
Between invocations the Task is effectively asleep. The task temporarily springs to life every time data is requested, before becoming dormant once more.
It’s possible to simultaneously set up an arbitrary number of coroutine tasks.
Parallel Processing
Coroutines don’t really feel like “parallel” processing because they are not working simultaneously. However it’s rather straightforward to get Julia to metaphorically juggle many balls at once. The first thing that you’ll need to do is launch the interpreter with multiple worker processes.
$ julia -p 4
There’s always one more process than specified on the command line (we specified the number of worker processes; add one for the master process).
julia> nprocs()
5
julia> workers() # Identifiers for the worker processes.
4-element Array{Int64,1}:
2
3
4
5
We can launch a job on one of the workers using remotecall().
@spawn and @spawnat are macros which launch jobs on individual workers. The @everywhere macro executes code across all processes (including the master).
julia> @everywhere p = 5
julia> @everywhere println(@sprintf("ID %d: %f %d", myid(), rand(), p))
ID 1: 0.686332 5
From worker 4: ID 4: 0.107924 5
From worker 5: ID 5: 0.136019 5
From worker 2: ID 2: 0.145561 5
From worker 3: ID 3: 0.670885 5
Parallel Loop and Map
To illustrate how easy it is to set up parallel loops, let’s first consider a simple serial implementation of a Monte Carlo technique to estimate π.
julia> function findpi(n)
inside = 0
for i = 1:n
x, y = rand(2)
if (x^2 + y^2 <= 1)
inside +=1
end
end
4 * inside / n
end
findpi (generic function with 1 method)
The quality of the result as well as the execution time (and memory consumption!) depend directly on the number of samples.
The parallel version is implemented using the @parallel macro, which takes a reduction operator (in this case +) as its first argument.
julia> function parallel_findpi(n)
inside = @parallel (+) for i = 1:n
x, y = rand(2)
x^2 + y^2 <= 1 ? 1 : 0
end
4 * inside / n
end
parallel_findpi (generic function with 1 method)
There is some significant overhead associated with setting up the parallel jobs, so that the parallel version actually performs worse for a small number of samples. But when you run sufficient samples the speedup becomes readily apparent.
For reference, these results were achieved with 4 worker processes on a DELL laptop with the following CPU:
root@propane: #lshw | grep product | head -n 1
product: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz
More information on parallel computing facilities in Julia can be found in the documentation. As usual the code for today’s Julia journey can be found on github.
Metaprogramming in Julia is a big topic and it’s covered extensively in both the official documentation as well as in the Introducing Julia wikibook. The idea behind metaprogramming is to write code which itself will either generate or change other code. There are two main features of the language which support this idea:
code representation (expressions and symbols) and
macros.
Code Representation
A symbol (data type Symbol) represents an unevaluated chunk of code. As such, symbols are a means to refer to a variable (or expression) itself rather than the value it contains.
julia> n = 5 # Assign to variable n.
5
julia> n # Refer to contents of variable n.
5
julia> typeof(n)
Int64
julia> :n # Refer to variable n itself using quote operator.
:n
julia> typeof(:n)
Symbol
julia> eval(:n)
5
julia> E = :(2x + y) # Unevaluated expression is also a symbol.
:(2x + y)
julia> typeof(E)
Expr
The quote operator, :, prevents the evaluation of its argument.
Expressions are made up of three parts: the operation (head), the arguments to that operation (args) and finally the return type from the expression (typ).
We can evaluate an expression using eval(). Not only does eval() return the result of the evaluated expression but it also applies any side effects from the expression (for example, variable assignment).
julia> x = 3; y = 5; eval(E)
11
julia> eval(:(x = 4))
4
julia> eval(E)
13
No real surprises there. But the true potential of all this lies in the fact that the code itself has an internal representation which can be manipulated. For example, we could change the arguments of the expression created above.
That still seems a little tame. What about manipulating a function?
julia> F = :(x -> x^2)
:(x->begin # none, line 1:
x^2
end)
julia> eval(F)(2) # Evaluate x -> x^2 for x = 2
4
julia> F.args[2].args[2].args[3] = 3 # Change function to x -> x^3
3
julia> eval(F)(2) # Evaluate x -> x^3 for x = 2
8
Macros
Macros are a little like functions in that they accept arguments and return a result. However they are different because they are evaluated at parse time and return an unevaluated expression.
macroexpand() is used to look at the code generated by the macro. Note that parentheses were automatically inserted to ensure the correct order of operations.
Julia has a plethora of predefined macros which do things like return the execution time for an expression (@time), apply an assertion (@assert), test approximate equality (@test_approx_eq) and execute code only in a UNIX environment (@unix_only).
The fact that one can use code to build and edit other code made me start thinking about self-replicating machines, self-reconfiguring modular robots, grey goo and utility fog. If we can do it in software, why not in hardware too? More evidence of my tinkering with metaprogramming in Julia can be found on github. No self-reconfiguring modular robots though, I’m afraid.