Author Archives: A Technical Blog -- julia

Thread Parallelism in Julia

Julia has 3 kinds of parallelism.
The well known, safe, slowish and easyish, distributed parallelism, via pmap, @spawn and @remotecall.
The wellish known, very safe, very easy, not-actually-parallelism, asynchronous parallelism via @async.
And the more obscure, less documented, experimental, really unsafe, shared memory parallelism via @threads.
It is the last we are going to talk about today.

I’m not sure if I can actually teach someone how to write threaded code.
Let alone efficient threaded code.
But this is me giving it a shot.
The example here is going to be fairly complex.
For a much simpler example of use,
on a problem that is more easily parallelizable,
see my recent stackoverflow post on parallelizing sorting.

(Spoilers: in the end I don’t manage to extract any serious performance gains from paralyzing this prime search. Unlike parallelizing that sorting. Paralising sorting worked out great)

Continue reading

Lazy Sequences in Julia

I wanted to talk about using Coroutines for lazy sequences in julia.
Because I am rewriting CorpusLoaders.jl to do so in a nondeprecated way.

This basically corresponds to C# and Python’s yield return statements.
(Many other languages also have this but I think they are the most well known).

The goal of using lazy sequences is to be able to iterate though something,
without having to load it all into memory.
Since you are only going to be processing it a single element at a time.
Potentially for some kind of moving average, or for acausal language modelling,
a single window of elements at a time.
Point is, at no point do I ever want to load all 20Gb of wikipedia into my program,
nor all 100Gb of Amazon product reviews.

And I especially do not want to load $\infty$ bytes of every prime number.

Continue reading

Using julia -L startupfile.jl, rather than machinefiles for starting workers.

If one wants to have full control over the worker process to method to use is addprocs and the -L startupfile.jl commandline arguement when you start julia
See the documentation for addprocs.

The simplest way to add processes to the julia worker is to invoke it with julia -p 4.
The -p 4 argument says start 4 worker processes, on the local machine.
For more control, one uses julia --machinefile ~/machines
Where ~/machines is a file listing the hosts.
The machinefile is often just a list of hostnames/IP-addresses,
but sometimes is more detailed.
Julia will connect to each host and start a number of workers on each equal to the number of cores.

Even the most detailed machinefile doesn’t give full control,
for example you can not specify the topology, or the location of the julia exectuable.

For full control, one shoud invoke addprocs directly,
and to do so, one should use julia -L startupfile.jl

Continue reading