Parallel Processing in Julia: Bootstrapping the MLE

By: Bradley Setzler

Re-posted from:

* Scripts that reproduce the results of this tutorial in Julia are located here and here.

To review our extended example, we first simulated the OLS DGP, solved for the MLE of OLS, bootstrapped the MLE, and computed various non-parametric p-values.

The only part of our previous work that may have slowed down your computer was forming the 1,000 bootstrap samples (samples). In this post, we will see how to speed up the bootstrapping by using more than one processor on our computers.

The Estimator that We Wish to Bootstrap

We are interested in the following MLE estimator, adapted from our previous work. The only difference is that I will include more independent variables in X by setting K=10 instead of 4, knowing that this will slow down our computers. I have to adjust trueParams and some other parts of the code to make room for the additional independent variables. First, we import the two needed packages and generate the data:

using Distributions
using Optim
genX = MvNormal(eye(K))
X = rand(genX,N)
X = X'
constant = ones(N)
X = [constant X]
genEpsilon = Normal(0, 1)
epsilon = rand(genEpsilon,N)
trueParams = [-K/2:K/2]*.02
Y = X*trueParams + epsilon

Next, we define the likelihood of OLS:

function loglike(rho,x,y)
   beta = rho[1:K+1]
   sigma2 = exp(rho[K+2])
   residual = y-x*beta
   dist = Normal(0, sigma2)
   contributions = logpdf(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood

Finally, we define the bootstrap function, which includes within it a wrapper for the likelihood:

function bootstrapSamples(B)
   samples = zeros(B,K+2)
   for b=1:B
      theIndex = randperm(N)[1:M]
      x = X[theIndex,:]
      y = Y[theIndex,:]
      function wrapLoglike(rho)
         return loglike(rho,x,y)
      samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum
   samples[:,K+2] = exp(samples[:,K+2])
   return samples

The reason for the cute println statements will become apparent later. Save all of this in a file called “bootstrapFunction.jl”.

Parallel Bootstrapping

We now have the function, bootstrapSamples, saved in the file “bootstrapFunction.jl”, that we wish to run in parallel. The idea is that, instead of running the MLE B times on one processor, we run it B/4 times on each of four processors, then collect the results back to the first processor. Unfortunately, Julia makes it very difficult to parallelize functions within the same file that the functions are defined; the user must explicitly tell Julia every parameter and function that is needed on each processor, which is a frustrating process of copy-pasting the command @everywhere all around your code.

Instead, we separate our work into two files, the first one you have already created. Create a second file in the same location on your computer as “bootstrapFunction.jl”. In the second file, write the following commands:

samples_pmap = pmap(bootstrapSamples,[b,b,b,b])
samples = vcat(samples_pmap[1],samples_pmap[2],samples_pmap[3],samples_pmap[4])

The command addprocs(4) means to add four processors into your Julia session. The command require is the key to getting this right: it ensures that all of the functions and parameters contained in “bootstrapFunction.jl” are available to each processor. pmap is the function that actually applies bootstrapSamples to each value in the list [b,b,b,b]. I set the list next to it to be [b,b,b,b], where b=250, so that bootstrapEstimates would collect 250 bootstrap samples on each processor. The resulting matrices from all four processors are stored in the length-4 vector called samples_pmap. Finally, we vertically stack (vcat) the four matrices into one matrix, samples, which is exactly the matrix of bootstrap samples we have seen previously.

Tip: For parallel processing in Julia, separate your code into two files, one file containing the functions and parameters that need to be run in parallel, and the other file managing the processing and collecting results. Use the require command in the managing file to import the functions and parameters to all processors.


Once you have run the outer file, you can do speed tests to compare the two ways of getting 1,000 bootstrap samples. We use the @elapsed command to time each code. Here are the results:

julia> @elapsed pmap(bootstrapSamples,[b,b,b,b])
From worker 2: hi
From worker 3: hi
From worker 4: hi
From worker 5: hi
From worker 5: bye
From worker 2: bye
From worker 4: bye
From worker 3: bye

Each of the processors tells us “hi” and “bye” when it begins and finishes its job, respectively. The total time was just under 9 seconds. By comparison, the following code will compute 1,000 bootstrap samples on only one processor, as in our previous work:

julia> @elapsed bootstrapSamples(B)

So, in the absence of parallel processing, the same result would require just over 21 seconds. Thus, parallel processing on a four-core personal computer allowed us to reduce the time required to bootstrap our estimator by a factor greater than 3.

Bradley J. Setzler