Very often when I answer questions on Stack Overflow I learn something
new. Recently when discussing random number generation in this post
I have made an answer using a practice I knew worked from my experience,
but it turned out that I did not really understand why (and thanks to
rafak for a great comment).

Let us start with the conclusion from the discussion and then I will expand on it:

Always explicitly pass random number generator to the rand function in
performance-critical code.

Let us first see a simple example of this rule at work and next try to
understand the reason for this recommendation.

Estimating \(\pi\) using Monte Carlo simulation

Let us write a simple function that approximates \(\pi\) using Monte Carlo
simulation and uses the default global pseudo-random number generator.

function pi_global(n::Int)
    s = 0
    for _ in 1:n
        s += rand()^2 + rand()^2 < 1
    return 4 * s / n

The code takes advantage from a well known fact that if we sample a point
\((x,y)\) uniformly from \([0,1,]^2\) square the probability that \(x^2+y^2\) is
less than \(1\) is equal to \(\pi/4\).

We check the runtime of this code:

julia> @time pi_global(10^9)
  6.998930 seconds (19 allocations: 20.188 KiB)

julia> @time pi_global(10^9)
  7.002321 seconds

as you can see on my laptop it is around 7 seconds.

Now let us write a function that takes a MersenneTwister
generator (this is the default pseudo-random number generator in Julia).

using Random

function pi_local(n::Int, rng::MersenneTwister)
    s = 0
    for _ in 1:n
        s += rand(rng)^2 + rand(rng)^2 < 1
    return 4 * s / n

Here is its timing:

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  2.723634 seconds

julia> @time pi_local(10^9, mt)
  2.734530 seconds

Wow! I would not have expected this.

Now let me reveal that I am on Julia 1.5.3. Interestingly, when I built my
habits of working with rand it was Julia 1.0 time. Let us check these codes on
Julia 1.0.5 (that soon will stop being supported). Here are the results:

julia> function pi_global(n::Int)
           s = 0
           for _ in 1:n
               s += rand()^2 + rand()^2 < 1
           return 4 * s / n
pi_global (generic function with 1 method)

julia> @time pi_global(10^9)
  2.939260 seconds (44.35 k allocations: 2.366 MiB)

julia> @time pi_global(10^9)
  2.891349 seconds (6 allocations: 192 bytes)

julia> using Random

julia> function pi_local(n::Int, rng::MersenneTwister)
           s = 0
           for _ in 1:n
               s += rand(rng)^2 + rand(rng)^2 < 1
           return 4 * s / n
pi_local (generic function with 1 method)

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  3.129134 seconds (30.73 k allocations: 1.574 MiB)

julia> @time pi_local(10^9, mt)
  3.115317 seconds (6 allocations: 192 bytes)

We see that there is a huge regression in the performance of rand() between
versions of Julia. Let us understand what is the reason for this.

Digging down the rand() implementation

We switch back to Julia 1.5.3 and will stick to it till the end of this post.

First we do a quick benchmark (I am using the same Julia 1.5.3. session as

julia> using BenchmarkTools

julia> @btime rand()
  4.784 ns (0 allocations: 0 bytes)

julia> @btime rand($mt)
  2.890 ns (0 allocations: 0 bytes)

There is a significant difference in performance indeed. So what does rand()
do that costs so much? Let us see the definition of relevant method for rand
(it is easy to get it by writing @edit rand()):

rand(rng::AbstractRNG=default_rng(), ::Type{X}=Float64) where {X} =
    rand(rng, Sampler(rng, X, Val(1)))

We can see that the only difference between rand() and rand(mt) is that the
former calls default_rng() function (it is from the Random module).

In a similar way as above we dig down to the relevant definition:

const THREAD_RNGs = MersenneTwister[]
@inline default_rng() = default_rng(Threads.threadid())
@noinline function default_rng(tid::Int)
    0 < tid <= length(THREAD_RNGs) || _rng_length_assert()
    if @inbounds isassigned(THREAD_RNGs, tid)
        @inbounds MT = THREAD_RNGs[tid]
        MT = MersenneTwister()
        @inbounds THREAD_RNGs[tid] = MT
    return MT
@noinline _rng_length_assert() =  @assert false "0 < tid <= length(THREAD_RNGs)"

function __init__()
    resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object

And now we see the reason. In Julia 1.5.3 rand() is tread safe (it was not in
Julia 1.0, and that is the reason of the difference in performance between
versions). Ensuring thread safety must cost something. In this case even
although the code that extracts out the appropriate MersenneTwister instance
from the THREAD_RNGs vector is simple it has a noticeable cost (the reason is
that random number generation itself is extremely well optimized and fast).


Going back to the beginning of this post: remember not to use rand() in your
performance critical code.

Also I think this example shows very nicely how huge benefits we have from the
fact that Julia is mostly written in Julia — it is only a few keystrokes
and we could identify the root cause of the performance puzzle.