Should I use mutable or immutable containers for agent based models in Julia?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2020/10/16/gctime.html

Introduction

It is a well known performance recommendation in the Julia language that
avoiding allocations matters and using immutable objects is a good practice. In
this post, in the context of creation of agent based models, I want to show two
examples of a toy codes that highlight the key aspects of this issue.

All the examples are tested under Julia 1.5.2. The Setfield.jl package version
used is 0.7.

Avoiding allocations

Consider the following code:

using Statistics

struct AgentI
    loc::Int
end

mutable struct AgentM
    loc::Int
end

fI(n) = mean(x -> x.loc, [AgentI(i) for i in 1:n])
fM(n) = mean(x -> x.loc, [AgentM(i) for i in 1:n])

The only difference between functions fI and fM is that they respectively
work on immutable and mutable objects. I use mean for aggeration to make sure
that the compiler does not optimize out too much.

Let us benchmark these codes:

julia> fI(1); fM(1); GC.gc(); # force compilation and collect garbage

julia> @time fI(10^8)
  0.385592 seconds (2 allocations: 762.940 MiB, 0.73% gc time)
5.00000005e7

julia> @time GC.gc()
  0.091111 seconds, 100.00% gc time

julia> @time fM(10^8)
  3.498990 seconds (100.00 M allocations: 2.235 GiB, 60.79% gc time)
5.00000005e7

julia> @time GC.gc()
  0.295949 seconds, 100.00% gc time

We see that working with mutable objects was ten times slower and also
it lead to a higher garbage collection cost after fM terminated.
In particular note that over 60% of run time of fM was spent on garbage
collection.

The reason is the following. Allocating objects has three costs:

  • cost of actual allocation
  • cost of having to store and use object references instead of objects directly
    in the container
  • cost of cleaning-up (i.e. running garbage collector that frees unused memory)

The cost of data movement

A huge advantage of mutable objects is, well, that they are mutable. This makes
it simple to update only their selected fields. One might wonder if the cost
of having to create the immutable object anew each time in such cases is important.
Let us investigate. First we set up our experiment:

using Statistics
using Setfield

struct Agent2I
    loc::Int
    junk::NTuple{100, Int}
end

mutable struct Agent2M
    loc::Int
    junk::NTuple{100, Int}
end

const REF_TUP = ntuple(i -> 0, 100)

function gI(n)
    x = [Agent2I(i, REF_TUP) for i in 1:n]
    for i in 1:n
        xi = x[i]
        x[i] = @set xi.loc += 1
    end
    return mean(x -> x.loc, x)
end

function gM(n)
    x = [Agent2M(i, REF_TUP) for i in 1:n]
    for i in 1:n
        x[i].loc += 1
    end
    return mean(x -> x.loc, x)
end

Note that we used REF_TUP to make the Agent2I object have a larger memory
footprint. The size of agent state I used is usually more than enough in
practice. As you can see in the code the Setfield.jl package makes it easy to
update only selected field of an immutable object using the @set macro.

It is time to start benchmarking:

julia> gI(1); gM(1); GC.gc();

julia> @time gI(10^7)
  2.714677 seconds (2 allocations: 7.525 GiB, 0.10% gc time)
5.0000015e6

julia> @time GC.gc()
  0.277477 seconds, 100.00% gc time

julia> @time gM(10^7)
  7.287978 seconds (10.00 M allocations: 7.674 GiB, 59.53% gc time)
5.0000015e6

julia> @time GC.gc()
  0.965113 seconds, 100.00% gc time

So as you can see the time difference has narrowed down, but still the immutable
code is faster.

Conclusions

In general working with mutable code is easier than working with immutable code.
However, if you are working with performance critical code, avoiding using
mutable objects is one of the first recommendations to consider when trying to
optimize it. Also Setfield.jl (and upcoming Accessors.jl) make working with
immutable objects relatively smooth.