Re-posted from: https://blog.glcs.io/sim-performance-type-stability

This post was written by Steven Whitaker.

In this Julia for Devs post,we will discuss improving the performanceof simulation code written in Juliaby eliminating sources of type-instabilities.

We wrote another postdetailing what type-stability isand how type-instabilities can degrade performance.We also showed how SnoopCompile.jl and Cthulhu.jlcan be used to pinpoint causes of type-instability.

This post will cover some of the type-instabilitieswe helped one of our clients overcome.

Our client is a technology innovator.They are building a first-of-its-kind logistics systemfocused on autonomous electric deliveryto reduce traffic and air pollution.Their aim is to provideefficient delivery services for life-saving productsto people in both urban and rural areas.Julia is helping them power criticalGuidance, Navigation, and Control (GNC) systems.

With this client, we:

Eliminated slowdown-related failureson the most important simulation scenario.
Decreased the compilation time of the scenario by 30%.
Improved the slowest time stepsfrom 300 ms to 10 ms (30x speedup),enabling 2x real-time performance.

We will not share client-specific code,but will provide similar examplesto illustrate root-cause issuesand suggested resolutions.

Here are the root-cause issues and resolutions we will focus on:

Help type-inference by unrolling recursion.
Standardize the output of different branches.
Avoid loops overTuples.
- Use SnoopCompile.jl to reveal dynamic dispatches.
- Investigate functionswith Cthulhu.jl.
Avoid dictionaries that map to functions.

Let’s dive in!

Help Type-Inference by Unrolling Recursion

One of the interesting problems we sawwas that there was a part of the client’s codethat SnoopCompile.jl reportedwas resulting in calls to inference,but when we inspected the code with Cthulhu.jlthe code looked perfectly type-stable.

This code consisted of a set of functionsthat recursively called each other,traversing the model treeto grab data from all the submodels.

As it turns out,recursion can pose difficulties for Julia’s type-inference.Basically,if type-inference detects recursionbut cannot prove it terminates(based only on the types of inputs—rememberthat type-inference occurs before runtime),inference gives up,resulting in code that runs like it is type-unstable.(See this Discourse post and comments and links thereinfor more information.)

The solution we implementedwas to use a @generated functionto unroll the recursion at compile time,resulting in a flat implementationthat could be correctly inferred.

Here’s an example that illustratesthe essence of the recursive code:

# Grab all the data in the entire model tree beginning at `model`.get_data(model::NamedTuple) = (; data = model.data, submodels = get_submodel_data(model.submodels))# This generated function is necessary for type-stability.# It calls `get_data` on each of the fields of `submodels`# and returns a `NamedTuple` of the results.# (This is not the generated function implemented in the solution.)@generated function get_submodel_data(submodels::NamedTuple)    assignments = map(fieldnames(submodels)) do field        Expr(:kw, field, :(get_data(submodels.$field)))    end    return Expr(:tuple, Expr(:parameters, assignments...))endget_submodel_data(::@NamedTuple{}) = (;)

Note that in this exampleget_data calls get_submodel_data,which in turn calls get_dataon the submodels.

Here’s the code for unrolling the recursion:

function _gen_get_data(T, path)    subT = _typeof_field(T, :submodels)    subpath = :($path.submodels)    return quote        (;            data = $path.data,            submodels = $(_gen_get_submodel_data(subT, subpath)),        )    endend# This function determines the type of `model.field`# given just the type of `model` (so we can't just call `typeof(model.field)`).# This function is necessary because we need to unroll the recursion# using a generated function, which means we have to work in the type domain# (because the generated function is generated before runtime).function _typeof_field(::Type{NamedTuple{names, T}}, field::Symbol) where {names, T}    i = findfirst(n -> n === field, names)    return T.parameters[i]endfunction _gen_get_submodel_data(::Type{@NamedTuple{}}, subpath)    return quote        (;)    endendfunction _gen_get_submodel_data(subT, subpath)    assignments = map(fieldnames(subT)) do field        T = _typeof_field(subT, field)        path = :($subpath.$field)        Expr(:kw, field, _gen_get_data(T, path))    end    return Expr(:tuple, Expr(:parameters, assignments...))end@generated get_data_generated(model::NamedTuple) = _gen_get_data(model, :model)

Unfortunately,this example doesn’t reproduce the issue our client had,but it does show how to use a @generated functionto unroll recursion.Note that there is still recursion:_gen_get_data and _gen_get_submodel_data call each other.The key, though, is that this recursion happens before inference,which means that when get_data_generated is inferred,the recursion has already taken place,resulting in unrolled codewithout any recursionthat might cause inference issues.

When we implemented this solution for our client,we saw the total memory utilization of their simulationdecrease by ~35%.This was enough to allow them to disable garbage collectionduring the simulation,speeding it upto faster than real-time!And this was the first timethis simulation had run faster than real-time!

Standardize Output of Different Branches

The client had different parts of their modelupdate at different frequencies.As a result,at any particular time steponly a subset of all the submodelsactually needed to update.Here’s an example of what this might look like:

function get_output(model, t)    if should_update(model, t)        out = update_model(model, t)    else        out = get_previous_output(model)    end    return outend

Unfortunately,update_model and get_previous_outputreturned values of different types,resulting in type-instability:the output type of get_outputdepended on the runtime result of should_update.

Furthermore,this function was called at every time pointon every submodel (and every sub-submodel, etc.),so the type-instability in this functionaffected the whole simulation.

The issue was that update_modeltypically returned the minimal subset of informationactually needed for the specific model,whereas get_previous_output was genericand returned a wider set of information.For example,maybe update_model would return a NamedTuplelike (x = [1, 2], xdot = [0, 0]),while get_previous_output would returnsomething like (x = [1, 2], xdot = [0, 0], p = nothing, stop_sim = false).

To fix this issue,rather than manually updating the return valuesof all the methods of update_modelfor all the submodels in the system,we created a function standardize_outputthat took whatever NamedTuple returned by update_modeland added the missing fieldsthat get_previous_output included.Then,the only change needed in get_outputwas to call standardize_output:

out = update_model(model, t) |> standardize_output

The result of making this changewas a 30% decrease in compilation timefor their simulation!

Avoid Loops over `Tuple`s

The client stored submodelsof a parent modelas a Tuple or NamedTuple.This makes sense for type-stabilitybecause each submodel was of a unique type,so storing them in this waypreserved the type informationwhen accessing the submodels.In contrast,storing the submodels as a Vector{Any}would lose the type informationof the submodels.

However,type-stability problems arisewhen looping over Tuplesof different types of objects.The problem is that the compiler needsto compile code for the body of the loop,but the body of the loop needsto be able to handleall types included in the Tuple.As a result,the compiler must resort to dynamic dispatchin the loop body(but see the note on union-splitting further below).

President waiting for receptionist to look up his office

Here’s an example of the issue:

module TupleLoopfunction tupleloop(t::Tuple)    for val in t        do_something(val)    endenddo_something(val::Number) = val + 1do_something(val::String) = val * "!"do_something(val::Vector{T}) where {T} = isempty(val) ? zero(T) : val[1]do_something(val::Dict{String,Int}) = get(val, "hello", 0)end

Using SnoopCompile.jl reveals dynamic dispatches to do_something:

julia> using SnoopCompileCorejulia> tinf = @snoop_inference TupleLoop.tupleloop((1, 2.0, "hi", [10.0], Dict{String,Int}()));julia> using SnoopCompilejulia> tinfInferenceTimingNode: 0.019444/0.020361 on Core.Compiler.Timings.ROOT() with 4 direct childrenjulia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)2-element Vector{SnoopCompile.TaggedTriggers{Method}}: eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers) tupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3 (3 callees from 1 callers)

Looking at tupleloop with Cthulhu.jl:

julia> using Cthulhujulia> ascend(mtrigs[2].itrigs[1])Choose a call for analysis (q to quit):     do_something(::String) >     tupleloop(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}) at /path/to/TupleLoop.jltupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3 3 function tupleloop(t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}::Tuple)::Core.Const(nothing) 5 5     for val::Any in t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}} 6         do_something(val::Any) 7     end 9 9 end

And we see the problem!Even though the Tuple is inferred,the loop variable val is inferred as Any,which means that calling do_something(val)must be a dynamic dispatch.

Note that in some casesJulia can perform union-splitting automaticallyto remove the dynamic dispatchcaused by this type-instability.In this example,union-splitting occurs when the Tuplecontains 4 (by default) or fewer unique types.However,it’s not a general solution.

One way to remove the dynamic dispatchwithout relying on union-splittingis to eliminate the loop:

do_something(t[1])do_something(t[2])

But we can quickly seethat writing this codeis not at all generic;we have to hard-codethe number of calls to do_something,which means the code will only workwith Tuples of a particular length.Fortunately,there’s a way around this issue.We can write a @generated functionto have the compiler unroll the loopfor us in a generic way:

@generated function tupleloop_generated(t::Tuple)    body = [:(do_something(t[$i])) for i in fieldnames(t)]    return quote        $(body...)        return nothing    endend

(Note that this code would also workif we specified t::NamedTuplein the method signature.)

Due to the way @generated functions work,SnoopCompile.jl still detects dynamic dispatches,but note that tupleloop_generateddoes not have any dynamic dispatches reported:

julia> using SnoopCompileCorejulia> tinf = @snoop_inference TupleLoop.tupleloop_generated((1, 2.0, "hi", [10.0], Dict{String,Int}()));julia> using SnoopCompilejulia> tinfInferenceTimingNode: 0.022208/0.050369 on Core.Compiler.Timings.ROOT() with 5 direct childrenjulia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)3-element Vector{SnoopCompile.TaggedTriggers{Method}}: (g::Core.GeneratedFunctionStub)(world::UInt64, source::LineNumberNode, args...) @ Core boot.jl:705 (1 callees from 1 callers) eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers) var"#s1#1"(::Any, t) @ Main.TupleLoop none:0 (3 callees from 1 callers)

And we can verify with Cthulhu.jlthat there are no more dynamic dispatches in tupleloop_generated:

julia> using Cthulhujulia> ascend(mtrigs[2].itrigs[1])Choose a call for analysis (q to quit): >   tupleloop_generated(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}})       eval at ./boot.jl:430 => eval_user_input(::Any, ::REPL.REPLBackend, ::Module) at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-1tupleloop_generated(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:11Variables  #self#::Core.Const(Main.TupleLoop.tupleloop_generated)  t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}Body::Core.Const(nothing)    @ /path/to/TupleLoop.jl:11 within `tupleloop_generated`    @ /path/to/TupleLoop.jl:15 within `macro expansion`1  %1  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)   %2  = Base.getindex(t, 1)::Int64         (%1)(%2)   %4  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)   %5  = Base.getindex(t, 2)::Float64         (%4)(%5)   %7  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)   %8  = Base.getindex(t, 3)::String         (%7)(%8)   %10 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)   %11 = Base.getindex(t, 4)::Vector{Float64}         (%10)(%11)   %13 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)   %14 = Base.getindex(t, 5)::Dict{String, Int64}         (%13)(%14)   @ /path/to/TupleLoop.jl:16 within `macro expansion`       return Main.TupleLoop.nothing

Here we have to examine the so-called “Typed” code(since the source code was generated via metaprogramming),but we see that there is no loop in this code.As a result,each call to do_somethingis a static dispatchwith a concretely inferred input.Hooray!

Avoid Dictionaries that Map to Functions

The client registered functionsfor updating their simulation visualizationvia a dictionary that mapped from a String keyto the appropriate update function.

Sometimes it can be convenientto have a dictionary of functions,for example:

d = Dict{String, Function}(    "sum" => sum,    "norm" => norm,    # etc.)x = [1.0, 2.0, 3.0]d["sum"](x) # Compute the sum of the elements of `x`d["norm"](x) # Compute the norm of `x`

This allows you to write generic codethat can call the appropriate intermediate functionbased on a key supplied by the caller.

You could use multiple dispatch to achieve similar results,but it requires a bit more thoughtto organize the code in such a waythat ensures the caller has access to the types to dispatch on.

As another alternative,you could also have the callerjust pass in the function to call.But again,it takes a bit more effortto organize the codeto make it work.

Unfortunately,using a dictionary in this wayis type-unstable:Julia can’t figure out what functionwill be calleduntil runtime,when the precise dictionary key is known.And since the function is unknown,the type of the result of the function is also unknown.

One partial solutionis to use type annotations:

d[func_key](x)::Float64

Then at least the output of the functioncan be used in a type-stable way.However,this only works if all the functions in the dictionaryreturn values of the same typegiven the same input.

A slightly less stringent alternativeis to explicitly convertthe result to a common type,but this requires conversion to be possible.

Our client updated a dictionaryusing the output of the registered function,so the full solution we implemented for our clientwas to remove the dictionaryand instead have explicit branches in the code.That is,instead of

updates[key] = d[key](updates[key])

we had

if key == "k1"    updates[key] = f1(updates[key]::OUTPUT_TYPE_F1)elseif key == "k2"    updates[key] = f2(updates[key]::OUTPUT_TYPE_F2)# Additional branches as neededend

Note that we needed the type annotationsOUTPUT_TYPE_F1 and OUTPUT_TYPE_F2because updates had an abstractly typed value type.The key that makes this solution workis recognizing that in the first branchupdates[key] is the output of f1from the previous time step in the simulation(and similarly for the other branches).Therefore,in each branch we know what the type of updates[key] is,so we can give the compiler that type information.

Also note that the previously mentioned ideasof using multiple dispatchor just passing in the functions to usedon’t work in this situationwithout removing the updates dictionary(and refactoring the affected code).

Making the above changecompletely removed type-instabilitiesin that part of the client’s code.

Summary

In this post,we explored a few problemsrelated to type-stabilitythat we helped our client resolve.We were able to diagnose issuesusing SnoopCompile.jl and Cthulhu.jland make code improvementsthat enabled our client’smost important simulation scenarioto pass tests for the first time.This was possiblebecause our solutions enabled the scenarioto run faster than real-timeand reduced compilation time by 30%.

Do you have type-instabilities that plague your Julia code?Contact us, and we can help you out!

Additional Links

SnoopCompile.jl Docs
- Documentation for SnoopCompile.jl.
Cthulhu.jl Docs
- Documentation (the package’s README) for Cthulhu.jl.
Julia Performance Tips
- Very good tips for improving the performance of Julia code.
GLCS Software Development
- Connect with us for Julia development help.
Upcoming JuliaCon Talk Announcement
- Check out our JuliaCon 2025 talk announcement!

The cover image backgroundof a person at the start of a racewas found at freepik.com.

The cartoon about looping over Tupleswas generated with AI.

]]>

juliabloggers.com

A Julia Language Blog Aggregator

Optimize Flight Simulations with Improved Type-Stability

Help Type-Inference by Unrolling Recursion

Standardize Output of Different Branches

Avoid Loops over `Tuple`s

Avoid Dictionaries that Map to Functions

Summary

Additional Links

Related

Help Type-Inference by Unrolling Recursion

Standardize Output of Different Branches

Avoid Loops over Tuples

Avoid Dictionaries that Map to Functions

Summary

Additional Links

Related

Avoid Loops over `Tuple`s