Author Archives: Mosè Giordano

My first macro in Julia

By: Mosè Giordano

Re-posted from: http://giordano.github.io/blog/2022-06-18-first-macro/

This post isn’t about the first macro I wrote in the Julia programming
language
, but it can be about your first macro.

Frequently Asked Questions

Question: What are macros in Julia?

Answer: Macros are sort of functions which take as input unevaluated expressions
(Expr) and return as output
another expression, whose code is then regularly evaluated at runtime. This post isn’t a
substitute for reading the section about macros in the Julia
documentation
, it’s
more complementary to it, a light tutorial for newcomers, but I warmly suggest reading the
manual to learn more about them.

Calls to macros are different from calls to regular functions because you need to prepend
the name of the macro with the at-sign @ sign: @view A[4, 1:5] is a call to the
@view macro, view(A, 4, 1:5)
is a call to the view function.
There are a couple of important reasons why macro calls are visually distinct from function
calls (which sometimes upsets Lisp purists):

  • the arguments of function calls are always evaluated before entering the body of the
    function: in f(g(2, 5.6)), we know that the function g(2, 5.6) will always be
    evaluated before calling f on its result. Instead, the expression which is given as
    input to a macro can in principle be discarded and never taken into account and the macro
    can be something completely different: in @f g(2, 5.6), the expression g(2, 5.6) is
    taken unevaluated and rewritten into something else. The function g may actually never
    be called, or it may be called with different arguments, or whatever the macro @f has
    been designed to rewrite the given expression;
  • nested macro calls are evaluated left-to-right, while nested function calls are evaluated
    right-to-left: in the expression f(g()), the function g() is first called and then fed
    into f, instead in @f @g x the macro @f will first rewrite the unevaluated
    expression @g x, and then, if it is still there after the expression-rewrite operated by
    @f (remember the previous point?), the macro @g is expanded.

Writing a macro can also be an unpleasant experience the first times you try it, because a
macro operates on a new level (expressions, which you want to turn into other expressions)
and you need to get familiar with new concepts, like
hygiene (more on this below), which can be
tricky to get initially right. However it may help remembering that the Expr macros
operate on are regular Julia objects, which you can access and modify like any other Julia
structure. So you can think of a macro as a regular function which operates on Expr
objects, to return a new Expr. This isn’t a simplification: many macros are actually
defined to only call a regular function which does all the expression rewriting business.

Q: Should I write a macro?

A: If you’re asking yourself this question, the answer is likely “no”. Steven G. Johnson
gave an interesting keynote speech at JuliaCon
2019
about metaprogramming (not just in
Julia), explaining when to use it, and more importantly when not to use it.

Also, macros don’t compose very well: remember that any macro can rewrite an expression in a
completely arbitrary way, so nesting macros can sometimes have unexpected results, if the
outermost macro doesn’t anticipate the possibility the expressions it operates on may
contain another macro which expects a specific expression. In practice, this is less of a
problem than it may sound, but it can definitely happens if you overuse many complicated
macros. This is one more reason why you should not write a macro in your code unless it’s
really necessary to substantially simplify the code.

Q: So, what’s the deal with macros?

A: Macros are useful to programmatically generate code which would be tedious, or very
complicated, to type manually. Keep in mind that the goal of a macro is to eventually get a
new expression, which is run when the macro is called, so the generated code will be
executed regularly, no shortcut about that, which is sometimes a misconception. There is
very little that plain (i.e. not using macros) Julia code cannot do that macros can.

Q: Why I should keep reading this post?

A: Writing a simple macro can still be a useful exercise to learn how it works, and
understand when they can be useful.

Our first macro: @stable

Globals in Julia are usually a major performance
hit
, when their type is not
constant, because the compiler have no idea what’s their actual type. When you don’t need
to reassign a global variable, you can mark it as constant with the
const keyword, which greatly improves
performance of accessing a global variable, because the compiler will know its type and can
reason about it.

Julia v1.8 introduces a new way to have non-horribly-slow global variables: you can annotate
the type of a global variable, to say that its type won’t change:

julia> x::Float64 = 3.14
3.14

julia> x = -4.2
-4.2

julia> x = 10
10

julia> x
10.0

julia> x = nothing
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  ...

This is different from a const variable, because you can reassign a type-annotated global
variable to another value with the same type (or
convertible to the annotated
type), while reassigning a const variable isn’t possible (nor recommended). But the type
would still be constant, which helps the compiler optimising code which accesses this global
variable. Note that x = 10 returns 10 (an Int) but the actual value of x is 10.0
(a Float64) because assignment returns the right-hand side but the value 10 is converted to Float64 before
assigning it to x.

Wait, there is no macro so far! Right, we’ll get there soon. The problem with
type-annotated globals is that in the expression x::Float64 = 3.14 it’s easy to predict
the type we want to attach to x, but if you want to make x = f() a type-annotated global
variable and the type of f() is much more involved than Float64, perhaps a type with
few parameters
, then doing
the type annotation can be annoying. Mind, not impossible, just tedious. So, that’s where
a macro could come in handy!

The idea is to have a macro, which we’ll call @stable, which operates like this:

@stable x = 2

will automatically run something like

x::typeof(2) = 2

so that we can automatically infer the type of x from the expression on the right-hand
side, without having to type it ourselves. A useful tool when dealing with macros is
Meta.@dump (another
macro, yay!).

julia> Meta.@dump x = 2
Expr
  head: Symbol =
  args: Array{Any}((2,))
    1: Symbol x
    2: Int64 2

This tells us how the expression x = 2 is parsed into an Expr, when it’s fed into a
macro. So, this means that our @stable x = 2 macro will see an Expr whose ex.args[1]
field is the name of the variable we want to create and ex.args[2] is the value we want to
assign to it, which means the expression we want to generate will be something like
ex.args[1]::typeof(ex.args[2]) = ex.args[2], but remember that you need to
interpolate
variables inside a quoted
expression
:

julia> macro stable(ex::Expr)
           return :( $(ex.args[1])::typeof($(ex.args[2])) = $(ex.args[2]) )
       end
@stable (macro with 1 method)

julia> @stable x = 2
2

Well, that was easy, it worked at the first try! Now we can use our brand-new type-stable
x! Let’s do it!

julia> x
ERROR: UndefVarError: x not defined

Waaat! What happened to our x, we just defined it above! Didn’t we? Well, let’s use
yet another macro,
@macroexpand, to see
what’s going on:

julia> @macroexpand @stable x = 2
:(var"#2#x"::Main.typeof(2) = 2)

Uhm, that looks weird, we were expecting the expression to be x::typeof(2), what’s that
var"#2#x"?
Let’s see:

julia> var"#2#x"
ERROR: UndefVarError: #2#x not defined

Another undefined variable, I’m more and more confused. What if that 2 in there is a
global counter? Maybe we need to try with 1:

julia> var"#1#x"
2

julia> var"#1#x" = 5
5

julia> var"#1#x"
5

julia> var"#1#x" = nothing
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  ...

Hey, here is our variable, and it’s working as we expected! But this isn’t as convenient
as calling the variable x as we wanted. I don’t like that. what’s happening?
Alright,
we’re now running into hygiene, which we mentioned above: this isn’t about washing your
hands, but about the fact macros need to make sure the variables in the returned expression
don’t accidentally clash with variables in the scope they expand to. This is achieved by
using the gensym function to
automatically generate unique identifiers (in the current module) to avoid clashes with
local variables.

What happened above is that our macro generated a variable with a gensym-ed name, instead
of the name we used in the expression, because macros in Julia are hygienic by default. To
opt out of this mechanism, we can use the
esc function. A rule of thumb is
that you should apply esc on input arguments if they contain variables or identifiers from
the scope of the calling site that you need use as they are, but for more details do read
the section about hygiene in the Julia
manual
. Note also that
the pattern var"#N#x", with increasing N at every macro call, in the gensym-ed
variable name is an implementation detail which may change in future versions of Julia,
don’t rely on it.

Now we should know how to fix the @stable macro:

julia> macro stable(ex::Expr)
           return :( $(esc(ex.args[1]))::typeof($(esc(ex.args[2]))) = $(esc(ex.args[2])) )
       end
@stable (macro with 1 method)

julia> @stable x = 2
2

julia> x
2

julia> x = 4.0
4.0

julia> x
4

julia> x = "hello world"
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  ...

julia> @macroexpand @stable x = 2
:(x::Main.typeof(2) = 2)

Cool, this is all working as expected! Are we done now? Yes, we’re heading into the
right direction, but no, we aren’t quite done yet. Let’s consider a more sophisticated
example, where the right-hand side is a function call and not a simple literal number, which
is why we started all of this. For example, let’s define a new type-stable variable with a
rand() value, and let’s print
it with one more macro, @show,
just to be sure:

julia> @stable y = @show(rand())
rand() = 0.19171602949009747
rand() = 0.5007039099074341
0.5007039099074341

julia> y
0.5007039099074341

Ugh, that doesn’t look good. We’re calling rand() twice and getting two different
values?
Let’s ask again our friend @macroexpand what’s going on (no need to use @show
this time):

julia> @macroexpand @stable y = rand()
:(y::Main.typeof(rand()) = rand())

Oh, I think I see it now: the way we defined the macro, the same expression, rand(), is
used twice: once inside typeof, and then on the right-hand side of the assignment, but
this means we’re actually calling that function twice, even though the expression is the
same.
Correct! And this isn’t good for at least two reasons:

  • the expression on the right-hand side of the assignment can be expensive to run, and
    calling it twice wouldn’t be a good outcome: we wanted to create a macro to simply things,
    not to spend twice as much time;
  • the expression on the right-hand side of the assignment can have side effects, which is
    precisely the case of the rand() function: every time you call rand() you’re advancing
    the mutable state of the random number generator, but if you call it twice instead of
    once, you’re doing something unexpected. By simply looking at the code @stable y =
    rand()
    , someone would expect that rand() is called exactly once, it’d be bad if users
    of your macro would experience undesired side effects, which can make for hard-to-debug
    issues.

In order to avoid double evaluation of the expression, we can assign it to another temporary
variable, and then use its value in the assignment expression:

julia> macro stable(ex::Expr)
           quote
               tmp = $(esc(ex.args[2]))
               $(esc(ex.args[1]))::typeof(tmp) = tmp
           end
       end
@stable (macro with 1 method)

julia> @stable y = @show(rand())
rand() = 0.5954734423582769
0.5954734423582769

This time rand() was called only once! That’s good, isn’t it? It is indeed, but I think
we can still improve the macro a little bit. For example, let’s look at the list of all
names defined in the current module with
names. Can you spot anything
strange?

julia> names(@__MODULE__; all=true)
13-element Vector{Symbol}:
 Symbol("##meta#58")
 Symbol("#1#2")
 Symbol("#1#x")
 Symbol("#5#tmp")
 Symbol("#@stable")
 Symbol("@stable")
 :Base
 :Core
 :InteractiveUtils
 :Main
 :ans
 :x
 :y

I have a baad feeling about that Symbol("#5#tmp"). Are we leaking the temporary variable
in the returned expression?
Correct! Admittedly, this isn’t a too big of a deal, the
variable is gensym-ed and so it won’t clash with any other local variables thanks to
hygiene, many people would just ignore this minor issue, but I believe it’d still be nice to
avoid leaking it in the first place, if possible. We can do that by sticking the
local keyword in front of the
temporary variable:

julia> macro stable(ex::Expr)
           quote
               local tmp = $(esc(ex.args[2]))
               $(esc(ex.args[1]))::typeof(tmp) = tmp
           end
       end
@stable (macro with 1 method)

julia> @stable y = rand()
0.7029553059625194

julia> @stable y = rand()
0.04552255224129409

julia> names(@__MODULE__; all=true)
13-element Vector{Symbol}:
 Symbol("##meta#58")
 Symbol("#1#2")
 Symbol("#1#x")
 Symbol("#5#tmp")
 Symbol("#@stable")
 Symbol("@stable")
 :Base
 :Core
 :InteractiveUtils
 :Main
 :ans
 :x
 :y

Yay, no other leaked temporary variables! Are we done now? Not yet, we can still make it
more robust. At the moment we’re assuming that the expression fed into our @stable macro
is an assignment, but what if it isn’t the case?

julia> @stable x * 12
4

julia> x
4

Uhm, it doesn’t look like anything happened, x is still 4, maybe we can ignore also
this case and move on.
Not so fast:

julia> 1 * 2
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?

Aaargh! What does that even mean?!? Let’s ask our dear friends Meta.@dump and
@macroexpand:

julia> Meta.@dump x * 2
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol *
    2: Symbol x
    3: Int64 2

julia> @macroexpand @stable x * 12
quote
    var"#5#tmp" = x
    (*)::Main.typeof(var"#5#tmp") = var"#5#tmp"
end

julia> *
4

Let me see if I follow: with @stable x * 12 we’re assigning x (which is now
ex.args[2]) to the temporary variable, and then the assignment is basically * = 4,
because ex.args[1] is now *. Ooops.
Brilliant! In particular we’re shadowing * in
the current scope (for example the Main module in the REPL, if you’re following along in
the REPL) with the number 4, the expression 1 * 2 is actually equivalent to *(1, 2),
and since * is 4

julia> 4(1, 2)
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?

Gotcha! So we should validate the input? Indeed, we should make sure the expression
passed to the macro is what we expect, that is an assignment. We’ve already seen before
that this means ex.head should be the symbol =. We should also make sure the left-hand
side is only a variable name, we don’t want to mess up with indexing expressions like A[1]
= 2
:

julia> Meta.@dump A[1] = 2
Expr
  head: Symbol =
  args: Array{Any}((2,))
    1: Expr
      head: Symbol ref
      args: Array{Any}((2,))
        1: Symbol A
        2: Int64 1
    2: Int64 2

Right, so ex.head should be only = and ex.args[1] should only be another symbol. In
the other cases we should throw a useful error message.
You’re getting the hang of it!

julia> macro stable(ex::Expr)
           (ex.head === :(=) && ex.args[1] isa Symbol) || throw(ArgumentError("@stable: `$(ex)` is not an assigment expression."))
           quote
               tmp = $(esc(ex.args[2]))
               $(esc(ex.args[1]))::typeof(tmp) = tmp
           end
       end
@stable (macro with 1 method)

julia> @stable x * 12
ERROR: LoadError: ArgumentError: @stable: `x * 12` is not an assigment expression.

julia> @stable A[1] = 2
ERROR: LoadError: ArgumentError: @stable: `A[1] = 2` is not an assigment expression.

Awesome, I think I’m now happy with my first macro! Love it! Yes, now it works pretty
well and it has also good handling of errors! Nice job!

Conclusions

I hope this post was instructive to learn how to write a very basic macro in Julia. In the
end, the macro we wrote is quite short and not very complicated, but we ran into many
pitfalls along the way: hygiene, thinking about corner cases of expressions, avoiding
repeated undesired evaluations and introducing extra variables in the scope of the macro’s
call-site. This also shows the purpose of macros: rewriting expressions into other ones, to
simplify writing more complicated expressions or programmatically write more code.

This post is inspired by a macro which I wrote some months ago for the Seven Lines of
Julia

thread on JuliaLang Discourse.

This was cool! Where can I learn more about macros? Good to hear! I hope now you aren’t
going to abuse macros though! But if you do want to learn something more about macros, in
addition to the official documentation, some useful resources are:




Mythbusting Julia speed

By: Mosè Giordano

Re-posted from: http://giordano.github.io/blog/2021-10-14-mythbusting-julia/

Question: I read a lot on the Internet about this programming language called Julia being
very fast, but I’m very skeptic. What do you say?

Answer: It’s good to be skeptic, but you can try yourself!

Q: Some time ago I actually tried and I found it to be very slow. See, I wrote this
small script, but it took almost a couple of seconds to do very simple things…

A: Right. There are some things to keep in mind:

  1. Julia has a non-negligible start-up time. It isn’t that bad, it’s generally on the order
    of 0.4 seconds or so, but if your workflow consists on running several times very small
    scripts, you’re going to pay this cost every time you start Julia. If this is a
    problem, you can either change your workflow to run a single long-running script or just
    don’t use Julia. If you want to repeat very small tasks multiple times driving them from
    the shell maybe Julia isn’t what you need to use and that’s OK. However keep in mind
    that Julia is a perfectly fine language for
    scripting
    , so you could also
    consider using Julia to drive your scripts, which would also allow you not to start Julia
    multiple times.
  2. Julia is a Just-In-Time (JIT)
    compiler, or maybe a Just-Ahead-Of-Time one. This means that in each Julia session, the
    first time you run a function (or a method of a function, to be more precise) it needs to
    be compiled, unless its compiled version was saved in the
    sysimage. Again, compilation time isn’t
    that bad, usually in the order of ~0.1 seconds for most simple functions, but more
    complicated functions can take longer. The problem however is that this time adds up,
    and it adds to the start-up time above.
  3. It’s very easy to make mistakes that very… I mean very negatively affect
    performance of Julia code. The most common gotcha is using global variables. Global
    variables are usually considered bad practice in many compiled languages, in Julia
    non-constant variables completely destroy performance. Avoid them and reorganise the
    code to use functions, or make the global variable constant, if that’s really a constant
    that has to be accessed by multiple functions. See Julia’s performance
    tips
    for more gotchas. While
    it seems daunting to remember all of them, it takes little practice to become familiar.
    Also, you can gradually learn them, you don’t need to get everything right from the
    beginning, you can learn them the hard way 😉

To summarise, Julia gives its best in long-running sessions so that start-up and JIT
compilation costs are paid as few as possible. There are some tools to make it easier to
restart Julia less frequently, or decrease the JIT cost:

  • Revise.jl automatically recompiles methods that
    are modified in the source code of packages or scripts. When developing packages, this
    tool lets you seamlessly hot-reload the code and not restart Julia all the time. Thanks
    to Julia’s introspection and reflection capabilities, this works granularly and recompiles
    only the needed methods. Note that when you do restart Julia, the package you edited will
    be entirely recompiled anyways. One more reason to avoid restarting Julia multiple times
    🙂
  • If you’re an Emacs user, the need to restart a program as little as possible may sound
    familiar. In that case a solution is to start an Emacs server and connect new clients to
    it. It isn’t a surprise there is a similar tool also for Julia:
    DaemonMode.jl. This package is very useful
    if you insinst on a workflow based on starting Julia multiple times to run small scripts:
    with the server-clients model, you pay the cost of startup time only once.
  • With PackageCompiler.jl you can
    create custom sysimages. A common use-case is to fully compile a stable package that
    you’re planning to use very frequently: JIT cost will be drastically reduced, the drawback
    is an increased size of the sysimage on disk.

Q: I’ve seen the micro-benchmarks on Julia’s
website. They seem too good to be real! And the set of benchmarks is very arbitrary to
make Julia look good

A: Let’s make it clear. Benchmarking is hard, languages benchmarks even more so: language
specifications can allow better or worse optimisations, but to get real numbers to compare
you usually use specific implementations (for C/C++ think of GCC vs LLVM vs Intel compilers
vs you-name-it; there are many Python implementations with different focus on speed), which
can have very different performance on different hardware. At that point, what are you
actually benchmarking? Language specification? Particular implementation? Vendor
optimisations? Hardware performance?

There is no reason to think Julia’s micro-benchmarks are somewhat fake: the source code is
available
, anyone can check, improve, and run
it. Julia is in the same ballpark as other compiled languages, like C, LuaJIT, Rust, Go and
Fortran, there is nothing special about it. Just like all other compiled languages, Julia
timings don’t include compilation time, to show the pure runtime performance of compiled
code. I agree the set of these benchmarks is arbitrary, any set would be, it’s impossible
to cover all possible applications! Actually, these benchmarks were chosen early in Julia’s
development to guide optimisations to do in the language. In a sense, Julia was optimised
to make these benchmarks look good.

Q: Well, plotting is so slow in Julia!

A: That’s a fair point. Honestly, plotting doesn’t have to be slow in Julia. Some simple
or low-level plotting packages are actually fairly fast. However, one of the most popular
packages for plotting, Plots.jl tends to feel slow to someone used to plotting in R or
Python. This happens because Plots.jl has an interesting interface with which you can use
the same code to produce plots using different backends, but this means that some code paths
are lazily compiled on-the-fly when a new backend is enabled, including the default one.
This problem used to be very annoying, 2-3 years ago the first time you’d do a plot in a
Julia session (loading the Plots.jl package and running the plot command) it’d take up
to ~20 seconds for the plot to appear, leading to the so-called “time-to-first-plot”
problem. The situation today is much improved, with time-to-first-plot for Plots.jl in
the order of ~5 seconds on recent computers. Definitely not as fast as you’d like, but much
more bearable than before, with the hope to see more small improvements along the way in the
future. As already said, the issue can be made less tedious by restarting Julia as little
as possible, or using a custom sysimage which includes Plots.jl.

Q: I’ve heard claims of people getting some ridiculously large speed-ups by moving to
Julia, like 200x, is that true? Is that even possible?

A: To be fair, probably the original code was not so performant, but speeding it up would
have been complicated for the developer. And maybe with Julia it was easier to use hardware
accelerators like GPUs. Does that make the claim misleading? Maybe so, because that’s not
a 1-to-1 comparison, but again, 1-to-1 comparisons are hard to make in a way that is fair
for all languages. Does that make the claim false? The fact is that Julia allowed them to
more easily enable optimisations that’d have been harder to achieve otherwise with the same
amount of effort. Isn’t that the point of a language that aims to make you more productive?

Q: Then should I expect a similar speed up by moving to Julia?

A: Not really! A more accurate answer would be that it depends. If your C/C++/Fortran code
is already heavily optimised, maybe even with some assembly to squeeze out the last bits of
performance, or if you have a Python+NumPy code which follows best performance practices
(vectorised functions, no for loops, etc…) don’t expect any sensible speedup by
translating it to Julia, if any speedup at all. In this case moving to Julia may not be
even worth the effort, just to set the right expectations. Actually, a naive literal
translation is likely to have fairly bad performance because what’s idiomatic in a language
may not be so in a different language with different properties (for example, they may have
different memory layouts). Julia isn’t a magic wand, it’s a compiler which generates
machine code, you can likely achieve the same machine code with many other languages. The
difference usually lies in the amount of effort you need to put on it with the same
experience in each language, starting a project from scratch: my totally biased claim is
that tends to favour Julia 🙂 On the other hand, if the problem you want to solve isn’t
amenable to be vectorised, note there are no performance penalties in using for loops in
Julia, so in this case moving from a for-allergic language or package to Julia can give
you a nice boost.

If you’re willing to move your code to Julia, the improvements you should expect are not
necessarily about speed: you will work in a portable language, and get access to a wide,
composable ecosystem of packages for numerical computing, with possibility to switch to use
hardware accelerators (like GPUs) with minimal effort.




What you can learn from accessing an element of an array in Julia

By: Mosè Giordano

Re-posted from: http://giordano.github.io/blog/2021-03-25-access-array/

This innocent question from a newcomer in the official Slack
workspace
of the Julia programming language

How can I extract the Float64 element from a 1×1 Array{Float64, 1}?

spurred a series of more or less serious answers:

  • use the only function: only(a)
  • get the element with index 1: a[1]
  • omit all indices: a[]
  • use the first function: first(a)
  • get the element with index 1, 1: a[1, 1]
  • use the last function: last(a)
  • get the element with index 1, 1, 1, 1, 1, …: a[ones(Int, 100)...] (the
    ... operator is called
    splatting)
  • get the last element with the special end index (end here is automatically
    lowered to lastindex(a)): a[end]
  • get the element in the first row and last column: a[begin, end]
  • reduce the array with the
    sum function: reduce(sum, a)
  • get the element with index floor(Int, -real(exp(π * im))): a[floor(Int, -real(exp(π * im)))], or the variant x[floor(Int, -real(cispi(1)))]
  • dereference the pointer
    of a
    (remember?):
    unsafe_load(pointer(a))
  • use the StarWarsArrays.jl
    package and get the fourth element (mentioned in the post about random
    indexing
    ):
    StarWarsArray(a)[4]
  • use the
    getindex
    function (a[n, ...] is a syntactic sugar for getindex(a, n, ...)):
    getindex(a, firstindex(a))
  • get a random element from the array: rand(a)

Benchmarking

We can compare the performance of all these versions. To do this, we can use
the BenchmarkTools.jl package
(benchmarks kindly provided by @Seelengrab):

using BenchmarkTools # to install it: ]add BenchmarkTools
using StarWarsArrays # to install it: ]add https://github.com/giordano/StarWarsArrays.jl
using Random # it's a standard library

only_bench(x)         = only(x)
one_idx_bench(x)      = x[1]
omit_bench(x)         = x[]
first_bench(x)        = first(x)
two_ones_idx_bench(x) = x[1, 1]
splat_bench(x)        = x[ones(Int, 100)...]
end_bench(x)          = x[end]
begin_end_bench(x)    = x[begin, end]
reduce_sum_bench(x)   = reduce(sum, x)
floor_bench(x)        = x[floor(Int, -real(exp(π * im)))]
cispi_bench(x)        = x[floor(Int, -real(cispi(1)))]
dereference_bench(x)  = unsafe_load(pointer(x))
getindex_bench(x)     = getindex(x, firstindex(x))
starwars_bench(x)     = StarWarsArray(x)[4] # ಠ_ಠ
rand_bench(x)         = rand(x)

for f in [
    only_bench,
    one_idx_bench,
    omit_bench,
    first_bench,
    two_ones_idx_bench,
    splat_bench,
    end_bench,
    begin_end_bench,
    reduce_sum_bench,
    floor_bench,
    cispi_bench,
    dereference_bench,
    getindex_bench,
    starwars_bench,
    rand_bench,
]
    println(f, ':')
    @btime $f(x) setup=(x=rand(1,1))
end

On my computer I get

only_bench:
  2.800 ns (0 allocations: 0 bytes)
one_idx_bench:
  1.631 ns (0 allocations: 0 bytes)
omit_bench:
  2.796 ns (0 allocations: 0 bytes)
first_bench:
  1.630 ns (0 allocations: 0 bytes)
two_ones_idx_bench:
  1.685 ns (0 allocations: 0 bytes)
splat_bench:
  4.524 μs (5 allocations: 1.75 KiB)
end_bench:
  1.378 ns (0 allocations: 0 bytes)
begin_end_bench:
  1.644 ns (0 allocations: 0 bytes)
reduce_sum_bench:
  1.901 ns (0 allocations: 0 bytes)
floor_bench:
  1.630 ns (0 allocations: 0 bytes)
cispi_bench:
  1.631 ns (0 allocations: 0 bytes)
dereference_bench:
  1.369 ns (0 allocations: 0 bytes)
getindex_bench:
  1.371 ns (0 allocations: 0 bytes)
starwars_bench:
  1.371 ns (0 allocations: 0 bytes)
rand_bench:
  9.599 ns (0 allocations: 0 bytes)

Unsurprisingly, the fastest solution is dereferencing a pointer, which compiles
down to the following code

julia> x = rand(1,1);

julia> @code_native debuginfo=:none dereference_bench(x)
        .text
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0                   # xmm0 = mem[0],zero
        retq
        nopl    (%rax,%rax)

This is pretty much what you’d expect from the corresponding C
code
.

What’s remarkable is that the
StarWarsArrays.jl package
achieves about the same performance: accessing an element of a custom data
structure using a funny custom indexing, and written without having performance
in mind, is just as efficient as dereferencing a pointer.

However, subnanosecond differences aren’t credible: most functions that clocked
in the range 1.3-1.6 nanoseconds have basically the same performance. If you
run the benchmarks again you’ll find that their performance get more or less
close to that of dereferencing. The difference with dereferencing is that all
other functions do bounds
checking
, which on average lead
to slightly larger runtime. However, in high-performance computing you usually
want to make sure to be within bounds at a higher level and skip bounds checks
when accessing individual elements. If you’re 100% sure that the element of the
array you want to access is within its bounds, you can use the macro
@inbounds to
skip runtime bounds checks.

To see the performance when skipping bounds checks in the above functions, you
can either use @inbounds in their definitions (e.g., one_idx_bench(x) =
@inbounds x[1]
, etc…), or start Julia with the --check-bounds=no flag (this
flag is not recommended for general use, as it will skip all bounds checks,
not only where you know it’s safe to do so). With --check-bounds=no I get

only_bench:
  2.725 ns (0 allocations: 0 bytes)
one_idx_bench:
  1.369 ns (0 allocations: 0 bytes)
omit_bench:
  1.369 ns (0 allocations: 0 bytes)
first_bench:
  1.369 ns (0 allocations: 0 bytes)
two_ones_idx_bench:
  1.369 ns (0 allocations: 0 bytes)
splat_bench:
  4.510 μs (5 allocations: 1.75 KiB)
end_bench:
  1.369 ns (0 allocations: 0 bytes)
begin_end_bench:
  1.630 ns (0 allocations: 0 bytes)
reduce_sum_bench:
  1.960 ns (0 allocations: 0 bytes)
floor_bench:
  1.408 ns (0 allocations: 0 bytes)
cispi_bench:
  1.369 ns (0 allocations: 0 bytes)
dereference_bench:
  1.369 ns (0 allocations: 0 bytes)
getindex_bench:
  1.368 ns (0 allocations: 0 bytes)
starwars_bench:
  1.369 ns (0 allocations: 0 bytes)
rand_bench:
  9.606 ns (0 allocations: 0 bytes)

As expected, most functions now more consistently perform as well as
dereferencing. This is confirmed by the fact they are all compiled down to the
same efficient code, even the most bizarre ones like floor_bench and
starwars_bench:

julia> @code_native debuginfo=:none omit_bench(x)
        .text
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0                   # xmm0 = mem[0],zero
        retq
        nopl    (%rax,%rax)

julia> @code_native debuginfo=:none floor_bench(x)
        .text
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0                   # xmm0 = mem[0],zero
        retq
        nopl    (%rax,%rax)

julia> @code_native debuginfo=:none getindex_bench(x)
        .text
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0                   # xmm0 = mem[0],zero
        retq
        nopl    (%rax,%rax)

julia> @code_native debuginfo=:none starwars_bench(x)
        .text
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0                   # xmm0 = mem[0],zero
        retq
        nopl    (%rax,%rax)

Lessons learned

  • To seriously answer the question that led to this digression, I think the
    first answers are all good ways to access the only element of a 1×1 array:
    only(a) (which however always does a runtime check to make sure that’s the
    only element of the array, incurring a performance hit), a[1], a[begin]
    a[1, 1], a[], first(a)
  • Custom Julia data types are efficient! The fact that a custom data type with
    custom indexing like
    StarWarsArray has basically
    the same performance as dereferencing a pointer (exactly same native code
    when skipping all bounds checks) is a testament to the power and expresiveness
    of the language: you can write your code in a high-level form, and have it
    compiled down to very efficient code
  • Splatting a long array can be bad for performance, both at compile- and
    run-time. Splatting is a convenient syntax, but use it with great care,
    especially when you have many elements
  • Reduction has relatively little overhead
  • rand(a) performed quite bad, but actually most of the time is spent in
    accessing the global default random number generator (RNG): if you care about
    performance (and reproducibility, too) you should always pass in the RNG as
    argument (see also The need for rand
    speed
    by Bogumił
    Kamiński):

      julia> @btime rand(x) setup=(x=rand(1,1))
      9.596 ns (0 allocations: 0 bytes)
    0.08218288442687438
    
    julia> @btime rand($(Random.default_rng()), x) setup=(x=rand(1,1))
      4.770 ns (0 allocations: 0 bytes)
    0.8112313537560083
    

    In the second benchmark we explicitly passed the default RNG to rand and can
    see that avoiding accessing the global RNG saves more than 50% of runtime.
    Performance is still far from ideal though. As an aside, drawing a random
    element from a short tuple (<= 4 elements) is optimised, and we can get again
    the same performance as derefercing a pointer for a 1-element tuple:

    julia> @btime rand($(Random.default_rng()), Ref(x)[]) setup=(x=(rand(),))
      1.369 ns (0 allocations: 0 bytes)
    0.4308805082102649
    
    julia> @code_native debuginfo=:none rand(Random.default_rng(), (rand(),))
            .text
            vmovsd  (%rsi), %xmm0                   # xmm0 = mem[0],zero
            retq
            nopw    %cs:(%rax,%rax)