Static and Ahead of Time (AOT) compiled Julia

On running Julia code without a JIT


Julia Computing carried out this work under contract from the Johns
Hopkins University Applied Physics Laboratory (JHU APL) for the Federal Aviation
Administration (FAA) to support its Traffic-Alert and Collision Avoidance
System (TCAS) program. JuliaCon 2015 had a very interesting talk by Robert Moss on this topic. Part of this work was also sponsored by Blackrock, Inc.

I’m often asked when I tell someone about Julia: “What makes it fast?” and “Why can’t <insert favorite dynamic language> do the same?” That’s not an easy question, since the answer has many parts, many of them nuanced and sometimes specific to a particular application – or even developer. Being fast is one benefit, but exploring the answer to this question also reveals some other applications: static compilation (i.e. removing the JIT dependency entirely), theorem proving, static memory allocation, and more! Answering this question requires an understanding of the traditional definitions of static vs. dynamic languages, and how Julia fits into that spectrum.

Many languages, including Julia, support templated code, macros, or other forms of source code generation. In traditional static languages, these have often been written in their own language dialect. This makes the distinction between the application and the generator functions immediately clear to the reader. But it also means the user must actually learn two dialects – and how they interact – to be fully proficient in the one language. This templating language may be simple (such as C Preprocessor macros), but may also be a full turing-complete interpreter (such as C++ templates).

Dynamic languages, by contrast, have commonly exposed similar functionality by providing an eval function. The reasoning is that since all code is being dynamically interpreted, there is no disadvantage for some of this code not being available to the compiler until “just-in-time” for the code to be executed. The distinction between application and generator is still fairly clear: the generator function ends with a call to eval.


It’s easy to blur the line between these two camps, however. For example, if a C++ program links against libclang (for example, the cling project), it is possible to program in the dynamic style. Or if a program written in a dynamic language doesn’t use eval, then it can be transpiled to avoid the runtime interpreter[1]. Julia embraces this hybridization. But to discuss the possibility of static compilation requires an understanding of this distinction between these two phases in the life cycle of the execution of code.

Julia follows in the Lisp tradition and provides tools for manipulating the language using the language itself. This can make it non-obvious to the reader which parts of the code are generators for application logic, and which parts of the program are the actual application logic. But this is also what complicates attempts to answer the initial question of whether Julia programs can be statically compiled – and what that question really means. If compilation is defined as finding the most efficient mapping of the source code onto the primitive instructions understood by the machine, then accurate static analysis is a prerequisite for the compiler to be able to optimize this translation. If the entire program can be statically transformed, then it is possible to generate compiled binaries and remove the runtime dependency for a parser / interpreter / compiler. And while compiler instruction selection is probably the most common static analysis, it is far from the only possible static analysis pass. For example, theorem proving, automated testing, and race detection are all active research areas.

A user’s first encounter with the Julia language is usually at the interactive REPL prompt, and then by writing script files in a similar style. At this top-level scope, all forms of dynamic evaluation are permitted: new types can be defined; functions created; variables can be modified and introspected via reflection; and modules can be defined and imported. However, once the user defines a local scope (for example, a function definition, let block, for loop), only static constructs can be used, with three exceptions provided for user flexibility: eval, macros, and generated functions. This is important, because it means that if the programmer avoids using these three dynamic constructs for the application logic, it is possible to statically analyze and compile the program generated as a result of running the user’s code file.

Let’s take a closer look at each of these cases:

  1. A call to eval can be used as an escape hatch to invoke top-level expressions and the compiler from inside a function. This is akin to its purpose in a typical dynamic language (with the exception that it cannot introspect or modify a local variable, which many other languages do allow). Julia provides many constructs intended to help the user avoid needing this functionality, including closure (nested) functions, dynamic dispatch, type parameters, and macros.

  2. Macro calls are demarcated by @ to distinguish them from regular function calls. They are functionally equivalent to the code templating features of many traditional static languages (albeit more ergonomic since they are implemented in Julia itself, in the style of Lisp). They are run after parsing, but before the code is executed. Indeed, there is no mechanism for invoking them at runtime so the existence of their definitions in a program does not cause problems.

    Aside: If you’ve ever encountered the error: “unsupported or misplaced expression $”, this is specifically the runtime-macro behavior that is “unsupported”. Indeed, the syntax for a runtime invocation of a macro would be:

    $(quote @macrocall $(args...) end)
    

  3. Generated (aka staged) functions cannot be statically compiled. These functions are equivalent to calling eval on a new anonymous function computed as a function of the input types (a JIT-parsed lambda, if you will), and optionally memoizing the result. Therefore, it is possible to statically compile the memoization cache. This makes them, in this regard, superior to an unadorned eval call. But in the general case, generated functions are black boxes to the compiler and thus cannot be analyzed statically.

So there you have it. If you avoid eval and generated functions, any language – including Julia – can be statically compiled.

But that still leaves all of the important questions unanswered, such as: (a) why does this matter? (b) how can we use it? (c) what makes Julia special?

Julia is special because it was designed from the start as a dynamic language of the ilk described above, but one in which the programmer often can describe to the compiler the extent to which those features are used by a particular function. The built-in library of functionality (aka Base) was developed to provide this information and take advantage of these principles, which continues to influence authors of extension modules (aka packages) to also follow these principles. For example, the Julia community seems to have coined the term “type-stability” to describe a concept that static / compiled languages have historically enforced and dynamic / scripting languages have historically disregarded. These considerations are what allows Julia to claim both flexibility and speed. These concerns can be very difficult to retrofit onto a legacy codebase. Put another way, the speed potential of a language consists almost entirely of the properties that the compiler is able to prove ahead-of-time so that they don’t need to checked at runtime. And the flexibility comes from being able to get those runtime checks automatically whenever they are needed. Type-checking (and unboxing) is one aspect of these checks, but there are many other properties that can be computed such as stack allocation, statically-determined memory lifetimes, constant propagation, and call de-virtualization. (For a more complete discussion of these properties, see Oscar Blumberg’s Green Fairy Analysis)

This also means turning Julia code into Julia binaries requires no tricks, complicated incantations, or obscure limitations. In fact, the Julia runtime / compiler is already silently doing this for you on a regular basis. Sorry, if you were hoping for something really spectacular here – but that’s also not quite the end of the story, since we can exercise some direct control over it.

So now let’s pull back the covers on some of the options for the Julia binary. You may have glossed over this long list at some point (abridged):

~$ julia --help

julia [switches] -- [programfile] [args...]

-v, --version         	Display version information

-h, --help            	Print this message

-J, --sysimage <file> 	Start up with the given system image file

--precompiled={yes|no}	Use precompiled code from system image if available

--compilecache={yes|no}   Enable/disable incremental precompilation of modules

--startup-file={yes|no}   Load ~/.juliarc.jl

-e, --eval <expr>     	Evaluate <expr>

-E, --print <expr>    	Evaluate and show <expr>

-P, --post-boot <expr>	Evaluate <expr>, but don't disable interactive mode (deprecated, use -i -e instead)

-L, --load <file>     	Load <file> immediately on all processors

--compile={yes|no|all}	Enable or disable compiler, or request exhaustive compilation

--output-o name       	Generate an object file (including system image data)

--output-ji name      	Generate a system image data file (.ji)

--output-bc name      	Generate LLVM bitcode (.bc)

--output-incremental=no   Generate an incremental output file (rather than complete)

What you may not have been as readily aware of is that many of these options are used internally to handle various modes of operation. For instance, -p n (or addprocs(n)) will launch extra copies of julia on the indicated hosts with --worker.

The --output, --compile, and --sysimage are the ones that will be of primary interest for investigating the static compilations abilities of Julia.

When building the Julia language runtime from the .jl source files in base, the Julia runtime library code is run with a flag that tells it where to save the resulting application – code and variable declarations – after executing the input commands. The first call to ./julia during the source compilation evaluates the coreimg.jl file and writes a bytecode representation of the resulting Julia Inference analysis code to inference.ji:

./julia --output-ji inference.ji coreimg.jl

Then the system builds upon that image, to compile the entire Base system, by evaluating sysimg.jl in the runtime environment previously defined and saved to the inference.ji file:

./julia --output-o sys.o --sysimage inference.ji --startup-file=no 
	sysimg.jl

And since it compiled some of those functions to native code (due to directives in the precompile.jl file or other heuristics), that native code can be linked into a dynamic library for fast startup:

cc -shared -o sys.so sys.o -ljulia

In normal usage, the Julia runtime only invokes the compiler when a function is called. This is an intentional trade-off that incurs higher memory usage and longer compile times (aka JIT warm-up), with the expectation that the additional information from the presence of the types will enable the compiler to generate simpler code with fewer runtime operations – resulting in a net time savings. There’s another assumption in this behavior also: which is that the compiler will be available at runtime.

There are cases, however, where the user may want to or need to avoid running the compiler at runtime. The --compile=<yes|no|all> flag makes this possible (the default is yes). When Julia is run with the --compile=all flag, the compiler is invoked for all functions in the system image, so that the resulting sys-all.so binary contains native code for all functions defined in Base:

./julia --output-o sys-all.o --sysimage sys.so --startup-file=no 
	--compile=all --eval nothing
cc -shared -o sys-all.so sys-all.o -ljulia

This dynamic library no longer requires the compiler, which can be demonstrated by disabling the compiler by command line argument:

./julia --compile=no --sysimage sys-all.so

Or the compiler can be removed from the library entirely:

make JULIACODEGEN=none

resulting in a much smaller libjulia.so file –

– but which will throw an error if the system needs to use any methods at runtime that haven’t been pre-compiled to binary code:


I think it is worth mentioning here that the ./julia binary itself is actually just a very small utility wrapper for parsing the command line arguments and loading the actual Julia runtime from the sys.so dynamic library. This allows the same binary file to be used as both a dynamic library file and an executable, instead of needing to create two different output files. But the linker could instead be invoked slightly differently to embed the compiled code directly into the executable[2]:

cc -o julia-app sys.o repl.c -ljulia

Package code is handled similarly to the system image, so these same principles apply also to modules loaded with Base.__precompiled__(true) / Base.compilecache("Package"). These commands invoke the ./julia program in compiler-mode like the examples above, plus the addition of an incremental flag. This extra flag tells it that the output file should only include the delta of the code and definitions that are part of that package:

./julia --output-ji pkg.ji --sysimage sys.so --output-incremental=yes 
	pkg.jl

I think that about covers the current capabilities of Julia’s static compilation engine. Over time, I’m sure that I, and the rest of the team at Julia Computing Inc., will be adding many more under-the-hood features and optimizations to expand further on these powerful capabilities. This will allow Julia to be used on a broad variety of resource-constrained compute devices, many of which disallow JIT compilation or simply aren’t powerful enough for it to be beneficial – web-browsers (e.g. emscripten), smartphones, IoT devices (e.g. the Raspberry Pi), etc.

One other application of static analysis that I hadn’t yet touched on is the ability to convert Julia code to another language, such as C. In my next post, I plan to dive further into this capability and show how that can be done.


Supplemental Tools

Since Julia users often come to be aware of, and sometimes even fluent in, the esoterica of such tools as code_llvm and code_native, I feel it would be remiss of me if I didn’t point out that there are several standalone tools for analyzing the static files generated above. For complete documentation, refer to the llvm webpage for these tools.

  • To use most of these tools, you will need to start by re-running the command of interest above, and specifying --output-bc instead of (or in addition to) --output-o

  • llvm-dis : converts the .bc (llvm bitcode) binary file to .ll (llvm assembly text)
    • roughly equivalent to code_llvm
  • llc : compiles a .bc or .ll file to .o (equivalent to the file from --output-o)

  • llvm-objdump : disassembles a .o file to .S
    • roughly equivalent to code_native


[1]: This observation forms the basis of the JIT compilers for many popular languages such as Javascript.


[2]: There is near infinite variety in the flags that can be passed to cc to compile and link files. I’ve neglected to mention paths and a few flags that are frequently essential such as -L, -I, and -Wl,-rpath,$(pwd). I’m assuming here that the reader already has a toolchain configured for their purpose, so I’ve opted for trying to show a simple example clearly rather than trying to teach all of the nuances, which could fill a whole blog post of its own.