Transpiling Julia to C – The LLVM-CBackend

static-julia-logo

Julia Computing carried out this work under contract from the Johns Hopkins University Applied Physics Laboratory (JHU APL) for the Federal Aviation Administration (FAA) to support its Airborne Collision Avoidance System X (ACAS X) program. JuliaCon 2015 had a very interesting talk by Robert Moss on this topic.

In my last blog post on the requirements for statically compiling julia, I promised to describe in more detail the ability to transpile Julia code into C output. I would now like to announce the open-source release of an updated LLVM CBackend that is able to handle the broad set of intrinsic operations used by Julia.

Julia already has a backend for compiling to the LLVM IR. The Static and Ahead of Time (AOT) compiled Julia work improved this backend to be able to handle any valid Julia method directly, instead of only those with sufficient type specialization. The updated LLVM-CBE project is then able to map the LLVM IR to the appropriate C construct, using only a limited subset of the C language (at this time, the output is nearly C89 compliant). The final executable then could be compiled by some other C compiler and would not have any dependency on the llvm project.

There are a few caveats to be aware of:

  • The LLVM IR is more expressive than C, so some parts of the output are not compatible with the JIT code. This means that you should not try to use the system image compiled by a C compiler with the JIT enabled. Additionally, C has much more undefined behavior than LLVM or Julia, so some places in the code may be more verbose than you might have expected to force C to give the intended behavior.

  • This is not a true cross-compiler. This is not the fault of the backend, but of the Julia program generator phase which encodes many platform-specific assumptions as it generates function definitions. This is the same situation you would encounter if you were to, for example, run a C file through cpp then try to compile the the output using a cross compiler. Some examples include:
    • OS-specializations
    • Endianness assumptions
    • Word (pointer) size
    • I am hopeful that the first two will be addressable in the future. Indeed, the coreimg.jl / inference.ji file is already able to avoid making any platform-specific assumption other than sizeof(uintptr_t) and can be used cross-compiled in this manner.
  • The resulting binary includes code for every method that was added to the system image, with no tree-shaking. The main reasons for this is that regardless of the contents of the program, the semantics of Julia require that the __init__ functions of each module will be called (with all of their side-effects). And beyond trivial programs, it is computationally infeasible to statically determine which methods might be called without solving the halting problem. It may be possible to statically bound the answer, but I expect that it will be more beneficial to better modularize the way code for Base is loaded.

Notwithstanding those limitations, the Julia compiler backend was previously not capable of handling all constructs. The backend has always been capable of compiling fully type-specialized method definitions to native code. And this turns out to be sufficient for compiling an unspecialized versions of most functions as well, by specifying a less specialized type signature for the lambda to the compiler. This is true since the semantics of a Julia program (notably dispatch) are defined as operating on the runtime types of the arguments. The ability to infer the types of variables and arguments ahead-of-time to unbox them and devirtualize the calls is important as an optimization, but neither optimization impacts the correctness of the code.

There were a few cases where the backend couldn’t handle the generic method signature without additional machinery: static parameters, intrinsics, and comprehensions.

Static method parameters represent a side channel of additional information, computed as a function of the method type signature and the runtime call types. So while f{T}(x::T) = T is a very simple method definition, it required some extra care to compile it correctly for any x. Compiling this generically requires the ability to pass T as an argument. However, the callee knows x and not T so it can’t readily provide this extra information. Usually, the Julia backend knows the method signature type exactly, which allows it to fill in T as a constant during compilation and avoids the issue. For the static compilation case, the method dispatch system needs to instead insert this extra parameter into the call. It does this by adding a hidden argument to the call signature, and then marking the function pointer as needing to be called with this extra argument.

A second challenge is intrinsics. These calls are function-like, but do not have a dynamic behavioral model behind them. The code-generator simply fails if it was not able to statically predict one of these types (it was the job of generic_unbox and auto_unbox to try to avoid reaching this failure situation. In practice, this isn’t usually an issue since the method signatures for these functions in Base have been strongly statically typed. However, to guarantee this can’t happen required an implementation of a generic version of all intrinsic functions that can be called if the static type is not sufficiently well-known. The updated LLVM CBackend even makes direct use of some of these dynamic copies of LLVM’s normal intrinsics to implement support for emulation of i128 operations when they are not natively supported by the compiler (for example, when using MSVC or when not using x64).

Another case is comprehensions. Currently this is an open issue. This could be worked around using the same mechanism that is used to add static parameters. But since this issue affects correctness of more than just statically compiled code, the general fix is being actively worked for the v0.5 release.


CBackend usage

These instructions assume you have built Julia from source. If you want to use it independent of Julia, please see the instructions in the repo README instead.

Overview:

  1. Build julia from source
  2. Install llvm-cbe
  3. Generate julia output .bc file with --compile=all
  4. Convert julia output .bc -> .c and compile

Start by defining the directory of the julia src root folder where you have previously built an (unmodified) version of julia master (>=v0.5-) as an environment variable:

JULIA_ROOT=`pwd`/julia

Install / compile llvm-cbe

# Build LLVM 3.7.1, for example, here we build it in-tree
make -C $JULIA_ROOT/deps compile-llvm LLVM_VER=3.7.1

# installs llvm-cbe to $JULIA_ROOT/deps/build/llvm-3.7.1/build_Release/Release/bin
git clone git@github.com:JuliaComputing/llvm-cbe.git $JULIA_ROOT/deps/srccache/llvm-3.7.1/projects/llvm-cbe
make -C $JULIA_ROOT/deps/build/llvm-3.7.1/build_Release/projects

Generate the statically-compiled Julia object file

Follow the steps in my previous blog post to create a .bc file with the desired content.

Convert to C source file (.c) and build

Running the llvm-cbe binary on this LLVM bitcode file converts it into C program:

$JULIA_ROOT/deps/build/llvm-3.7.0/build_Release/Release/bin/llvm-cbe 
    sys-plus.bc -o sys-plus.cbe.c

This output file then can be integrated into your normal toolchain and should work with your compiler. For gcc and clang, I have tested with the following flags to suppress uninteresting lint flags while demonstrating compliance to the standard:

WARN='-std=c99 -pedantic -Wall -Wextra -Wno-unused-variable -Wno-unused-function -Wno-unused-parameter -Wno-sign-compare -Wno-unused-but-set-variable -Wno-long-long -Wno-invalid-noreturn'

The resulting output than can be used in the place of the default system image (with JIT compilation disabled):

./julia -J <file>.so --compile=no <ARGS>

or it could be linked into a larger embedded application executable and initialized with the embedding api:

jl_options.compile_enabled = JL_OPTIONS_COMPILE_OFF;
jl_options.image_file = argv[0];
julia_init(JL_IMAGE_CWD);