Bugs in Julia with AFL and C-Reduce

By: Tim Besard

Re-posted from: https://blog.maleadt.net/2018/11/16/julia_bugs/

Recently, I’ve been looking at some existing tools to efficiently find bugs in
the Julia language implementation: American Fuzzy
Lop
(AFL) for fuzzing the compiler, and
C-Reduce for reducing test cases. In this
post, I’ll detail the set-up and use of these tools. Skip to the bottom
of this post for a demonstration.

American Fuzzy Lop

Fuzzing Julia used to be impractical due to the significant start-up time of the
Julia VM, rendering typical fuzzing approaches inefficient and calling for
specialized tools such as
TypeFuzz.jl. However, start-up
latency has been improving lately, and AFL now has some features to cope with
slow binaries.

Quick start

Working with AFL 2.52b, the naive approach of using afl-gcc and afl-g++ to
compile Julia by setting the CC and CXX build variables, results in a
functioning julia binary. This only instruments those parts of the compiler
that have been written in C/C++; code written in Julia is not covered.

To trap interesting errors, I built with LLVM_ASSERTIONS=1 and applied the
following patch to at least detect errors in the Julia parts of the compiler:

--- a/julia/src/gf.c
+++ b/julia/src/gf.c
@@ -280,10 +280,11 @@ jl_code_info_t *jl_type_infer(...)
 jl_printf(JL_STDERR, "Internal error: encountered unexpected error in runtime:\n");
 jl_static_show(JL_STDERR, jl_current_exception());
 jl_printf(JL_STDERR, "\n");
 jlbacktrace(); // written to STDERR_FILENO
 linfo_src = NULL;
+abort();

Running the instrumented binary with afl-fuzz crashes the AFL forkserver. This
is due to an incompatibility with the LD_BIND_NOW linker option that AFL uses.
It can be disabled by specifying LD_BIND_LAZY=1 before running the fuzzer.
Furthermore, Julia preallocates quite a bit of memory, and interesting
test-cases might take a while due to infer and compile, so I specify -m 2048
to use up to 2G per process and -t 5000 to let them run for up to 5 seconds:

$ LD_BIND_LAZY=1 afl-fuzz -i inputs/ -o outputs/
                          -m 2048 -t 5000 -- julia @@
afl-fuzz 2.52b by <lcamtuf@google.com>
[+] You have 16 CPU cores and 2 runnable tasks (utilization: 12%).
[+] Try parallel jobs - see docs/parallel_fuzzing.txt.
[*] Checking CPU core loadout...
[+] Found a free CPU core, binding to #4.
[*] Checking core_pattern...
[*] Checking CPU scaling governor...
[*] Setting up output directories...
[*] Scanning 'inputs'...
[+] No auto-generated dictionary tokens to reuse.
[*] Creating hard links for all input files...
[*] Validating target binary...
[+] Deferred forkserver binary detected.
[*] Attempting dry run with 'id:000000,orig:binary_trees.jl'...
[*] Spinning up the fork server...
[+] All right - fork server is up.
    len = 1324, map size = 12396, exec speed = 249839 us
[+] All test cases processed.
[+] All set and ready to roll!

Sadly, performance is pretty low: between 0.2 and 2 executions per second,
respectively executions that error (not crash) and succeed.

Let’s try and optimize that.

Only instrument the compiler

A first trivial optimization is to only instrument code that matters.
Specifically, compiling dependencies (such as LLVM and OpenBLAS) with the system
compiler bumps the speed up a notch:

julia$ make -C deps
julia$ make CC=afl/afl-gcc CXX=afl/afl-g++

Fail hard and fast

Fuzzing code results in lots of invalid sources getting processed by the Julia
compiler. Each of these result in a nice exception, which relies on expensive
stack unwinding to provide back traces. We can disable this functionality by
building with DISABLE_LIBUNWIND=1.

Another costly element of errors is displaying them, as this code is not part of
the precompiled system image. This functionality, courtesy of the
display_error function in julia/base/client.jl, is easily disabled by
preloading a file that redefines the relevant methods:

$ cat fuzz.jl
Base.display_error(er, bt) = println("ERROR")
$ julia -L fuzz.jl -e "error()"
ERROR

The above also applies to deprecation warnings, but those can be disabled more
easily by passing --depwarn=no to Julia.

Deferred fork server

Much of the time launching Julia is spent setting-up libraries, initializing
code generation, etc. This recurring cost can be avoided by using a deferred
fork server, where AFL starts the binary at a manually-specified point during
execution. In the case of Julia, this would be right before loading the input
file:

--- a/julia/ui/repl.c
+++ b/julia/ui/repl.c
@@ -89,6 +89,10 @@ static NOINLINE int true_main(int argc, char *argv[])
     jl_function_t *start_client = jl_base_module ?
         (jl_function_t*)jl_get_global(jl_base_module, jl_symbol("_start")) : NULL;
 
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+  __AFL_INIT();
+#endif
+
     if (start_client) {

This approach relies on the binary being compiled with AFL’s LLVM instrumenter,
available as afl-clang-fast and afl-clang-fast++ for compiling respectively
C and C++ code, again specified using the CC and CXX build variables. Note
that this instrumentation seems incompatible with OpenBLAS, so if you were to
instrument dependencies you would need to compile that library using a regular
compiler.

Use of the deferred fork server necessitated several changes. For one, the
symbols AFL defines need to be exported globally:

--- a/julia/src/julia.expmap
+++ b/julia/src/julia.expmap
@@ -1,5 +1,6 @@
 {
   global:
+    __afl*;

Furthermore, despite the __afl_manual_init symbol that enables the deferred
fork server being part of the Julia binary, AFL does not pick it up and defaults
to regular forking. I did not debug this issue, and instead patched AFL to
unconditionally enter deferred mode:

--- a/afl/afl-fuzz.c
+++ b/afl/afl-fuzz.c
@@ -6950,7 +6950,7 @@ EXP_ST void check_binary(u8* fname) {
-  if (memmem(f_data, f_len, DEFER_SIG, strlen(DEFER_SIG) + 1)) {
+  if (1) {
 
     OKF(cPIN "Deferred forkserver binary detected.")

You should now see the following in the afl-fuzz output:

[+] Deferred forkserver binary detected.

Note that the AFL documentation suggests putting the __AFL_INIT call before
the initialization of important threads, as they don’t survive the fork. The
patch above does not do that, as threads are spawned by restore_signals as
well as different module initializers. To be safe, I built with
JULIA_THREADS=0, configured LLVM with ENABLE_THREADS unset and executed with
OPENBLAS_NUM_THREADS=1 to reduce possible issues. Still, your mileage might
vary and you might have to move the initialization point further upwards.

Multithreaded fuzzing

With the above fixes, we go from 0.2-2 executions/s to 3-30; still considered
slow but a significant improvement nonetheless. Luckily, AFL scales pretty well
and can be easily executed across all threads of a system. This brings the final
execution speed to around 500 per second.

Now let’s focus on improving fuzzing quality.

Instrumenting Julia code

AFL can more accurately measure the impact of changes to inputs if all relevant
code is instrumented. Nowadays, part of the Julia compiler is written in Julia,
and compiled during initial bootstrap of the julia binary. Julia code is
compiled with LLVM, so we can re-use the LLVM-based instrumentation passes that
are part of AFL. However, Julia uses its own (heavily patched) version of LLVM,
so we need to recompile AFL and link against Julia’s LLVM library:

afl/llvm_mode$ PATH=julia/usr/tools:$PATH make
# building the tests will fail

In order to use this pass, which is now linked against Julia’s LLVM, we need a
compatible build of Clang that can load the instrumentation pass. We can do so
by rebuilding Julia’s copy of LLVM with the BUILD_LLVM_CLANG variable set:

julia$ make -C deps BUILD_LLVM_CLANG=1

# we can now run AFL's tests
afl/llvm_mode$ PATH=julia/usr/tools:$PATH make

AFL needs to be modified to expose its instrumentation pass:

--- a/afl/llvm_mode/afl-llvm-pass.so.cc
+++ b/afl/llvm_mode/afl-llvm-pass.so.cc
@@ -177,6 +177,10 @@
+namespace llvm {
+Pass *createAFLCoveragePass() {
+  return new AFLCoverage();
+}
+};

The following changes make Julia use these instrumentation passes:

--- a/julia/src/Makefile
+++ b/julia/src/Makefile
@@ -95,6 +95,8 @@
 endif # USE_LLVM_SHLIB == 1
 endif

+LLVMLINK += afl/afl-llvm-pass.so
--- a/julia/src/jitlayers.cpp
+++ b/julia/src/jitlayers.cpp
@@ -32,6 +32,7 @@
 namespace llvm {
     extern Pass *createLowerSimdLoopPass();
+    extern Pass *createAFLCoveragePass();
 }

@@ -230,6 +231,8 @@ void addOptimizationPasses(...)
+    PM->add(createAFLCoveragePass());
 }

Now build Julia while pointing AFL to the compatible clang binary:

julia$ export AFL_CC=julia/usr/tools/clang
julia$ export AFL_CXX=julia/usr/tools/clang++
julia$ make CC=afl/afl-clang-fast CXX=afl/afl-clang-fast++

For some reason, compiled Julia code uses AFL’s afl_prev_loc TLS variable
differently then how it is defined in the contained binary (instead using
__emutls_v.__afl_prev_loc). I worked around this issue by changing the
definition to a regular variable, since we’re not using threads anyway:

--- a/afl/llvm_mode/afl-llvm-rt.o.c
+++ b/afl/llvm_mode/afl-llvm-rt.o.c
@@ -52,7 +52,7 @@
-__thread u32 __afl_prev_loc;
+u32 __afl_prev_loc;
--- a/afl/llvm_mode/afl-llvm-pass.so.cc
+++ b/afl/llvm_mode/afl-llvm-pass.so.cc
@@ -101,8 +101,7 @@ bool AFLCoverage::runOnModule(Module &M) {
 GlobalVariable *AFLPrevLoc = new GlobalVariable(
-M, Int32Ty, false, GlobalValue::ExternalLinkage, 0, "__afl_prev_loc",
-0, GlobalVariable::GeneralDynamicTLSModel, 0, false);
+M, Int32Ty, false, GlobalValue::ExternalLinkage, 0, "__afl_prev_loc");

Make sure to rebuild all of Julia after applying this patch.

We can now compare the generated code with and without instrumentation:

--- julia-original     -e "@code_llvm sin(1.)"
+++ julia-instrumented -e "@code_llvm sin(1.)"
 L4:                                               ; preds = %top
 ; Location: special/trig.jl:32
 ; Function <; {
 ; Location: float.jl:452
-  %4 = fcmp uge double %2, 0x3E50000000000000
+  %10 = load i32, i32* @__afl_prev_loc
+  %11 = load i8*, i8** @__afl_area_ptr
+  %12 = xor i32 %10, 56933
+  %13 = getelementptr i8, i8* %11, i32 %12
+  %14 = load i8, i8* %13
+  %15 = add i8 %14, 1
+  store i8 %15, i8* %13
+  store i32 28466, i32* @__afl_prev_loc
+  %16 = fcmp uge double %8, 0x3E50000000000000
 ;}

Dictionary for Julia code

A dictionary makes it easier for AFL to explore verbose grammars such as
programming languages. I created a dictionary for
Julia
, with
most keywords, important operators, and common syntax such as values and
definitions. You can use this dictionary by passing -x julia.dict to
afl-fuzz.

Test-case reduction

Although AFL provides a test-case minimizer (afl-tmin), I decided to look into
a independent tool. C-Reduce is such a
tool, blindly reducing test cases based on a user-specified condition. As such,
it is more flexible, and independent from the complicated set-up from above.

Using C-Reduce is fairly simple: provide a script whose return code indicates
the state of the current test case, and run it over some files. I’ve created
some scripts that improve integration with Julia and its code loading mechanism:
maleadt/creduce_julia. For more
information, refer to the repository’s README, or have a look at this Discourse
thread
.

Demo

After running AFL for about a day on a 16-core machine, afl-whatsup reported
the following:

$ afl-whatsup -s outputs/
status check tool for afl-fuzz by <lcamtuf@google.com>

Summary stats
=============

       Fuzzers alive : 16
      Total run time : 13 days, 9 hours
         Total execs : 3 million
    Cumulative speed : 42 execs/sec
       Pending paths : 2023 faves, 62710 total
  Pending per fuzzer : 126 faves, 3919 total (on average)
       Crashes found : 1 locally unique

We found a crash! The actual input is written in one of the crashes
subdirectories, and contained the following code:

abstract type BTree end

mutable struct Empty <: BTree
end

mutable struct Node <: BTree
  info
  left::BTree
  right::BTree
end

function make(val, d)
  if d == 0
    Node(val, Empty(), Empty())
  else
    nval = val * 2
    Node(val, make(nval-1, d-1), make(nval, d-1))
  end
end

check(t::Empty) = 0
check(t::Node) = t.info + check(t.left) - check(t.right)

function loop_depths(d, min_depth, max_depth)
  for i = 0:div(max_depth - d, 2)
    niter = 1 << (max_depth - d + min_depth)
    c = 0
    for j = 1:niter
      c += check(make(i, d)) + check(make(-i, d))
    end,#@printf("%i\t trees of depth %i\t check: %i\n", 2*niter, d, c)
    d += 2
  end
end

const N=10
min_depth = 4
max_depth = N
stretch_depth = max_depth + 1

let c = check(make(0, stretch_depth))
  #@printf("stretch tree of depth %i\t check: %i\n", stretch_depth, c)
end

long_lived_tree = make(0, max_depth)

loop_depths(min_depth, min_depth, max_depth)

As you may notice, I used the BaseBenchmark’s shootout
benchmarks

as inputs to AFL. Pasting the reduced code in a Julia REPL indeed crashes the
compiler:

Internal error: encountered unexpected error in runtime:
BoundsError(a=Array{Core.Compiler.NewNode, (0,)}[], i=(3,))

signal (4): Illegal instruction

To reduce this code, I check-out
maleadt/creduce_julia and use the
crashing code as the initial input. We need to make sure that the run script
successfully catches the error condition; in this case it is sufficient to
grep for encountered unexpected error in runtime:

--- a/creduce_julia/run
+++ b/creduce_julia/run
@@ -30,4 +30,4 @@ JULIA_LOAD_PATH=$TEMP \
-julia main.jl |& grep "example error message"
+julia main.jl |& grep "encountered unexpected error in runtime"
creduce_julia$ ./run
Internal error: encountered unexpected error in runtime:

creduce_julia$ echo $?
0

Finally, we run the reduce script to kick-off the process. It will
automatically work in parallel on all available cores:

creduce_julia$ ./reduce
===< 30926 >===
running 8 interestingness tests in parallel
...
===================== done ====================

pass statistics:
  method pass_balanced :: parens-inside worked 1 times and failed 2 times
  method pass_indent :: final worked 1 times and failed 0 times
  method pass_clex :: rm-toks-3 worked 1 times and failed 61 times
  method pass_lines :: 4 worked 2 times and failed 50 times
  method pass_lines :: 1 worked 2 times and failed 50 times
  method pass_clex :: rename-toks worked 2 times and failed 3 times
  method pass_blank :: 0 worked 2 times and failed 0 times
  method pass_indent :: regular worked 2 times and failed 0 times
  method pass_lines :: 8 worked 2 times and failed 50 times
  method pass_lines :: 10 worked 2 times and failed 50 times
  method pass_lines :: 6 worked 2 times and failed 50 times
  method pass_lines :: 3 worked 2 times and failed 50 times
  method pass_clex :: rm-toks-2 worked 2 times and failed 64 times
  method pass_lines :: 2 worked 2 times and failed 50 times
  method pass_balanced :: parens-to-zero worked 2 times and failed 6 times
  method pass_clex :: rm-toks-4 worked 3 times and failed 49 times
  method pass_clex :: rm-toks-13 worked 3 times and failed 18 times
  method pass_lines :: 0 worked 15 times and failed 78 times
  method pass_clex :: rm-toks-1 worked 16 times and failed 68 times

The above took about 2 minutes on an 4-core workstation. Doing some manual
clean-up, we end up with the issue as reported in
JuliaLang/julia#30062:

for a = 1 end, b += 2

Future work

Although a tool like AFL isn’t ideal
for fuzzing a compiler as done in this blogpost, the technique and its set-up
can be used to fuzz other libraries and tools that are written in Julia (eg.
HTTP.jl or Images.jl). That would of course require productionisation of this
proof-of-concept and its many workarounds.

A hypothetical AFL.jl package would:

Fuzzing the compiler itself could also be improved:

  • deferring the fork server even more: there’s significant initialization done
    by Julia code, and the manual initialization point could move right before the
    input eval loop
  • persistent fuzzing: having AFL input new test cases without restarting the
    process. Although Julia is far from stateless, AFL could be modified to verify
    crashes in a fresh process.