Author Archives: John Myles White

My Experience at JuliaCon

By: John Myles White

Re-posted from: http://www.johnmyleswhite.com/notebook/2014/06/30/my-experience-at-juliacon/

Introduction

I just got home from JuliaCon, the first conference dedicated entirely to Julia. It was a great pleasure to spend two full days listening to talks about a language that I started advocating for just a little more than two years ago.

What follows is a very brief review of the talks that excited me the most. It’s not in any way exhaustive: there were a bunch of other good talks that I saw as well as a few talks I missed so that I could visit the Data Science for Social Good fellows.

Optimization

The optimization community seems to be the academic field that’s been most ready to adopt Julia. Two talks about using Julia for optimization stood out: Iain Dunning and Joey Huchette’s talk about JuMP.jl, and Madeleine Udell’s talk about CVX.jl.

JuMP implements a DSL that allows users to describe an optimization problem in purely mathematical terms. This problem encoding can be then passed to one of many backend solvers to determine a solution. By abstracting across solvers, JuMP makes it easier for people like me to get access to well-established tools like GLPK.

CVX is quite similar to JuMP, but it implements a symbolic computation system that’s especially focused on allowing users to encode convex optimization problems. One of the things that’s most appealing about CVX is that it automatically confirms whether the problem you’re encoding is convex or not. Until I saw Madeleine’s talk, I hadn’t realized how much progress had been made on CVX.jl. Now that I’ve seen CVX.jl in action, I’m hoping to start using it for some of my work. I’ll probably also write a blog post about it in the future.

Statistics

I really enjoyed the statistics talks given by Doug Bates, Simon Byrne and Dan Wlasiuk. I was especially glad to hear Doug Bates remind the audience that, years ago, he’d attended a small meeting about R that was similar in size to this first iteration of JuliaCon. Over the course of the intervening decades, he noted that the R community has grown from dozens to millions of users.

Language-Level Issues

Given that Julia is still something of a language nerd’s language, it’s no surprise that some of the best talks focused on language-level issues.

Arch Robison gave a really interesting talk about the tools used in Julia 0.3 to automatically vectorize code so that it can take advantage of SIMD instructions. For those coming from languages like R or Python, you should be aware that vectorization means almost the exact opposite thing to compiler writers that it means to high-level language users: vectorization involves the transformation of certain kinds of iterative code into the thread-free parallelized instructions that modern CPU’s provide for performing a single operation on multiple data chunks simultaneously. I’ve come to love this kind of compiler design discussion and the invariance properties the compiler needs to prove before it can perform program transformations safely. For example, Arch noted that SIMD instructions can be safely used when working on many integers, but cannot be used on floating point numbers because of failures of associativity.

After Arch spoke, Jeff Bezanson gave a nice description of the process by which Julia code is transformed from raw text users enter into the REPL into the final compiled form that gets executed by CPU’s. For those interested in understanding how Julia works under the hood, this talk is likely to be the best place to start.

In addition, Leah Hanson and Keno Fischer both gave good talks about improved tools for debugging Julia code. Leah spoke about TypeCheck.jl, a system for automatically warning about potential code problems. Keno demoed a very rough draft of a Julia debugger built on top of LLDB. As an added plus, Keno also demoed a new C++ FFI for Julia that I’m really looking forward to. I’m hopeful that the new FFI will make it much easier to wrap C++ libraries for use from Julia.

Deploying Julia in Production

Both Avik Sengupta and Michael Bean described their experiences using Julia in production systems. Knowing that Julia was being used in production anywhere was inspiring.

Graphics and Audio

Daniel C. Jones and Spencer Russell both gave great talks about the developments taking place in graphics and audio support. Daniel C. Jones’s demo of a theremin built using Shashi Gowda’s React.jl and Spencer Russell’s AudioIO.jl was especially impressive.

Take Aways

The Julia community really is a community now. It was big enough to sell out a small conference and to field a large variety of discussion topics. I’m really excited to see how the next JuliaCon will turn out.

The Relationship between Vectorized and Devectorized Code

By: John Myles White

Re-posted from: http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/

Introduction

Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should consider devectorizing code, we’re not trying to beat R’s speed — our vectorized code does that already. Instead, we’re trying to match C’s speed.

As the examples below indicate, a little bit of devectorization goes a long way towards this loftier goal. In the specific examples I show, I find that:

  • Julia’s vectorized code is 2x faster than R’s vectorized code
  • Julia’s devectorized code is 140x faster than R’s vectorized code
  • Julia’s devectorized code is 1350x faster than R’s devectorized code

Examples of Vectorized and Devectorized Code in R

Let’s start by contrasting two pieces of R code: a vectorized and a devectorized implementation of a trivial snippet of code that does repeated vector addition.

First, we consider an example of idiomatic, vectorized R code:

vectorized <- function()
{
    a <- c(1, 1)
    b <- c(2, 2)
    x <- c(NaN, NaN)

    for (i in 1:1000000)
    {
        x <- a + b
    }

    return()
}

time <- function (N)
{
    timings <- rep(NA, N)

    for (itr in 1:N)
    {
        start <- Sys.time()
        vectorized()
        end <- Sys.time()
        timings[itr] <- end - start
    }

    return(timings)
}

mean(time(10))

This code takes, on average, 0.49 seconds per iteration to compute 1,000,000 vector additions.

Having considered the vectorized implementation, we can then consider an unidiomatic devectorized implementation of the same operation in R:

devectorized <- function()
{
    a <- c(1, 1)
    b <- c(2, 2)
    x <- c(NaN, NaN)

    for (i in 1:1000000)
    {
        for (index in 1:2)
        {
            x[index] <- a[index] + b[index]
        }
    }

    return()
}

time <- function (N)
{
    timings <- rep(NA, N)

    for (itr in 1:N)
    {
        start <- Sys.time()
        devectorized()
        end <- Sys.time()
        timings[itr] <- end - start
    }

    return(timings)
}

mean(time(10))

This takes, on average, 4.72 seconds per iteration to compute 1,000,000 vector additions.

Examples of Vectorized and Devectorized Code in Julia

Let’s now consider two Julia implementations of this same snippet of code. We’ll start with a vectorized implementation:

function vectorized()
    a = [1.0, 1.0]
    b = [2.0, 2.0]
    x = [NaN, NaN]

    for i in 1:1000000
        x = a + b
    end

    return
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    vectorized()

    for itr in 1:N
        timings[itr] = @elapsed vectorized()
    end

    return timings
end

mean(time(10))

This takes, on average, 0.236 seconds per iteration to compute 1,000,000 vector additions.

Next, let’s consider a devectorized implementation of this same snippet:

function devectorized()
    a = [1.0, 1.0]
    b = [2.0, 2.0]
    x = [NaN, NaN]

    for i in 1:1000000
        for index in 1:2
            x[index] = a[index] + b[index]
        end
    end

    return
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    devectorized()

    for itr in 1:N
        timings[itr] = @elapsed devectorized()
    end

    return timings
end

mean(time(10))

This takes, on average, 0.0035 seconds per iteration to compute 1,000,000 vector additions.

Comparing Performance in R and Julia

We can summarize the results of the four examples above in a single table:

Approach Language Average Time
Vectorized R 0.49
Devectorized R 4.72
Vectorized Julia 0.24
Devectorized Julia 0.0035

All of these examples were timed on my 2.9 GHz Intel Core i7 MacBook Pro. The results are quite striking: Julia is uniformly faster than R. And a very small bit of devectorization produces huge performance improvements. Of course, it would be nice if Julia’s compiler could optimize vectorized code as well as it optimizes devectorized code. But doing so requires a substantial amount of work.

Why is Optimizing Vectorized Code Hard?

What makes automatic devectorization tricky to get right is that even minor variants of the snippet shown above have profoundly different optimization strategies. Consider, for example, the following two snippets of code:

function vectorized2()
    a = [1.0, 1.0]
    b = [2.0, 2.0]

    res = {}

    for i in 1:1000000
        x = [rand(), rand()]
        x += a + b
        push!(res, x)
    end

    return res
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    vectorized2()

    for itr in 1:N
        timings[itr] = @elapsed vectorized2()
    end

    return timings
end

mean(time(10))

This first snippet takes 1.29 seconds on average.

function devectorized2()
    a = [1.0, 1.0]
    b = [2.0, 2.0]

    res = {}

    for i in 1:1000000
        x = [rand(), rand()]
        for dim in 1:2
            x[dim] += a[dim] + b[dim]
        end
        push!(res, x)
    end

    return res
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    devectorized2()

    for itr in 1:N
        timings[itr] = @elapsed devectorized2()
    end

    return timings
end

mean(time(10))

This second snippet takes, on average, 0.27 seconds.

The gap between vectorized and devectorized code is much smaller here because this second set of code snippets uses memory in a very different way than our original snippets did. In the first set of snippets, it was possible to entirely avoid allocating any memory for storing changes to x. The devectorized code for the first set of snippets explicitly made clear to the compiler that no memory needed to be allocated. The vectorized code did not make this clear. Making it clear that no memory needed to be allocated led to a 75x speedup. Explicitly telling the compiler what it can avoid spending time on goes a long way.

In contrast, in the second set of snippets, a new chunk of memory has to be allocated for every x vector that gets created. And the result is that even the devectorized variant of our second snippet cannot offer much of a performance boost over its vectorized analogue. The devectorized variant is slightly faster because it avoids allocating any memory during the steps in which x has a and b added to it, but this makes less of a difference when there is still a lot of other work being done that cannot be avoided by devectorizing operations.

This reflects a more general statement: the vectorization/devectorization contrast is only correlated, not causally related, with the actual performance characteristics of code. What matters for computations that take place on modern computers is the efficient utilization of processor cycles and memory. In many real examples of vectorized code, it is memory management, rather than vectorization per se, that is the core causal factor responsible for performance.

The Reversed Role of Vectorization in R and Julia

Part of what makes it difficult to have a straightforward discussion about vectorization is that vectorization in R conflates issues that are logically unrelated. In R, vectorization is often done for both (a) readability and (b) performance. In Julia, vectorization is only used for readability; it is devectorization that offers superior performance.

This confuses some people who are not familiar with the internals of R. It is therefore worth noting how one improves the speed of R code. The process of performance improvement is quite simple: one starts with devectorized R code, then replaces it with vectorized R code and then finally implements this vectorized R code in devectorized C code. This last step is unfortunately invisible to many R users, who therefore think of vectorization per se as a mechanism for increasing performance. Vectorization per se does not help make code faster. What makes vectorization in R effective is that it provides a mechanism for moving computations into C, where a hidden layer of devectorization can do its mgic.

In other words, R is doing exactly what Julia is doing to get better performance. R’s vectorized code is simply a thin wrapper around completely devectorized C code. If you don’t believe me, go read the C code for something like R’s distance function, which involves calls to functions like the following:

static double R_euclidean(double *x, int nr, int nc, int i1, int i2)
{
    double dev, dist;
    int count, j;

    count= 0;
    dist = 0;
    for(j = 0 ; j < nc ; j++) {
    if(both_non_NA(x[i1], x[i2])) {
        dev = (x[i1] - x[i2]);
        if(!ISNAN(dev)) {
        dist += dev * dev;
        count++;
        }
    }
    i1 += nr;
    i2 += nr;
    }
    if(count == 0) return NA_REAL;
    if(count != nc) dist /= ((double)count/nc);
    return sqrt(dist);
}

It is important to keep this sort of thing in mind: the term vectorization in R actually refers to a step in which you write devectorized code in C. Vectorization, per se, is a red herring when reasoning about performance.

To finish this last point, let’s summarize the performance hierarchy for R and Julia code in a simple table:

Worst Case Typical Case Best Case
Julia Vectorized Code Julia Devectorized Code
R Devectorized Code R Vectorized Code C Devectorized Code

It is the complete absence of one column for Julia that makes it difficult to compare vectorization across the two languages. Nothing in Julia is as bad as R’s devectorized code. On the other end of the spectrum, the performance of Julia’s devectorized code simply has no point of comparison in pure R: it is more similar to the C code used to power R behind the scenes.

Conclusion

Julia aims to (and typically does) provide vectorized code that is efficient as the vectorized code available in other high-level languages. What sets Julia apart is the possibility of writing, in pure Julia, high performance code that uses CPU and memory resources as effectively as can be done in C.

In particular, vectorization and devectorization stand in the opposite relationship to one another in Julia as they do in R. In R, devectorization makes code unusably slow: R code must be vectorized to perform at an acceptable level. In contrast, Julia programmers view vectorized code as a convenient prototype that can be modified with some clever devectorization to produce production-performance code. Of course, we would like prototype code to perform better. But no popular language offers that kind of functionality. What Julia offers isn’t the requirement for devectorization, but the possibility of doing it in Julia itself, rather than in C.