Newsletter October 2017

By: Julia Computing, Inc.

Re-posted from:

We wanted to thank all Julia users and well wishers for the continued use of and support for Julia, and share some of the latest developments from Julia Computing and the Julia community.

  1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran
  2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips
  3. Julia Computing and Julia Featured in Forbes
  4. Julia Meetups in Your City
  5. Julia Computing at Open Data Science Conference
  6. Julia at a Conference Near You
  7. Julia Goes to Hollywood
  8. Recent Julia Events in New York
  9. Contact Us

  1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran: Julia contributor Christopher Rackauckus from UC Irvine wrote a seminal blog post comparing the capabilities of differential equation solvers in Julia and other languages. Follow the detailed discussion on Hacker News.

  2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips

    Mike Innes: Flux: Machine Learning with Julia

    Jonathan Malmaud: Modern Machine Learning in Julia with TensorFlow.jl

    Deniz Yuret: Machine Learning in 5 Slides

    Deniz Yuret: What is Knet.jl and Why Use Julia for Deep Learning

    JuliaCon 2017

  3. Julia Computing and Julia Featured in Forbes: Forbes published a major feature about Julia Computing and Julia titled “How a New Programming Language Created by Four Scientists Is Now Used by the World’s Biggest Companies” by Suparna Dutt D’Cunha. The article describes the origin of Julia and Julia Computing and how Julia is being used today. The article has more than 75 thousand page views.

  4. Julia Meetups in Your City: There are dozens of Julia meetup groups with more than 13,000 members around the globe. Julia Computing is looking to support Julia meetups by providing training materials, Webcasts and other support. Please email us if you would like to organize a meetup group or partner with Julia Computing on a meetup event. Upcoming events include:

  5. Julia Computing at Open Data Science Conference: Julia Computing will be presenting at the Open Data Science Conference in London Oct 12-14 and in San Francisco Nov 2-4.

  6. Julia at a Conference Near You: Julia Computing will be present at several upcoming conferences and talks including:

  7. Julia Goes to Hollywood: Julia has been featured in two Hollywood television programs: The 100 and Casual. In The 100, Julia is used to represent the language of artificial intelligence in the year 2150. In Casual, Julia is described as a language for the latest generation of programmers.

  8. Recent Julia Events in New York: Julia co-creators Viral Shah and Stefan Karpinski presented an introduction to Julia at the Data Driven NYC meetup on September 25. Their presentation is available here. They also presented Julia and Spark, Better Together at the Strata Data conference. Try out the Julia bindings for Spark with Spark.jl, with a special thanks to Andrei Zhabinski.

  9. Contact Us: Please contact us if you wish to:

    • Purchase or obtain license information for Julia products such as JuliaPro, JuliaRun, JuliaDB, JuliaFin or JuliaBox
    • Obtain pricing for Julia consulting projects for your enterprise
    • Schedule Julia training for your organization
    • Share information about exciting new Julia case studies or use cases
    • Partner with Julia Computing to organize a Julia meetup group, hackathon, workshop, training or other event in your city

    About Julia and Julia Computing

    Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.

    Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:

    • JuliaPro for data science professionals and researchers to install and run Julia with more than one hundred carefully curated popular Julia packages on a laptop or desktop computer.
    • JuliaRun for deploying Julia at scale on dozens, hundreds or thousands of nodes in the public or private cloud, including AWS and Microsoft Azure.
    • JuliaFin for financial modeling, algorithmic trading and risk analysis including Bloomberg and Excel integration, Miletus for designing and executing trading strategies and advanced time-series analytics.
    • JuliaDB for in-database in-memory analytics and advanced time-series analysis.
    • JuliaBox for students or new Julia users to experience Julia in a Jupyter notebook right from a Web browser with no download or installation required.

    To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.

    Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.

Branch prediction: yet another example

By: Tamás K. Papp

Re-posted from:

Tomas Lycken linked a very nice discussion on StackOverflow about branch prediction as a comment on the previous post. It has an intuitive explanation (read it if you like good metaphors) and some Java benchmarks. I was curious about how it looks in Julia.

The exercise is to sum elements in a vector only if they are greater than or equal to 128.

function sumabove_if(x)
    s = zero(eltype(x))
    for elt in x
        if elt  128
            s += elt

This calculation naturally has a branch in it, while the branchless
version, using ifelse, does not:

function sumabove_ifelse(x)
    s = zero(eltype(x))
    for elt in x
        s += ifelse(elt  128, elt, zero(eltype(x)))

The actual example has something different: using tricky bit-twiddling
to calculate the same value. I generally like to leave this sort of
thing up to the compiler, because it is much, much better at it than I
am, and I make mistakes all the time; worse, I don’t know what I
actually did when I reread the code 6 months later. But I included it
here for comparison:

function sumabove_tricky(x::Vector{Int64})
    s = Int64(0)
    for elt in x
        s += ~((elt - 128) >> 63) & elt

Following the original example on StackOverflow, we sum 2^15 random integers in 1:256. For this, we don’t need to worry about overflow. We also sum the sorted vector: this will facilitate branch predicion, since the various branches will be contiguous.

I also benchmark a simple version using generators:

sumabove_generator(x) = sum(y for y in x if y  128)
Benchmarks (\(μ\)s)

random sorted
if 139 28
ifelse 21 21
if & sort 96 n/a
tricky 27 27
generator 219 168

Benchmarks are in the table above. Note that

  1. for the version with if, working on sorted vectors is dramatically faster (about 5x).

  2. the non-branching ifelse version beats them hands down, and naturally it does not care about sorting.

  3. if you have to use if, then you are better off sorting, even if you take the time of that into account.

  4. generators are susprisingly bad.

  5. the tricky bit-twiddling version is actually worse than ifelse (which reinforces my aversion to it).

Self-contained code for everything is available below.

download code as code.jl

CPU pipelines: when more is less

By: Tamás K. Papp

Re-posted from:

I have been working on micro-optimizations for some simulation
code, and was reminded of a counter-intuitive artifact of modern CPU
architecture, which is worth a short post.

Consider (just for the sake of example) a very simple (if not
particularly meaningful) function,

f(x) = \begin{cases}
(x+2)^2 & \text{if } x \ge 0,\\
1-x & \text{otherwise}

with implementations

f1(x) = ifelse(x  0, abs2(x+2), 1-x)
f2(x) = x  0 ? abs2(x+2) : 1-x

f1 calculates both possibilities before choosing between them with
ifelse, while f2 will only calculate values on demand. So, intuitively, it should be faster.

But it isn’t…

julia> x = randn(1_000_000);

julia> using BenchmarkTools

julia> @btime f1.($x);
  664.228 μs (2 allocations: 7.63 MiB)

julia> @btime f2.($x);
  6.519 ms (2 allocations: 7.63 MiB)

…it is about 10x slower.

This can be understood as an artifact of the instruction
: your
x86 CPU likes to perform similar operations in staggered manner, and
it does not like branches (jumps) because they break the flow.

Comparing the native code reveals that while f1 is jump-free, the if in f2 results in a jump (jae):

julia> @code_native f1(1.0)
Filename: REPL[2]
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $139862498743472, %rax  # imm = 0x7F34468E14B0
        movsd   (%rax), %xmm2           # xmm2 = mem[0],zero
Source line: 1
        addsd   %xmm0, %xmm2
        mulsd   %xmm2, %xmm2
        movabsq $139862498743480, %rax  # imm = 0x7F34468E14B8
        movsd   (%rax), %xmm3           # xmm3 = mem[0],zero
        subsd   %xmm0, %xmm3
        xorps   %xmm1, %xmm1
        cmpnlesd        %xmm0, %xmm1
        andpd   %xmm1, %xmm3
        andnpd  %xmm2, %xmm1
        orpd    %xmm3, %xmm1
        movapd  %xmm1, %xmm0
        popq    %rbp
        nopw    %cs:(%rax,%rax)

julia> @code_native f2(1.0)
Filename: REPL[3]
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 1
        xorps   %xmm1, %xmm1
        ucomisd %xmm1, %xmm0
        jae     L37
        movabsq $139862498680736, %rax  # imm = 0x7F34468D1FA0
        movsd   (%rax), %xmm1           # xmm1 = mem[0],zero
        subsd   %xmm0, %xmm1
        movapd  %xmm1, %xmm0
        popq    %rbp
        movabsq $139862498680728, %rax  # imm = 0x7F34468D1F98
        addsd   (%rax), %xmm0
        mulsd   %xmm0, %xmm0
        popq    %rbp
        nopl    (%rax)

In my application the speed gain was more modest, but still
sizeable. Benchmarking a non-branching version of your code is
sometimes worth it, especially if it the change is simple and both
branches of the conditional can be run error-free. If, for example, we
had to calculate

g(x) = x  0 ? (x+2) : 1-x

then we could not use ifelse without restricting the domain, since
√(x+2) would fail whenever x < -2.

Julia Base contains many optimizations like this: for a particularly
nice example see functions that use Base.null_safe_op.