We wanted to thank all Julia users and well wishers for the continued use of and support for Julia, and share some of the latest developments from Julia Computing and the Julia community.

1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran
2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips
3. Julia Computing and Julia Featured in Forbes
4. Julia Meetups in Your City
5. Julia Computing at Open Data Science Conference
6. Julia at a Conference Near You
7. Julia Goes to Hollywood
8. Recent Julia Events in New York

1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran: Julia contributor Christopher Rackauckus from UC Irvine wrote a seminal blog post comparing the capabilities of differential equation solvers in Julia and other languages. Follow the detailed discussion on Hacker News.

2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips

Mike Innes: Flux: Machine Learning with Julia

Jonathan Malmaud: Modern Machine Learning in Julia with TensorFlow.jl

Deniz Yuret: Machine Learning in 5 Slides

Deniz Yuret: What is Knet.jl and Why Use Julia for Deep Learning

JuliaCon 2017

3. Julia Computing and Julia Featured in Forbes: Forbes published a major feature about Julia Computing and Julia titled “How a New Programming Language Created by Four Scientists Is Now Used by the World’s Biggest Companies” by Suparna Dutt D’Cunha. The article describes the origin of Julia and Julia Computing and how Julia is being used today. The article has more than 75 thousand page views.

4. Julia Meetups in Your City: There are dozens of Julia meetup groups with more than 13,000 members around the globe. Julia Computing is looking to support Julia meetups by providing training materials, Webcasts and other support. Please email us if you would like to organize a meetup group or partner with Julia Computing on a meetup event. Upcoming events include:

5. Julia Computing at Open Data Science Conference: Julia Computing will be presenting at the Open Data Science Conference in London Oct 12-14 and in San Francisco Nov 2-4.

6. Julia at a Conference Near You: Julia Computing will be present at several upcoming conferences and talks including:

7. Julia Goes to Hollywood: Julia has been featured in two Hollywood television programs: The 100 and Casual. In The 100, Julia is used to represent the language of artificial intelligence in the year 2150. In Casual, Julia is described as a language for the latest generation of programmers.

8. Recent Julia Events in New York: Julia co-creators Viral Shah and Stefan Karpinski presented an introduction to Julia at the Data Driven NYC meetup on September 25. Their presentation is available here. They also presented Julia and Spark, Better Together at the Strata Data conference. Try out the Julia bindings for Spark with Spark.jl, with a special thanks to Andrei Zhabinski.

• Purchase or obtain license information for Julia products such as JuliaPro, JuliaRun, JuliaDB, JuliaFin or JuliaBox
• Obtain pricing for Julia consulting projects for your enterprise
• Schedule Julia training for your organization
• Share information about exciting new Julia case studies or use cases
• Partner with Julia Computing to organize a Julia meetup group, hackathon, workshop, training or other event in your city

Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.

Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:

• JuliaPro for data science professionals and researchers to install and run Julia with more than one hundred carefully curated popular Julia packages on a laptop or desktop computer.
• JuliaRun for deploying Julia at scale on dozens, hundreds or thousands of nodes in the public or private cloud, including AWS and Microsoft Azure.
• JuliaFin for financial modeling, algorithmic trading and risk analysis including Bloomberg and Excel integration, Miletus for designing and executing trading strategies and advanced time-series analytics.
• JuliaBox for students or new Julia users to experience Julia in a Jupyter notebook right from a Web browser with no download or installation required.

To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.

Branch prediction: yet another example

Re-posted from: https://tpapp.github.io/post/branch_prediction2/

Tomas Lycken linked a very nice discussion on StackOverflow about branch prediction as a comment on the previous post. It has an intuitive explanation (read it if you like good metaphors) and some Java benchmarks. I was curious about how it looks in Julia.

The exercise is to sum elements in a vector only if they are greater than or equal to 128.

function sumabove_if(x)
s = zero(eltype(x))
for elt in x
if elt ≥ 128
s += elt
end
end
s
end

This calculation naturally has a branch in it, while the branchless
version, using ifelse, does not:

function sumabove_ifelse(x)
s = zero(eltype(x))
for elt in x
s += ifelse(elt ≥ 128, elt, zero(eltype(x)))
end
s
end

The actual example has something different: using tricky bit-twiddling
to calculate the same value. I generally like to leave this sort of
thing up to the compiler, because it is much, much better at it than I
am, and I make mistakes all the time; worse, I don’t know what I
actually did when I reread the code 6 months later. But I included it
here for comparison:

function sumabove_tricky(x::Vector{Int64})
s = Int64(0)
for elt in x
s += ~((elt - 128) >> 63) & elt
end
s
end

Following the original example on StackOverflow, we sum 2^15 random integers in 1:256. For this, we don’t need to worry about overflow. We also sum the sorted vector: this will facilitate branch predicion, since the various branches will be contiguous.

I also benchmark a simple version using generators:

sumabove_generator(x) = sum(y for y in x if y ≥ 128)
Benchmarks ($$μ$$s)

random sorted
if 139 28
ifelse 21 21
if & sort 96 n/a
tricky 27 27
generator 219 168

Benchmarks are in the table above. Note that

1. for the version with if, working on sorted vectors is dramatically faster (about 5x).

2. the non-branching ifelse version beats them hands down, and naturally it does not care about sorting.

3. if you have to use if, then you are better off sorting, even if you take the time of that into account.

5. the tricky bit-twiddling version is actually worse than ifelse (which reinforces my aversion to it).

Self-contained code for everything is available below.

CPU pipelines: when more is less

Re-posted from: https://tpapp.github.io/post/branch_prediction/

I have been working on micro-optimizations for some simulation
code, and was reminded of a counter-intuitive artifact of modern CPU
architecture, which is worth a short post.

Consider (just for the sake of example) a very simple (if not
particularly meaningful) function,

$f(x) = \begin{cases} (x+2)^2 & \text{if } x \ge 0,\\ 1-x & \text{otherwise} \end{cases}$

with implementations

f1(x) = ifelse(x ≥ 0, abs2(x+2), 1-x)
f2(x) = x ≥ 0 ? abs2(x+2) : 1-x

f1 calculates both possibilities before choosing between them with
ifelse, while f2 will only calculate values on demand. So, intuitively, it should be faster.

But it isn’t…

julia> x = randn(1_000_000);

julia> using BenchmarkTools

julia> @btime f1.($x); 664.228 μs (2 allocations: 7.63 MiB) julia> @btime f2.($x);
6.519 ms (2 allocations: 7.63 MiB)

This can be understood as an artifact of the instruction
pipeline
: your
x86 CPU likes to perform similar operations in staggered manner, and
it does not like branches (jumps) because they break the flow.

Comparing the native code reveals that while f1 is jump-free, the if in f2 results in a jump (jae):

julia> @code_native f1(1.0)
.text
Filename: REPL[2]
pushq   %rbp
movq    %rsp, %rbp
movabsq $139862498743472, %rax # imm = 0x7F34468E14B0 movsd (%rax), %xmm2 # xmm2 = mem[0],zero Source line: 1 addsd %xmm0, %xmm2 mulsd %xmm2, %xmm2 movabsq$139862498743480, %rax  # imm = 0x7F34468E14B8
movsd   (%rax), %xmm3           # xmm3 = mem[0],zero
subsd   %xmm0, %xmm3
xorps   %xmm1, %xmm1
cmpnlesd        %xmm0, %xmm1
andpd   %xmm1, %xmm3
andnpd  %xmm2, %xmm1
orpd    %xmm3, %xmm1
movapd  %xmm1, %xmm0
popq    %rbp
retq
nopw    %cs:(%rax,%rax)

julia> @code_native f2(1.0)
.text
Filename: REPL[3]
pushq   %rbp
movq    %rsp, %rbp
Source line: 1
xorps   %xmm1, %xmm1
ucomisd %xmm1, %xmm0
jae     L37
movabsq $139862498680736, %rax # imm = 0x7F34468D1FA0 movsd (%rax), %xmm1 # xmm1 = mem[0],zero subsd %xmm0, %xmm1 movapd %xmm1, %xmm0 popq %rbp retq L37: movabsq$139862498680728, %rax  # imm = 0x7F34468D1F98
mulsd   %xmm0, %xmm0
popq    %rbp
retq
nopl    (%rax)

In my application the speed gain was more modest, but still
sizeable. Benchmarking a non-branching version of your code is
sometimes worth it, especially if it the change is simple and both
branches of the conditional can be run error-free. If, for example, we
g(x) = x ≥ 0 ? √(x+2) : 1-x
then we could not use ifelse without restricting the domain, since
√(x+2) would fail whenever x < -2.
Julia Base contains many optimizations like this: for a particularly
nice example see functions that use Base.null_safe_op.