By: Tamás K. Papp
Reposted from: https://tpapp.github.io/post/branch_prediction2/
Tomas Lycken linked a very nice discussion on StackOverflow about branch prediction as a comment on the previous post. It has an intuitive explanation (read it if you like good metaphors) and some Java benchmarks. I was curious about how it looks in Julia.
The exercise is to sum elements in a vector only if they are greater than or equal to 128.
function sumabove_if(x)
s = zero(eltype(x))
for elt in x
if elt ≥ 128
s += elt
end
end
s
end
This calculation naturally has a branch in it, while the branchless
version, using ifelse
, does not:
function sumabove_ifelse(x)
s = zero(eltype(x))
for elt in x
s += ifelse(elt ≥ 128, elt, zero(eltype(x)))
end
s
end
The actual example has something different: using tricky bittwiddling
to calculate the same value. I generally like to leave this sort of
thing up to the compiler, because it is much, much better at it than I
am, and I make mistakes all the time; worse, I don’t know what I
actually did when I reread the code 6 months later. But I included it
here for comparison:
function sumabove_tricky(x::Vector{Int64})
s = Int64(0)
for elt in x
s += ~((elt  128) >> 63) & elt
end
s
end
Following the original example on StackOverflow, we sum 2^15
random integers in 1:256
. For this, we don’t need to worry about overflow. We also sum the sorted vector: this will facilitate branch predicion, since the various branches will be contiguous.
I also benchmark a simple version using generators:
sumabove_generator(x) = sum(y for y in x if y ≥ 128)
random  sorted  

if 
139  28 
ifelse 
21  21 
if & sort 
96  n/a 
tricky  27  27 
generator  219  168 
Benchmarks are in the table above. Note that

for the version with
if
, working on sorted vectors is dramatically faster (about 5x). 
the nonbranching
ifelse
version beats them hands down, and naturally it does not care about sorting. 
if you have to use
if
, then you are better off sorting, even if you take the time of that into account. 
generators are susprisingly bad.

the tricky bittwiddling version is actually worse than
ifelse
(which reinforces my aversion to it).
Selfcontained code for everything is available below.
download code as code.jl