One thousand and one stories

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/11/12/so.html

Introduction

I have just hit 1000 answers for the [julia] tag on Stack Overflow so I felt
like writing about it. In order to have a complete one thousand and one stories
collection today I thought of writing about a feature of Julia that will show
you how challenging the design decisions that have to be made when designing
functions are. We will work with one of the most fundamental functions, the
sum.

Before I go to the technical details let me go back to my recent post that I wrote about a model that Alan Edelman prepared for one of
classes during his studies. Recently I had an opportunity to discuss with him
about the exact context of creation of the model. Here you can read the summary
which I was lacking when I was writing the original post:

In 1980, Alan Edelman was a 17 year old freshman at Yale University where he
met his social science distribution requirement by taking Psychology 101
with Dr. Kenji Hakuta. There he learned about the famous
Kitty Genovese murder and the concept of diffusion of responsibility.
Having to write a paper for this freshman class, and being a “math person” he
figured why not take the idea of diffusion literally and write a paper about
that. He does not recall if he still has a copy of that paper, but it may be
in a box in the attic. He thinks he got an A on that paper.

The post was written under Julia 1.6.3.

Are you sure you understand how the sum function works?

I will test your knowledge by example. The first task is to perform row-wise
summation of the following matrix:

julia> mat = fill(Int8(100), 5, 10)
5×10 Matrix{Int8}:
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100

julia> sum(eachcol(mat))
5-element Vector{Int8}:
 -24
 -24
 -24
 -24
 -24

julia> sum.(eachrow(mat))
5-element Vector{Int64}:
 1000
 1000
 1000
 1000
 1000

If you are surprised by the result let me explain the situation. In the first
case sum operates on whole vectors. In the second case sum operates on
scalars. Why would this make the difference? The reason is that sum does not
use + for aggregation, but it employs the Base.add_sum as the reduction
operator which is defined as follows:

add_sum(x, y) = x + y
add_sum(x::SmallSigned, y::SmallSigned) = Int(x) + Int(y)
add_sum(x::SmallUnsigned, y::SmallUnsigned) = UInt(x) + UInt(y)
add_sum(x::Real, y::Real)::Real = x + y

As you can see the add_sum will promote the result to Int or UInt only if
it is passed scalar integers. Therefore if we pass it vectors of integers no
promotion happens.

Now for sure you know what will be the sum of the following vector:

julia> v = Integer[0x64; fill(Int8(100), 9)]
10-element Vector{Integer}:
 0x64
  100
  100
  100
  100
  100
  100
  100
  100
  100

Let us check:

julia> sum(v)
0xe8

Could you have guessed it? If yes, you probably assume that sum is using
foldl and you have noticed that Base.add_sum does not perform promotion
to Int or UInt when you mix signed and unsigned integers.

Unfortunately, if this was your guess, you are wrong in general. Consider this
scenario:

julia> sum(Integer[0x64; fill(Int8(100), 9999)])
937536

The result we get is quite surprising. We could have expected:

julia> foldl(+, Integer[0x64; fill(Int8(100), 9999)])
0x40

as we know that Base.add_sum will always fall back to + in this case. This
would be consistent with the previous result.

What is the reason of the difference? Actually sum does not use foldl but
reduce, and reduce does not have a guaranteed order of summation. This
means that in the latter case we must have made some summation of two Int8
(100)
values using Base.add_sum which promoted the result to Int.

Maybe above you thought of calling foldl with Base.add_sum like ths?:

julia> foldl(Base.add_sum, Integer[0x64; fill(Int8(100), 9999)])
0x00000000000f4240

0x00000000000f4240 is just 1000000 (which is a correct result if we were
widening types always when doing the summation), but why do we get such a weird
value? The reason is that foldl differs from sum in the way it initializes
the summation. It uses the first element of the collection(0x64 in our case)
and promotes it to the type that woud be produced if this element were added
using Base.add_sum to itself and we know UInt8 to UInt8 invokes promotion
to UInt.

What else could go wrong? Try this:

julia> using Random

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> x = rand(1000)
1000-element Vector{Float64}:
 0.5908446386657102
 0.7667970365022592
 0.5662374165061859
 0.4600853424625171
 0.7940257103317943
 0.8541465903790502
 0.20058603493384108
 0.2986142783434118
 ⋮
 0.5762976355934157
 0.08831200391130656
 0.8994769043886504
 0.8232831225471882
 0.37869007913520947
 0.7812366659068535
 0.4651012221417914

julia> sum(x)
496.84883432553806

julia> sum(x, init=0.0)
496.84883432553846

As you can see the results are not identical (they differ at the second least
significant digit). What have messed up this time? By specifying init keyword
argument, although 0.0 is a neutral in summation, we forced sum to
use a different summation order again and for floats order of summation is
known to affect the result.

Wait – have I said that 0.0 is a neutral in summation? I have lied. See:

julia> isequal(sum([-0.0, -0.0]), sum([-0.0, -0.0], init=0.0))
false

because:

julia> sum([-0.0, -0.0])
-0.0

julia> sum([-0.0, -0.0], init=0.0)
0.0

At this point I start asking myself why in my math classes I was taught to use
real numbers and not IEEE 754 standard which seems to be at play much
more often in practice (at least if you are using computers). I will have to
pose this question Alan Edelman who is a professor of both mathematics and
computer science the next time I have the privilege of talking with him.

Conclusions

In this blog and on Stack Overflow I try to show users that Julia is a nice
language to work with (of which I am really convinced).

However, having written these 1000 stories that end good one wants to show at
least one story when the dark side shows up (of course the problems I have
discussed are not Julia specific, but their particular manifestation is a
consequence of design decisions that Julia developers made).

Can we do anything about the problems I have shown? There is one remedy and one
warning to keep in mind.

The remedy is the following: when working with collections in Julia always take
care to make sure they have homogeneous type of elements and this type is
chosen appropriately to the operation you want to perform. Except for very rare
situations there is little sense in mixing numbers of different types in one
collection so just do not do it.

The warning is the IEEE floating point arithmetic standard consequence: if you
work with floats better treat the result of operations on them as only
approximate.