Tag Archives: julialang

Broadcasting in Julia: the Good, the Bad, and the Ugly

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/06/24/broadcasting.html

Introduction

Recently I have written a post about @view and @views macros.
In what followed I got a feedback, that the topic I touched there was
indeed useful. Therefore for today I decided to write about another macro.
This time I will share my thoughts on the @. macro.

This post was written under Julia 1.7.2.

Understanding what @. macro does

The @. macro is used to inject broadcasting into every function call
in an expression passed to it.

Let us have a look at its docstring:

Convert every function call or operator in expr into a “dot call”
(e.g. convert f(x) to f.(x)), and convert every
assignment in expr to a “dot assignment” (e.g. convert += to .+=).

If you want to avoid adding dots for selected function calls in expr, splice
those function calls in with $. For example,
@. sqrt(abs($sort(x))) is equivalent to sqrt.(abs.(sort(x))) (no dot for sort).

The @. marco is quite useful when we work with long expressions that
involve several operations that we want to be broadcasted together. Here
is a minimal example:

julia> using Statistics

julia> x = 1:10
1:10

julia> @. sin(x)^2 + cos(x)^2
10-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999999
 1.0
 0.9999999999999999
 0.9999999999999999
 0.9999999999999999
 1.0
 0.9999999999999999
 1.0

Writing @. sin(x)^2 + cos(x)^2 is much more convenient than writing
sin.(x).^2 .+ cos.(x).^2.

However, what if we wanted to compute variance of x? A direct approach
would be to write this operation as:

julia> sum((x .- mean(x)) .^ 2) / (length(x) - 1)
9.166666666666666

As you can see I use broadcasting in only two places, while most of the
operations are not broadcasted. If we wanted to use @. macro we would
need to use $ escaping and write something like:

julia> @. $/($sum((x - $mean(x)) ^ 2), $-($length(x), 1))
9.166666666666666

which is equivalent and ugly. You can check it by using @macroexpand:

julia> @macroexpand @. $/($sum((x - $mean(x))^2), $-($length(x), 1))
:(sum((^).((-).(x, mean(x)), 2)) / (length(x) - 1))

In this case also correct (but not equivalent) and simpler way would be:

julia> @. $sum((x - $mean(x))^2) / ($length(x) - 1)
9.166666666666666

However, in this case you need to know and be sure that by not escaping-out the
/ and - function calls in the second part of the expression you will not
affect the correctness of your calculation.

In summary, I do not use @. in complex expressions as it is usually hard to
reason about it.

Let us now switch to some special cases of using @..

Be careful with broadcasted assignment

Other common source of bugs when using @. is broadcasted assignment.

Let us analyze the following code:

julia> x = ["a", "b"]
2-element Vector{String}:
 "a"
 "b"

julia> x = @. length(x)
2-element Vector{Int64}:
 1
 1

julia> x
2-element Vector{Int64}:
 1
 1

julia> y = ["a", "b"]
2-element Vector{String}:
 "a"
 "b"

julia> @. y = length(y)
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type String

In the case of x variable the @. macro is on the right hand side of the
assignment. In this case we get a fresh binding of value to variable x.

In the case of y the @. encompasses the left hand side of the assignment.
In this case the operation is in-place. Therefore in this case we get an error,
because you cannot store integers in a vector of strings.

Things, of course, can be silently wrong, as in the following example of
Vector{Char}, as Char supports conversion from integer:

julia> z = ['a', 'b']
2-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

julia> @. z = length(z)
2-element Vector{Char}:
 '\x01': ASCII/Unicode U+0001 (category Cc: Other, control)
 '\x01': ASCII/Unicode U+0001 (category Cc: Other, control)

julia> z
2-element Vector{Char}:
 '\x01': ASCII/Unicode U+0001 (category Cc: Other, control)
 '\x01': ASCII/Unicode U+0001 (category Cc: Other, control)

Incorrect handling of named tuples

Consider the following code:

julia> v = 1:3
1:3

julia> [x in (a=1, c=3) for x in v]
3-element Vector{Bool}:
 1
 0
 1

We can rewrite it using broadcasting as:

julia> in.(v, Ref((a=1, c=3)))
3-element BitVector:
 1
 0
 1

Now we think we could use the @. here as follows:

julia> @. in(v, $Ref((a=1, c=3)))
ERROR: UndefVarError: c not defined

However, this fails as @. macro incorrectly handles = inside NamedTuple
definition. We have to write:

julia> @. in(v, $Ref((; a=1, c=3)))
3-element BitVector:
 1
 0
 1

The ; at the beginning of NamedTuple definition gives an equivalent object
but changes how the code expression is transformed by the Julia compiler
and it works. Here is how we can check the difference in the representation
of (a=1, c=3) and (; a=1, c=3):

julia> dump(:(a=1, c=3))
Expr
  head: Symbol tuple
  args: Array{Any}((2,))
    1: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol a
        2: Int64 1
    2: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol c
        2: Int64 3

julia> dump(:(; a=1, c=3))
Expr
  head: Symbol tuple
  args: Array{Any}((1,))
    1: Expr
      head: Symbol parameters
      args: Array{Any}((2,))
        1: Expr
          head: Symbol kw
          args: Array{Any}((2,))
            1: Symbol a
            2: Int64 1
        2: Expr
          head: Symbol kw
          args: Array{Any}((2,))
            1: Symbol c
            2: Int64 3

Fortunately the case of NamedTuple is not likely to be problematic in practice
as it is extremely rare.

Conclusions

The @. macro can be very convenient. However, in my experience, you
need to be careful when you use it as it is easy to get surprising results
if you work with complex expressions. Out of the possible problematic situations
I have covered in my post a most common one is forgetting to add $ to avoid
broadcasting of some function calls in a complex expression.

I hope that you will find this post useful and it will help you to avoid
bugs in your Julia code using broadcasting!

Breaking: Julia ranks in the top 5 most loved programming languages for 2022

By: Logan Kilpatrick

Re-posted from: https://blog.devgenius.io/breaking-julia-ranks-in-the-top-5-most-loved-programming-languages-for-2022-6cb7740240e1?source=rss-2c8aac9051d3------2

It should come as no surprise to those following the growth and expansion of the Julia Programming Language ecosystem that in this year’s…

The Zen of Missing in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/06/17/missing.html

Introduction

Some time ago I have written a post about
ABC of handling missing values in Julia. Its objective was to give
an introduction to the topic for the newcomers. However, occasionally
users complain that working with missing values in Julia is less convenient
than in e.g. Python or R.

Such opinions are always debatable, so recently I decided to run a small
pool on Julia Discourse about the skipmissing function.
The question was if we want to shorten the skipmissing name into something
that is more convenient to use in interactive work.
To my surprise, a vast majority of voters preferred a verbose and explicit
operation name. This preference regarding handling of missings, to my surprise,
reminded me of several passages from The Zen of Python:

  • Explicit is better than implicit.
  • Readability counts.
  • In the face of ambiguity, refuse the temptation to guess.

So given this preference, how should Julia users retain convenience. Let me
share some of my thoughts on this topic that did not make into my
previous post.

This post was written under Julia 1.7.2, Missings.jl 1.0.2,
MissingsAsFalse.jl 0.1, and StatsBase.jl 0.33.16.

Verbosity

Indeed writing missing, skipmissing, and passmissing (the last one is
defined in Missings.jl), in places where they are needed, might seem verbose.
However, the good news is that most of the time you do not have to type them
fully thanks to completions so in your editor/REPL:

  • instead of missing write mis<tab>;
  • instead of skipmissing write skipm<tab>;
  • instead of passmissing write pas<tab>.

Let us compare. In R:

sum(c(1, NA), na.rm=T)

vs Julia:

sum(skipmissing([1, missing]))

seems longer. However, if you take into account the amount of typing you need to
do (number of keystrokes) it is the same.

Missing values in logical conditions

As I have written in this post I personally strongly recommend using
coalesce to handle missing values in logical conditions. This allows you,
to follow the Explicit is better than implicit. principle by explicitly
showing in the code if missing should be treated as true or as false.

Here is a short example:

julia> c = missing
missing

julia> c ? "true" : "false"
ERROR: TypeError: non-boolean (Missing) used in boolean context

julia> coalesce(c, false) ? "true" : "false"
"false"

However, some users find it more convenient to use @mfalse macro from
the MissingsAsFalse.jl package:

julia> using MissingsAsFalse

julia> @mfalse c ? "true" : "false"
"false"

Correlation matrix with missing values

A common, and relatively complex case of handling missing values, is computing
of correlation matrix of data that contains missings. Let us check what Julia
offers here. We will use the pairwise function from StatsBase.jl.

julia> using Random

julia> using StatsBase

julia> using Statistics

julia> Random.seed!(1234);

julia> x = rand([1:10; missing], 16, 4)
16×4 Matrix{Union{Missing, Int64}}:
  4          missing   2         10
  7         8           missing   8
  3          missing   7          5
 10         9          8         10
  4         2          7          6
  5         6          1           missing
   missing  7          8          9
  9         3          2          6
  6         7         10           missing
  9         3          4          6
  7         2          9          8
  9         7          3          3
  1         8          6          5
  3         9          4          7
  5         7          3          2
  8         5           missing   4

julia> pairwise(cor, eachcol(x))
4×4 Matrix{Union{Missing, Float64}}:
 1.0        missing   missing   missing
  missing  1.0        missing   missing
  missing   missing  1.0        missing
  missing   missing   missing  1.0

julia> pairwise(cor, eachcol(x), skipmissing=:pairwise)
4×4 Matrix{Float64}:
  1.0        -0.218849    -0.0177704   0.0922413
 -0.218849    1.0          0.00973122  0.102969
 -0.0177704   0.00973122   1.0         0.364821
  0.0922413   0.102969     0.364821    1.0

julia> pairwise(cor, eachcol(x), skipmissing=:listwise)
4×4 Matrix{Float64}:
  1.0       -0.229568   -0.100028   0.247283
 -0.229568   1.0        -0.127153  -0.0420058
 -0.100028  -0.127153    1.0        0.67068
  0.247283  -0.0420058   0.67068    1.0

By default pairwise for cor returns missing when at least one of the
columns contains missing values. Use :pairwise value of skipmissing keyword
argument to skip entries with a missing value in either of the two vectors
passed to cor and use :listwise to skip entries with a missing value in
any of the vectors passed to pairwise.

In this example we see that since there are several ways how missing values
should be handled by cor in pairwise computations, instead of using the
skipmissing function, a keyword argument is used allowing to specify what
the data scientist wants exactly.

A similar situation is in the subset and subset! functions from
DataFrames.jl, that also take a skipmissing keyword argument, to simplify
handling of logical conditions specifying which rows from a source data frame
should be kept.

Conclusions

In Julia the approach is that handling of missing values is explicit. This
choice is guided by the fact that then in the code it is explicitly visible what
was the developer’s decision about how they should be treated. This choice makes
code more verbose. Today I have tried to show that, especially with a good
editor/REPL support, in practice it does not introduce a large overhead.