Author Archives: Blog by Bogumił Kamiński

The @view and @views macros: are you sure you know how they work?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/06/10/view.html

Introduction

Macros in Julia are a very nice element of the language. You can easily visually
identify macros as they are called using the @ prefix. I most often use the
@assert, @show, @spawn, @edit, and @time macros. However, one needs to
understand how macros work to confidently use them.

Today I want to write about typical situations when using
the @view and @views macros to create views can lead to surprising results.

The post was tested under Julia 1.7.2.

The @view macro and expression range

Assume we have a vector representing 24 months of data and we want to compute
correlation between the first 12 observations and the last 12 observations.

Here is a simple way to do it:

julia> using Random

julia> using Statistics

julia> Random.seed!(1234);

julia> x = randn(24);

julia> cor(x[1:12], x[13:24])
-0.018264675865149734

However, this operation copies data. If we want to avoid this we can use
views. To start let us check the view function:

julia> cor(view(x, 1:12), view(x, 13:24))
-0.018264675865149734

Let us check if indeed this reduces allocations first (using the @allocated
macro):

julia> @allocated cor(x[1:12], x[13:24])
336

julia> @allocated cor(view(x, 1:12), view(x, 13:24))
112

Indeed it does. Of course in our toy example the benefit is minimal.

Now we are ready to get to the main point of my post. Suppose we have
cor(x[1:12], x[13:24]) and want to do use views. As you can see in
codes above turning indexing into view call requires re-writing of the
expressions. This is where the @view macro comes handy.

julia> cor(@view x[1:12], @view x[13:24])
ERROR: LoadError: ArgumentError: Invalid use of @view macro: argument must be a reference expression A[...].

Or does it? We have some problem when using the macro. Let us check the
Julia Manual:

Macros are invoked with the following general syntax:

@name expr1 expr2 ... or @name(expr1, expr2, ...)

Since we invoked @view using the first style Julia eagerly considers
everything that follows it as a single expression. In this case the whole
x[1:12], @view x[13:24] part of code is passed to @view as a single
expression and we get an error.

We need to use the second macro invocation style in this case:

julia> cor(@view(x[1:12]), @view(x[13:24]))
-0.018264675865149734

Now all worked as expected. The only downside is that it feels a bit
inconvenient to write @view(...). Fortunately, Julia’s creators have thought
about it and designed a @views macro which turns every array slicing
(i.e., array[...]) operation in a passed expression to a view. In our case
this would be:

julia> @views cor(x[1:12], x[13:24])
-0.018264675865149734

The @views macro surprises

Let us check using the @macroexpand macro if indeed @views works as I promised:

julia> @macroexpand @views cor(x[1:12], x[13:24])
:(cor((Base.maybeview)(x, 1:12), (Base.maybeview)(x, 13:24)))

What is this strange Base.maybeview function? Is like getindex, but returns
a view for array slicing operations, while remaining equivalent to getindex
for scalar indices and non-array types. So it is almost the same as view.

Let us see the difference between using @view and @views:

julia> @view x[1]
0-dimensional view(::Vector{Float64}, 1) with eltype Float64:
0.9706563288552144

julia> @views x[1]
0.9706563288552144

Most of the time using a 0-dimensional view or a scalar will not make a
difference, however, sometimes it does. Let us have a look:

julia> similar(@view x[1])
0-dimensional Array{Float64, 0}:
1.40721121e-315

julia> similar(@views x[1])
ERROR: MethodError: no method matching similar(::Float64)

So things are, unfortunately, not as simple as you might expect, especially if
you are writing generic code and do not know upfront if you will use a scalar
index or not.

Having learned what I have written above you might think that the following code
will not work:

julia> x, y, z = [1, 2, 3], [2, 3, 4], [4, 5, 6]
([1, 2, 3], [2, 3, 4], [4, 5, 6])

julia> @views x[1] = y[1] + z[1]
6

julia> x
3-element Vector{Int64}:
 6
 2
 3

However, it produces a correct result. What is the reason? Let us check:

julia> @macroexpand @views x[1] = y[1] + z[1]
:(x[1] = (Base.maybeview)(y, 1) + (Base.maybeview)(z, 1))

As we can see @views is smart enough not to apply the view transformation
to the left hand side of the assignment. This feature is not documented in its
docstring, but can be found (and is even commented about) in the source code of
the @views macro (in the _views function to be precise).

Interestingly, this rule is in play even in the example given in
the docsting of the @views macro. Let us check it:

julia> A = zeros(3, 3);

julia> @macroexpand @views for row in 1:3
                    b = A[row, :]
                    b[:] .= row
                end
:(for row = 1:3
      #= REPL[67]:2 =#
      b = (Base.maybeview)(A, row, :)
      #= REPL[67]:3 =#
      b[:] .= row
  end)

We can see that since b[:] is on left hand side of the assignment it is not
touched by @views.

Conclusions

What are the lessons learned?

  1. Macros in Julia are powerful. Well designed macros can make developer’s
    life easier and code more readable.
  2. Macros can be tricky. The most common problem when using macros is the
    @name expr1 style of invocation, which can process “too much” of your code
    unexpectedly.
  3. @views is not equivalent to multiple invocations of @view. There are
    subtle differences between them. Fortunately these differences matter
    mostly in generic code and thus package developers need to be aware of them.

The @view and @views macros: are you sure you know how they work?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/06/10/view.html

Introduction

Macros in Julia are a very nice element of the language. You can easily visually
identify macros as they are called using the @ prefix. I most often use the
@assert, @show, @spawn, @edit, and @time macros. However, one needs to
understand how macros work to confidently use them.

Today I want to write about typical situations when using
the @view and @views macros to create views can lead to surprising results.

The post was tested under Julia 1.7.2.

The @view macro and expression range

Assume we have a vector representing 24 months of data and we want to compute
correlation between the first 12 observations and the last 12 observations.

Here is a simple way to do it:

julia> using Random

julia> using Statistics

julia> Random.seed!(1234);

julia> x = randn(24);

julia> cor(x[1:12], x[13:24])
-0.018264675865149734

However, this operation copies data. If we want to avoid this we can use
views. To start let us check the view function:

julia> cor(view(x, 1:12), view(x, 13:24))
-0.018264675865149734

Let us check if indeed this reduces allocations first (using the @allocated
macro):

julia> @allocated cor(x[1:12], x[13:24])
336

julia> @allocated cor(view(x, 1:12), view(x, 13:24))
112

Indeed it does. Of course in our toy example the benefit is minimal.

Now we are ready to get to the main point of my post. Suppose we have
cor(x[1:12], x[13:24]) and want to do use views. As you can see in
codes above turning indexing into view call requires re-writing of the
expressions. This is where the @view macro comes handy.

julia> cor(@view x[1:12], @view x[13:24])
ERROR: LoadError: ArgumentError: Invalid use of @view macro: argument must be a reference expression A[...].

Or does it? We have some problem when using the macro. Let us check the
Julia Manual:

Macros are invoked with the following general syntax:

@name expr1 expr2 ... or @name(expr1, expr2, ...)

Since we invoked @view using the first style Julia eagerly considers
everything that follows it as a single expression. In this case the whole
x[1:12], @view x[13:24] part of code is passed to @view as a single
expression and we get an error.

We need to use the second macro invocation style in this case:

julia> cor(@view(x[1:12]), @view(x[13:24]))
-0.018264675865149734

Now all worked as expected. The only downside is that it feels a bit
inconvenient to write @view(...). Fortunately, Julia’s creators have thought
about it and designed a @views macro which turns every array slicing
(i.e., array[...]) operation in a passed expression to a view. In our case
this would be:

julia> @views cor(x[1:12], x[13:24])
-0.018264675865149734

The @views macro surprises

Let us check using the @macroexpand macro if indeed @views works as I promised:

julia> @macroexpand @views cor(x[1:12], x[13:24])
:(cor((Base.maybeview)(x, 1:12), (Base.maybeview)(x, 13:24)))

What is this strange Base.maybeview function? Is like getindex, but returns
a view for array slicing operations, while remaining equivalent to getindex
for scalar indices and non-array types. So it is almost the same as view.

Let us see the difference between using @view and @views:

julia> @view x[1]
0-dimensional view(::Vector{Float64}, 1) with eltype Float64:
0.9706563288552144

julia> @views x[1]
0.9706563288552144

Most of the time using a 0-dimensional view or a scalar will not make a
difference, however, sometimes it does. Let us have a look:

julia> similar(@view x[1])
0-dimensional Array{Float64, 0}:
1.40721121e-315

julia> similar(@views x[1])
ERROR: MethodError: no method matching similar(::Float64)

So things are, unfortunately, not as simple as you might expect, especially if
you are writing generic code and do not know upfront if you will use a scalar
index or not.

Having learned what I have written above you might think that the following code
will not work:

julia> x, y, z = [1, 2, 3], [2, 3, 4], [4, 5, 6]
([1, 2, 3], [2, 3, 4], [4, 5, 6])

julia> @views x[1] = y[1] + z[1]
6

julia> x
3-element Vector{Int64}:
 6
 2
 3

However, it produces a correct result. What is the reason? Let us check:

julia> @macroexpand @views x[1] = y[1] + z[1]
:(x[1] = (Base.maybeview)(y, 1) + (Base.maybeview)(z, 1))

As we can see @views is smart enough not to apply the view transformation
to the left hand side of the assignment. This feature is not documented in its
docstring, but can be found (and is even commented about) in the source code of
the @views macro (in the _views function to be precise).

Interestingly, this rule is in play even in the example given in
the docsting of the @views macro. Let us check it:

julia> A = zeros(3, 3);

julia> @macroexpand @views for row in 1:3
                    b = A[row, :]
                    b[:] .= row
                end
:(for row = 1:3
      #= REPL[67]:2 =#
      b = (Base.maybeview)(A, row, :)
      #= REPL[67]:3 =#
      b[:] .= row
  end)

We can see that since b[:] is on left hand side of the assignment it is not
touched by @views.

Conclusions

What are the lessons learned?

  1. Macros in Julia are powerful. Well designed macros can make developer’s
    life easier and code more readable.
  2. Macros can be tricky. The most common problem when using macros is the
    @name expr1 style of invocation, which can process “too much” of your code
    unexpectedly.
  3. @views is not equivalent to multiple invocations of @view. There are
    subtle differences between them. Fortunately these differences matter
    mostly in generic code and thus package developers need to be aware of them.

DataFrames.jl for work and pleasure

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/06/03/selectors.html

Introduction

This week I have read a post on Why Vim is better than VSCode.
In it the author discusses a lot the operator – text object – motion
pattern in Vim. The post argues that it is not only efficient but fun to
learn and use.

It reminded me of the structure of the operation specification language
we have in DataFrames.jl that follows the pattern:

input columns => transformation => output column names

I have already written two posts about this topic that you can find
here and here. Therefore, today I decided to take the
fun part of using the minilanguage.

The post is written under Julia 1.7.2 and DataFrames.jl 1.3.4.

The challenge

The user has some data frame and wants to drop a :col column from it,
but the user is not sure if this column is present in the data frame.

Let us first create two test data frames on which we will test our solutions:

julia> using DataFrames

julia> df1 = DataFrame(a=1:2, b=3:4)
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> df2 = DataFrame(a=1:2, col=["drop", "me"], b=3:4)
2×3 DataFrame
 Row │ a      col     b
     │ Int64  String  Int64
─────┼──────────────────────
   1 │     1  drop        3
   2 │     2  me          4

A basic approach

A natural thing to try is using the Not selector for this task. Let us
check it:

julia> select(df1, Not(:col))
ERROR: ArgumentError: column name :col not found in the data frame

julia> select(df2, Not(:col))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

The operation worked on df2, but failed on df1.

You might ask why Not selector is so restrictive? The reason is to avoid bugs.
You could accidentally mistype column name and then, if such operation worked,
instead of erroring, your incorrect result would propagate.

An intermediate solution

A first solution that comes to mind is to drop the column only if it is present
in a data frame so you might write something like this:

julia> select(df1, names(df1) .!= "col")
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> select(df2, names(df2) .!= "col")
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

This works, but you need to write the name of the source data frame twice,
so the solution feels a bit heavy.

The fun part

What is the way I find nice to do this operation then? Here is the approach:

julia> select(df1, Cols(!=("col")))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> select(df2, Cols(!=("col")))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

We are using a combo of a bit advanced features here.

First !=("col") creates a function that compares its argument to "col" using
!=. This is a very nice feature of Base Julia that it allows partial function
application for the != operator.

Next the Cols function accepts a predicate, in our case !=("col"). Then it
selects all columns of a data frame for which this predicate returns true.

Conclusions

The beauty of Julia is that it not only does the job you want done, but also is
quite fun to code with. At the same time, its design often helps you with
catching common possible bugs in code (like the Not behavior I have described
in this post).

Enjoy!