Author Archives: Blog by Bogumił Kamiński

The curious case of subset condition

Re-posted from: https://bkamins.github.io/julialang/2022/01/28/subset.html

Introduction

Recently on Julia Slack there was a question about using the subset function
to drop whole groups from GroupedDataFrame in DataFrames.jl.
I thought that indeed this case is tricky enough to be worth a post.

The examples were tested under Julia 1.7.0 and DataFrames.jl 1.3.2.

Standard use cases of the `subset` function

Let us start with creating some sample data:

julia> using DataFrames

julia> df = DataFrame(id=[1, 1, 1, 1, 2, 2], x=1:6)
6×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3
   4 │     1      4
   5 │     2      5
   6 │     2      6

julia> gdf = groupby(df, :id)
GroupedDataFrame with 2 groups based on key: id
First Group (4 rows): id = 1
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3
   4 │     1      4
⋮
Last Group (2 rows): id = 2
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     2      5
   2 │     2      6

Assume we want to keep rows having value of :x less than the mean of this
column from df. This can be achieved with:

julia> using Statistics

julia> subset(df, :x => x -> x .< mean(x))
3×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3

The same operation can be easily done groupwise. Now we keep rows that have the
value of :x less than the mean of this column per group defined by :id:

julia> subset(gdf, :x => x -> x .< mean(x))
3×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     2      5

The limitation of the `subset` contract

The subset function requires that the return value of the passed condition
is a vector. Therefore the following operation fails:

julia> subset(df, :x => x -> true)
ERROR: ArgumentError: functions passed to `subset` must return an AbstractVector.

although we might expect that broadcasting would be applied to the result of
the function and all rows would be kept. For a reference e.g. select would
perform such broadcasting automatically:

julia> select(df, All(), :x => x -> true)
6×3 DataFrame
 Row │ id     x      x_function
     │ Int64  Int64  Bool
─────┼──────────────────────────
   1 │     1      1        true
   2 │     1      2        true
   3 │     1      3        true
   4 │     1      4        true
   5 │     2      5        true
   6 │     2      6        true

You might wonder why this restriction is made. Initially we allowed non-vector
return values, but they turned to be confusing for the users so we disallowed
them.

Let me give an example. If the user wants to keep all rows for which the :id
column is equal to 1 one should write:

julia> subset(df, :id => ByRow(==(1)))
4×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3
   4 │     1      4

However, it turned out that users frequently were forgetting to add ByRow
wrapper and instead used:

julia> subset(df, :id => ==(1))
ERROR: ArgumentError: functions passed to `subset` must return an AbstractVector.

Now it throws an error, but if we have not imposed the restriction that we require
a vector to be returned we would get the following result:

julia> subset(df, :id => x -> fill(x == 1, length(x)))
0×2 DataFrame

as the whole column :id would be compared to 1 and the result of this
comparison is false.

Dropping whole groups from a `GroupedDataFrame`

The requirement that the condition must return a vector was added for safety
reasons. However, there is one case when it is a bit problematic.

Assume we want to keep from the gdf GroupedDataFrame all groups for which
the mean of :x column is less than 3. The problem is that the following
condition fails:

julia> subset(gdf, :x => x -> mean(x) < 3)
ERROR: ArgumentError: functions passed to `subset` must return an AbstractVector.

since the comparing the mean of the :x column to 3 produces a scalar Bool
value.

The solution is to manually expand the result of the condition to match the
number of rows in the group:

julia> subset(gdf, :x => x -> fill(mean(x) < 3, length(x)))
4×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3
   4 │     1      4

This is unfortunately a bit inconvenient.

An alternative approach would be to use the filter function which applied
to GroupedDataFrame always works on whole groups:

julia> filter(:x => x -> mean(x) < 3, gdf) |> DataFrame
4×2 DataFrame
 Row │ id     x
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     1      2
   3 │     1      3
   4 │     1      4

(we had to pass the result of filter to DataFrame constructor, as otherwise
we would get a filtered GroupedDataFrame)

Conclusions

The design of subset I discussed in this post shows one of the challenges we
face when defining APIs in DataFrames.jl. There often is a tension between
developer convenience and safety. In this example allowing only vectors as
results of conditions in the subset function is safer since it allows to
catch some common bugs in the users code. The cost is that in some cases
(most notably dropping whole groups from a GroupedDataFrame) it is a bit
inconvenient.

Categorical vs pooled arrays

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/01/21/categorical.html

Introduction

When working with data in Julia you most likely have encountered two packages
PooledArrays.jl and CategoricalArrays.jl. They both provide
custom implementations of AbstractArray that are useful when you have low
cardinality data.

In this post I want to highlight major similarities and differences between
these two packages.

The examples were tested under Julia 1.7.0, DataFrames.jl 1.3.1,
PooledArrays.jl 1.4.0, CategoricalArrays.jl 0.10.2, FreqTables.jl 0.4.5,
and BenchmarkTols 1.2.2.

PooledArrays.jl

As I have explained in my recent post you can use PooledArrays.jl when
you have large vectors that have few unique values. The simplest way of thinking
about PooledArray type is that it can be used as a way to compress the data.

Here is an example:

julia> using PooledArrays

julia> v = repeat(["a", "b", "c", "d"], 10^6)
4000000-element Vector{String}:
 "a"
 "b"
 "c"
 "d"
 ⋮
 "a"
 "b"
 "c"
 "d"

julia> Base.summarysize(v)
32000076

julia> pv = PooledArray(v)
4000000-element PooledVector{String, UInt32, Vector{UInt32}}:
 "a"
 "b"
 "c"
 "d"
 ⋮
 "a"
 "b"
 "c"
 "d"

julia> Base.summarysize(pv)
16000580

As you can see even for short strings we have achieved significant compression.
You can have even better compression by passing compress=true keyword argument:

julia> pv2 = PooledArray(v, compress=true)
4000000-element PooledVector{String, UInt8, Vector{UInt8}}:
 "a"
 "b"
 "c"
 "d"
 ⋮
 "b"
 "c"
 "d"

julia> Base.summarysize(PooledArray(v, compress=true))
4000532

There is one more benefit of using PooledArray when working with DataFrames.jl.
You can expect grouping and join operations to be performed faster. Here are
some examples:

julia> using DataFrames

julia> using BenchmarkTools

julia> df = DataFrame(;v, pv, pv2)
4000000×3 DataFrame
     Row │ v       pv      pv2
         │ String  String  String
─────────┼────────────────────────
       1 │ a       a       a
       2 │ b       b       b
       3 │ c       c       c
    ⋮    │   ⋮       ⋮       ⋮
 3999998 │ b       b       b
 3999999 │ c       c       c
 4000000 │ d       d       d
              3999994 rows omitted

julia> @benchmark groupby($df, :v)
BenchmarkTools.Trial: 61 samples with 1 evaluation.
 Range (min … max):  66.280 ms … 170.877 ms  ┊ GC (min … max): 0.68% … 58.51%
 Time  (median):     79.254 ms               ┊ GC (median):    0.62%
 Time  (mean ± σ):   82.754 ms ±  17.021 ms  ┊ GC (mean ± σ):  5.53% ± 10.36%

     █          ▇
  ▃▃▃█▅▁▄█▅▄▁▁▃▄█▆▃▁▁▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  66.3 ms         Histogram: frequency by time          154 ms <

 Memory estimate: 125.04 MiB, allocs estimate: 37.

julia> @benchmark groupby($df, :pv)
BenchmarkTools.Trial: 678 samples with 1 evaluation.
 Range (min … max):  3.920 ms … 20.284 ms  ┊ GC (min … max): 0.00% …  4.13%
 Time  (median):     4.835 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.373 ms ±  4.111 ms  ┊ GC (mean ± σ):  5.02% ± 10.96%

  ▃█▇▃▂    ▁▅                       ▁ ▄▁▃▂
  ██████▇▆▇██▇▆▆▆▅▆▇▆█▆▄▅▁▁▆▁▁▆▄▇▅▅▆█▄█████▇▄▅▄▄▄▆▅▅▅▅▁▄▁▅▄▅ ▇
  3.92 ms      Histogram: log(frequency) by time     18.6 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 61.

julia> @benchmark groupby($df, :pv2)
BenchmarkTools.Trial: 710 samples with 1 evaluation.
 Range (min … max):  3.462 ms … 19.995 ms  ┊ GC (min … max): 0.00% …  8.23%
 Time  (median):     4.356 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.035 ms ±  4.189 ms  ┊ GC (mean ± σ):  5.63% ± 11.95%

  ▂▄█▃▁   ▁  ▄▁                       ▃  ▃
  ██████▇▆█▇▇██▇▇▇▆▅▆▇▆▄▅▁▁▁▄▄▆▄▄▇▄▁▄▇██████▇▅▇▅▄▅▄▅▄▁▄▄▅▅▅▇ ▇
  3.46 ms      Histogram: log(frequency) by time       18 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 61.

julia> df2 = df[1:4, :]
4×3 DataFrame
 Row │ v       pv      pv2
     │ String  String  String
─────┼────────────────────────
   1 │ a       a       a
   2 │ b       b       b
   3 │ c       c       c
   4 │ d       d       d

julia> @benchmark innerjoin($df, $df2, on=:v, makeunique=true)
BenchmarkTools.Trial: 26 samples with 1 evaluation.
 Range (min … max):  153.954 ms … 281.650 ms  ┊ GC (min … max): 1.63% … 34.70%
 Time  (median):     189.137 ms               ┊ GC (median):    1.12%
 Time  (mean ± σ):   193.827 ms ±  26.318 ms  ┊ GC (mean ± σ):  5.15% ±  8.92%

           ▁   ▁ ▁█  ▄ ▁
  ▆▁▁▆▁▁▁▁▁█▆▆▁█▁██▆▁█▁█▆▆▆▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▆ ▁
  154 ms           Histogram: frequency by time          282 ms <

 Memory estimate: 143.01 MiB, allocs estimate: 262.

julia> @benchmark innerjoin($df, $df2, on=:pv, makeunique=true)
BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min … max):  124.786 ms … 258.973 ms  ┊ GC (min … max): 2.46% … 42.85%
 Time  (median):     152.645 ms               ┊ GC (median):    2.56%
 Time  (mean ± σ):   157.173 ms ±  28.157 ms  ┊ GC (mean ± σ):  7.05% ±  9.68%

  ▄        ██▁ █▁ ▄▄
  █▁▆▁▁▆▁▁▁███▆██▁██▁▁▆▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▆ ▁
  125 ms           Histogram: frequency by time          259 ms <

 Memory estimate: 158.27 MiB, allocs estimate: 274.

julia> @benchmark innerjoin($df, $df2, on=:pv2, makeunique=true)
BenchmarkTools.Trial: 34 samples with 1 evaluation.
 Range (min … max):  119.717 ms … 252.698 ms  ┊ GC (min … max): 1.92% … 44.18%
 Time  (median):     143.639 ms               ┊ GC (median):    1.02%
 Time  (mean ± σ):   147.490 ms ±  26.220 ms  ┊ GC (mean ± σ):  6.05% ± 10.18%

  ▁     █   ▃▆▁  ▁
  █▁▁▁▁▄█▇▄▁███▁▁█▇▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄ ▁
  120 ms           Histogram: frequency by time          253 ms <

 Memory estimate: 169.72 MiB, allocs estimate: 274.

Apart from these differences working with pooled vectors should have no
user visible differences in comparison to working with normal vectors.

CategoricalArrays.jl

The CategoricalArray type is also performing data compression, but its main
use case is when you want to represent your data as categorical in a statistical
sesnse (both ordered and unordered).

However, let me start with showing you the compression effect and grouping
and joining speedups, as they are the same as for PooledArrays.jl.

julia> using CategoricalArrays

julia> cv = categorical(v)
4000000-element CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "c"
 "d"
 "a"
 ⋮
 "a"
 "b"
 "c"
 "d"

julia> Base.summarysize(cv)
16000612

julia> cv2 = categorical(v, compress=true)
4000000-element CategoricalArray{String,1,UInt8}:
 "a"
 "b"
 "c"
 "d"
 "a"
 ⋮
 "a"
 "b"
 "c"
 "d"

julia> Base.summarysize(cv2)
4000564

julia> df = DataFrame(;v, cv, cv2)
4000000×3 DataFrame
     Row │ v       cv    cv2
         │ String  Cat…  Cat…
─────────┼────────────────────
       1 │ a       a       a
       2 │ b       b       b
       3 │ c       c       c
    ⋮    │   ⋮       ⋮       ⋮
 3999998 │ b       b       b
 3999999 │ c       c       c
 4000000 │ d       d       d
              3999994 rows omitted

julia> @benchmark groupby($df, :v)
BenchmarkTools.Trial: 54 samples with 1 evaluation.
 Range (min … max):  72.757 ms … 109.888 ms  ┊ GC (min … max): 0.00% … 4.32%
 Time  (median):     92.208 ms               ┊ GC (median):    3.27%
 Time  (mean ± σ):   92.917 ms ±  11.122 ms  ┊ GC (mean ± σ):  3.53% ± 2.48%

                        ▆           ▂                      ▆ █
  ▄▄▁▁▁▁▁▄▁▄▄▆▆▁▁▄▄▁▄▁▄▆█▆▁▁▁▁▁▆▁▄▆██▄▁▁▁▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▆█▁█ ▁
  72.8 ms         Histogram: frequency by time          108 ms <

 Memory estimate: 125.04 MiB, allocs estimate: 33.

julia> @benchmark groupby($df, :cv)
BenchmarkTools.Trial: 714 samples with 1 evaluation.
 Range (min … max):  3.930 ms … 15.026 ms  ┊ GC (min … max): 0.00% …  4.58%
 Time  (median):     4.474 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.007 ms ±  3.824 ms  ┊ GC (mean ± σ):  5.99% ± 12.26%

  ▃ █▇              ▄▁                               ▁ ▅ ▂ ▄
  █▇██▆▇▄▁▄▁▁▁▁▄▆▅▄███▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▅▁▆▆▁▁▅▅▁█▁█▇█▇█ ▇
  3.93 ms      Histogram: log(frequency) by time     14.1 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 57.

julia> @benchmark groupby($df, :cv2)
BenchmarkTools.Trial: 634 samples with 1 evaluation.
 Range (min … max):  3.506 ms … 22.074 ms  ┊ GC (min … max): 0.00% …  0.00%
 Time  (median):     5.651 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.881 ms ±  4.817 ms  ┊ GC (mean ± σ):  6.49% ± 13.01%

  ▁█▆▂▂ ▃▃▂ ▄▂      ▁           ▄ ▄▁
  ████████████▆▇▅▆█▆█▇▅▁▁▅▅▆▄▇▅▆█████▇▄▅▅▆▅▄▆▅▅▁▇▄▅▆▆▇▅▇▆▆▄▄ ▇
  3.51 ms      Histogram: log(frequency) by time     21.2 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 57.

julia> df2 = df[1:4, :]
4×3 DataFrame
 Row │ v       cv    cv2
     │ String  Cat…  Cat…
─────┼────────────────────
   1 │ a       a     a
   2 │ b       b     b
   3 │ c       c     c
   4 │ d       d     d

julia> @benchmark innerjoin($df, $df2, on=:v, makeunique=true)
BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range (min … max):  158.062 ms … 193.518 ms  ┊ GC (min … max): 0.46% … 1.62%
 Time  (median):     187.180 ms               ┊ GC (median):    1.66%
 Time  (mean ± σ):   186.249 ms ±   7.073 ms  ┊ GC (mean ± σ):  1.89% ± 0.88%

                                                   █▂
  ▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▄▄▆▄▄▄▄▁▁▆ ▁
  158 ms           Histogram: frequency by time          194 ms <

 Memory estimate: 143.02 MiB, allocs estimate: 275.

julia> @benchmark innerjoin($df, $df2, on=:cv, makeunique=true)
BenchmarkTools.Trial: 33 samples with 1 evaluation.
 Range (min … max):  133.837 ms … 253.376 ms  ┊ GC (min … max): 0.54% … 43.65%
 Time  (median):     152.783 ms               ┊ GC (median):    1.83%
 Time  (mean ± σ):   154.100 ms ±  21.196 ms  ┊ GC (mean ± σ):  5.68% ±  8.33%

    █▂     ▅ ▅▂
  ▅▁██▅██▅▅████▁▅▁▁▁▁▁▁▁▁▁▅▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  134 ms           Histogram: frequency by time          253 ms <

 Memory estimate: 158.27 MiB, allocs estimate: 284.

julia> @benchmark innerjoin($df, $df2, on=:cv2, makeunique=true)
BenchmarkTools.Trial: 31 samples with 1 evaluation.
 Range (min … max):  118.869 ms … 292.135 ms  ┊ GC (min … max): 0.63% … 39.85%
 Time  (median):     149.992 ms               ┊ GC (median):    2.45%
 Time  (mean ± σ):   161.881 ms ±  35.070 ms  ┊ GC (mean ± σ):  8.69% ± 10.37%

            █
  ▃▁▁▁▁▃▃▁▁▃█▆▃▃▃▃▁▃▄▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃ ▁
  119 ms           Histogram: frequency by time          292 ms <

 Memory estimate: 169.72 MiB, allocs estimate: 284.

So what is the difference?

First, only a limited set of value types can be stored in categorical array:

julia> categorical([nothing])
ERROR: ArgumentError: CategoricalArray only supports AbstractString, AbstractChar and Number element types (got element type Nothing)

Second, when you retrieve values from categorical arrays they are wrapped in
a custom type:

julia> cv[1]
CategoricalValue{String, UInt32} "a"

You can extract out the underlying value using the unwrap function:

julia> unwrap(cv[1])
"a"

Why do we have this behavior? The reason is that values stored in the
categorical array are treated as categories, and the wrapped value is just a
label for some category.

Let us first check the order of levels in the cv vector and try to compare two
of its elements:

julia> levels(cv)
4-element Vector{String}:
 "a"
 "b"
 "c"
 "d"

julia> cv[1] < cv[2]
ERROR: ArgumentError: Unordered CategoricalValue objects cannot be tested for order using <. Use isless instead, or call the ordered! function on the parent array to change this

The order of levels will be reflected e.g. if you use FreqTables.jl freqtable
function on such a vector:

julia> using FreqTables

julia> freqtable(cv)
4-element Named Vector{Int64}
Dim1  │
──────┼────────
"a"   │ 1000000
"b"   │ 1000000
"c"   │ 1000000
"d"   │ 1000000

Let us make the cv vector ordered and change the order of levels:

julia> ordered!(cv, true);

julia> levels!(cv, ["d", "b", "a", "c"]);

Now try doing the same operations:

julia> levels(cv)
4-element Vector{String}:
 "d"
 "b"
 "a"
 "c"

julia> cv[1] < cv[2]
false

julia> cv[1], cv[2]
(CategoricalValue{String, UInt32} "a" (3/4), CategoricalValue{String, UInt32} "b" (2/4))

julia> freqtable(cv)
4-element Named Vector{Int64}
Dim1  │
──────┼────────
"d"   │ 1000000
"b"   │ 1000000
"a"   │ 1000000
"c"   │ 1000000

As you can see freqtable respects the ordering of levels. Also when we compared
cv[1] vs cv[2] the order is respected (i.e. since level "a" is in position
3, and level "b" is in position 2 the comparison returns false).

Conclusions

In summary, most of the time using PooledArrays.jl is what you will need if you
only require to save memory when working with large data. However, if you want
to process data that is categorical in statistical sense use
CategoricalArrays.jl. There is some mental overhead of working with categorical
values as there is a special CategoricalValue type that you need to learn how
to work with. However, the benefit is that the information about levels and
their ordering is retained and can be used if you do operations on elements
of categorical vector.

Pitfalls of macro invocation in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/01/14/macros.html

Introduction

Macros in Julia are super useful for defining domain specific languages
and this is taken advantage of by many packages like JuMP.jl, StatsModels.jl,
DataFramesMeta.jl, DataFrameMacros.jl, ….

This post was prompted by the discussion in this issue and is aimed to
highlight how macros should be properly invoked.

The examples were tested under Julia 1.7.0.

Preliminaries

A big advantage of macros is that they do not require parentheses when they are
called, e.g.:

julia> @time sin(1)
  0.000001 seconds
0.8414709848078965

This avoids visual noise of the alternative syntax:

julia> @time(sin(1))
  0.000001 seconds
0.8414709848078965

The rules of both types of invocation are explained in the Julia Manual:

Macros are invoked with the following general syntax:
@name expr1 expr2 ...
@name(expr1, expr2, ...)
Note the distinguishing @ before the macro name and the lack of commas between
the argument expressions in the first form, and the lack of whitespace after
@name in the second form.

The explanation seems clear. However, sometimes it is tricky to tell what Julia
considers to be an expression. Let me give some examples.

Examples of non-obvious expression handling

I think the issue is best explained with this basic macro:

julia> macro m(args...)
           show(args)
       end
@m (macro with 1 method)

julia> @m 1 + 1
(:(1 + 1),)
julia> @m 1+1
(:(1 + 1),)
julia> @m 1 +1
(1, 1)

As you can see above when you write 1 + 1 and 1+1 then Julia treats it
as a single expression. However if you write 1 +1 then Julia considers it
to be two expressions.

The issue is especially tricky with tuples:

julia> @m(1, 1)
(1, 1)
julia> @m (1, 1)
(:((1, 1)),)
julia> @m 1, 1
(:((1, 1)),)
julia> @m 1 1
(1, 1)

In the first case a parenthesized style of macro call was used and we see that
the @m macro received two arguments. In @m (1, 1) since we put a space
after @m the (1, 1) is considered to be a tuple that was passed to it as a
single argument. Writing @m 1, 1 is interpreted in the same way, as when
defining a tuple you can omit passing parenthesis. Finally @m 1 1 is again
interpreted as passing two arguments to @m because the first and the second
1 are separate expressions.

Conclusions

When writing macros always make sure to take care of understanding where the
boundaries of the expressions passed to it are or use the macro invocation style
that uses parentheses.

Let me give one final example. If you want to get the time in minutes that some
operation took do not write:

julia> @elapsed sleep(1) / 60
ERROR: MethodError: no method matching /(::Nothing, ::Int64)

as sleep(1) / 60 gets interpreted as a single expression.
Instead do

julia> (@elapsed sleep(1)) / 60
0.01670782215

julia> @elapsed(sleep(1)) / 60
0.016707725083333333

juliabloggers.com

A Julia Language Blog Aggregator

Author Archives: Blog by Bogumił Kamiński

The curious case of subset condition

Introduction

Standard use cases of the `subset` function

The limitation of the `subset` contract

Dropping whole groups from a `GroupedDataFrame`

Conclusions

Categorical vs pooled arrays

Introduction

PooledArrays.jl

CategoricalArrays.jl

Conclusions

Pitfalls of macro invocation in Julia

Introduction

Preliminaries

Examples of non-obvious expression handling

Conclusions

Introduction

Standard use cases of the subset function

The limitation of the subset contract

Dropping whole groups from a GroupedDataFrame

Conclusions

Introduction

PooledArrays.jl

CategoricalArrays.jl

Conclusions

Introduction

Preliminaries

Examples of non-obvious expression handling

Conclusions

Standard use cases of the `subset` function

The limitation of the `subset` contract

Dropping whole groups from a `GroupedDataFrame`