Author Archives: Blog by Bogumił Kamiński

How do loops in Julia handle local variables?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/02/19/binding.html

Introduction

Today I decided to write about a feature of Julia that is well known to people
working with it a lot, but which often triggers questions from experienced
people switching to Julia from other languages.

The topic of this post is why this code fails:

julia> function f()
       for i in 0:3
           if i == 0
               v = 0
           else
               v += 1
           end
       end
       end
f (generic function with 1 method)

julia> f()
ERROR: UndefVarError: v not defined

and what to do to fix the problem.

The post was written under Julia 1.5.3.

The root cause

The reason why people ask this question is that e.g. in Python the following code
runs without any problem:

>>> def f():
...     for i in range(4):
...         if i == 0:
...             v = 0
...         else:
...             v += 1
...     return v
...
>>> f()
3

So what is the root cause of this difference? The reason is that in Julia
for loop creates a new scope for the variables that are not present in the
enclosing scope (i.e. variables local to the for loop do not leak out).

Moreover, as is explained here in the Julia manual such variables
get a new binding in each iteration of the loop. So in our example although we
set v = 0 in the first iteration this value is not retained in the following
iterations of the loop.

Fixing the problem

Fortunately it is easy to fix the problem in case you wanted v to behave
differently. Just define a local variable in the enclosing scope like this:

julia> function f()
       local v
       for i in 0:3
           if i == 0
               v = 0
           else
               v += 1
           end
       end
       return v
       end
f (generic function with 1 method)

julia> f()
3

Why do we need a fresh binding of loop-local variables in each iteration?

If you thought that what Julia did in the topmost example was strange then
consider which of the following examples you find surprising (I use
comprehensions this time).

This is Python:

>>> l = [lambda : i for i in range(4)]
>>> for i in range(4):
...     print(l[i]())
...
3
3
3
3

And this is Julia:

julia> l = [() -> i for i in 1:4];

julia> for i in 1:4
           println(l[i]())
       end
1
2
3
4

Sometimes things can get nasty

And what if we update a variable that is defined local outside the loop?

Here are two examples:

julia> function g1()
       l = []
       j = 0
       for i in 0:3
           push!(l, () -> j)
           j += 1
       end
       return l
       end
g (generic function with 1 method)

julia> l1 = g1();

julia> for i in 1:4
           println(l1[i]())
       end
4
4
4
4

julia> function g2()
       l = []
       j = 0
       for i in 0:3
           push!(l, () -> j)
           j = i + 1
       end
       return l
       end
g (generic function with 1 method)

julia> l2 = g2();

julia> for i in 1:4
           println(l2[i]())
       end
4
4
4
4

So far we see what we would expect. Variable j does not get a new binding
inside the loop as it is defined outside of it, so we have just reproduced the
behavior seen in Python.

However, how would you then explain this:

julia> function g3()
       x = []
       local j
       for i in 0:3
           j = i
           push!(x, () -> j)
       end
       return x
       end
g (generic function with 1 method)

julia> l3 = g3();

julia> for i in 1:4
           println(l3[i]())
       end
0
1
2
3

The reason is that in g1 and g2 Julia is boxing j, while it does not in
g3. Here are the consequences.

Consequence 1: impact on performance

Have a look at this test:

julia> function agg(fun)
       s = 0
       for i in 1:10^6
           s += fun()
       end
       return s
       end
agg (generic function with 1 method)

julia> agg(l1[1])
4000000

julia> @time agg(l1[1])
  0.038266 seconds (999.87 k allocations: 15.257 MiB)
4000000

julia> agg(l3[1])
0

julia> @time agg(l3[1])
  0.000007 seconds
0

And we see that closures created by g1 have a very bad performance, while
g3 gives us super fast closures.

Consequence 2: crazy things you can do

Since Julia is boxing j in the case of g1 and g2 you can do the following:

julia> for i in 1:4
           println(l1[i]())
       end
4
4
4
4

julia> l1[1].j.contents = 100
100

julia> for i in 1:4
           println(l1[i]())
       end
100
100
100
100

Of course I do not recommend doing such things. By this example I just highlight
that indeed j is boxed in this case.

Let us check:

julia> l1[1].j # boxed value
Core.Box(100)

julia> l3[1].j # just an Int
0

How we could have learned about this? You can use @code_warntype to see what
is going on:

julia> @code_warntype g1()
Variables
  #self#::Core.Compiler.Const(g1, false)
  l::Array{Any,1}
  j@_3::Core.Box
  @_4::Union{Nothing, Tuple{Int64,Int64}}
  i::Int64
  #15::var"#15#16"
  j@_7::Union{}

Body::Array{Any,1}
1 ─       (j@_3 = Core.Box())
│         (l = Base.vect())
│         Core.setfield!(j@_3, :contents, 0)
│   %4  = (0:3)::Core.Compiler.Const(0:3, false)
│         (@_4 = Base.iterate(%4))
│   %6  = (@_4::Core.Compiler.Const((0, 0), false) === nothing)::Core.Compiler.Const(false, false)
│   %7  = Base.not_int(%6)::Core.Compiler.Const(true, false)
└──       goto #7 if not %7
2 ┄ %9  = @_4::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│         (i = Core.getfield(%9, 1))
│   %11 = Core.getfield(%9, 2)::Int64
│   %12 = l::Array{Any,1}
│         (#15 = %new(Main.:(var"#15#16"), j@_3))
│   %14 = #15::var"#15#16"
│         Main.push!(%12, %14)
│   %16 = Core.isdefined(j@_3, :contents)::Bool
└──       goto #4 if not %16
3 ─       goto #5
4 ─       Core.NewvarNode(:(j@_7))
└──       j@_7
5 ┄ %21 = Core.getfield(j@_3, :contents)::Any
│   %22 = (%21 + 1)::Any
│         Core.setfield!(j@_3, :contents, %22)
│         (@_4 = Base.iterate(%4, %11))
│   %25 = (@_4 === nothing)::Bool
│   %26 = Base.not_int(%25)::Bool
└──       goto #7 if not %26
6 ─       goto #2
7 ┄       return l

julia> @code_warntype g3()
Variables
  #self#::Core.Compiler.Const(g3, false)
  j::Int64
  x::Array{Any,1}
  @_4::Union{Nothing, Tuple{Int64,Int64}}
  i::Int64
  #19::var"#19#20"{Int64}

Body::Array{Any,1}
1 ─       Core.NewvarNode(:(j))
│         (x = Base.vect())
│   %3  = (0:3)::Core.Compiler.Const(0:3, false)
│         (@_4 = Base.iterate(%3))
│   %5  = (@_4::Core.Compiler.Const((0, 0), false) === nothing)::Core.Compiler.Const(false, false)
│   %6  = Base.not_int(%5)::Core.Compiler.Const(true, false)
└──       goto #4 if not %6
2 ┄ %8  = @_4::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│         (i = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│         (j = i)
│   %12 = x::Array{Any,1}
│   %13 = Main.:(var"#19#20")::Core.Compiler.Const(var"#19#20", false)
│   %14 = Core.typeof(j)::Core.Compiler.Const(Int64, false)
│   %15 = Core.apply_type(%13, %14)::Core.Compiler.Const(var"#19#20"{Int64}, false)
│         (#19 = %new(%15, j))
│   %17 = #19::var"#19#20"{Int64}
│         Main.push!(%12, %17)
│         (@_4 = Base.iterate(%3, %10))
│   %20 = (@_4 === nothing)::Bool
│   %21 = Base.not_int(%20)::Bool
└──       goto #4 if not %21
3 ─       goto #2
4 ┄       return x

Conclusions

The post has started-off easy, but ended with some surprising behavior.
I hope you found it useful to better understand how Julia works and how to
diagnose things.

The major take aways are:

  • basic: Julia creates new bindings for loop-local variables on each iteration;
  • not-basic: if you are creating closures using local variables and need them to
    be fast (and you probably do if you use Julia) always check if Julia compiler
    was able to prove that it does not have to do boxing as it affects both the
    behavior and the performance.

Reducing compilation cost in DataFrames.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/02/12/fun-compile.html

Introduction

DataFrames.jl is a general purpose package. This means that it is maximally
flexible, at the cost of having to handle a very wide variety of use-cases.

This situation leads to quite high complexity of internal code of DataFrames.jl.
In turn, this complexity causes a relatively high time-to-first-result times.
Fortunately Milan Bouchet-Valat in this PR made it much better in
0.22 release due to taking advantage of precompilation, and the hints from Tim
Holy
in this PR and this PR are going to be included in the
1.0 release when we finish changing the core functionality of the package (so
expect even more compiler friendly DataFrames.jl soon).

The things listed above are on DataFrames.jl developer side. However, there is
one simple rule that can help you reduce compilation time on user side and at
the same time make your code more readable (at least this is my preference).

The rule is: avoid using anonymous functions in top-level code as they trigger
compilation each time they are defined (even if the body of the function is not
changed).

Let me jump straight to the examples.

The code was run under Julia 1.5.3 and DataFrames.jl 0.22.5.

Anonymous function compilation latency

Consider the following code:

julia> using DataFrames

julia> df = DataFrame(x=[1,2])
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> @time transform(df, :x => (x -> x) => :x2)
  0.649323 seconds (1.79 M allocations: 94.619 MiB, 5.09% gc time)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

julia> @time transform(df, :x => (x -> x) => :x2)
  0.096049 seconds (182.88 k allocations: 9.707 MiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

julia> @time transform(df, :x => (x -> x) => :x2)
  0.092431 seconds (182.88 k allocations: 9.708 MiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

As you can see compilation gets triggered each time the transform function is
executed (on the first run more compilation has to be done, but in consecutive
runs this cost is still visible). Let us compare it to the following code (I use
a fresh Julia session):

julia> using DataFrames

julia> df = DataFrame(x=[1,2])
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> f(x) = x
f (generic function with 1 method)

julia> @time transform(df, :x => f => :x2)
  0.640971 seconds (1.78 M allocations: 94.119 MiB, 5.26% gc time)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

julia> @time transform(df, :x => f => :x2)
  0.000077 seconds (96 allocations: 5.641 KiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

julia> @time transform(df, :x => f => :x2)
  0.000083 seconds (96 allocations: 5.641 KiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2

As you can see Julia needs to compile the code only once.

I have mentioned that I consider the method with a transformation function
pre-defined as more readable. The reason is that, unless the function is very
simple, the code starts being not very readable when an anonymous function is
passed in-line to e.g. transform. Also having an informative name for a
transformation function usually helps going back to the code after some time and
understanding what it does.

When the anonymous function is not a problem

There are cases when defining a fresh anonymous function is not a problem.

The first, and quite common one, is when we use a type-stable functor
that is passed a predefined function. Here is a basic example (fresh Julia
session again):

julia> using DataFrames

julia> df = DataFrame(x=[1,2])
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> @time transform(df, :x => ByRow(>(1.5)) => :x2)
  0.747826 seconds (2.18 M allocations: 112.703 MiB, 2.84% gc time)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

julia> @time transform(df, :x => ByRow(>(1.5)) => :x2)
  0.000081 seconds (99 allocations: 5.734 KiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

julia> @time transform(df, :x => ByRow(>(1.5)) => :x2)
  0.000083 seconds (99 allocations: 5.734 KiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

In this example both >(1.5) and ByRow are parametric struts. So the expression
ByRow(>(1.5)) has a constant type across a single Julia session.

Of course the function being the argument of the functor must not be anonymous,
as then all is bad again (fresh Julia session):

julia> using DataFrames

julia> df = DataFrame(x=[1,2])
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> @time transform(df, :x => ByRow(x -> x > 1.5) => :x2)
  0.752139 seconds (2.18 M allocations: 112.890 MiB, 4.51% gc time)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

julia> @time transform(df, :x => ByRow(x -> x > 1.5) => :x2)
  0.184907 seconds (538.13 k allocations: 26.211 MiB, 5.16% gc time)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

julia> @time transform(df, :x => ByRow(x -> x > 1.5) => :x2)
  0.169902 seconds (538.12 k allocations: 26.213 MiB)
2×2 DataFrame
 Row │ x      x2
     │ Int64  Bool
─────┼──────────────
   1 │     1  false
   2 │     2   true

The second case when having an anonymous function is not a problem is when we
introduce local scope in the code via a loop (this is a pattern that I see quite
often; fresh Julia session):

julia> using DataFrames

julia> df = DataFrame(x=[1,2])
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> for i in 1:3
           @time transform(df, :x => ByRow(x -> x > 1.5) => :x2)
       end
  0.759848 seconds (2.18 M allocations: 112.888 MiB, 4.29% gc time)
  0.000131 seconds (102 allocations: 5.844 KiB)
  0.000102 seconds (102 allocations: 5.844 KiB)

As you can see this time x -> x > 1.5 is defined only once within the body of
the loop so transform gets compiled only once.

Conclusions

I hope this post will be useful to reduce latency of your DataFrames.jl code.

I would like to highlight that these recommendations are not just DataFrames.jl
specific. They apply to any function that takes a function as one of its
arguments, e.g. map, filter, sum, …

Another point to note is that, as I have mentioned above, DataFrames.jl is a
general purpose package that supports Tables.jl table interface.
If you have a specific use case it might be better to go for a specialized
package that is tailored to your needs. See e.g. a
recent discussion
about the design of the TimeSeries.jl package. Still I hope that users
will find DataFrames.jl a good one-stop shop for majority of their data
wrangling needs.

Column selectors in DataFrames.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/02/06/colsel.html

Introduction

In my last post I have discussed row selector rules for data frames
in DataFrames.jl.

This time I will cover available column selector options.

This post was written under Julia 1.5.3 and DataFrames 0.22.5.

First we set up the environment:

julia> using DataFrames

julia> df = DataFrame(a=1, b=2, x1=3, x2=4, y1=5, y2=6)
1×6 DataFrame
 Row │ a      b      x1     x2     y1     y2
     │ Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────
   1 │     1      2      3      4      5      6

Column selection in indexing syntax

A starting point of column selection syntax is indexing. I will go through
all available options. In later sections of this post I discuss other functions
that support column indexing as they build upon the indexing syntax.

Selecting a single column

You can use either: an integer, a Symbol or a string. Also begin and end
is supported just like for arrays. Here are some examples:

julia> df[:, 1]
1-element Array{Int64,1}:
 1

julia> df[:, :a]
1-element Array{Int64,1}:
 1

julia> df[:, "a"]
1-element Array{Int64,1}:
 1

julia> df[:, begin]
1-element Array{Int64,1}:
 1

julia> df[:, end]
1-element Array{Int64,1}:
 6

Note that if in indexing a single column is selected then it is extracted out
from a data frame as a vector. In all other column selector options described
below you always get a data frame as a result of the operation.

Selecting multiple columns using vectors

You can use the vector selectors: all integers, all Bool, all Symbol,
or all string (mixing styles is not allowed). Note that you can also use begin
and end to define a vector (e.g. in ranges). Here are some examples:

julia> df[:, [1, 2]]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df[:, [trues(2); falses(4)]]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df[:, [:a, :b]]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df[:, ["a", "b"]]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df[:, begin:2]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df[:, 2:end]
1×5 DataFrame
 Row │ b      x1     x2     y1     y2
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     2      3      4      5      6

Special selectors

There are the following special selectors available:

  • : and All() allowing you to select all columns:
julia> df[:, :]
1×6 DataFrame
 Row │ a      b      x1     x2     y1     y2
     │ Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────
   1 │     1      2      3      4      5      6

julia> df[:, All()]
1×6 DataFrame
 Row │ a      b      x1     x2     y1     y2
     │ Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────
   1 │     1      2      3      4      5      6
  • a regular expression picking columns whose name matches it:
julia> df[:, r"1"]
1×2 DataFrame
 Row │ x1     y1
     │ Int64  Int64
─────┼──────────────
   1 │     3      5

julia> df[:, r"x"]
1×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64
─────┼──────────────
   1 │     3      4
  • Between selector allowing you to specify a range of columns (you can specify
    the start and stop column using any of the single column selector syntaxes):
julia> df[:, Between(3, :y1)]
1×3 DataFrame
 Row │ x1     x2     y1
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     3      4      5

julia> df[:, Between(begin, "x1")]
1×3 DataFrame
 Row │ a      b      x1
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
  • Not selector allowing you to specify the columns you want to exclude from
    the resulting data frames. You can put any valid other column selector
    inside Not:
julia> df[:, Not("a")]
1×5 DataFrame
 Row │ b      x1     x2     y1     y2
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     2      3      4      5      6

julia> df[:, Not(r"x")]
1×4 DataFrame
 Row │ a      b      y1     y2
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      5      6

julia> df[:, Not(Between(:x1, end))]
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
  • Cols selector picking a union of other selectors passed as its arguments:
julia> df[:, Cols()]
0×0 DataFrame

julia> df[:, Cols(:x1, 1)]
1×2 DataFrame
 Row │ x1     a
     │ Int64  Int64
─────┼──────────────
   1 │     3      1

julia> df[:, Cols(r"1", r"x")]
1×3 DataFrame
 Row │ x1     y1     x2
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     3      5      4

Note that the order of columns in the produced data frame reflects the order
in which ther are selected by consecutive selectors passed inside Cols.

Selecting just column names

Sometimes one only wants to pick names of columns meeting some condition
without actually creating a new data frame. This can be achieved using the
names function.

If you just write names(df) you get a list of all column names of a data frame
as a vector of strings:

julia> names(df)
6-element Array{String,1}:
 "a"
 "b"
 "x1"
 "x2"
 "y1"
 "y2"

You can also pass any column selector expression we described above as a second
argument to names to get a subset of the column names, e.g.:

julia> names(df, r"x")
2-element Array{String,1}:
 "x1"
 "x2"

julia> names(df, Not(Between("b", "x2")))
3-element Array{String,1}:
 "a"
 "y1"
 "y2"

Additionally you can pass a type as a second argument to the names function,
and in this case you will get a vector of column names whose element type is
a subtype of passed argument. Here is an example:

julia> df2 = DataFrame([[1, 2], [1.0, 2.0], ["1", "2"], [1, missing]], :auto)
2×4 DataFrame
 Row │ x1     x2       x3      x4
     │ Int64  Float64  String  Int64?
─────┼─────────────────────────────────
   1 │     1      1.0  1             1
   2 │     2      2.0  2       missing

julia> names(df2, Int)
1-element Array{String,1}:
 "x1"

julia> names(df2, Union{Int, Missing})
2-element Array{String,1}:
 "x1"
 "x4"

julia> names(df2, Real)
2-element Array{String,1}:
 "x1"
 "x2"

There is also an upcoming feature that will be soon available in DataFrames.jl 1.0
release. You are going to be able to pass as a second argument to the names
function a predicate taking the column name as a string and returning true for:
columns that should be kept. Here are some examples (note that this code snippet
will only work if you ceck out main branch of DataFrames.jl from its GitHub
repository):

julia> df3 = DataFrame(x1=1, x2=2, y=3)
1×3 DataFrame
 Row │ x1     x2     y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> names(df3, startswith('x'))
2-element Array{String,1}:
 "x1"
 "x2"

julia> names(df3, x -> length(x) == 1)
1-element Array{String,1}:
 "y"

The names function is quite often used if you want to perform multiple
transformations of columns using select, transform or combine, e.g.:

julia> select(df, r"x", names(df, r"x") .=> ByRow(sqrt))
1×4 DataFrame
 Row │ x1     x2     x1_sqrt  x2_sqrt
     │ Int64  Int64  Float64  Float64
─────┼────────────────────────────────
   1 │     3      4  1.73205      2.0

Other functions that support passing column selectors

Here is a list of functions that accept the column selectors we discussed in the
indexing section above:

  • describe takes cols keyword argument allowing you to choose for which
    columns descriptive statistics should be computed;
  • combine, select, select!, transform, transform!, allowing you to
    just keep the passed columns;
  • groupby that takes a second positional argument a column selector;
  • flatten taking a column selector as a second positional argument giving
    informations which columns should be flattened;
  • stack allowing you to specify measure variables and id variables using
    the column selectors;
  • unstack where you can pass columns that specify row keys for unstacking;
  • issorted, sortperm, sort and sort! where the second positional argument
    is a column selector (they also allow some additional syntax to specify how
    sorting should be performed but this is a separate topic);
  • unique, unique!, and nonunique where second positional argument is a
    column selector signaling which column should be checked for uniqueness;
  • allowmissing, allowmissing!, disallowmissing, disallowmissing!,
    completecases, dropmissing, dropmissing! which all take as a second
    argument a column selector indicating which columns should be taken into
    account in transformation performed.

Conclusions

As you can see in this post DataFrames.jl provides a very flexible system of
allowed column selectors. Also, good understanding column selector semantics
will boost your efficiency when working with DataFrames.jl as we consistently
support the same patterns across whole package.