Tag Archives: julialang

My approach to named tuples in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/06/30/namedtuple.html

Introduction

Named tuples are one of the most commonly used data types in Julia.
In this post I want to discuss how I think of this type and use it in practice.

This post was written under Julia 1.9.1.

The basics

The Julia Manual in the section Named Tuples specifies:

The components of tuples can optionally be named, in which case a named tuple is constructed (…)
Named tuples are very similar to tuples, except that fields can additionally
be accessed by name using dot syntax (x.a) in addition to the regular indexing syntax (x[1]).

And gives the following example:

julia> x = (a=2, b=1+2)
(a = 2, b = 3)

julia> x[1]
2

julia> x.a
2

This definition highlights that NamedTuple is similar to a Tuple except that its
components can have names.

However, I find it more natural to think of a NamedTuple as an anonymous struct,
that optionally can be integer-indexed to get access to its components.

Let me give a few reasons why I think it is a better mental model.

Subtyping behavior

Let us have a look at the Tuple type first. Construct a basic tuple:

julia> t = (1, "a")
(1, "a")

julia> typeof(t)
Tuple{Int64, String}

Note that type of t is a subtype of e.g. Tuple{Integer, AbstractString}:

julia> t isa Tuple{Integer, AbstractString}
true

This subtyping behavior is called covariant.

Moreover Tuple{Integer, AbstractString} is not concrete:

julia> isconcretetype(Tuple{Integer, AbstractString})
false

This means that you cannot create a tuple that has this type.
If you try the following call:

julia> t2 = Tuple{Integer, AbstractString}(t)
(1, "a")

julia> typeof(t2)
Tuple{Int64, String}

Nothing changes. The resulting t2 tuple has concrete parameters Int64 and String.

You might ask how much of these features a NamedTuple shares? The answer is none.
Let us check:

julia> nt = (a=1, b="a")
(a = 1, b = "a")

julia> typeof(nt)
NamedTuple{(:a, :b), Tuple{Int64, String}}

Type of nt is not a subtype of e.g. NamedTuple{(:a, :b), Tuple{Integer, AbstractString}}:

julia> nt isa NamedTuple{(:a, :b), Tuple{Integer, AbstractString}}
false

However, it is a subtype of UnionAll type NamedTuple{(:a, :b), <:Tuple{Integer, AbstractString}}.

julia> nt isa NamedTuple{(:a, :b), <:Tuple{Integer, AbstractString}}
true

This subtyping behavior is called invariant and is shared with struct types.

Moreover, you can construct a value that has NamedTuple{(:a, :b), Tuple{Integer, AbstractString}} type (so it is concrete):

julia> isconcretetype(NamedTuple{(:a, :b), Tuple{Integer, AbstractString}})
true

julia> nt2 = NamedTuple{(:a, :b), Tuple{Integer, AbstractString}}((1, "a"))
NamedTuple{(:a, :b), Tuple{Integer, AbstractString}}((1, "a"))

julia> typeof(nt2)
NamedTuple{(:a, :b), Tuple{Integer, AbstractString}}

Again, this behavior is the same as for struct types.

The fact that you can flexibly set parameters of a named tuple is mostly useful
if you have heterogeneous data. Here is an example.

Permissive code:

julia> [(a=1, b=missing), (a=missing, b="x")]
2-element Vector{NamedTuple{(:a, :b)}}:
 (a = 1, b = missing)
 (a = missing, b = "x")

julia> typeof.([(a=1, b=missing), (a=missing, b="x")])
2-element Vector{DataType}:
 NamedTuple{(:a, :b), Tuple{Int64, Missing}}
 NamedTuple{(:a, :b), Tuple{Missing, String}}

Here we created a vector with elements having different types.
The code using it would not be type stable. However, we could make the types the same
(which would be more compiler friendly; this would be relevant if you processed really large data):

julia> @NamedTuple{a::Union{Int,Missing}, b::Union{String,Missing}}.([(a=1, b=missing), (a=missing, b="x")])
2-element Vector{NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}}:
 NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}((1, missing))
 NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}((missing, "x"))

The difference is that now the compiler will know the type of every field of our named tuple
(and Julia gracefully handles small Unions).

Note that I used the @NamedTuple{a::Union{Int,Missing}, b::Union{String,Missing}} macro call
for a convenient way to specify NamedTuple type. Let us check it (with one more way to write it):

julia> @NamedTuple{a::Union{Int,Missing}, b::Union{String,Missing}}
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}

julia> @NamedTuple begin
           a::Union{Int,Missing}
           b::Union{String,Missing}
       end
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}

Now, even on the syntax level, the similarity to struct definition is apparent.

Let us define two types:

julia> const S1 = @NamedTuple begin
           a::Union{Int,Missing}
           b::Union{String,Missing}
       end
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}

julia> struct S2
           a::Union{Int,Missing}
           b::Union{String,Missing}
       end

The S1 and S2 types are almost undistinguishable. The only differences are:

  • S1 is a subtype of NamedTuple UnionAll, while S2 is just a subtype of Any;
  • S1 accepts a tuple as a single argument, while S2 accepts two arguments;
  • S1 is indexable, while S2 is not;
  • adding methods to foreign functions for S1 is type piracy.

How named tuples are different from structs

Let us create instances of S1 and S2 and discuss the differences mentioned above.

First the difference in the constructor:

julia> x1 = S1((1, "a"))
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}((1, "a"))

julia> x2 = S2(1, "a")
S2(1, "a")

We can see the difference in the constructor. An extra pair of parentheses is needed for S1.
It would be tempting to define:

julia> S1(x, y) = S1((x, y))
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}

julia> S1(1, "a")
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}}((1, "a"))

This would work. But if you are writing library code you should never do this.
The reason is that you have just added a method for
NamedTuple{(:a, :b), Tuple{Union{Missing, Int64}, Union{Missing, String}}} type
and this type is part of Base Julia. Such behavior is called type piracy
and potentially could break some third-party code (in this case the risk is minimal,
but it is worth haveing this habit).

The difference is that with your custom type S2 you are free to add methods to
any functions you like (even third party), since you own S2. Therefore
(provided you make a correct method definition) you will not break any code by doing so.

Let us give an example. We have discussed the benefit of NamedTuple over
struct is that it is indexable:

julia> x1[1]
1

julia> x2[1]
ERROR: MethodError: no method matching getindex(::S2, ::Int64)

However, you can easily fix this:

julia> Base.getindex(x::S2, i::Int) = getfield(x, i)

julia> x2[1]
1

What we did by adding a method to Base.getindex is not type piracy since
S2 is a type that we own.

The last difference between S1 is S2 is that, as we mentioned,
S1 is subtype of NamedTuple, so it will get accepted by functions
that work with named tuples, as opposed to S2. Here is a small example:

julia> empty(x1)
NamedTuple()

julia> empty(x2)
ERROR: MethodError: no method matching empty(::S2)

Again – we could easily add a method for Base.empty for S2 if we wanted, just as we did
with Base.getindex above.

Conclusions

Given these examples named tuple shares some features with tuples, and some with structs.
My experience is that it is more similar to a struct in practice. There are two reasons:

  1. I rarely index or iterate named tuples, most often I access their fields by name.
  2. In dispatch named tuple does behave like a struct (this is quite relevant when writing your code).

Still I like to think of named tuple as an anonymous struct. Although we could give it an alias
like we did with S1 definition, this is rarely done. Typically the concrete type of a named tuple
is left dynamic, and I just define my functions for NamedTuple UnionAll.

Happy hacking!

The return of the graphs (and an interesting puzzle)

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/06/23/graphspuzzle.html

Introduction

This week I come back to graphs. The reason is that I have participated
in an inspiring Social Networks & Complex Systems Workshop.

During this workshop Paweł Prałat asked the following puzzle:

You are given eight batteries, out of which four are good and four
are depleted, but visually they do not differ.
You have a flashlight that needs two good batteries to work.
You want your flashlight to work. What is the minimum number of times
you need to put two batteries into your flashlight to be sure it works
in the worst case.

In this post I want to discuss how it can be solved.
We will start with an analytical solution and then give a computational
one. You can judge for yourself which is harder.

The post was written under Julia 1.9.1,
Combinatorics v1.0.2, DataFrames v1.5.0,
SimpleGraphs v0.8.4, and SimpleGraphAlgorithms v0.4.21.

Analytical solution

Before we begin we need to make one observation. If I say that I want to use
a given number of trials in the worst case I can provide them upfront (before
making any tests). The reason is that if I put some batteries into a
flashlight it will work (and then we are done) or not work (and we need to continue).
So the situation is simpler than in many other puzzles when you might want
adjust your strategy conditional on the answer you see to your previous queries.

What we will want to show is that we need 7 trials. Start with showing
that 7 tests is enough. To see this notice that we have 4 good batteries.
Therefore if we split our 8 batteries into three groups there will be one
group that has two batteries (by the pigeonhole principle).
What we need to do is to show that using 7
comparisons we can find a pair of good batteries within one of these
three groups.

This is the way how you can do it.
Assume we number the batteries from 1 to 8. Split them into three groups:
{1, 2, 3}, {4, 5, 6}, and {7, 8}. Now we assume that we make all
possible comparisons within groups. We can see that in the first two groups
there are three comparisons possible, and in the last group only one. Seven
in total. This finishes the proof that 7 comparisons are enough.
We can visualize this solution as follows (line represents the comparison
we make):

  1      4    7
 / \    / \   |
2---3  5---6  8

We are left with showing that 6 comparisons are not enough. To see
this note the following. In the plot above we have a graph on 8 nodes
and having 7 edges. We could claim that we have a solution because in
any four element subset of its nodes there existed at least two connected
by an edge (within one group).

So to show that 6 comparisons are not enough we must show that no matter
how we make them there will always be 4 nodes that are mutually not
connected. We will prove that by contradiction. Assume we have some assignment
of edges under which maximally three nodes are not connected. Without loss
of generality take that their numbers are 1, 2, and 3. But this means that each
of the nodes 4, 5, 6, 7, and 8 must be connected to one of the 1, 2, or 3 nodes
by at least one edge. So we must use-up 5 edges to make these connections.
We are left with one edge (recall we assume that we can use 6 edges in total).
If this one last edge is connected to node 1, 2, or 3. Then nodes 4, 5, 6, 7, and 8
are not connected directly and we just found a 5-element set that is not connected by an edge.
So assume that this edge connects one pair of nodes from the set {4, 5, 6, 7, 8}.
However, since we only have one edge we still will have 4 nodes that are not connected.
E.g. if 4 and 5 are connected then the set of nodes {4, 6, 7, 8} are not connected,
so we have a 4 element set of nodes that are not connected. A contradiction to the assumption
that there were maximally three nodes that are not connected. In conclusion – 6 comparisons
are not enough.

(If you would like to see an alternative proof that uses the probabilistic method
you can reach out to Paweł Prałat who has shown it to me.)

Computational solution

Now let us move to the brute force and computation (and in the process hopefully learn some Julia tricks).

First load the required packages and do some setup:

julia> using Combinatorics

julia> using DataFrames

julia> using SimpleGraphs

julia> using SimpleGraphAlgorithms

julia> use_Cbc()
[ Info: Solver Cbc verbose is set to false

As we saw in the analytical solution we can represent our queries as graphs on 8 nodes
and some edges.

The problem is that there are potentially many such graphs. Therefore we will want to limit
our search to the graphs that are not isomorphic. Two graphs are isomorphic if you can
get one from the other by re-labelling the nodes. Clearly such two graphs are undistinguishable
for our purposes.

What we want to show is that all graphs on 8 nodes and 6 edges contain at least 4 nodes that
are not connected by an edge. And that there exist graphs on 8 nodes and 7 edges in which
the maximum number of unconnected nodes is 3.

So how do we create a list of all non-isomorphic graphs having 6 and 7 edges respectively?

Let us start with a simpler case of graphs on 8 nodes and 3 edges and list non-isomorphic graphs:

julia> function create_graph(es)
           g = IntGraph(8)
           for e in es
               add!(g, e...)
           end
           return g
       end
create_graph (generic function with 1 method)

julia> g3 = map(create_graph, combinations([(i, j) for i in 1:7 for j in i+1:8], 3));

julia> hg3 = uhash.(g3);

julia> g3df = DataFrame(hg=hg3, g=g3);

julia> g3gdf = groupby(g3df, :hg);

julia> redirect_stdout(devnull) do
           for sdf in g3gdf
               for i in 2:nrow(sdf)
                   @assert is_iso(sdf.g[1], sdf.g[i])
               end
           end
       end

julia> noniso3 = combine(g3gdf, first).g;

julia> elist.(noniso3)
5-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 2), (1, 3), (1, 4)]
 [(1, 2), (1, 3), (2, 3)]
 [(1, 2), (1, 3), (2, 4)]
 [(1, 2), (1, 3), (4, 5)]
 [(1, 2), (3, 4), (5, 6)]

Let us explain what we do step by step. The g3 object contains all
graphs on three edges. Let us check how many of them we have:

julia> length(g3)
3276

There are a lot of graphs. But most of them are isomorphic. How do we prune them?
Using the uhash function we compute the hash of each graph.
uhash guarantees us that graphs having a different hash are not isomorphic.
The g3gdf is a GroupedDataFrame that keeps these graphs grouped by their
hash value. We have 5 such groups as can be checked in:

julia> length(g3gdf)
5

However, maybe this is the case that we have non-isomorphic graphs that
have the same hash value (this is unlikely but possible). We check it with
the is_iso function. If they would not be isomorphic @assert would error.
Since it does not we are good. Note that I use the redirect_stdout(devnull)
trick to avoid printing any output that is_iso produces. The reason
is that it calls CBC solver which prints to the screen its status (and since
we do over 3000 calls the screen would be flooded by not very useful output).

With the elist.(noniso3) we can see what are the edges of the five non-isomorphic
graphs that exist on 3 edges.
(And since we have only 5 graphs you can probably convince yourself using pen
and paper that we have found all possible options.)

How do we do this process for a larger number of edges.
The same approach would work, but would be much more time consuming (there are
over 1 million graphs on 7 edges). So we use the following trick, we take the
non-isomorphic graphs with 3 edges and add one edge to them. We now get graphs
on 4 edges. Some of them are isomorphic. But we already know how to prune them
to be left only with non-isomorphic ones.

The procedure that iteratively performs this task up to 7 edges is as follows:

julia> function add_possible_edges(g::T) where T
           res = T[]
           for i in 1:7, j in i+1:8
               if !has(g, i, j)
                   newg = deepcopy(g)
                   add!(newg, i, j)
                   @assert NE(newg) == NE(g) + 1
                   push!(res, newg)
               end
                   end
           return res
       end
add_possible_edges (generic function with 1 method)

julia> function grow_graphs(noniso)
           g = reduce(vcat, add_possible_edges.(noniso))
           hg = uhash.(g)
           gdf = DataFrame(; hg, g)
           ggdf = groupby(gdf, :hg)
           redirect_stdout(devnull) do
               for sdf in ggdf
                   for i in 2:nrow(sdf)
                       @assert is_iso(sdf.g[1], sdf.g[i])
                   end
               end
           end
           return combine(ggdf, first).g
       end
grow_graphs (generic function with 1 method)

julia> noniso4 = grow_graphs(noniso3)
11-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=4)
 ⋮
 UndirectedGraph{Int64} (n=8, m=4)

julia> noniso5 = grow_graphs(noniso4)
24-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=5)
 ⋮
 UndirectedGraph{Int64} (n=8, m=5)

julia> noniso6 = grow_graphs(noniso5)
56-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=6)
 ⋮
 UndirectedGraph{Int64} (n=8, m=6)

julia> noniso7 = grow_graphs(noniso6)
115-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=7)
 ⋮
 UndirectedGraph{Int64} (n=8, m=7)

In the process we learn that there are 11 non-isomorphic graphs having 4 edges,
and respectively 24 for 5 edges, 56 for 6 edges and 115 for 7 edges.

Now for each of these graphs let us find a maximum number of nodes that are
not connected. This can be done using the max_indep_set function.
Again we use the devnull trick to avoid printing of the output:

julia> mis6 = redirect_stdout(devnull) do
           return max_indep_set.(noniso6)
       end
56-element Vector{Set{Int64}}:
 Set([5, 4, 6, 7, 2, 8, 3])
 ⋮
 Set([4, 7, 8, 3])

julia> minimum(length.(mis6))
4

So we first see that for graphs on 6 edges we indeed have at least 4 nodes in the
independent set.

Let us now check the 7 node case:

julia> mis7 = redirect_stdout(devnull) do
           return max_indep_set.(noniso7)
       end
115-element Vector{Set{Int64}}:
 Set([5, 4, 6, 7, 2, 8, 3])
 ⋮
 Set([7, 2, 8, 3])

julia> minimum(length.(mis7))
3

julia> elist.(noniso7[length.(mis7) .== 3])
1-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 2), (1, 3), (2, 3), (4, 5), (4, 6), (5, 6), (7, 8)]

Here we see that there exists only one graph (up to isomorphism)
that has a property that at most three nodes are independent.
And looking at its edges it is the same graph that we have drawn
in our analytical solution.

Conclusions

So is the analytical or computational solution more interesting?
For me both have their value and were fun.

If you like such puzzles, and do plan ahead, please consider joining
us next year. From June 3 to 7, 2024 we are going to host
the WAW2024: 19th Workshop on Modelling and Mining Networks
at SGH Warsaw School of Economics. We invite all enthusiasts of graphs:
both theoreticians and practitioners.

Homographs in DataFrames.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/06/16/homographs.html

Introduction

Last week I posted about graphs, so I thought to post about homographs today.

From your English lessons you probably remember that homographs are words that share
the same written form but have a different meaning.

Starting with Julia 1.9 release we have a homograph in DataFrames.jl. This is the
stack function and I will cover it today.

The post was written under Julia 1.9.1 and DataFrames.jl 1.5.0.

Homographs and multiple dispatch

Julia supports multiple dispatch. This means that you can define specialized methods
for the same function depending on the types of its arguments.

For example if you write 1 + 2 and 1.0 + 2.0 internally different methods of the
+ are invoked, one working with integers, and the other working with floats.

However, there is one important rule that should be followed. If you add methods
to some function they should perform conceptually similar operations. For example,
1 + 2 produces 3 and 1.0 + 2.0 produces 3.0. In both cases an addition was done.

The reason for this rule is that otherwise when you see code like f(x) you would not
know what the function f does until you know the type of x.

Unfortunately, sometimes such situations happen. Let me give you a story of the stack
function. It has been defined in DataFrames.jl for many years and is used to perform
wide to long conversion of data frames. Recently Julia maintainers decided that
it makes sense to add the stack function to Base module and make it combine
a collection of arrays into one larger array.

In such a situation, as DataFrames.jl maintainers we had two options:

  1. keep Base.stack and DataFrames.stack functions separate;
  2. make stack from DataFrames.jl add a method to Base.stack.

In general the first option is preferable. Base.stack and DataFrames.stack
do different things so they should be separate functions. However, there is one
problem with this approach. All legacy code that used stack from DataFrames.jl
would stop working and users would need to write DataFrames.stack instead.
This is something we did not want so we decided to go for option 2, that is,
add a method for Base.stack that would handle data frames. The reason why we
decided for this is that there is very low risk of confusion since stack from
DataFrames.jl always requires an AbstractDataFrame as its first argument.
You can see it here:

julia> using DataFrames

julia> methods(stack)
# 6 methods for generic function "stack" from Base:
 [1] stack(df::AbstractDataFrame)
     @ DataFrames ~\.julia\packages\DataFrames\LteEl\src\abstractdataframe\reshape.jl:136
 [2] stack(df::AbstractDataFrame, measure_vars)
     @ DataFrames ~\.julia\packages\DataFrames\LteEl\src\abstractdataframe\reshape.jl:136
 [3] stack(df::AbstractDataFrame, measure_vars, id_vars; variable_name, value_name, view, variable_eltype)
     @ DataFrames ~\.julia\packages\DataFrames\LteEl\src\abstractdataframe\reshape.jl:136
 [4] stack(iter; dims)
     @ abstractarray.jl:2743
 [5] stack(f, iter; dims)
     @ abstractarray.jl:2772
 [6] stack(f, xs, yzs...; dims)
     @ abstractarray.jl:2773

At the same time standard Base.stack does not work with data frames at all.

OK, enough theory. Let us have a look at stack from Base and from DataFrames.jl
in action.

Combining collections of arrays

I will concentrate here on the simplest (and most often needed scenarios).
Assume you have a vector of vectors of equal length:

julia> x = [1:2, 3:4, 5:6]
3-element Vector{UnitRange{Int64}}:
 1:2
 3:4
 5:6

We could want to turn it into a matrix. There are two ways you could want to do it.
The first is to put these vectors as rows in the produced matrix, like this:

julia> stack(x)
2×3 Matrix{Int64}:
 1  3  5
 2  4  6

The second is to put them as columns (this is often needed and in the past I always
used permutedims to get this, which was a bit cumbersome):

julia> stack(x, dims=1)
3×2 Matrix{Int64}:
 1  2
 3  4
 5  6

The third commonly used pattern is when you want to apply a function to a vector of
values and this function returns another vector or e.g. a tuple. Let us have a look
at an example:

julia> using Statistics

julia> f(x) = [sum(x), mean(x)]
f (generic function with 1 method)

julia> f.(x)
3-element Vector{Vector{Float64}}:
 [3.0, 1.5]
 [7.0, 3.5]
 [11.0, 5.5]

This is the traditional, broadcasting way, to apply the function to such a vector.
However, often you want the result in a flat matrix. Now you can do:

julia> stack(f.(x))
2×3 Matrix{Float64}:
 3.0  7.0  11.0
 1.5  3.5   5.5

which can be done even simpler by just writing:

julia> stack(f, x)
2×3 Matrix{Float64}:
 3.0  7.0  11.0
 1.5  3.5   5.5

In summary, Base.stack is a super nice little utility that comes handy very often
if you work with arrays.

Transforming data from wide to long format

In DataFrames.jl the stack transforms data from wide to long format.
Assume you have the following input data frame:

julia> df = DataFrame(year=[2020, 2021], Spring=1:2, Summer=3:4, Autumn=5:6, Winter=7:8)
2×5 DataFrame
 Row │ year   Spring  Summer  Autumn  Winter
     │ Int64  Int64   Int64   Int64   Int64
─────┼───────────────────────────────────────
   1 │  2020       1       3       5       7
   2 │  2021       2       4       6       8

It is in wide format. For each year you have four columns for each season holding some values.
Assume we want instead a narrow data frame with year-season combinations and one column with values.
With DataFrames.stack it is enough to pass a data frame and specify which columns hold the values:

julia> stack(df, Not(:year))
8×3 DataFrame
 Row │ year   variable  value
     │ Int64  String    Int64
─────┼────────────────────────
   1 │  2020  Spring        1
   2 │  2021  Spring        2
   3 │  2020  Summer        3
   4 │  2021  Summer        4
   5 │  2020  Autumn        5
   6 │  2021  Autumn        6
   7 │  2020  Winter        7
   8 │  2021  Winter        8

Or, if you want to be more fancy you can e.g. change the generated column names:

julia> stack(df, Not(:year), variable_name="season", value_name="number")
8×3 DataFrame
 Row │ year   season  number
     │ Int64  String  Int64
─────┼───────────────────────
   1 │  2020  Spring       1
   2 │  2021  Spring       2
   3 │  2020  Summer       3
   4 │  2021  Summer       4
   5 │  2020  Autumn       5
   6 │  2021  Autumn       6
   7 │  2020  Winter       7
   8 │  2021  Winter       8

Conclusions

In this post I have given the basic examples of Base.stack and DataFrames.stack usage.
I recommend you to have a look at their documentation for more complete information.
However, the point is that both functions are quite useful in daily data wrangling so it is
useful to know them.

Additionally, I wanted to highlight some general considerations of package design in Julia
and the challenges that maintainers face. In the specific example of stack we decided to break
the all methods of a function should do a similar thing rule in favor of user convenience and making
sure that we keep the legacy DataFrames.jl code working.