Tag Archives: julialang

Pivot tables in DataFrames.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/05/19/pivot.html

Introduction

Creation of pivot tables is a common operation in exploratory data analysis.
Today I want to show you one example how this can be done in DataFrames.jl
that was prompted by a recent discussion on Julia Slack.

The post was written under Julia 1.9.0, Chain 0.5.0, DataFrames.jl 1.5.0, and RDatasets 0.7.7.

The data

We will work with the classical diamonds dataset. Let us load it first:

julia> using DataFrames

julia> using Chain

julia> using RDatasets

julia> diamonds = RDatasets.dataset("ggplot2", "diamonds")
53940×10 DataFrame
   Row │ Carat    Cut        Color  Clarity  Depth    Table    Price  X        Y        Z
       │ Float64  Cat…       Cat…   Cat…     Float64  Float64  Int32  Float64  Float64  Float64
───────┼────────────────────────────────────────────────────────────────────────────────────────
     1 │    0.23  Ideal      E      SI2         61.5     55.0    326     3.95     3.98     2.43
     2 │    0.21  Premium    E      SI1         59.8     61.0    326     3.89     3.84     2.31
     3 │    0.23  Good       E      VS1         56.9     65.0    327     4.05     4.07     2.31
     4 │    0.29  Premium    I      VS2         62.4     58.0    334     4.2      4.23     2.63
     5 │    0.31  Good       J      SI2         63.3     58.0    335     4.34     4.35     2.75
     6 │    0.24  Very Good  J      VVS2        62.8     57.0    336     3.94     3.96     2.48
     7 │    0.24  Very Good  I      VVS1        62.3     57.0    336     3.95     3.98     2.47
     8 │    0.26  Very Good  H      SI1         61.9     55.0    337     4.07     4.11     2.53
     9 │    0.22  Fair       E      VS2         65.1     61.0    337     3.87     3.78     2.49
    10 │    0.23  Very Good  H      VS1         59.4     61.0    338     4.0      4.05     2.39
    11 │    0.3   Good       J      SI1         64.0     55.0    339     4.25     4.28     2.73
    12 │    0.23  Ideal      J      VS1         62.8     56.0    340     3.93     3.9      2.46
    13 │    0.22  Premium    F      SI1         60.4     61.0    342     3.88     3.84     2.33
    14 │    0.31  Ideal      J      SI2         62.2     54.0    344     4.35     4.37     2.71
   ⋮   │    ⋮         ⋮        ⋮       ⋮        ⋮        ⋮       ⋮       ⋮        ⋮        ⋮
 53928 │    0.79  Good       F      SI1         58.1     59.0   2756     6.06     6.13     3.54
 53929 │    0.79  Premium    E      SI2         61.4     58.0   2756     6.03     5.96     3.68
 53930 │    0.71  Ideal      G      VS1         61.4     56.0   2756     5.76     5.73     3.53
 53931 │    0.71  Premium    E      SI1         60.5     55.0   2756     5.79     5.74     3.49
 53932 │    0.71  Premium    F      SI1         59.8     62.0   2756     5.74     5.73     3.43
 53933 │    0.7   Very Good  E      VS2         60.5     59.0   2757     5.71     5.76     3.47
 53934 │    0.7   Very Good  E      VS2         61.2     59.0   2757     5.69     5.72     3.49
 53935 │    0.72  Premium    D      SI1         62.7     59.0   2757     5.69     5.73     3.58
 53936 │    0.72  Ideal      D      SI1         60.8     57.0   2757     5.75     5.76     3.5
 53937 │    0.72  Good       D      SI1         63.1     55.0   2757     5.69     5.75     3.61
 53938 │    0.7   Very Good  D      SI1         62.8     60.0   2757     5.66     5.68     3.56
 53939 │    0.86  Premium    H      SI2         61.0     58.0   2757     6.15     6.12     3.74
 53940 │    0.75  Ideal      D      SI2         62.2     55.0   2757     5.83     5.87     3.64
                                                                              53913 rows omitted

The task we want to do is to analyze the distribution of :Cut column by :Color.

Note that these columns are Categorical (as indicated by the Cat… information above).
This allows us to check levels of :Cut and :Color to verify their ordering.

julia> levels(diamonds.Cut)
5-element Vector{String}:
 "Fair"
 "Good"
 "Very Good"
 "Premium"
 "Ideal"

julia> levels(diamonds.Color)
7-element Vector{String}:
 "D"
 "E"
 "F"
 "G"
 "H"
 "I"
 "J"

Now we are ready to start analyzing the data.

Simple pivot table

A simple pivot table would be to calculate number of observations of each
:Cut, and :Color combination. You can do it as follows:

julia> unstack(diamonds, :Cut, :Color, :Cut, combine=length)
5×8 DataFrame
 Row │ Cut        E       I       J       H       F       G       D
     │ Cat…       Int64?  Int64?  Int64?  Int64?  Int64?  Int64?  Int64?
─────┼───────────────────────────────────────────────────────────────────
   1 │ Ideal        3903    2093     896    3115    3826    4884    2834
   2 │ Premium      2337    1428     808    2360    2331    2924    1603
   3 │ Good          933     522     307     702     909     871     662
   4 │ Very Good    2400    1204     678    1824    2164    2299    1513
   5 │ Fair          224     175     119     303     312     314     163

We see that we put the second positional argument of unstack (:Cut in our case) as rows,
and the third (:Color) as columns. The fourth positional argument is what we put in the cells
of the pivot table. Since we want to get the number of observations (combine=length) then
it does not matter which column we pass so I used :Cut.

Fixing the order

The table looks nice, but there is one problem with it. The rows and columns are not ordered nicely.
The reason is that currently unstack in DataFrames.jl orders them by the order of their appearance
in the source data frame.

We can fix it by sorting. The order of columns is set by pre-sorting the source data frame,
and the order of rows is set by post-sorting of the data frame returned by unstack.
Note that I start using @chain macro from Chain.jl for clarity of the code:

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
       end
5×8 DataFrame
 Row │ Cut        D       E       F       G       H       I       J
     │ Cat…       Int64?  Int64?  Int64?  Int64?  Int64?  Int64?  Int64?
─────┼───────────────────────────────────────────────────────────────────
   1 │ Fair          163     224     312     314     303     175     119
   2 │ Good          662     933     909     871     702     522     307
   3 │ Very Good    1513    2400    2164    2299    1824    1204     678
   4 │ Premium      1603    2337    2331    2924    2360    1428     808
   5 │ Ideal        2834    3903    3826    4884    3115    2093     896

Now the table is nicely ordered. Notice that both sort and sort! functions are aware of
categorical nature of data and properly sort it.

We almost have what we wanted. The problem is that seeing counts does not allow us to easily
assess the distributions by diamond color. This can be easily added by transforming the columns
of our data frame.

Getting proportions

Let us turn the data from counts to proportions. We can do it using the transform! function:

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
           transform(Not(:Cut) .=> x -> x / sum(x), renamecols=false)
       end
5×8 DataFrame
 Row │ Cut        D          E          F          G          H          I          J
     │ Cat…       Float64    Float64    Float64    Float64    Float64    Float64    Float64
─────┼────────────────────────────────────────────────────────────────────────────────────────
   1 │ Fair       0.024059   0.0228641  0.0326975  0.0278073  0.0364884  0.0322759  0.0423789
   2 │ Good       0.0977122  0.0952332  0.095263   0.0771343  0.0845376  0.0962744  0.10933
   3 │ Very Good  0.223321   0.244973   0.226787   0.203595   0.219653   0.222058   0.241453
   4 │ Premium    0.236605   0.238542   0.244288   0.258944   0.2842     0.263371   0.287749
   5 │ Ideal      0.418303   0.398387   0.400964   0.432519   0.37512    0.38602    0.319088

Indeed J diamonds seem to have worst :Cut, and the best are G diamonds.

Digging deeper into the data

To formally assess the order of columns by :Cut quality let us turn the data from distribution
to a cumulative distribution first:

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
           transform!(Not(:Cut) .=> x -> cumsum(x / sum(x)), renamecols=false)
       end
5×8 DataFrame
 Row │ Cut        D         E          F          G          H          I          J
     │ Cat…       Float64   Float64    Float64    Float64    Float64    Float64    Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────
   1 │ Fair       0.024059  0.0228641  0.0326975  0.0278073  0.0364884  0.0322759  0.0423789
   2 │ Good       0.121771  0.118097   0.127961   0.104942   0.121026   0.12855    0.151709
   3 │ Very Good  0.345092  0.36307    0.354747   0.308537   0.340679   0.350609   0.393162
   4 │ Premium    0.581697  0.601613   0.599036   0.567481   0.62488    0.61398    0.680912
   5 │ Ideal      1.0       1.0        1.0        1.0        1.0        1.0        1.0

We would like to order columns by the first-order stochastic dominance relation.
Since DataFrames.jl makes it easier to sort rows, let us permute the dimensions of our data frame first:

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
           transform!(Not(:Cut) .=> x -> cumsum(x / sum(x)), renamecols=false)
           permutedims(:Cut, :Color)
       end
7×6 DataFrame
 Row │ Color   Fair       Good      Very Good  Premium   Ideal
     │ String  Float64    Float64   Float64    Float64   Float64
─────┼───────────────────────────────────────────────────────────
   1 │ D       0.024059   0.121771   0.345092  0.581697      1.0
   2 │ E       0.0228641  0.118097   0.36307   0.601613      1.0
   3 │ F       0.0326975  0.127961   0.354747  0.599036      1.0
   4 │ G       0.0278073  0.104942   0.308537  0.567481      1.0
   5 │ H       0.0364884  0.121026   0.340679  0.62488       1.0
   6 │ I       0.0322759  0.12855    0.350609  0.61398       1.0
   7 │ J       0.0423789  0.151709   0.393162  0.680912      1.0

Now we are ready to sort our data frame by all columns except :Color:

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
           transform!(Not(:Cut) .=> x -> cumsum(x / sum(x)), renamecols=false)
           permutedims(:Cut, :Color)
           sort!(Not(:Color))
       end
7×6 DataFrame
 Row │ Color   Fair       Good      Very Good  Premium   Ideal
     │ String  Float64    Float64   Float64    Float64   Float64
─────┼───────────────────────────────────────────────────────────
   1 │ E       0.0228641  0.118097   0.36307   0.601613      1.0
   2 │ D       0.024059   0.121771   0.345092  0.581697      1.0
   3 │ G       0.0278073  0.104942   0.308537  0.567481      1.0
   4 │ I       0.0322759  0.12855    0.350609  0.61398       1.0
   5 │ F       0.0326975  0.127961   0.354747  0.599036      1.0
   6 │ H       0.0364884  0.121026   0.340679  0.62488       1.0
   7 │ J       0.0423789  0.151709   0.393162  0.680912      1.0

The sorting exercise did not work this time.
First-order stochastic dominance does not render the best and the worst option.
Note that e.g. options E and G do not dominate each other.
However we see that option J is clearly worst as it has maximum values in all levels.

To see this more clearly let us do min-max scaling of all columns except :Ideal (as it is constant):

julia> scale(x) = (x .- minimum(x)) / (maximum(x) - minimum(x))
scale (generic function with 1 method)

julia> @chain diamonds begin
           sort(:Color)
           unstack(:Cut, :Color, :Cut, combine=length)
           sort!(:Cut)
           transform!(Not(:Cut) .=> x -> cumsum(x / sum(x)), renamecols=false)
           permutedims(:Cut, :Color)
           sort!(Not(:Color))
           transform!(Not([:Color, :Ideal]) .=> scale .=> identity)
       end
7×6 DataFrame
 Row │ Color   Fair       Good      Very Good  Premium   Ideal
     │ String  Float64    Float64   Float64    Float64   Float64
─────┼───────────────────────────────────────────────────────────
   1 │ E       0.0        0.281301   0.644408  0.300901      1.0
   2 │ D       0.0612305  0.359855   0.431965  0.125328      1.0
   3 │ G       0.253303   0.0        0.0       0.0           1.0
   4 │ I       0.482289   0.504808   0.497151  0.409932      1.0
   5 │ F       0.503895   0.492198   0.546059  0.278184      1.0
   6 │ H       0.698153   0.343921   0.379817  0.506022      1.0
   7 │ J       1.0        1.0        1.0       1.0           1.0

Indeed J color has 1.0 values in all columns.

We also now can more clearly see that if we ignore the :Fair level
the G color dominates all other colors.

Note that in the last step I have shown an alternative to renamecols=false
of how one can keep the column names unchanged under transformation.
What you can do is pass the identity function as target column name.

The reason is that if you pass a function as target column name then this function is applied
to source column name (and identity keeps things as they were).

Conclusions

I hope you found this post useful for exploring some of the functionalities
of DataFrames.jl.

I also tried to show how nicely @chain can be used to gradually build
an analysis. This is especially convenient in Julia REPL, since when you
go back in command history it allows you to get whole last command
(and not just a single line like in some other REPLs)
in the prompt with one up arrow key press and edit it.

Julia 1.9.0 lives up to its promise

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/05/12/julia190.html

Introduction

Julia 1.9.0 has been released this week.
This release was much waited for as it brings many significant improvements.
You can find a summary of most important changes in the Julia 1.9 Highlights post.

Of all the additions the most user-visible change is probably caching of native code.
In simple words it means that if you use some package you can expect that the first
time some function from this package is run it should be executed faster than in previous
Julia releases.

The time of first execution was indeed a big pain point for many Julia users so I am
really excited by this functionality. However, to see the benefits of caching of native code,
packages you use must be designed in a way that takes this fact into account.

So the practical question is if given the current state of the package ecosystem in Julia
we indeed see these benefits. I decided to answer this question by running some
realistic data processing workflow on both Julia 1.8.5 and 1.9.0 to see the differences.

For the test I used code from the demo I prepared for ODSC-EUROPE-2021.
The reason is that it covers all standard operations like: reading and writing data to disk,
aggregation, group by, joining, sorting, and reshaping.

The tests require the following packages: CSV.jl v0.10.10,
Chain.jl v0.5.0, DataFrames.jl v1.5.0, HTTP.jl v1.9.1.

The test

When presenting the results I will show the code snippet and timing under Julia 1.8.5 and 1.9.0
below it (without showing the output to save space). All the tests were started under a fresh
Julia session.

First let us look at package load time:

@time begin
    using Chain
    using CSV
    using DataFrames
    using HTTP
    using Statistics
end

Timings:

Julia 1.8.5:
4.041740 seconds (8.09 M allocations: 536.512 MiB, 4.93% gc time, 33.25% compilation time: 87% of which was recompilation)

Julia 1.9.0:
1.721003 seconds (1.73 M allocations: 109.727 MiB, 4.92% gc time, 5.46% compilation time: 81% of which was recompilation)

As you can see the package load time is visibly improved. However, we see that a significant time is still
spent in recompilation, which means that it should be possible to improve things in the future with better design of
internals of the packages.

Next check the time to read and write a CSV file:

input = "https://raw.githubusercontent.com/Rdatatable\
         /data.table/master/vignettes/flights14.csv";
flights_bin = HTTP.get(input).body;
@time flights = CSV.read(flights_bin, DataFrame);
@time CSV.write("flights14.csv", flights);

Timings:

Julia 1.8.5:
  11.247200 seconds (3.19 M allocations: 202.963 MiB, 0.58% gc time, 98.33% compilation time)
  2.625601 seconds (15.55 M allocations: 469.223 MiB, 4.82% gc time, 53.75% compilation time)

Julia 1.9.0:
  1.023744 seconds (703.11 k allocations: 71.524 MiB, 2.36% gc time, 70.20% compilation time)
  1.147991 seconds (14.12 M allocations: 421.457 MiB, 6.22% gc time, 55.25% compilation time)

Here we see a really big gain. It is especially visible in CSV reading time.

The next test is dropping some rows from a data frame. I do it in four different ways:

@time flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :];
@time filter(row -> row.origin == "EWR" && row.dest == "PHL", flights);
@time subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL");
@time subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")));

Timings:

Julia 1.8.5:
  0.842566 seconds (581.75 k allocations: 27.749 MiB, 99.43% compilation time)
  0.523396 seconds (2.07 M allocations: 69.233 MiB, 4.39% gc time, 85.76% compilation time)
  1.745602 seconds (1.78 M allocations: 90.815 MiB, 1.30% gc time, 99.59% compilation time: 2% of which was recompilation)
  0.565263 seconds (1.19 M allocations: 62.720 MiB, 3.93% gc time, 98.65% compilation time)

Julia 1.9.0:
  0.234892 seconds (255.22 k allocations: 17.176 MiB, 3.79% gc time, 98.27% compilation time)
  0.326977 seconds (1.47 M allocations: 46.091 MiB, 2.78% gc time, 83.76% compilation time)
  0.685728 seconds (560.88 k allocations: 37.067 MiB, 1.94% gc time, 99.13% compilation time)
  0.499620 seconds (534.75 k allocations: 37.376 MiB, 98.60% compilation time)

Again, in all cases we see a drop in time to first execution. You might ask why we still see a lot of compilation?
The major reason is that in the examples we define new functions or data structures that cause compilation.
For example x -> x .== "EWR" is an anonymous function that we have just created so it was impossible to precompile it.

Let us now perform a groupby and group selection by key-values:

@time flights_idx = groupby(flights, [:origin, :dest]);
@time flights_idx[("EWR", "PHL")];

Timings:

Julia 1.8.5:
  1.436151 seconds (2.02 M allocations: 104.135 MiB, 1.86% gc time, 99.47% compilation time)
  0.569703 seconds (458.67 k allocations: 26.118 MiB, 4.46% gc time, 99.60% compilation time)

Julia 1.9.0:
  1.216488 seconds (1.16 M allocations: 80.253 MiB, 3.13% gc time, 98.56% compilation time)
  0.281608 seconds (214.53 k allocations: 16.605 MiB, 4.83% gc time, 99.62% compilation time)

We still see the benefits. Yet, you might ask why we see so much compilation even under Julia 1.9.0.
The reason is that e.g. groupby by two columns is not very common, so DataFrames.jl decided to
leave it out from precompilation. For this reason when you run groupby(flights, [:origin, :dest])
native code for such a scenario is not cached. This is indeed a hard design decision for package
maintainers. You could add more and more precompilation statements to improve the coverage of
cached native code, but it also costs as it would impact: package installation time and
package load time.

Our next test is an aggregation operation. Again I chose it to be non-trivial and associated with
creation of an anonymous function:

julia> @time combine(flights_idx) do sdf
    max_air_time = maximum(sdf.air_time)
    return count(sdf.air_time .== max_air_time)
end;

Timings:

Julia 1.8.5:
  1.748435 seconds (1.40 M allocations: 76.052 MiB, 1.69% gc time, 99.53% compilation time)

Julia 1.9.0:
  0.953018 seconds (921.98 k allocations: 58.030 MiB, 1.55% gc time, 99.54% compilation time)

Again we see significant timing improvement.

It is time for a multi-step operation involving: groping, aggregation, and sorting:

@time @chain flights begin
    groupby(:month)
    combine(nrow, :dep_delay => mean)
    sort(:dep_delay_mean)
end;

Timings:

Julia 1.8.5:
  1.648610 seconds (778.81 k allocations: 43.905 MiB, 99.57% compilation time: 10% of which was recompilation)

Julia 1.9.0:
  0.560781 seconds (687.27 k allocations: 40.013 MiB, 2.39% gc time, 98.64% compilation time: 39% of which was recompilation)

Another big win. However, we see that we triggered recompilation, which means that we might try to improve internal design
of the package ecosystem here.

We are now ready for all-time favorite operation of all data scientists that is a join:

@time months = DataFrame(month=1:10,
                         month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                     "Jun", "Jul", "Aug", "Sep", "Oct"]);
julia> @time leftjoin(flights, months, on=:month);

Timings:

Julia 1.8.5:
  0.033462 seconds (351 allocations: 16.797 KiB, 99.64% compilation time)
  4.870511 seconds (2.89 M allocations: 176.266 MiB, 1.21% gc time, 99.72% compilation time)

Julia 1.9.0:
  0.000090 seconds (26 allocations: 2.016 KiB)
  0.847289 seconds (343.69 k allocations: 48.378 MiB, 2.30% gc time, 94.35% compilation time)

We see another big win here. You might ask why we still see a lot of compilation in leftjoin
although there is no function passed to it? Now the reason is that various data frames can have
different column types. And again – we cannot precompile code against all possible column types
that user might use, only the most common ones are covered in cached native code.

The last benchmark is reshaping of a data frame, which is another commonly done operation.
In the example I use a non-trivial reshape that involves creating a pivot table:

julia> @time unstack(flights, :month, :carrier, :carrier, combine=length);

Timings:

Julia 1.8.5:
  2.059941 seconds (2.96 M allocations: 147.041 MiB, 98.75% compilation time)
Julia 1.9.0:
  1.081823 seconds (1.56 M allocations: 95.952 MiB, 1.92% gc time, 98.65% compilation time)

The last test also shows noticeable improvements, so we are indeed happy in all cases.

It is natural to ask what is the total time of running of our whole analysis. Here is the
code (leaving out package loading and data downloading). The test is run on a fresh
Julia session.

function test()
    flights = CSV.read(flights_bin, DataFrame)
    CSV.write("flights14.csv", flights)
    flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :]
    filter(row -> row.origin == "EWR" && row.dest == "PHL", flights)
    subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL")
    subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")))
    flights_idx = groupby(flights, [:origin, :dest])
    flights_idx[("EWR", "PHL")]
    combine(flights_idx) do sdf
        max_air_time = maximum(sdf.air_time)
        return count(sdf.air_time .== max_air_time)
    end
    @chain flights begin
        groupby(:month)
        combine(nrow, :dep_delay => mean)
        sort(:dep_delay_mean)
    end
    months = DataFrame(month=1:10,
                       month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                   "Jun", "Jul", "Aug", "Sep", "Oct"])
    leftjoin(flights, months, on=:month)
    unstack(flights, :month, :carrier, :carrier, combine=length)
end;
@time test();
@time test();

This time I run the code twice to show how much time is saved when we do not need
to compile things:

Julia 1.8.5:
 19.994426 seconds (27.73 M allocations: 1.083 GiB, 1.79% gc time, 93.94% compilation time: 1% of which was recompilation)
  1.016076 seconds (13.86 M allocations: 403.160 MiB, 5.66% gc time)
Julia 1.9.0:
  7.453665 seconds (20.80 M allocations: 856.780 MiB, 3.52% gc time, 86.36% compilation time: 3% of which was recompilation)
  0.958529 seconds (13.86 M allocations: 403.147 MiB, 6.47% gc time)

As you can see the first run is almost three times faster on Julia 1.9.0 in comparison to Julia 1.8.5.
However, the second run is comparable on both (as expected) and is significantly faster as it does not involve compilation.

Conclusions

There are three conclusions from our tests:

  • Indeed caching of native code significantly improves time of running functions defined in packages. Julia 1.9.0 indeed lives up to its promise.
  • Still, to see these benefits package maintainers need to appropriately prepare their distribution. From the tests we see that there is still room for improvement in this area.
  • Even with the best preparation of the packages you still will see run-time compilation.

The major reason why you still will see compilation are user defined functions and data structures (that are not known during package precompilation so native code handling them cannot be cached).
For some packages this is probably a minimal limitation. However, for packages such as DataFrames.jl this is a challenge, as most analysis involve custom non-standard data transformations
and potentially non-standard column types. This means that in DataFrames.jl we really need to think hard what to put into precompilation directives to ensure the best
user experience.

I hope you will enjoy using Julia 1.9.0!