Tag Archives: julialang

Is Makie.jl up to speed?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/12/01/plot.html

Introduction

Makie is a plotting ecosystem for the Julia language that is extremely feature-packed and actively developed.
Recently its core package Makie.jl reached version 0.20. Its core developer Simon told me that
the package now loads much faster than it was the case in the past.

The “time to first plot” issue is often raised by new users of the Julia ecosystem as important.
Therefore a lot of work was put both by Julia core developers and by package maintainers to reduce it.

In this post I thought that it would be interesting to check how CairoMakie.jl compares to Plots.jl.
The Plots.jl package is another great plotting ecosystem for Julia. It is more lightweight, so in the past it was seen
as faster, but less feature rich. Let us see how the situation stands currently.
From the Makie ecosystem I have chosen CairoMakie.jl as I typically need 2-D production-quality plots.

The code in this post was tested under Julia Version 1.10.0-rc1, CairoMakie.jl 0.11.2 and Plots.jl 1.39.0.

Installing the package

This, and the following tests, are done in separate Julia sessions in separate project environments
for both plotting ecosystems.

The first timing we make is package installation. The results are the following:

  • CairoMakie.jl 0.11.2: 241 dependencies successfully precompiled in 227 seconds
  • Plots.jl v1.39.0: 62 dependencies successfully precompiled in 78 seconds

CairoMakie.jl takes three times more time to install and has four times more dependencies.
Installation is one-time cost, however, there are two considerations to keep in mind:

  • New users are first faced with installation time so it is noticeable (this is especially relevant with Pluto.jl).
  • Since CairoMakie.jl has many more dependencies it is more likely that it will require recompilation when any of them gets updated.

Loading the package

After installing packages we can check how long it takes to load them:

julia> @time using CairoMakie
  3.580779 seconds (2.75 M allocations: 181.590 MiB, 4.39% gc time, 1.71% compilation time: 49% of which was recompilation)

vs

julia> @time using Plots
  1.296026 seconds (1.05 M allocations: 73.541 MiB, 6.00% gc time, 2.19% compilation time)

CairoMakie.jl takes around three times more time to load. This difference is noticeable, but I think it is not a show-stopper in most cases.
Having to wait 3.5 seconds for a package to load should be acceptable unless someone expects to run a really short-lived Julia session.

Simple plotting

Now comes the time to compare plotting time. Start with CairoMakie.jl:

julia> x = range(0, 10, length=100);

julia> y = sin.(x);

julia> @time lines(x, y)
  0.559800 seconds (476.77 k allocations: 32.349 MiB, 3.13% gc time, 94.66% compilation time)

julia> @time lines(x, y)
  0.012473 seconds (32.40 k allocations: 2.128 MiB)

vs Plots.jl:

julia> x = range(0, 10, length=100);

julia> y = sin.(x);

julia> @time plot(x, y)
  0.082866 seconds (9.16 k allocations: 648.188 KiB, 97.32% compilation time)

julia> @time plot(x, y)
  0.000508 seconds (484 allocations: 45.992 KiB)

The situation repeats. CairoMakie.jl is visibly slower, but having to wait 0.5 second for a first plot to be sent to the plotting engine is I think acceptable.
Note that the consecutive plots are much faster as they do not require compilation.

Conclusions

Given the timings I have gotten my judgment is as follows:

  • CairoMakie.jl is still visibly slower than Plots.jl.
  • Yet, CairoMakie.jl is in my opinion currently fast enough not to annoy users by requiring them to wait excessively long for a plot.

I think Makie maintainers, in combination with core Julia developers,
have done a fantastic job with improving time-to-first-plot in this ecosystem.

I can say that I decided to switch to Makie as my default plotting tool for larger projects.
However, I will probably for now still use Plots.jl in scenarios when I just want to start Julia and do a single quick plot
(especially on a machine where it has to be yet installed).

Boost your time to market with Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/24/pe104.html

Introduction

Today my post was inspired by an interesting write-up by Christopher Rackauckas:
ChatGPT performs better on Julia than Python (and R) for Large Language Model (LLM) Code Generation. Why?.
The general take away of this text is that writing Julia code with LLM support is efficient.

A thing that I think is relevant here is that when you write code you typically want to get your job done as fast as possible.
Getting the job done typically involves three steps (possibly repeated):

  • Thinking about the algorithm.
  • Implementing it.
  • Running the code.

Each of these steps takes time. And you, usually, want to get a solution in a minimum possible total time.

My experience is that Julia is really fast to code with once you learn it (and as Chris explained using LLMs
boosts this time even further).

Since Julia is compiled it also offers fast code execution times.

So we are left with algorithm design time. And this is the point that lead me to write this post.
The reason is that my experience is that I often decide to use non-optimal algorithms, because
with Julia I know my code will be reasonably fast anyway. This is especially the case if I can
see a simple solution that uses basic functionalities in-built into Julia and do not require too much thought.

To test-drive this assertion today I chose a Project Euler problem that I have not solved before and decided
to write down my thought process when solving it. My choice was problem 104, because it was a first relatively easy
(so I assumed I would not need to spend too much time on it) problem I have not solved yet.

The code was written under Julia 1.9.2.

The problem

The Project Euler problem 104 is stated as follows (abbreviated):

Find the number of the smallest Fibonacci number for which
its first nine digits contain all the digits 1 to 9 (not necessarily in order)
and the same condition holds for its last nine digits.

Thinking about the algorithm

The observations I made are the following:

  • I assumed that the problem is not trivial – I need to do some optimizations to solve it in a reasonable time.
  • The last nine digits are easier; I can just track values modulo 10^9 and easily fit the data into Int type.
  • The number formed by last nine digits must be divisible by 9 as the sum of numbers from 1 to 9 is 45.
  • The first nine digits are harder. I could get them by using the Binet’s Formula, but this would require analysis of round-off error of floating-point operations.
    This would take thinking time, so maybe it is enough to just use the BigInt values and check them only if the Int based test passes.
  • The good thing is tha Julia ships with the digits function so I can easily get a vector of digits of a given number.
  • I need to combine Int and BigInt calculations to cut down the processing time. Most of the time it should be enough to analyze just Int values.

As you can see, I decided not to invest too much into the thinking time,
hoping that the implementation of the above algorithm will be easy and its run time will be acceptable.

Implementing it

The following code contains the implementation of the algorithm following the observations I have described above:

function pe104()
    big1, big2 = big(1), big(1)
    small1, small2 = 1, 1
    k = 2
    while true
        k += 1
        big1, big2 = big1 + big2, big1
        small1, small2 = (small1 + small2) % 10^9, small1

        small1 % 9 == 0 &&
        sort!(digits(small1)) == 1:9 &&
        sort!(last(digits(big1), 9)) == 1:9 &&
        return k
    end
end

Note that we perform computations both on Int values (small1 and small2) and on BigInt values (big1 and big2).
In the code we do three tests to decide if a given k is good. The tests are ordered in the increasing order of computational cost.
Note that since the Int values are computed modulo 10^9 I do not need to do any trimming of vector of digits returned by digits(small1).

Running the code

Let us check how much time it takes to find the solution:

julia> @time pe104();
 16.634432 seconds (696.82 k allocations: 4.441 GiB, 3.01% gc time)

Note that I put ; at the end of the line to suppress printing of the solution (in hope to encourage you to run the code yourself).
The point is that the run time of the code is just a few seconds. The code is clearly not optimal and could be significantly faster.
However, I am sure that speeding it up would take me more time than I would save in run time. Thus, I am happy with what I got.

Conclusions

In general Julia allows you to write efficient code. There are many applications where it is crucial and it is worth to spend a lot of time
on optimization of run-time. This is especially true if you write your code once and run it millions of times.

However, in many cases, like the one presented today, Julia is fast enough so even code that is not fully performant is good enough.
Often such simpler code takes: less time to design, less time to implement, and less time to prove its correctness.

Is vector pooling considered harmful?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/17/pooling.html

Introduction

PooledArrays.jl is a package providing a custom array type that can be used for compression of your data if it has a few unique elements.
In this post I want to explain the design of PooledArray object and discuss how it affects its performance.

The post was written under Julia 1.9.2, PooledArrays.jl 1.4.3, and DataFrames.jl 1.6.1.

How pooling works

The definition of the PooledArray type is the following:

mutable struct PooledArray{T, R<:Integer, N, RA} <: AbstractArray{T, N}
    refs::RA
    pool::Vector{T}
    invpool::Dict{T,R}
    # refcount[] is 1 if only one PooledArray holds a reference to pool and invpool
    refcount::Threads.Atomic{Int}
end

It represents a N-dimensional array with element type T.
The internal representation of data is that each unique value in a PooledArray gets an integer
representation (reference) of type R. Then for each element of the array the refs field is an array
that is AbstractArray{R, N} and keeps track of what reference value is stored for each entry of an array.

To be able to efficiently work with these reference numbers PooledArray stores two fields:

  • pool that gives information how reference numbers are mapped to actual values of type T stored in the array;
  • invpool that gives an inverse information on what is a reference number of some value of type T.

Since many operations on pooled arrays do not change pool and invpool the PooledArray has an extra optimization
that automatically ensures that two arrays that can be proven to have the same pool and invpool share them.
The refcount field is used to keep track how many such arrays exist. This is used in case when we modify
pool and invpool of some array and want to make sure that we do not modify another PooledArray by accident.

Let us check these properties in practice.

Test driving pooled arrays

Let us create a simple PooledArray first:

julia> using PooledArrays

julia> pa1 = PooledArray(["a", "b", "a", "b", "a", "b"])
6-element PooledVector{String, UInt32, Vector{UInt32}}:
 "a"
 "b"
 "a"
 "b"
 "a"
 "b"

julia> pa1.refs
6-element Vector{UInt32}:
 0x00000001
 0x00000002
 0x00000001
 0x00000002
 0x00000001
 0x00000002

julia> pa1.pool
2-element Vector{String}:
 "a"
 "b"

julia> pa1.invpool
Dict{String, UInt32} with 2 entries:
  "b" => 0x00000002
  "a" => 0x00000001

julia> pa1.refcount
Base.Threads.Atomic{Int64}(1)

We can see that, by default, each entry is recorded as 4-byte UInt32.
Additionally the pa1.refcount tells us that only this pooled array
uses the pool and refpool objects that it references to.

Let us first check what happens when we operate on this array:

julia> pa2 = pa1[[1, 2, 3]]
3-element PooledVector{String, UInt32, Vector{UInt32}}:
 "a"
 "b"
 "a"

julia> pa2.refcount
Base.Threads.Atomic{Int64}(2)

julia> pa1.refcount
Base.Threads.Atomic{Int64}(2)

julia> pa2.pool === pa1.pool
true

julia> pa2.invpool === pa1.invpool
true

As you can see, since pa2 subsets pa1 we knew that they can share their pool and invpool.
The refcount field tells us that two objects reuse them.

Let us now modify the pool of pa2:

julia> pa2[1] = "c"
"c"

julia> pa2.refcount
Base.Threads.Atomic{Int64}(1)

julia> pa1.refcount
Base.Threads.Atomic{Int64}(1)

julia> pa2.pool
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> pa1.pool
2-element Vector{String}:
 "a"
 "b"

As you can see the pools got automatically decoupled and refcount is adjusted accordingly.

In summary, the benefit of pool-sharing is that we can very fast subset PooledArrays without having to re-create pool and invpool.
This makes working with PooledArray fast as long as we do not change the set of values we store in them.

The second important design aspect of PooledArray is the R type. As I have said, by default it is UInt32. However,
for small pools this is inefficient. Therefore you can write:

julia> pa3 = PooledArray(pa1, compress=true)
6-element PooledVector{String, UInt8, Vector{UInt8}}:
 "a"
 "b"
 "a"
 "b"
 "a"
 "b"

julia> Base.summarysize(pa1)
570

julia> Base.summarysize(pa3)
504

As you can see, you can use the compress=true keyword argument to automatically pick the minimal size of R type that is able to keep the pool at hand.
In our case it is UInt8, which would save a lot of memory in case of large arrays.
Why do we use UInt32 by default then?
The reason is that this type is typically efficient enough memory-wise and at the same time it ensures a pool that is large enough in most scenarios.
For example, the limitation of the UInt8 pool is that it can store up to 255 values only:

julia> pa4 = PooledArray(1:255, compress=true);

julia> pa4[1]=256
ERROR: You're using a PooledArray with ref type UInt8, which can only hold 255 values,
and you just tried to add the 256th reference.  Please change the ref type
to a larger int type, or use the default ref type (UInt32).

So you have a tradeoff here. If you are sure you will not change your pool then compress=true is a safe option.
If you know you might need to change the pool you need to pick the R type more carefully.

What are the benefits of pooled arrays?

There are two types of benefits of PooledArray. The first is memory footprint, the second is performance.
Let me explain them by example.

First create two large vectors storing strings:

julia> v1 = ["x$(isodd(i))" for i in 1:10^6];

julia> v2 = PooledArray(v1, compress=true);

julia> Base.summarysize(v1)
21500040

julia> Base.summarysize(v2)
1000507

As you can see there is a significant compression gain in our example by using a PooledArray.

The second benefit is performance, especially in combination with DataFrames.jl:

julia> using DataFrames

julia> df = DataFrame(; v1, v2)
1000000×2 DataFrame
     Row │ v1      v2
         │ String  String
─────────┼────────────────
       1 │ xtrue   xtrue
       2 │ xfalse  xfalse
       3 │ xtrue   xtrue
    ⋮    │   ⋮       ⋮
  999998 │ xfalse  xfalse
  999999 │ xtrue   xtrue
 1000000 │ xfalse  xfalse
       999994 rows omitted

Now let us perform two example operations (I am measuring the second timing to avoid counting compilation time).

The first is aggregation:

julia> combine(groupby(df, :v1), nrow);

julia> @time combine(groupby(df, :v1), nrow);
  0.025122 seconds (201 allocations: 31.271 MiB)

julia> combine(groupby(df, :v2), nrow);

julia> @time combine(groupby(df, :v2), nrow);
  0.002766 seconds (227 allocations: 7.643 MiB)

As you can see doing aggregation when grouping by PooledArray is much faster.

The second example is innerjoin:

julia> df_ref = df[1:2, :]
2×2 DataFrame
 Row │ v1      v2
     │ String  String
─────┼────────────────
   1 │ xtrue   xtrue
   2 │ xfalse  xfalse

julia> df_ref.val = 1:2
1:2

julia> innerjoin(df, df_ref, on=:v1, makeunique=true);

julia> @time innerjoin(df, df_ref, on=:v1, makeunique=true);
  0.057885 seconds (248 allocations: 36.741 MiB, 23.00% gc time)

julia> innerjoin(df, df_ref, on=:v2, makeunique=true);

julia> @time innerjoin(df, df_ref, on=:v2, makeunique=true);
  0.024692 seconds (265 allocations: 43.416 MiB)

And again we see that joining on :v2 is faster than on :v1.

When using pooled arrays might not be a good idea?

There are three cases when you might not see benefits from using PooledArray.

The first is when you have many unique values in your data. Then you have to pay the price
of storing refs, pool, and invpool objects and all of them will be large.

The second is if you have a value that you store that has a small memory footprint, e.g. Bool
and you did not use compress=true. In such a case refs will take more memory than original data would.

The third case is when you create multiple copies of a PooledArray and modify its pool. In such a case
the cost of copying of the pool and invpool fields might be non-negligible. Let me show you a practical
example of the third situation:

julia> df2 = DataFrame(id=1:10^6, v=PooledArray(repeat(["x$i" for i in 1:1000], 1000)))
1000000×2 DataFrame
     Row │ id       v
         │ Int64    String
─────────┼─────────────────
       1 │       1  x1
       2 │       2  x2
       3 │       3  x3
    ⋮    │    ⋮       ⋮
  999998 │  999998  x998
  999999 │  999999  x999
 1000000 │ 1000000  x1000
        999994 rows omitted

julia> df3 = DataFrame(id=1:10^6, v=repeat(["x$i" for i in 1:1000], 1000));

Note that df2 and df3 are identical except that in df2 the :v column is pooled and in df3 it is not.

Now let us test outerjoin on this data:

julia> outerjoin(df2, df2, on=:id, makeunique=true);

julia> @time outerjoin(df2, df2, on=:id, makeunique=true);
  0.065559 seconds (326 allocations: 30.951 MiB)

julia> outerjoin(df3, df3, on=:id, makeunique=true);

julia> @time outerjoin(df3, df3, on=:id, makeunique=true);
  0.036927 seconds (274 allocations: 38.400 MiB)

Note that working with non-pooled data is faster. If we check innerjoin this is not the case:

julia> innerjoin(df2, df2, on=:id, makeunique=true);

julia> @time innerjoin(df2, df2, on=:id, makeunique=true);
  0.018188 seconds (210 allocations: 30.528 MiB)

julia> innerjoin(df3, df3, on=:id, makeunique=true);

julia> @time innerjoin(df3, df3, on=:id, makeunique=true);
  0.029364 seconds (206 allocations: 38.157 MiB)

What is going on here? Let us look at the output:

julia> outerjoin(df2, df2, on=:id, makeunique=true)
1000000×3 DataFrame
     Row │ id       v        v_1
         │ Int64    String?  String?
─────────┼───────────────────────────
       1 │       1  x1       x1
       2 │       2  x2       x2
       3 │       3  x3       x3
    ⋮    │    ⋮        ⋮        ⋮
  999998 │  999998  x998     x998
  999999 │  999999  x999     x999
 1000000 │ 1000000  x1000    x1000
                  999994 rows omitted

julia> innerjoin(df2, df2, on=:id, makeunique=true)
1000000×3 DataFrame
     Row │ id       v       v_1
         │ Int64    String  String
─────────┼─────────────────────────
       1 │       1  x1      x1
       2 │       2  x2      x2
       3 │       3  x3      x3
    ⋮    │    ⋮       ⋮       ⋮
  999998 │  999998  x998    x998
  999999 │  999999  x999    x999
 1000000 │ 1000000  x1000   x1000
                999994 rows omitted

Note that outerjoin changes the element type of v and v_1 columns, so pool and invpool need to be re-created twice,
which takes time and memory.
In innerjoin the element type is not changed so pool and invpool in the output are reused from the source data frame.

Conclusions

In summary:

  • PooledArray is useful if you have data that has many duplicates.
  • The benefits of using PooledArray are lower memory footprint and the fact that some operations on it can be faster (e.g. groupby in DataFrames.jl).
  • The biggest benefit of PooledArray is when you do not change its pool of values. In such a case the pool and invpool objects are created only once
    and are reused in arrays derived from the source array.
  • Remember to carefully choose the type of the reference used in PooledArray. By default it is UInt32, but you can pick a smaller type to get even better compression
    at the expense of smaller number of unique values that your pooled array can store.

I hope you find the tips shared in this post useful in your data analysis processes.