Tag Archives: julialang

Partial function application in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/02/23/fix.html

Introduction

Some functions provided in Base Julia support partial application.
I often find this functionality useful.
Therefore in this post I want to give you its explanation and a summary which functions have this property.

The post was tested with Julia Version 1.12.0-DEV.53.

Explaining partial function application

We will focus on partial application of functions having two positional arguments.
Let us work by example.

Consider the in function. You can call it to check if some item is in a collection.
Here is an example:

julia> in('a', "Abracadabra")
true

julia> in('x', "Abracadabra")
false

A common pattern you might need is to perform a repeated check if various items are contained in the same collection.
For example assume you have a vector of characters and you want to filer it to keep only the elements contained in a reference collection.
You can do it like this:

julia> v = 'a':'z'
'a':1:'z'

julia> filter(x -> in(x, "Abracadabra"), v)
5-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)

This pattern is so commonly needed that there is a shorthand for x -> in(x, "Abracadabra").
Instead of creating this anonymous function you can just write in("Abracadabra").
The value returned by this function call behaves in the same way as x -> in(x, "Abracadabra").
Let us check:

julia> filter(in("Abracadabra"), v)
5-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)

You can think of this operation as if we partially applied the in function by fixing its second argument
(the collection) and leaving the first (the item we check) to be specified later.

In other words the following two operations are equivalent:

julia> in('a', "Abracadabra")
true

julia> in("Abracadabra")('a')
true

Fixing of the second argument is most common. However, sometimes it is useful to fix the first argument.
This is exactly the case of the filter function we have just used.

What if you wanted to perform the filter(in("Abracadabra"), v) test for multiple different values of v but with a fixed predicate function?
Here is an example:

julia> vv = ['a'+i:'z' for i in 0:4]
5-element Vector{StepRange{Char, Int64}}:
 'a':1:'z'
 'b':1:'z'
 'c':1:'z'
 'd':1:'z'
 'e':1:'z'

julia> map(v -> filter(in("Abracadabra"), v), vv)
5-element Vector{Vector{Char}}:
 ['a', 'b', 'c', 'd', 'r']
 ['b', 'c', 'd', 'r']
 ['c', 'd', 'r']
 ['d', 'r']
 ['r']

You probably see, where I am getting at. Instead of v -> filter(in("Abracadabra"), v) we can write filter(in("Abracadabra")) and fix
the first positional argument of filter, leaving the second to be specified later.
Let us check if this works:

julia> map(filter(in("Abracadabra")), vv)
5-element Vector{Vector{Char}}:
 ['a', 'b', 'c', 'd', 'r']
 ['b', 'c', 'd', 'r']
 ['c', 'd', 'r']
 ['d', 'r']
 ['r']

Indeed, we get what we expected. Again, for a reference note that the following two operations are equivalent:

julia> filter(in("Abracadabra"), v)
5-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)

julia> filter(in("Abracadabra"))(v)
5-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'r': ASCII/Unicode U+0072 (category Ll: Letter, lowercase)

Before I finish this section let me note that if you do not like writing that many parentheses you could use the |> operator.
In our example we could write:

julia> map("Abracadabra" |> in |> filter, vv)
5-element Vector{Vector{Char}}:
 ['a', 'b', 'c', 'd', 'r']
 ['b', 'c', 'd', 'r']
 ['c', 'd', 'r']
 ['d', 'r']
 ['r']

Which style you use is a matter of preference.

Catalogue of function supporting partial applications

We saw that some functions taking two arguments support partial application.
Below I give you a list of all of them that are currently supported (and this is the reason why the post is written under Julia nightly,
as there were recent changes in this list).

There is only one function in Base Julia that supports fixing its first argument and this function is filter.

However, there are many functions supporting fixing of their second argument. Here is their list:

  • comparisons: isequal, ==, !=, >=, <=, >, <;
  • inclusion testing: in, , , , ;
  • string checking: contains, occursin, endswith, startswith;
  • set operations (supported since Julia 1.11; not released yet): issubset,, , , , , , isdisjoint, issetequal.

Conclusions

After reading this post you know how to use partial function application in Julia and which functions from Base support it.
I hope you will find this functionality useful in your code.

Changes in random number generation performance in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/02/16/rng.html

Introduction

Today I want to present a small benchmark of random number generation
performance improvements between current Julia release 1.10.1 and
current LTS version 1.6.7.

The idea for the benchmark follows a discussion with a friend who needed
to run some compute intensive Julia code on its LTS version.

The post was written under Julia 1.10.1 and Julia 1.6.7.

The benchmark

Let us start with presenting the benchmark functions

function test_rand1()
    s = 0
    for i in 1:10^9
        s += rand(1:1_000_000)
    end
    return s
end

function test_rand2()
    s = 0.0
    for i in 1:10^9
        s += rand()
    end
    return s
end

They are relatively simple. I wanted to compare the performance of:
(1) integer generation from some range and (2) generation of floating point numbers from [0, 1) interval
as these are two most common scenarios in practice.

Let us see the results. First comes Julia 1.6.7:

julia> @time test_rand1()
  4.949335 seconds (13 allocations: 35.406 KiB)
499993991047124

julia> @time test_rand1()
  4.663646 seconds
499998112691460

julia> @time test_rand2()
  2.175424 seconds
5.000141761909688e8

julia> @time test_rand2()
  2.238839 seconds
4.9999424544883996e8

And now we have Julia 1.10.1:

julia> @time test_rand1()
  2.355028 seconds
500001818410630

julia> @time test_rand1()
  2.287840 seconds
499998082399284

julia> @time test_rand2()
  1.123886 seconds
5.000026226340503e8

julia> @time test_rand2()
  1.117811 seconds
4.9999201274214923e8

So we see that things run roughly two times faster.

Some additional remarks

What is the reason for this difference?
The major point is that between Julia 1.6.7 and Julia 1.10.1
a default random number generator was changed. Let us see
(below I use copy to ensure explicit instantiation of the random number generator object under Julia 1.10.1).

Again, first we test Julia 1.6.7:

julia> using Random

julia> copy(Random.default_rng())
MersenneTwister(0x2fe644ceb724000ca5e5b4409dc3c6ea, (0, 4502994048, 4502993046, 986, 2502992778, 986))

and next we check Julia 1.10.1:

julia> using Random

julia> copy(Random.default_rng())
Xoshiro(0x1273707731737276, 0x187b3d2e82fb1d48, 0x13f9fd1a82642acb, 0xa7dcba727da742e6, 0x3ed2b4d410aa4b31)

So indeed, we see that MersenneTwister was replaced by Xoshiro generator (to be exact Xoshiro256++).

This has one important consequence, apart from random number generation speed that is related to seeding
of the generator. Let us check, Julia 1.6.7:

julia> Random.seed!(1)
MersenneTwister(1)

julia> rand()
0.23603334566204692

vs Julia 1.10.1:

julia> Random.seed!(1)
TaskLocalRNG()

julia> rand()
0.07336635446929285

This means that when you use the default random number generator you should not expect reproducibility of results between these two Julia versions.
This lack of stability is documented as not ensured across Julia versions.

If you need to ensure such reproducibility you can use e.g. the StableRNGs.jl package.

Conclusions

The topic of changes in random number generation in Julia is probably well known to people doing compute intensive simulations.
However, I thought it is worth to present these results for new users, who might be using different versions of Julia to execute the same code
and wonder why the performance or the results themselves are different across them.

One puzzle, two solutions

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/02/09/pe116.html

Introduction

Today, I wanted to switch back to a lighter subject.
Therefore I decided to have a look at my favorite Project Euler website.

I picked the problem 116 as I have not tried to solve it yet.
Interestingly, it turned out that there are two ways to approach this puzzle,
so I thought to share them here.

The post was written under Julia 1.10.0.

The puzzle

The Project Euler puzzle 116 can be briefly stated as follows:

Given a row of 50 grey squares is to have a number of its tiles replaced with
coloured oblong tiles chosen from red (length two),
green (length three), or blue (length four).
How many different ways can the grey tiles be replaced if colours
cannot be mixed and at least one coloured tile must be used?

(If you want to see some visual examples of valid tilings, I encourage you to
visit the puzzle 116 page.)

The first approach

When we think of this problem, it is natural to generalize it. By C(n, d) we can
define the number of ways that n gray squares can be replaced with tiles of length d.
Then the solution to our problem is C(n, 2) + C(n, 3) + C(n, 4).
So let us focus on computing C(n, d) (assuming d is positive).

The first approach is to ask how many tiles of length d can be put. There must be at least
1, and we cannot put more than n ÷ d (here I use the ÷ notation taken from Julia that
denotes integer division; in other words the integer part of n / d).

So now assume that we want to put i blocks of length d (assuming i is valid). In
how many ways can we do it. Well, we put i long blocks and we are left with n - d*i gray blocks.
In total we have i + (n - d*i) blocks. You can then think of it as you have that many slots
from which you need pick i slots to put the long blocks. The number of ways you can do it is
given by the value of binomial coefficient. In Julia notation it is:
binomial(BigInt(i + (n - d*i)), BigInt(i)).

Now you might ask why I put the BigInt wrapper around the passed numbers? The reason is
that binomial coefficient gets large pretty quickly, so I want to make sure I will not
have issues with integer overflow.

Given these considerations the first function that produces C(n, d) can be defined as:

function C1(n::Integer, d::Integer)
    @assert d > 0 && n >= 0
    return sum(i -> binomial(BigInt(i + (n - d*i)), BigInt(i)), 1:n ÷ d; init=big"0")
end

Note that I use the init=big"0" initialization statement in the sum to ensure the
correct handling of the cases when n < d when we are given an empty collection to sum over.

The second approach

However, there is a different way how we can think of computing C(n, d).

Assume we know the values of C(n, d) for values of n smaller than the requested one.

We look at the last tile in our row.

If it is empty, then we are down to n-1 tiles to be filled.
This can be done in C(n-1, d) ways (remember that this value takes care
of the fact that at least one block of length d has to be used).

But what if the last tile in our row is filled with a block of length d?
Then we have two options. Either all other blocks are left gray (which gives us 1 combination)
or we are left with n-d tiles that are filled with at least one block of length d. The
second value is exactly C(n-d, d).

In summary we get that C(n, d) = C(n-1, d) + C(n-d, d) + 1.

This formula assumes n is at least d. But clearly for n < d
we have 0 ways to arrange the blocks.

Let us write down the code that performs the required computation:

function C2(n::Integer, d::Integer)
    @assert d > 0 && n >= 0
    npos = Dict{Int,BigInt}(i => 0 for i in 0:d-1)
    for j in d:n
        npos[j] = npos[j-1] + npos[j-d] + 1
    end
    return npos[n]
end

Note in the code that I used the npos dictionary to flexibly allow
for any potential integer values of n. The dictionary has
Dict{Int,BigInt} type, again, to ensure that the results of the computations
are stored correctly even if they are large.

Testing

Now we have two functions C1 and C2 that look completely differently.
Do they produce the same results. Let us check:

julia> using Test

julia> @testset "test C1 and C2 equality" begin
           for n in 0:200, d in 1:20
               @test C1(n, d) == C2(n, d)
           end
       end;
Test Summary:           | Pass  Total  Time
test C1 and C2 equality | 4020   4020  0.9s

Indeed we see that both C1 and C2 functions produce the same results.

To convince ourselves that using arbitrary precision integers was indeed needed
let us check some example values of the functions:

julia> C1(200, 2)
453973694165307953197296969697410619233825

julia> C2(200, 2)
453973694165307953197296969697410619233825

julia> typemax(Int)
9223372036854775807

Indeed, if we were not careful, we would have an integer overflow issue.

Conclusions

As usual I will not show the value of the solution to the problem to encourage you
to run the code yourself. You can get it by executing either
sum(d -> C1(50, d), 2:4) or sum(d -> C2(50, d), 2:4).
(We have just checked that the value produced in both cases is the same).