MRI Scan Simulation Made Easy with BlochSim.jl

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/mri-scan-simulation-made-easy-with-blochsimjl

MRI scans manipulate magnetization vectors to produce an observable signal.

Often it can be useful to simulate what signal an MRI scan will generate. Example scenarios include simulating MRI signals for educational purposes, for developing a new MRI scan, or for fitting model parameters to an acquired signal.

In this post, we will learn how to use the Julia programming language, a free and open source programming language that excels especially in scientific computing, to simulate the signal generated by an MRI scan. Specifically, we will use the BlochSim.jl package to simulate a balanced steady-state free precession (bSSFP) scan.

Note that this post assumes a basic understanding of MRI pulse sequences and Bloch simulations. See our previous posts for an overview of the Bloch equations and an overview of BlochSim.jl.

Prerequisites

To run the code in this post, both Julia and BlochSim.jl need to be installed. See our previous post for installation instructions.

bSSFP

First, we need to review bSSFP so we know how to simulate it.

Here is a pulse sequence diagram of bSSFP (source).

For this post, we are not simulating signal localization, so we will ignore the imaging gradients.

Thus, for each TR we will simulate

  1. excitation,

  2. free precession until the echo time, and

  3. free precession until the end of the TR.

This, by itself, would be essentially the same simulation as done in our previous post. However, two aspects of bSSFP make simulating it a bit more interesting.

  1. bSSFP, as the name states, is a steady-state scan. This means we don’t simulate just one excitation followed by free precession to the echo time. Instead, we need to simulate multiple TRs until we get to steady-state. We can do this either by manually simulating multiple TRs or by mathematically computing the steady-state magnetization. More on this later.

  2. bSSFP utilizes phase-cycling, where the phase of the RF pulse is incremented after each TR.

Therefore, our goal is to write a Julia function (that we will call bssfp) that computes the steady-state signal given by a bSSFP scan that uses phase-cycling.

Throughout this post, we will be editing a text file called bssfp.jl that will load BlochSim.jl and define our bssfp function.

Function Outline

We will begin with an outline of what our bssfp function needs to do.

# bssfp.jlusing BlochSim# Step 0: Specify the inputs to the function.function bssfp(inputs)    # Step 1: Create a `Spin` object.    # Step 2: Create the RF pulse.    # Step 3: Compute the steady-state signal.end

First, we need to specify what inputs bssfp needs. We need values for M0, T1, T2, and off-resonance frequency \( \Delta f \) to create a Spin object, a flip angle \( \alpha \) to create the RF pulse, and the TR, TE, and phase-cycling factor \( \Delta\phi \) to use. We will also add an input that specifies how to compute the steady-state magnetization.

# bssfp.jlusing BlochSimfunction bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)    # Step 1: Create a `Spin` object.    # Step 2: Create the RF pulse.    # Step 3: Compute the steady-state signal.end

Here we see that Julia allows using Greek letters in variable names, which helps the code look more like the math, enhancing readability. (To type, e.g., , type \alpha and then press TAB, and similarly for other Greek letters.)

We also made manual_ss an optional argument. If the user does not specify manual_ss, it will default to false.

Now we can focus on writing code for each step outlined above.

Step 1: Create a Spin Object

As discussed in our previous post, a Spin object is created as follows.

spin = Spin(M0, T1, T2, f)

That’s it!

So let’s update bssfp.

# bssfp.jlusing BlochSimfunction bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)    # Step 1: Create a `Spin` object.    spin = Spin(M0, T1, T2, f)    # Step 2: Create the RF pulse.    # Step 3: Compute the steady-state signal.end

Step 2: Create the RF Pulse

In this simulation, we will assume the excitation pulse is very short relative to relaxation and off-resonance effects. In this case, we will use an InstantaneousRF object for the RF pulse that will simulate excitation by just rotating the magnetization vector.

rf = InstantaneousRF()

Let’s add that to bssfp.

# bssfp.jlusing BlochSimfunction bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)    # Step 1: Create a `Spin` object.    spin = Spin(M0, T1, T2, f)    # Step 2: Create the RF pulse.    rf = InstantaneousRF()    # Step 3: Compute the steady-state signal.end

Step 3: Compute the Steady-State Signal

Now we are ready to tackle the core of the simulation. As mentioned earlier, there are two ways to go about computing the steady-state signal:

  1. we can manually simulate multiple TRs, or

  2. we can mathematically compute the steady-state magnetization.

We will discuss how to do both.

Version 1: Simulate Multiple TRs

The idea behind simulating multiple TRs is to simulate for long enough so that the magnetization reaches a steady-state. A good rule of thumb is to simulate for a few T1’s worth of time. Thus, the number of TRs we will simulate is 3T1 / TR, rounded up so we get an integer value.

nTR = ceil(Int, 3 * T1 / TR)

(Of course, the longer we simulate the closer to the true steady-state we will get.)

Then for each TR, we have excitation followed by free precession. But we also need to remember to increment the phase of the RF pulse. That means we will need to create a new InstantaneousRF object for each excitation. In this case, we will pass in the flip angle as we did previously, but we will also pass in the phase of the RF pulse (which will be the phase of the previous pulse plus the phase-cycling increment).

for i = 1:nTR    excite!(spin, rf)    freeprecess!(spin, TR)    rf = InstantaneousRF(, rf. + )end

Now we have the magnetization at the end of the TR, but we want the signal at the echo time. So we need to excite one more time, and then free precess until the echo time.

excite!(spin, rf)freeprecess!(spin, TE)

Then the steady-state signal can be obtained. One detail to remember, however, is that when doing phase-cycling, the phase of the receiver coil is matched to the phase of the RF pulse. To simulate this, we will multiply our signal by a phase factor.

sig = signal(spin) * cis(-rf.)

(cis() is an efficient way to compute \( e^{i \theta} \) .)

And there we have our steady-state signal!

Let’s update bssfp.

# bssfp.jlusing BlochSimfunction bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)    # Step 1: Create a `Spin` object.    spin = Spin(M0, T1, T2, f)    # Step 2: Create the RF pulse.    rf = InstantaneousRF()    # Step 3: Compute the steady-state signal.    if manual_ss        # Version 1: Simulate multiple TRs.        nTR = ceil(Int, 3 * T1 / TR)        for i = 1:nTR            excite!(spin, rf)            freeprecess!(spin, TR)            rf = InstantaneousRF(, rf. + )        end        excite!(spin, rf)        freeprecess!(spin, TE)        sig = signal(spin) * cis(-rf.)    else        # Version 2: Mathematically compute the steady-state.        # This version is used by default if `manual_ss` is not provided.    end    return sigend

Now let’s see how to compute the steady-state signal a different way.

Version 2: Mathematically Compute the Steady-State

In steady-state, a magnetization vector immediately after excitation (call it \( \mathbf{M}^{+} \) ) is the same as the one immediately after the previous excitation (call it \( \mathbf{M} \) ). Let \( \mathbf{A} \) and \( \mathbf{B} \) be such that \( \mathbf{A} \mathbf{M} + \mathbf{B} \) applies free precession to the magnetization vector, and let \( \mathbf{R} \) be the rotation matrix that represents the instantaneous excitation. Then in steady-state we have

\[\mathbf{M}^{+}=\mathbf{R}(\mathbf{A}\mathbf{M}+\mathbf{B})=\mathbf{M}.\]

Solving for \( \mathbf{M} \) gives us

\[\mathbf{M}=(\mathbf{I}\mathbf{R}\mathbf{A})^{1}\mathbf{R}\mathbf{B},\]

where \( \mathbf{I} \) is the identity matrix.

This derivation assumes a constant RF phase, so we need to account for phase-cycling somehow. Phase-cycling in bSSFP shifts the off-resonance profile, so it will suffice to use a Spin object with a suitably chosen off-resonance frequency to simulate phase-cycling.

Let’s use this derivation in our code.

spin_pc = Spin(M0, T1, T2, f - ( / 2 / (TR / 1000)))(R,) = excite(spin_pc, rf)(A, B) = freeprecess(spin_pc, TR)M = (I - R * A) \ (R * B)

Here, I comes from the LinearAlgebra standard library module, so we will need to make sure to load it.

Notice we didn’t modify spin, so we need to make sure to copy the computed steady-state magnetization to spin.

copyto!(spin.M, M)

The above gives us the steady-state magnetization immediately after excitation, so now we need to simulate free precession until the echo time.

freeprecess!(spin, TE)

Then we can get the steady-state signal.

sig = signal(spin)

Updating bssfp gives us the following.

# bssfp.jlusing BlochSim, LinearAlgebrafunction bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)    # Step 1: Create a `Spin` object.    spin = Spin(M0, T1, T2, f)    # Step 2: Create the RF pulse.    rf = InstantaneousRF()    # Step 3: Compute the steady-state signal.    if manual_ss        # Version 1: Simulate multiple TRs.        nTR = ceil(Int, 3 * T1 / TR)        for i = 1:nTR            excite!(spin, rf)            freeprecess!(spin, TR)            rf = InstantaneousRF(, rf. + )        end        excite!(spin, rf)        freeprecess!(spin, TE)        sig = signal(spin) * cis(-rf.)    else        # Version 2: Mathematically compute the steady-state.        # This version is used by default if `manual_ss` is not provided.        spin_pc = Spin(M0, T1, T2, f - ( / 2 / (TR / 1000)))        (R,) = excite(spin_pc, rf)        (A, B) = freeprecess(spin_pc, TR)        M = (I - R * A) \ (R * B)        copyto!(spin.M, M)        freeprecess!(spin, TE)        sig = signal(spin)    end    return sigend

Now bssfp is complete and ready to use!

Using bssfp

To use bssfp, we need to run the bssfp.jl file that loads BlochSim.jl and defines bssfp.

julia> include("bssfp.jl")bssfp (generic function with 2 methods)

First, let’s make sure the two methods for computing the steady-state magnetization give approximately the same result.

julia> (M0, T1, T2, f, , TR, TE, ) = (1, 1000, 80, 0, /6, 5, 2.5, /2)(1, 1000, 80, 0, 0.5235987755982988, 5, 2.5, 1.5707963267948966)julia> sig_manual = bssfp(M0, T1, T2, f, , TR, TE, , true)0.09889476203144602 + 0.09290409266118088imjulia> sig_exact = bssfp(M0, T1, T2, f, , TR, TE, )0.09880282817239999 + 0.09281666742806782imjulia> isapprox(sig_exact, sig_manual; rtol = 0.001)true

The results are less than 0.1% different from each other, indicating the manually simulated steady-state does approach the mathematically computed steady-state, as expected.

Now let’s use bssfp to plot the characteristic bSSFP off-resonance profile. First we compute the steady-state signal for various off-resonance values.

julia> (M0, T1, T2, , TR, TE, , N) = (1, 1000, 80, /6, 5, 2.5, , 801)(1, 1000, 80, 0.5235987755982988, 5, 2.5, , 801)julia> f = range(-2 / (TR / 1000), 2 / (TR / 1000), N)-400.0:1.0:400.0julia> sig = bssfp.(M0, T1, T2, f, , TR, TE, );

Note the dot (.) when calling bssfp. This essentially is shorthand for

sig = zeros(ComplexF64, length(f))for i = 1:length(f)    sig[i] = bssfp(M0, T1, T2, f[i], , TR, TE, )end

(See our blog post on broadcasting for more information.)

And now we can plot the results.

julia> using Plotsjulia> pmag = plot(f, abs.(sig); label = "", title = "bSSFP Off-Resonance Profile", ylabel = "Magnitude", color = 1);julia> pphase = plot(f, angle.(sig); label = "", xlabel = "Off-Resonance (Hz)", ylabel = "Phase (rad)", yticks = ([-, 0, ], ["-", "0", ""]), color = 2);julia> plot(pmag, pphase; layout = (2, 1))

Summary

In this post, we have learned how to use BlochSim.jl to simulate the steady-state signal obtained from a bSSFP scan. We used the bssfp function we wrote to plot the characteristic bSSFP off-resonance profile to demonstrate that our code produces the expected results.

Additional Links

MRI Scan Simulation Made Easy with BlochSim.jl

By: Steven Whitaker

Re-posted from: https://blog.glcs.io/mri-scan-simulation-made-easy-with-blochsimjl

MRI scans manipulate magnetization vectors to produce an observable signal.

Often it can be useful to simulate what signal an MRI scan will generate. Example scenarios include simulating MRI signals for educational purposes, for developing a new MRI scan, or for fitting model parameters to an acquired signal.

In this post, we will learn how to use the Julia programming language, a free and open source programming language that excels especially in scientific computing, to simulate the signal generated by an MRI scan. Specifically, we will use the BlochSim.jl package to simulate a balanced steady-state free precession (bSSFP) scan.

Note that this post assumes a basic understanding of MRI pulse sequences and Bloch simulations. See our previous posts for an overview of the Bloch equations and an overview of BlochSim.jl.

Prerequisites

To run the code in this post, both Julia and BlochSim.jl need to be installed. See our previous post for installation instructions.

bSSFP

First, we need to review bSSFP so we know how to simulate it.

Here is a pulse sequence diagram of bSSFP (source).

For this post, we are not simulating signal localization, so we will ignore the imaging gradients.

Thus, for each TR we will simulate

  1. excitation,

  2. free precession until the echo time, and

  3. free precession until the end of the TR.

This, by itself, would be essentially the same simulation as done in our previous post. However, two aspects of bSSFP make simulating it a bit more interesting.

  1. bSSFP, as the name states, is a steady-state scan. This means we don’t simulate just one excitation followed by free precession to the echo time. Instead, we need to simulate multiple TRs until we get to steady-state. We can do this either by manually simulating multiple TRs or by mathematically computing the steady-state magnetization. More on this later.

  2. bSSFP utilizes phase-cycling, where the phase of the RF pulse is incremented after each TR.

Therefore, our goal is to write a Julia function (that we will call bssfp) that computes the steady-state signal given by a bSSFP scan that uses phase-cycling.

Throughout this post, we will be editing a text file called bssfp.jl that will load BlochSim.jl and define our bssfp function.

Function Outline

We will begin with an outline of what our bssfp function needs to do.

# bssfp.jl

using BlochSim

# Step 0: Specify the inputs to the function.
function bssfp(inputs)

    # Step 1: Create a `Spin` object.
    # Step 2: Create the RF pulse.
    # Step 3: Compute the steady-state signal.

end

First, we need to specify what inputs bssfp needs. We need values for M0, T1, T2, and off-resonance frequency \( \Delta f \) to create a Spin object, a flip angle \( \alpha \) to create the RF pulse, and the TR, TE, and phase-cycling factor \( \Delta\phi \) to use. We will also add an input that specifies how to compute the steady-state magnetization.

# bssfp.jl

using BlochSim

function bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)

    # Step 1: Create a `Spin` object.
    # Step 2: Create the RF pulse.
    # Step 3: Compute the steady-state signal.

end

Here we see that Julia allows using Greek letters in variable names, which helps the code look more like the math, enhancing readability. (To type, e.g., , type \alpha and then press TAB, and similarly for other Greek letters.)

We also made manual_ss an optional argument. If the user does not specify manual_ss, it will default to false.

Now we can focus on writing code for each step outlined above.

Step 1: Create a Spin Object

As discussed in our previous post, a Spin object is created as follows.

spin = Spin(M0, T1, T2, f)

That’s it!

So let’s update bssfp.

# bssfp.jl

using BlochSim

function bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)

    # Step 1: Create a `Spin` object.
    spin = Spin(M0, T1, T2, f)

    # Step 2: Create the RF pulse.
    # Step 3: Compute the steady-state signal.

end

Step 2: Create the RF Pulse

In this simulation, we will assume the excitation pulse is very short relative to relaxation and off-resonance effects. In this case, we will use an InstantaneousRF object for the RF pulse that will simulate excitation by just rotating the magnetization vector.

rf = InstantaneousRF()

Let’s add that to bssfp.

# bssfp.jl

using BlochSim

function bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)

    # Step 1: Create a `Spin` object.
    spin = Spin(M0, T1, T2, f)

    # Step 2: Create the RF pulse.
    rf = InstantaneousRF()

    # Step 3: Compute the steady-state signal.

end

Step 3: Compute the Steady-State Signal

Now we are ready to tackle the core of the simulation. As mentioned earlier, there are two ways to go about computing the steady-state signal:

  1. we can manually simulate multiple TRs, or

  2. we can mathematically compute the steady-state magnetization.

We will discuss how to do both.

Version 1: Simulate Multiple TRs

The idea behind simulating multiple TRs is to simulate for long enough so that the magnetization reaches a steady-state. A good rule of thumb is to simulate for a few T1’s worth of time. Thus, the number of TRs we will simulate is 3T1 / TR, rounded up so we get an integer value.

nTR = ceil(Int, 3 * T1 / TR)

(Of course, the longer we simulate the closer to the true steady-state we will get.)

Then for each TR, we have excitation followed by free precession. But we also need to remember to increment the phase of the RF pulse. That means we will need to create a new InstantaneousRF object for each excitation. In this case, we will pass in the flip angle as we did previously, but we will also pass in the phase of the RF pulse (which will be the phase of the previous pulse plus the phase-cycling increment).

for i = 1:nTR
    excite!(spin, rf)
    freeprecess!(spin, TR)
    rf = InstantaneousRF(, rf. + )
end

Now we have the magnetization at the end of the TR, but we want the signal at the echo time. So we need to excite one more time, and then free precess until the echo time.

excite!(spin, rf)
freeprecess!(spin, TE)

Then the steady-state signal can be obtained. One detail to remember, however, is that when doing phase-cycling, the phase of the receiver coil is matched to the phase of the RF pulse. To simulate this, we will multiply our signal by a phase factor.

sig = signal(spin) * cis(-rf.)

(cis() is an efficient way to compute \( e^{i \theta} \) .)

And there we have our steady-state signal!

Let’s update bssfp.

# bssfp.jl

using BlochSim

function bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)

    # Step 1: Create a `Spin` object.
    spin = Spin(M0, T1, T2, f)

    # Step 2: Create the RF pulse.
    rf = InstantaneousRF()

    # Step 3: Compute the steady-state signal.
    if manual_ss
        # Version 1: Simulate multiple TRs.
        nTR = ceil(Int, 3 * T1 / TR)
        for i = 1:nTR
            excite!(spin, rf)
            freeprecess!(spin, TR)
            rf = InstantaneousRF(, rf. + )
        end
        excite!(spin, rf)
        freeprecess!(spin, TE)
        sig = signal(spin) * cis(-rf.)
    else
        # Version 2: Mathematically compute the steady-state.
        # This version is used by default if `manual_ss` is not provided.
    end

    return sig

end

Now let’s see how to compute the steady-state signal a different way.

Version 2: Mathematically Compute the Steady-State

In steady-state, a magnetization vector immediately after excitation (call it \( \mathbf{M}^{+} \) ) is the same as the one immediately after the previous excitation (call it \( \mathbf{M} \) ). Let \( \mathbf{A} \) and \( \mathbf{B} \) be such that \( \mathbf{A} \mathbf{M} + \mathbf{B} \) applies free precession to the magnetization vector, and let \( \mathbf{R} \) be the rotation matrix that represents the instantaneous excitation. Then in steady-state we have

\[
\mathbf{M}^{+}=\mathbf{R}(\mathbf{A}\mathbf{M}+\mathbf{B})=\mathbf{M}.
\]

Solving for \( \mathbf{M} \) gives us

\[
\mathbf{M}=(\mathbf{I}\mathbf{R}\mathbf{A})^{1}\mathbf{R}\mathbf{B},
\]

where \( \mathbf{I} \) is the identity matrix.

This derivation assumes a constant RF phase, so we need to account for phase-cycling somehow. Phase-cycling in bSSFP shifts the off-resonance profile, so it will suffice to use a Spin object with a suitably chosen off-resonance frequency to simulate phase-cycling.

Let’s use this derivation in our code.

spin_pc = Spin(M0, T1, T2, f - ( / 2 / (TR / 1000)))
(R,) = excite(spin_pc, rf)
(A, B) = freeprecess(spin_pc, TR)
M = (I - R * A) \ (R * B)

Here, I comes from the LinearAlgebra standard library module, so we will need to make sure to load it.

Notice we didn’t modify spin, so we need to make sure to copy the computed steady-state magnetization to spin.

copyto!(spin.M, M)

The above gives us the steady-state magnetization immediately after excitation, so now we need to simulate free precession until the echo time.

freeprecess!(spin, TE)

Then we can get the steady-state signal.

sig = signal(spin)

Updating bssfp gives us the following.

# bssfp.jl

using BlochSim, LinearAlgebra

function bssfp(M0, T1, T2, f, , TR, TE, , manual_ss = false)

    # Step 1: Create a `Spin` object.
    spin = Spin(M0, T1, T2, f)

    # Step 2: Create the RF pulse.
    rf = InstantaneousRF()

    # Step 3: Compute the steady-state signal.
    if manual_ss
        # Version 1: Simulate multiple TRs.
        nTR = ceil(Int, 3 * T1 / TR)
        for i = 1:nTR
            excite!(spin, rf)
            freeprecess!(spin, TR)
            rf = InstantaneousRF(, rf. + )
        end
        excite!(spin, rf)
        freeprecess!(spin, TE)
        sig = signal(spin) * cis(-rf.)
    else
        # Version 2: Mathematically compute the steady-state.
        # This version is used by default if `manual_ss` is not provided.
        spin_pc = Spin(M0, T1, T2, f - ( / 2 / (TR / 1000)))
        (R,) = excite(spin_pc, rf)
        (A, B) = freeprecess(spin_pc, TR)
        M = (I - R * A) \ (R * B)
        copyto!(spin.M, M)
        freeprecess!(spin, TE)
        sig = signal(spin)
    end

    return sig

end

Now bssfp is complete and ready to use!

Using bssfp

To use bssfp, we need to run the bssfp.jl file that loads BlochSim.jl and defines bssfp.

julia> include("bssfp.jl")
bssfp (generic function with 2 methods)

First, let’s make sure the two methods for computing the steady-state magnetization give approximately the same result.

julia> (M0, T1, T2, f, , TR, TE, ) = (1, 1000, 80, 0, /6, 5, 2.5, /2)
(1, 1000, 80, 0, 0.5235987755982988, 5, 2.5, 1.5707963267948966)

julia> sig_manual = bssfp(M0, T1, T2, f, , TR, TE, , true)
0.09889476203144602 + 0.09290409266118088im

julia> sig_exact = bssfp(M0, T1, T2, f, , TR, TE, )
0.09880282817239999 + 0.09281666742806782im

julia> isapprox(sig_exact, sig_manual; rtol = 0.001)
true

The results are less than 0.1% different from each other, indicating the manually simulated steady-state does approach the mathematically computed steady-state, as expected.

Now let’s use bssfp to plot the characteristic bSSFP off-resonance profile. First we compute the steady-state signal for various off-resonance values.

julia> (M0, T1, T2, , TR, TE, , N) = (1, 1000, 80, /6, 5, 2.5, , 801)
(1, 1000, 80, 0.5235987755982988, 5, 2.5, , 801)

julia> f = range(-2 / (TR / 1000), 2 / (TR / 1000), N)
-400.0:1.0:400.0

julia> sig = bssfp.(M0, T1, T2, f, , TR, TE, );

Note the dot (.) when calling bssfp. This essentially is shorthand for

sig = zeros(ComplexF64, length(f))
for i = 1:length(f)
    sig[i] = bssfp(M0, T1, T2, f[i], , TR, TE, )
end

(See our blog post on broadcasting for more information.)

And now we can plot the results.

julia> using Plots

julia> pmag = plot(f, abs.(sig); label = "", title = "bSSFP Off-Resonance Profile", ylabel = "Magnitude", color = 1);

julia> pphase = plot(f, angle.(sig); label = "", xlabel = "Off-Resonance (Hz)", ylabel = "Phase (rad)", yticks = ([-, 0, ], ["-", "0", ""]), color = 2);

julia> plot(pmag, pphase; layout = (2, 1))

Summary

In this post, we have learned how to use BlochSim.jl to simulate the steady-state signal obtained from a bSSFP scan. We used the bssfp function we wrote to plot the characteristic bSSFP off-resonance profile to demonstrate that our code produces the expected results.

Additional Links

How does Tables.jl handle schema-less tables?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/09/01/tables.html

Introduction

Tables.jl is a fundamental package in the JuliaData ecosystem.

One of the key concepts used in Tables.jl is a schema of a table.
Schema is information about the column names and types of a table.
Having access to schema is useful as being able to query these properties
is constantly needed in practice.

In this post I want to discuss in what cases a table might not have
a schema information associated with it and how Tables.jl handles this case.

The post was written using Julia 1.9.2, Tables.jl 1.10.1, and DataFrames.jl 1.6.1.

What is a schema of a table?

Let us create some example tables and investigate their schema:

julia> df = DataFrame(a=[1, 2], b=[3.0, 4.0], c=[1, "a"])
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> Tables.schema(df)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> ntv = Tables.columntable(df)
(a = [1, 2], b = [3.0, 4.0], c = Any[1, "a"])

julia> Tables.schema(ntv)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> vnt = Tables.rowtable(df)
2-element Vector{NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}}:
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((1, 3.0, 1))
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((2, 4.0, "a"))

julia> Tables.schema(vnt)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

I have created a data frame df with three columns and then converted it into
a named tuple of vectors ntv and a vector of named tuples vnt.
All these three objects are Tables.jl tables.

In all cases we have checked that the Tables.schema properly identifies the schema
of the se tables. They have three columns :a, :b, and :c, and these columns
have types :Int64, Float64, and Any.

So in what case a table might not have a schema? To understand this consider the
following transformation:

julia> vnt2 = [(; collect(pairs(r))...) for r in vnt]
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(vnt2)

julia> isnothing(Tables.schema(vnt2))
true

Let us understand what happens in this code. With
(; collect(pairs(r))...) operation I make the type of the fields
of the named tuples stored in the vnt vector concrete.
To understand this better note:

julia> typeof(vnt[1])
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}

julia> typeof((; collect(pairs(vnt[1]))...))
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Int64}}

Note that the type of the :c field is Any and Int64 respectively.

This transformation means that the vnt2 vector itself does not, in consequence,
have a concrete element type:

julia> eltype(vnt2)
NamedTuple{(:a, :b, :c)}

This type only specifies column names, but not their types. In such a situation
the Tables.schema function returns nothing signaling to the user that the schema
of a table is unknown.

You might think that this is an artificial example but consider the following common case:

julia> vnt3 = [(a=1, b=2), (a=missing, b=3)]
2-element Vector{NamedTuple{(:a, :b)}}:
 (a = 1, b = 2)
 (a = missing, b = 3)

julia> Tables.schema(vnt3)

julia> isnothing(Tables.schema(vnt3))
true

As you can see when we have a vector of named tuples, a mix in of missing values
makes it schema-less.

How does Tables.jl handle schema-less tables? Part 1

If Tables.jl detects a schema-less table it tries to dynamically detect its schema
when functions defined in Tables.jl are called. The three key functions here are:

  • Tables.columns: returns an object that can be queried column-wise;
  • Tables.rows: returns an object that can be queried row-wise;
  • Tables.dictrowtable: returns a row-wise vector that performs “column unioning”.

You might wonder what this “column unioning” part means. We will get to it in a second.

First let us investigate how these functions work on our vnt2 object that is schema-less:

julia> Tables.columns(vnt2)
Tables.CopiedColumns{NamedTuple{(:a, :b, :c), Tuple{Vector{Int64}, Vector{Float64}, Vector{Any}}}} with 2 rows, 3 columns, and schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.schema(Tables.columns(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.rows(vnt2)
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(Tables.rows(vnt2))

julia> isnothing(Tables.schema(Tables.rows(vnt2)))
true

julia> Tables.dictrowtable(vnt2)
Tables.DictRowTable([:a, :b, :c], Dict{Symbol, Type}(:a => Int64, :b => Float64, :c => Union{Int64, String}), Dict{Symbol, Any}[Dict(:a => 1, :b => 3.0, :c => 1), Dict(:a => 2, :b => 4.0, :c => "a")])

julia> Tables.schema(Tables.dictrowtable(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Union{Int64, String}

As you can see Tables.columns performed eltype detection and determined the type of column :c as Any.
The Tables.rows produces a schema-less object as for vnt2 object calling Tables.rows on it just returns the input.
Finally Tables.dictrowtable performs a narrower eltype for column :c which is Union{Int64, String}.
All these results are technically correct, but it is important to note that they start to matter when we create
a data frame from the result:

julia> DataFrame(Tables.columns(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(vnt2)
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(Tables.dictrowtable(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Union…
─────┼────────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

In this case for Tables.columns and Tables.rows we have Any as element type for :c.
While for Tables.dictrowtable we get the Union. This difference might matter to you
if you wanted to later change data stored in :c column.

However, the key difference between Tables.dictrowtable and other methods is visible
in case when we have heterogeneous column list data.

How does Tables.jl handle schema-less tables? Part 2

What is the data that has a heterogeneous column list?
It is relatively common in practice. Let me create two examples:

julia> h1 = [(; a=1), (; a=2, b=3)]
2-element Vector{NamedTuple}:
 (a = 1,)
 (a = 2, b = 3)

julia> h2 = [(; a=1), (; b=3)]
2-element Vector{NamedTuple{names, Tuple{Int64}} where names}:
 (a = 1,)
 (b = 3,)

Both h1 and h2 have a different set of columns. How does the
Tables.jl handle them? Let us check by creating a data frame from them:

julia> DataFrame(Tables.columns(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.rows(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.dictrowtable(h1))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼────────────────
   1 │     1  missing
   2 │     2        3

For the h1 input we see that using Tables.columns and Tables.rows drops the :b
column while Tables.dictrowtable keeps it and fills missing entries with missing.
This behavior is called “column unioning”.

Now let us check the h2 case:

julia> DataFrame(Tables.columns(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.rows(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.dictrowtable(h2))
2×2 DataFrame
 Row │ a        b
     │ Int64?   Int64?
─────┼──────────────────
   1 │       1  missing
   2 │ missing        3

Now we have an even stranger behavior. The first two calls error, while the
Tables.dictrowtable works by “column unioning”. Why do the first two cases error?
The reason is that DataFrame constructor internally always calls Tables.columns
on a passed table. And Tables.columns when doing column detection assumes that
the list of columns for row-wise table is given by its first row. So even
without calling DataFrame we get an error when we do:

julia> Tables.columns(h2)
ERROR: type NamedTuple has no field a

In summary we have two cases of behaviors for row-wise tables:

  • assume the columns are given in the first row of a table;
    this is what Tables.columns does, and, in consequence DataFrame
    constructor;
  • perform “column unioning”; this is what Tables.dictrowtable does.

Conclusions

The take-away from the examples we have given today are as follows:

  • by default DataFrame(row_wise_table) inherits the Tables.columns behavior and assumes
    that the set of columns of the passed row-wise table is given by its first row;
  • if you want to override the default behavior, and want to perform “column unioning” instead
    call DataFrame(Tables.dictrowtable(row_wise_table)); this option is useful if you have
    data that has heterogeneous column lists in different rows.

The examples given today were a bit low-level, but I hope they improved your understanding
how the Tables.jl functions handle different tabular data sources.