Does DataFrames.jl copy or not copy, that is the question

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/09/15/copying.html

Introduction

Some time ago I have written a post about my thoughts on copying of data when working with it in Julia.

Today I want to focus on a related, but more narrow topic related to DataFrames.jl.
People starting to work with this package are sometimes confused when columns
get copied and when they are not copied. I want to discuss the most common cases in this post.

Spoiler! The post is a bit long. If you want a simple advice – you can skip to the section with conclusions.

The post was written using Julia 1.9.2 and DataFrames.jl 1.6.1.

Getting a column from a data frame

Let us start with a simpler case. When does copying happen if we get a column form a data frame?

First we set up some initial data:

julia> using DataFrames

julia> df = DataFrame(a=1:10^6)
1000000×1 DataFrame
     Row │ a
         │ Int64
─────────┼─────────
       1 │       1
       2 │       2
       3 │       3
       4 │       4
       5 │       5
    ⋮    │    ⋮
  999997 │  999997
  999998 │  999998
  999999 │  999999
 1000000 │ 1000000
999991 rows omitted

There are three ways to get the :a column from this data frame: df.a, df[:, :a] and df[!, :a].
Let us check them one by one. Start with df.a:

julia> df.a
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> @allocated df.a
0

df.a extracts the column without copying data. You can see it by the fact that there are no allocations performed in this operation.

Now check df[:, :a], which uses a standard row index : that is also used in arrays:

julia> df[:, :a]
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> @allocated df[:, :a]
8000048

df[:, :a] copies data, we see a lot of memory allocated this time. This is an identical behavior to how : works for arrays.

Finally check df[!, :a], which uses a non-standard ! row index:

julia> df[!, :a]
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> @allocated df[!, :a]
0

We can see that df[!, :a] does not allocate. It is equivalent to df.a, just with a bit different syntax
(the indexing syntax with ! is handy if we wanted to select multiple columns from a data frame, which is not possible with df.a syntax).

This part was relatively easy. Now let us turn to a harder case of setting a column of a data frame.

Case 1: setting a column in a data frame using assignment

First store the :a column in a temporary variable a (without copying it):

julia> a = df.a
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

Now let us check various options of creation of a column that will store a.
Begin with creating of a new column.

julia> df.b = a
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> df.b === a
true

We can see that if we put df.b on the left hand side the operation does not copy the passed data.
You probably already can guess that the same happens with df[!, :c] on left hand side. Indeed
it is the case:

julia> df[!, :c] = a
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> df.c === a
true

What about df[:, :d]? Let us see:

julia> df[:, :d] = a
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

julia> df.d === a
false

So we see a first difference. When creating a new column the data was copied.
But what would happen if some column already existed in a data frame?

Well for df.b and df[!, :c] syntaxes nothing would change, as they just put
a right hand side vector into a data frame without copying it.
But for df[:, :d] the situation is different. Let us check:

julia> d = df.d;

julia> df[:, :d] = a;

julia> df.d === a
false

julia> df.d === d
true

We can see that if we use the df[:, :d] syntax on left hand side the operation is in-place,
that is the vector already present in df is reused and the data is stored in a column
already present in a data frame. This means that we cannot use df[:, :d] = ... to change
element type of column :d. Let us see:

julia> df[:, :d] = a .+ 0.5;
ERROR: InexactError: Int64(1.5)

Indeed a .+ 0.5 contains floating point values, and the :d column allowed only integers.
Note that with df.b = ... or df[!, :c] = ... we would not have this issue as they
replace columns with what is passed on a right hand side:

julia> df.b = a .+ 0.5
1000000-element Vector{Float64}:
      1.5
      2.5
      3.5
      4.5
      5.5
      6.5
      7.5
      ⋮
 999995.5
 999996.5
 999997.5
 999998.5
 999999.5
      1.0000005e6

There is one more twist to this story. It is related to ranges.
The issue is that DataFrame object always materializes ranges
stored in it.
Therefore the following operation allocates data:

julia> df.b = 1:10^6
1:1000000

julia> df.b
1000000-element Vector{Int64}:
       1
       2
       3
       4
       5
       6
       7
       ⋮
  999995
  999996
  999997
  999998
  999999
 1000000

The issue is that generally df.b = ... does not allocate, but since we disallow storing
ranges as columns of a data frame (in our case the 1:10^6 range) the allocation still takes place.
You would have the same behavior with df[!, :c] = 1:10^6.

Case 2: setting a column in a data frame using broadcasted assignment

Julia is famous for its powerful broadcasting capabilities. Let us thus investigate what happens when we
replace = with .= in our experiments. We will reproduce all the examples we gave above from scratch.

Start with df.b .= a:

julia> df = DataFrame(a=1:10^6);

julia> a = df.a;

julia> df.b .= a;

julia> df.b === a
false

We now see a difference. The :b column is freshly allocated.

Let us check the two other options of creation of a new column:

julia> df[!, :c] .= a;

julia> df.c === a
false

julia> df[:, :d] .= a;

julia> df.d === a
false

They have the same effect: a new column gets allocated.

In the case of an existing column df.b .= ... and df[!, :c] .= ...
would again create a new copied column:

julia> df.b .= a .+ 0.5
1000000-element Vector{Float64}:
      1.5
      2.5
      3.5
      4.5
      5.5
      6.5
      7.5
      ⋮
 999995.5
 999996.5
 999997.5
 999998.5
 999999.5
      1.0000005e6

The difference is with df[:, :d] .= ...:

julia> d = df.d;

julia> df[:, :d] .= a;

julia> df.d === a
false

julia> df.d === d
true

julia> df[:, :d] .= a .+ 0.5
ERROR: InexactError: Int64(1.5)

So we see that we have here an in-place operation just like with df[:, :d] = ....

Conclusions

As a summary let me discuss a common anti-pattern:

df.a = df.b

Given the examples I presented we know that after this operation the :a and :b columns
of the df data frame are aliased, i.e. df.a === df.b produces true. Usually this is not
a desired situation as many operations assume that columns of a data frame do not share memory.

Fortunately, we also already learnt an easy fix to the aliasing problem. You can just write:

df.a .= df.b

To get a copy of :b stored in column :a.

I hope the examples I gave in my post today will be useful for your work with DataFrames.jl.

RAW photo library automation with Julia

By: jkrumbiegel.com

Re-posted from: https://jkrumbiegel.com/pages/2023-09-12-capture-one-photos-sqlite/index.html

In this post I describe how I use Julia to automatically synchronize my Capture One raw photo catalog to my iCloud via Apple Photos, so that I can view and share the jpegs from my iPhone at any time with the same interface as my iPhone photos. The official AppleScript interfaces are not powerful enough to do what I need. My solution is accessing the SQLite databases of Capture One and Apple Photos directly and doing some simple data wrangling which Julia is perfectly suited for.

Installing Julia and VS Code – A Comprehensive Guide

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/install-julia-and-vscode

Julia is a relatively new,free, and open-source programming language.It has a syntaxsimilar to that of other popular programming languagessuch as MATLAB and Python,but it boasts being able to achieve C-like speeds.

One popular IDE to use with Juliais Visual Studio Code,or VS Code.

In this post,we will learn how to install Julia and VS Code.We will also learn how to configure VS Codeto get it to work with Julia.

The instructions for installing Julia and VS Codevary by operating system.Please use the table of contentsto jump to the instructionsthat are relevant to you.Note that the instructions for configuring VS Codeare the same for all operating systems.

Windows

Installing Julia

  1. Go to https://julialang.org/downloads.
  2. In the table under the heading”Current stable release”,go to the row labeled “Windows”and click the “64-bit (installer)” link.(Of course, click the 32-bit linkif you have a 32-bit machine.)

    Downloading Julia installer on Windows

    (See Installing a Specific Julia Versionfor details on installing an older Julia version.)

  3. Run the installer.
  4. Choose where to install Julia.(The default location works wellunless you have a specific needto install Julia somewhere else.)Then click “Next”.

    Selecting Julia installation directory

  5. Specify some additional options.

    • “Create a Desktop shortcut”:Allows you to run Juliaby clicking the shortcuton your Desktop.
    • “Create a Start Menu entry”:Allows you to pin Julia to your Start Menu.Also allows you to easily search for Juliafrom the Start Menu.
    • “Add Julia to PATH”:(Recommended)Allows you to run Juliafrom the command linewithout specifying the full pathto the Julia executable.Also enables easier integrationwith VS Code.

    Then click “Next”.

    Specifying additional options

  6. Installation is complete!Click “Finish” to exit the installer.

    Installation is complete

Running Julia

There are a few different waysto run Julia on Windows.

  • If you created a Desktop shortcut,you can double-click the shortcutto start Julia.

    Julia Desktop icon in Windows

  • If you created a Start Menu entry,you can run Julia from the Start Menu.First search for “julia”.

    Start menu search bar

    And then click on the appropriate resultto start Julia.

    Start menu Julia search results

  • You can also run Juliafrom the command line(Command Prompt or PowerShell).If Julia was added to PATH,you can run
    C:\Users\user> julia

    Otherwise,you can specify the full (or relative) path.

    C:\Users\user> .\AppData\Local\Programs\Julia-1.9.3\bin\julia.exe

After starting Juliayou will be greeted witha fresh Julia prompt.

Fresh Julia prompt on Windows

Now you know how to install and run Julia on Windows!

Installing VS Code

  1. Go to https://code.visualstudio.com.
  2. Click “Download for Windows”.

    Downloading VS Code installer on Windows

  3. Run the installer.
  4. Click “I accept the agreement”and then click “Next”.

    VS Code license agreement on Windows

  5. Choose where to install VS Code.(The default location works wellunless you have a specific needto install VS Code somewhere else.)Then click “Next”.

    Selecting VS Code installation directory

  6. Specify the Start Menu folder for VS Code.(The default folder is fine.)Then click “Next”.

    Choosing VS Code start menu folder

  7. Specify additional options if desired.(The defaults are fine,but if you want a Desktop shortcutbe sure to select that option.)Then click “Next”.

    Specifying additional options

  8. Click “Install”.

    Confirming installation options for VS Code in Windows

  9. Installation is complete!Click “Finish” to exit the installer.

    Installation is complete

Running VS Code

You can run VS Codeusing any of the methods described abovefor running Julia:

  • using the Desktop shortcut.

    VS Code Desktop icon in Windows

  • using the Start Menu.

    Start menu VS Code search results

  • using the command line.
    C:\Users\user> code

Now you know how to install and run VS Code on Windows!

Jump to Configuring VS Code for Juliato learn how to configure VS Code.

macOS

Installing Julia

  1. Go to https://julialang.org/downloads.
  2. In the table under the heading”Current stable release”,go to the rows labeled “MacOS”and click the “64-bit (.dmg)” link for either Apple Siliconor the Intel or Rosetta link for older machines or special scenarios.

    Downloading Julia .dmg on macOS

    (See Installing a Specific Julia Versionfor details on installing an older Julia version.)

  3. Open the downloads folder and double-click on the Julia Disk Image you just downloaded.

    Open Julia .dmg on macOS

  4. In the new window, drag Julia into the Applications folder.

    Drag Julia

  5. Allow Julia to be copied with your credentials or Touch ID.

    Allow Julia

  6. Eject the disk image, either with the eject buttonor with right-click and selecting “Eject”.

    Eject Julia Disk Image

  7. We recommend you follow these final instructionsto make it easier to open Julia on your Mac.Open a new terminal window and follow these instructions:

    sudo mkdir -p /usr/local/binsudo rm -f /usr/local/bin/juliasudo ln -s /Applications/Julia-1.9.app/Contents/Resources/julia/bin/julia /usr/local/bin/julia

    Original source.

Running Julia

You can start Julia through your normal means of opening applicationsor through typing julia into your terminal (if you did the last step above).See Apple’s support if you need help.

After starting Juliayou will be greeted witha fresh Julia prompt.

Fresh Julia prompt on macOS

Now you know how to install and run Julia on macOS!

Installing VS Code

  1. Go to https://code.visualstudio.com.
  2. Click “Download Mac Universal”.

    Downloading VS Code app on macOS

  3. Open the downloads folder and drag VSCode into the Applications folder.

    Drag VSCode

  4. Allow VSCode to be moved with your credentials or Touch ID.

    Allow VSCode

You can start VS Code through your normal means of opening applications.

Now you know how to install and run VS Code on macOS!

Jump to Configuring VS Code for Juliato learn how to configure VS Code.

Linux

Installing Julia

  1. Go to https://julialang.org/downloads.
  2. In the table under the heading”Current stable release”,go to the row labeled “Generic Linux on x86″and click the “64-bit (glibc)” link.(This link should work for most Linux systems.If you have a different computer architecture,you probably already know to choose a different download link.)

    Downloading Julia installer on Linux

    (See Installing a Specific Julia Versionfor details on installing an older Julia version.)

  3. Open a terminaland navigate to the directorywhere you want to install Julia.In this post,we will install Juliain ~/programs/julia/.(Note that everything up to and including the $is the terminal prompt,so the actual command to runis everything after the $.)
    ~$ mkdir -p ~/programs/julia~$ cd ~/programs/julia
  4. Move the downloaded archiveto the current directory.
    ~/programs/julia$ mv ~/Downloads/julia-1.9.3-linux-x86_64.tar.gz .
  5. Unarchive the file.
    ~/programs/julia$ tar xzf julia-1.9.3-linux-x86_64.tar.gz
  6. Add Julia to your PATHby adding the followingto your .bashrc or .zshrc file.(Remember to change /home/user/programs/juliato the directory where you installed Julia.)
    export PATH="$PATH:/home/user/programs/julia/julia-1.9.3/bin"

    (Restart the terminal to actually update the PATH.)

Julia is now installed!

Running Julia

You can run Julia from the command line.

$ ~/programs/julia/julia-1.9.3/bin/julia

Or, if you added Julia to your PATH:

$ julia

After starting Juliayou will be greeted witha fresh Julia prompt.

Fresh Julia prompt on Linux

Now you know how to install and run Julia on Linux!

Installing VS Code

The following steps were testedon a computer running Ubuntu.They should work as writtenfor any Debian-based Linux distribution,but modifications may be necessaryfor other Linux distributions.Note that you will need admin privilegesfor one of the steps below.

  1. Go to https://code.visualstudio.com.
  2. Click ” .deb”.

    Downloading VS Code installer on Linux

  3. (Requires admin privileges)Either open your file managerand double-click the downloaded file,or install VS Codevia the command line.(Remember to change the file path/name as needed.)
    $ sudo dpkg -i /home/user/Downloads/code_1.81.1-1691620686_amd64.deb

VS Code is now installed!

Running VS Code

You can run VS Code from the command line.

$ code

Now you know how to install and run VS Code on Linux!

Jump to Configuring VS Code for Juliato learn how to configure VS Code.

Installing a Specific Julia Version

If you need a specific version of Julia,you can navigate to the older releases pageand find the appropriate installerfor the version you need.For example,to install Julia 1.0.5 on Windows,you would find “v1.0.5” on the left columnand then click on the download linkto download the installer.

Installing Julia 1.0.5

Otherwise,the installation instructionsshould be basically the same.

Configuring VS Code for Julia

After starting VS Codefor the first timeyou will be greeted with”Get Started with VS Code”.

Get Started with VS Code

Feel free to walk through the options,but for this postwe will ignore themand just click “< Welcome”on the top left of the window.That brings us to the welcome page.

VS Code welcome page

We need to install the Julia extension for VS Code.

  1. Click on the extensions button.

    VS Code extensions button

  2. Search for “julia”.Click “install” on the Julia extension.

    Installing Julia VS Code extension

Now we need to make surethe Julia extension knows where Julia is installed.If you added Julia to PATH,this step can be skipped.However,if you get errors trying to run Julia in VS Codeor if you find the wrong Julia version is being used(if you have multiple versions installed),you can follow these steps.

  1. Open the Julia extension settingsby clicking the settings icon.Click “Extension Settings”.

    Opening Julia extension settings

  2. Search for “executable path”.Then type the path to the Julia executablein the “Julia: Executable Path” extension setting.

    Telling VS Code where the Julia executable is

Now VS Code is ready for Julia development!

Developing Julia Code in VS Code

  1. Click the file icon in the top leftto open the Explorer pane.Then click “Open Folder”.

    Explorer pane in VS Code

  2. Navigate to a folder(or create one)where your Julia code will be saved.In this post,we created a folder called Julia.
  3. Click the “New File” buttonnext to the folder in the Explorer pane.Then type a name for the filewhere you will write Julia code.It should have a .jl extension.In this post,we created a file called script.jl.

    Creating a new file in VS Code

    VS Code will now open a new tabwith a blank fileready for us to edit.

  4. Add the following code to your fileand then save it.
    println("Hello, World!")
  5. To run the code,place your cursor on the line of code to executeand then press Shift+Enter.This command will start a Julia session(if one has not already been started)and then run the code.(The very first time you run codemay take a little whilebecause packages need to precompile.)

    Hello, World from VS Code

  6. Now add more lines of code.

    function plus1(x)    x + 1enda = 3b = plus1(a)b == a + 1
  7. To run all the code in the file,open the command palettewith Ctrl+Shift+Pand search for”Julia: Execute active file in REPL”.Highlight the command and press Enter.

    VS Code command palette

    File execution in VS Code

    Note that two lines of outputwere added to the REPL:the println statementand the value of b == a + 1(the last line that was executed).

  8. Now that some code has run,we can look at the Julia paneon the leftand see the variables and function we definedin the “Workspace” tab.

    Workspace pane in VS Code

  9. We can also use the REPL directly.

    VS Code Julia REPL

And now you know how to write and run Julia codein VS Code!

If you would like to see more tips and tricksfor how to use Julia in VS Code,be sure to comment below!

Summary

In this post,we learned how to install Juliaand VS Codeon different operating systems.We also learned how to configure VS Codefor developing Julia code.

Once you have installed Julia,move on to thenext post to learn about variables and functions!Or,feel free to take a lookat our other Julia tutorial posts!