Resources for Learning the Julia Programming Language

By: Jacob Zelko

Re-posted from: https://jacobzelko.com/10082023195125-julia-learning-resources/index.html

Resources for Learning the Julia Programming Language

Date: October 8 2023

Summary: A non-exhaustive list of recommendations for how I suggest learning Julia to language newcomers

Keywords: #julia #programming #beginners #recommendations #learning #archive #blog

Bibliography

Not Available

Table of Contents

    1. Motivation
    2. Before Programming with Julia, Let's Set It Up
    3. Julia Programming for New Programmers
    4. Quickly Picking Up Julia Programming
    5. What Is a Julian?
    6. Building Up Expertise in Julia Programming
    7. Domain Specific Workflows in Julia
      1. Working with Data
      2. Plotting
    8. Conclusion
  1. How To Cite
  2. References:
  3. Discussion:

Motivation

I saw an interesting post on BlueSky recently that got me thinking about Julia learning resources. I tend to give out a lot of advice about how to go about learning Julia but I realized I have never really centralized one place where I keep that information. This blog post talks about my personal opinions both within the Julia ecosystem and recommendations for how to learn Julia.

Before Programming with Julia, Let's Set It Up

The fantastic initiative, Modern Julia Workflows, spearheaded by Guillaume Dalle and co has a number of sections that can help with getting set-up fast (I'll be referring to their work quite a bit throughout this post). In particular, here are the sections I'd recommend to get set-up fastest:

  1. How to install Julia on your computer

  2. What you need to write Julia. A special note on this from me is that you really do not need much – you could use something like NotePad on Windows, textedit on OSX, or KWrite on *nix systems. I like the stance Dalle takes in recommending VSCode however as this gives you the best mileage whether you are a beginner or expert programmer.

Suggestion 2 here will most likely take you the longest if you have never worked with a text editor before (a piece of software to create and edit most different types of files). So, no worries and enjoy the learning here!

Julia Programming for New Programmers

If you are completely new to programming in general, I'd recommend the course, Julia Programming for Nervous Beginners, by Dr. Henri Laurie. It really eases you through how to start with programming and uses Julia as that learning tool. Otherwise, skip to the next section.

Quickly Picking Up Julia Programming

To pick up Julia programming, I recommend Introduction to Julia (for programmers) by Dr. Jane Herriman. This will get you going with Julia the fastest – especially if you already know some programming.

What Is a Julian?

Before continuing your Julia adventure, it is worth a pause to discuss a couple aspects of Julia that one may not immediately recognize but are crucial in a productive Julia workflow. Otherwise, one may end up despairing over the supposed virtues of Julia. Here are some specific pieces:

  1. Julia is a REPL-centric workflow. If you are unfamiliar with what a REPL is, please see this reference for details but in short, the Julia REPL is a continuous loop that accepts all valid inputs. From loading a file, experimenting with code, or calling functions, the REPL serves as a scratchpad to iteratively build your overall Julia software instantly.

  2. Julia is compiled – packages and functions will take a moment to load for use. This builds on the previous point, but yes, as Julia is compiled, any package or function you want to use may execute slightly longer initially but then will be compiled for the duration of your work session. This is why you want your Julia workflow to be REPL-centric as you can get around this issue.

  3. Julians organize Julia software into "projects" or packages. Whether you are writing a collection of small scripts to analyze some data or developing a completely new software package, to effectively maneuver through your Julia code, make liberal use of Pkg.jl. Dalle has an excellent reference that talks about this concept of project environments as well as how to build your own local package.

  4. Working within Julia can be extremely efficient – if you know how. This is a circular statement as it naturally raises the question of, "how do I actually build a concrete Julia workflow?" Thankfully, much has been written about this

  5. Julians want to help you. What is wonderful about the Julia community is that, in contrast to perhaps alternative internet communities, the bulk of Julians greatly enjoy helping not only other Julians but other programmers in general (there has been numerous occasions where I have seen Julians help other language users become even more proficient in their workflows). This is an invaluable assortment of where to find your fellow Julians.

I hope this section does not come off as overtly prescriptive, but I have seen the notion of "you are holding the tool wrong" or "what is Julian" (i.e. how do proficient Julia users do X) pop up too many times for new Julians or those experimenting with the language. I hope with this nudging guidance here, a new Julian can more clearly understand the "why" of what other more proficient Julians recommend.

Building Up Expertise in Julia Programming

At this stage, we can now move from the beginner to intermediate Julian stage. Here, I think the world of Julia quite truly opens up to the new user. To delve deeper into Julia, here are some resources I would personally recommend:

  • Believe it or not, the Julia documentation is actually really nice to read and accessible. Now, I don't just say this as I have helped write some of it, but I do truly think it worth looking through to get a better feel for aspects of Julia one may not consider. I would suggest starting with the Manual section of the documentation.

  • Check out the MIT Computational Thinking Course to have a more hands-on introduction to scientific computing. I have never personally gone through it, but I hear it highly praised.

  • Try solving problems on Exercism.io to practice and improve your skills. I am a mentor here although don't have as much time anymore to help review. I still find this to be a really great place to further your learning and to get better at programming Julia – you'll often get feedback from expert Julia users which, in itself, is extremely valuable.

Domain Specific Workflows in Julia

I will probably spin out the following sub-sections into their own blogs, but here are some selected domain specific workflows I have used or become familiar with that I use regularly within Julia.

Working with Data

This admittedly broad workflow encompasses much, but the most important packages in this space are:

  • DataFrames.jl: This package provides a powerful data manipulation and analysis tool for Julia, similar to the pandas library in Python.

    • Additionally, the author of the package, Bogumił Kamiński, is an extremely prolific blogger who shares many different ways of using DataFrames.jl.

I highly suggest his blog.

  • CSV.jl: Utility library for working with CSV and other delimited files in the Julia programming language

  • TerminalPager.jl: a REPL-based Julia variable and documentation explorer

Plotting

When I first started within Julia, this was the only area I felt that was sorely lacking within the ecosystem. However, I am happy to say that this is no longer the case! In my mind, the best Julia plotting package is Makie.jl. It is an interactive data visualization and plotting ecosystem that has support for multiple backends ranging from publication quality static images, 3D images, to fully interactive plots and visualizations. I use it whenever I can.

Conclusion

NOTE: This blog post is a continuous work in progress.

As this blog post is a continuous work in progress, please feel free to comment below on questions about how I could improve it or explain more. That said, my goal with this blog post was not to cover every aspect of the Julia ecosystem but how to quickly go from knowing nothing about programming to becoming a self-sufficient Julian. May this concise guide help you in your way to achieving all that you want within Julia.

How To Cite

Zelko, Jacob. Resources for Learning the Julia Programming Language. https://jacobzelko.com/10082023195125-julia-learning-resources. October 8 2023.

References:

Discussion:


COPIERTemplate.jl: A new template for Julia using copier

By: julia on Abel Soares Siqueira

Re-posted from: https://abelsiqueira.com/blog/2023-10-07-copiertemplate/

I help manage over 50 packages in the Julia Smooth Optimizers organization, and sometimes we have to make a small update to all of these packages. For instance, one of the workflows was updated, or something new is introduced, or the LTS version of Julia changes.
In these situations, our usual approach is to create some script that downloads all of these packages, then apply the change, then creates a pull request with the modifications.

Column unioning in Tables.jl: row vs column oriented storage

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/10/06/tables.html

Introduction

Today I want to continue exploration of Tables.jl package functionality.
Some time ago I wrote a post about getting a schema of Tables.jl tables.
In the post I discussed, in particular, “column unioning”.

The column unioning approach allows you to put data with heterogenous entries into a single table
by filling the missing entries with missing values.

In my previous post I discussed the Tables.dictrowtable function that performs column unioning.
Today I want to introduce you to the Tables.dictcolumntable function that has a similar functionality,
but uses a different storage format.

The post was written using Julia 1.9.2, Tables.jl 1.11.0, and DataFrames.jl 1.6.1.

Introduction: an example of column unioning

Let us start with a simple example of column unioning behavior:

julia> using DataFrames

julia> vnt = [(a=1, b=2), (a=3, c=4), (b=5, d=6)]
3-element Vector{NamedTuple{names, Tuple{Int64, Int64}} where names}:
 (a = 1, b = 2)
 (a = 3, c = 4)
 (b = 5, d = 6)

julia> DataFrame(Tables.dictrowtable(vnt))
3×4 DataFrame
 Row │ a        b        c        d
     │ Int64?   Int64?   Int64?   Int64?
─────┼────────────────────────────────────
   1 │       1        2  missing  missing
   2 │       3  missing        4  missing
   3 │ missing        5  missing        6

julia> DataFrame(Tables.dictcolumntable(vnt))
3×4 DataFrame
 Row │ a        b        c        d
     │ Int64?   Int64?   Int64?   Int64?
─────┼────────────────────────────────────
   1 │       1        2  missing  missing
   2 │       3  missing        4  missing
   3 │ missing        5  missing        6

We have a heterogeneous table vnt (a vector of named tuples).
Each of its rows has a different set of columns.
After column unioning operation we get a table (displayed as a data frame in our example) where each row has four columns and the missing values are filled with missing.

Such situations are quite common when one e.g. processes JSON data and each object may have a varying set of attributes.

Why do we have two functions that perform column unioning?

A natural question to ask is why do we have Tables.dictrowtable and Tables.dictcolumntable that perform the same operation?

The reason is simple. Tables.dictrowtable returns a row-oriented object, while Tables.dictcolumntable a column oriented one.

We can easily check it by digging into the internals of these objects:

julia> getfield(Tables.dictrowtable(vnt), :values)
3-element Vector{Dict{Symbol, Any}}:
 Dict(:a => 1, :b => 2)
 Dict(:a => 3, :c => 4)
 Dict(:b => 5, :d => 6)

julia> getfield(Tables.dictcolumntable(vnt), :values)
OrderedCollections.OrderedDict{Symbol, AbstractVector} with 4 entries:
  :a => Union{Missing, Int64}[1, 3, missing]
  :b => Union{Missing, Int64}[2, missing, 5]
  :c => Union{Missing, Int64}[missing, 4, missing]
  :d => Union{Missing, Int64}[missing, missing, 6]

As you can see Tables.dictrowtable internally stores a vector of dictionaries, while Tables.dictcolumntable a dictionary of vectors.

So the question is when you should use which? A simple answer is that most of the time Tables.dictcolumntable is what you want,
unless you want to process data row-by-row and the data is very sparse. The reason is that creating a Dict for each row of the data
has a big overhead. Additionally, most often analytical routines expect data stored in columns. For example DataFrame has a columnar storage.
Therefore, creating a data frame from a Tables.dictcolumntable object will be much faster.

Let us make a small test (I repeat the same @time call several times to capture the variability of the results and make sure all gets compiled):

julia> vnt2 = repeat(vnt, 10^6);

julia> @time DataFrame(Tables.dictrowtable(vnt2));
  9.625842 seconds (99.00 M allocations: 5.018 GiB, 20.14% gc time)

julia> @time DataFrame(Tables.dictrowtable(vnt2));
  9.792331 seconds (99.00 M allocations: 5.018 GiB, 24.41% gc time)

julia> @time DataFrame(Tables.dictcolumntable(vnt2));
  2.029009 seconds (30.00 M allocations: 892.629 MiB)

julia> @time DataFrame(Tables.dictcolumntable(vnt2));
  2.251707 seconds (30.00 M allocations: 892.629 MiB)

Indeed we have a faster execution time, less allocations, and less memory allocated with Tables.dictcolumntable.

If we wanted to squeeze out more performance we could pass copycols=false to DataFrame constructor since we know
that in this case we can safely skip making copies of the vectors representing columns that are passed to the constructor:

julia> @time DataFrame(Tables.dictcolumntable(vnt2), copycols=false);
  2.584373 seconds (30.02 M allocations: 790.861 MiB, 17.83% gc time, 3.40% compilation time)

julia> @time DataFrame(Tables.dictcolumntable(vnt2), copycols=false);
  2.139848 seconds (30.00 M allocations: 789.632 MiB)

julia> @time DataFrame(Tables.dictcolumntable(vnt2), copycols=false);
  1.959573 seconds (30.00 M allocations: 789.632 MiB)

Indeed we get less allocations, but the timing of the operation is only minimally better.

Conclusions

I dedicated my post today only to two functions Tables.dictrowtable and Tables.dictcolumntable.
The reason is that they are not commonly known, and at the same time they are very useful if you
need column unioning behavior when working with your source data. Happy hacking!