Author Archives: Steven Whitaker

Julia’s Parallel Processing

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/parallel-processing

Julia is a relatively new, free, and open-source programming language. It has a syntax similar to that of other popular programming languages such as MATLAB and Python, but it boasts being able to achieve C-like speeds.

While serial Julia code can be fast, sometimes even more speed is desired. In many cases, writing parallel code can further reduce run time. Parallel code takes advantage of the multiple CPU cores included in modern computers, allowing multiple computations to run at the same time, or in parallel.

Julia provides two methods for writing parallel CPU code: multi-threading and distributed computing. This post will cover the basics of how to use these two methods of parallel processing.

This post assumes you already have Julia installed. If you haven’t yet, check out our earlier post on how to install Julia.

Multi-Threading

First, let’s learn about multi-threading.

To enable multi-threading, you must start Julia in one of two ways:

  1. Set the environment variable JULIA_NUM_THREADS to the number of threads Julia should use, and then start Julia. For example, JULIA_NUM_THREADS=4.

  2. Run Julia with the --threads (or -t) command line argument. For example, julia --threads 4 or julia -t 4.

After starting Julia (either with or without specifying the number of threads), the Threads module will be loaded. We can check the number of threads Julia has available:

julia> Threads.nthreads()4

The simplest way to start writing parallel code is just to use the Threads.@threads macro. Inserting this macro before a for loop will cause the iterations of the loop to be split across the available threads, which will then operate in parallel. For example:

Threads.@threads for i = 1:10    func(i)end

Without Threads.@threads, first func(1) will run, then func(2), and so on. With the macro, and assuming we started Julia with four threads, first func(1), func(4), func(7), and func(9) will run in parallel. Then, when a thread’s iteration finishes, it will start another iteration (assuming the loop is not done yet), regardless of whether the other threads have finished their iterations yet. Therefore, this loop will theoretically finish 10 iterations in the time it takes a single thread to do 3.

Note that Threads.@threads is blocking, meaning code after the threaded for loop will not run until the loop has finished.

Image of threaded for loop

threads_for

Julia also provides another macro for multi-threading: Threads.@spawn. This macro is more flexible than Threads.@threads because it can be used to run any code on a thread, not just for loops. But let’s illustrate how to use Threads.@spawn by implementing the behavior of Threads.@threads:

# Function for splitting up `x` as evenly as possible# across `np` partitions.function partition(x, np)    (len, rem) = divrem(length(x), np)    Base.Generator(1:np) do p        i1 = firstindex(x) + (p - 1) * len        i2 = i1 + len - 1        if p <= rem            i1 += p - 1            i2 += p        else            i1 += rem            i2 += rem        end        chunk = x[i1:i2]    endendN = 10chunks = partition(1:10, Threads.nthreads())tasks = map(chunks) do chunk    Threads.@spawn for i in chunk        func(i)    endendwait.(tasks)

Let’s walk through this code, assuming Threads.nthreads() == 4:

  • First, we split the 10 iterations evenly across the 4 threads using partition. So, chunks ends up being [1:3, 4:6, 7:8, 9:10]. (We could have hard-coded the partitioning, but now you have a nice partition function that can work with more complicated partitionings!)

  • Then, for each chunk, we create a Task via Threads.@spawn that will call func on each element of the chunk. This Task will be scheduled to run on an available thread. tasks contains a reference to each of these spawned Tasks.

  • Finally, we wait for the Tasks to finish with the wait function.

To reemphasize, note that Threads.@spawn creates a Task; it does not wait for the task to run. As such, it is non-blocking, and program execution continues as soon as the Task is returned. The code wrapped in the task will also run, but in parallel, on a separate thread. This behavior is illustrated below:

julia> Threads.@spawn (sleep(2); println("Spawned task finished"))Task (runnable) @0x00007fdd4b10dc30julia> 1 + 1 # This code executes without waiting for the above task to finish2julia> Spawned task finished # Prints 2 seconds after spawning the above taskjulia>

Spawned tasks can also return data. While wait just waits for a task to finish, fetch waits for a task and then obtains the result:

julia> task = Threads.@spawn (sleep(2); 1 + 1)Task (runnable) @0x00007fdd4a5e28b0julia> fetch(task)2

Thread Safety

When using multi-threading, memory is shared across threads. If a thread writes to a memory location that is written to or read from another thread, that will lead to a race condition with unpredictable results. To illustrate:

julia> s = 0;julia> Threads.@threads for i = 1:1000000           global s += i       endjulia> s19566554653 # Should be 500000500000

Race condition

race_condition

There are two methods we can use to avoid the race condition. The first involves using a lock:

julia> s = 0; l = ReentrantLock();julia> Threads.@threads for i = 1:1000000           lock(l) do               global s += i           end       endjulia> s500000500000

In this case, the addition can only occur on a given thread once that thread holds the lock. If a thread does not hold the lock, it must wait for whatever thread controls it to release the lock before it can run the code within the lock block.

Using a lock in this example is suboptimal, however, as it eliminates all parallelism because only one thread can hold the lock at any given moment. (In other examples, however, using a lock works great, particularly when only a small portion of the code depends on the lock.)

The other way to eliminate the race condition is to use task-local buffers:

julia> s = 0; chunks = partition(1:1000000, Threads.nthreads());julia> tasks = map(chunks) do chunk           Threads.@spawn begin               x = 0               for i in chunk                   x += i               end               x           end       end;julia> thread_sums = fetch.(tasks);julia> for i in thread_sums           s += i       endjulia> s500000500000

In this example, each spawned task has its own x that stores the sum of the values just in the task’s chunk of data. In particular, none of the tasks modify s. Then, once each task has computed its sum, the intermediate values are summed and stored in s in a single-threaded manner.

Using task-local buffers works better for this example than using a lock because most of the parallelism is preserved.

(Note that it used to be advised to manage task-local buffers using the threadid function. However, doing so does not guarantee each task uses its own buffer. Therefore, the method demonstrated in the above example is now advised.)

Packages for Quickly Utilizing Multi-Threading

In addition to writing your own multi-threaded code, there exist packages that utilize multi-threading. Two such examples are ThreadsX.jl and ThreadTools.jl.

ThreadsX.jl provides multi-threaded implementations of several common functions such as sum and sort, while ThreadTools.jl provides tmap, a multi-threaded version of map.

These packages can be great for quickly boosting performance without having to figure out multi-threading on your own.

Distributed Computing

Besides multi-threading, Julia also provides for distributed computing, or splitting work across multiple Julia processes.

There are two ways to start multiple Julia processes:

  1. Load the Distributed standard library package with using Distributed and then use addprocs. For example, addprocs(2) to add two additional Julia processes (for a total of three).

  2. Run Julia with the -p command line argument. For example, julia -p 2 to start Julia with three total Julia processes. (Note that running Julia with -p will implicitly load Distributed.)

Added processes are known as worker processes, while the original process is the main process. Each process has an id: the main process has id 1, and worker processes have id 2, 3, etc.

By default, code runs on the main process. To run code on a worker, we need to explicitly give code to that worker. We can do so with remotecall_fetch, which takes as inputs a function to run, the process id to run the function on, and the input arguments and keyword arguments the function needs. Here are some examples:

# Create a zero-argument anonymous function to run on worker 2.julia> remotecall_fetch(2) do           println("Done")       end      From worker 2:    Done# Create a two-argument anonymous function to run on worker 2.julia> remotecall_fetch((a, b) -> a + b, 2, 1, 2)3# Run `sum([1 3; 2 4]; dims = 1)` on worker 3.julia> remotecall_fetch(sum, 3, [1 3; 2 4]; dims = 1)1x2 Matrix{Int64}: 3  7

If you don’t need to wait for the result immediately, use remotecall instead of remotecall_fetch. This will create a Future that you can later wait on or fetch (similarly to a Task spawned with Threads.@spawn).

Super computer

super_computer

Separate Memory Spaces

One significant difference between multi-threading and distributed processing is that memory is shared in multi-threading, while each distributed process has its own separate memory space. This has several important implications:

  • To use a package on a given worker, it must be loaded on that worker, not just on the main process. To illustrate:

      julia> using LinearAlgebra  julia> I  UniformScaling{Bool}  true*I  julia> remotecall_fetch(() -> I, 2)  ERROR: On worker 2:  UndefVarError: `I` not defined

    To avoid the error, we could use @everywhere using LinearAlgebra to load LinearAlgebra on all processes.

  • Similarly to the previous point, functions defined on one process are not available on other processes. Prepend a function definition with @everywhere to allow using the function on all processes:

      julia> @everywhere function myadd(a, b)             a + b         end;  julia> myadd(1, 2)  3  # This would error without `@everywhere` above.  julia> remotecall_fetch(myadd, 2, 3, 4)  7
  • Global variables are not shared, even if defined everywhere with @everywhere:

      julia> @everywhere x = [0];  julia> remotecall_fetch(2) do             x[1] = 2         end;  # `x` was modified on worker 2.  julia> remotecall_fetch(() -> x, 2)  1-element Vector{Int64}:   2  # `x` was not modified on worker 3.  julia> remotecall_fetch(() -> x, 3)  1-element Vector{Int64}:   0

    If needed, an array of data can be shared across processes by using a SharedArray, provided by the SharedArrays standard library package:

      julia> @everywhere using SharedArrays  # We don't need `@everywhere` when defining a `SharedArray`.  julia> x = SharedArray{Int,1}(1)  1-element SharedVector{Int64}:   0  julia> remotecall_fetch(2) do             x[1] = 2         end;  julia> remotecall_fetch(() -> x, 2)  1-element SharedVector{Int64}:   2  julia> remotecall_fetch(() -> x, 3)  1-element SharedVector{Int64}:   2

Now, a note about command line arguments. When adding worker processes with -p, those processes are spawned with the same command line arguments as the main Julia process. With addprocs, however, each of those added processes are started with no command line arguments. Below is an example of where this behavior might cause some confusion:

$ JULIA_NUM_THREADS=4 julia --banner=no -t 1julia> Threads.nthreads()1julia> using Distributedjulia> addprocs(1);julia> remotecall_fetch(Threads.nthreads, 2)4

In this situation, we have the environment variable JULIA_NUM_THREADS (for example, because normally we run Julia with four threads). But in this particular case we want to run Julia with just one thread, so we set -t 1. Then we add a process, but it turns out that process has four threads, not one! This is because the environment variable was set, but no command line arguments were given to the added process. To use just one thread for the added process, we would need to use the exeflags keyword argument to addprocs:

addprocs(1; exeflags = ["-t 1"])

As a final note, if needed, processes can be removed with rmprocs, which removes the processes associated with the provided worker ids.

Summary

In this post, we have provided an introduction to parallel processing in Julia. We discussed the basics of both multi-threading and distributed computing, how to use them in Julia, and some things to watch out for.

As a parting piece of advice, when choosing whether to use multi-threading or distributed processing, choose multi-threading unless you have a specific need for multiple processes with distinct memory spaces. Multi-threading has lower overhead and generally is easier to use.

How do you use parallel processing in your code? Let us know in the comments below!

Additional Links

Exploring Julia 1.10 – Key Features and Updates

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/julia-1-10

A new version of the Julia programming languagewas just released!Version 1.10 is now the latest stable version of Julia.

This release is a minor release,meaning it includes language enhancementsand bug fixesbut should also be fully compatiblewith code written in previous Julia versions(from version 1.0 and onward).

In this post,we will check out some of the features and improvementsintroduced in this newest Julia version.Read the full post,or click on the links belowto jump to the features that interest you.

If you are new to Julia(or just need a refresher),feel free to check out our Julia tutorial series,beginning with how to install Julia and VS Code.

Improved Latency, or Getting Started Faster

Julia 1.10 has improved latency,which means you can get started faster.

Two sources of latencyhistorically have been slow in Julia:package loadingand just-in-time code compilation.A classic example where this latency was readily noticeablewas when trying to create a plot;consequently,this latency often is calledthe time to first plot (TTFP),or how long one has to waitbefore seeing a plot.

Note that the TTFP issue exists in the first placebecause Julia was designedwith a trade-off in mind:by taking the time to compile a functionthe first time it is called,subsequent calls to the functioncan run at speeds comparable to C.This, however, leads to increased latencyon the first call.

Recent Julia versions have been tackling this issue,and Julia 1.10 further improves latency.

Below is a screenshot of a slide sharedduring the State of Julia talk at JuliaCon 2023.It shows how the time it takesto load Plots.jland then call plotdecreases when moving from Julia 1.8to Julia 1.9and then to Julia 1.10(in this case, Julia 1.10wasn’t released yet,so the alpha version was used).

Improved latency

I saw similar results on my computercomparing Julia 1.9.4 to Julia 1.10.0-rc1(the first release candidate of Julia 1.10):

# Julia 1.9.4julia> @time using Plots  1.278046 seconds (3.39 M allocations: 194.392 MiB, 10.10% gc time, 6.28% compilation time: 89% of which was recompilation)julia> @time display(plot(1:10))  0.365514 seconds (246.08 k allocations: 16.338 MiB, 58.76% compilation time: 10% of which was recompilation)# Julia 1.10.0-rc1julia> @time using Plots  0.713279 seconds (1.42 M allocations: 97.684 MiB, 3.30% gc time, 15.26% compilation time: 86% of which was recompilation)julia> @time display(plot(1:10))  0.257097 seconds (247.72 k allocations: 17.621 MiB, 6.29% gc time, 81.56% compilation time: 9% of which was recompilation)

It’s amazing how much latencyhas been improved!

Better Error Messages

Julia 1.10 now uses JuliaSyntax.jlas the default parser,replacing the old Lisp-based parser.

Having a new parserdoesn’t change how the language runs,but the new parserdoes improve error messages,enabling easier debuggingand creating a lower barrier to entryfor new Julia users.

As an example,consider the following buggy code:

julia> count = 0;julia> for i = 1:10           count++       end

Can you spot the error?

Julia 1.9 gives the following error message:

ERROR: syntax: unexpected "end"

Julia 1.10 gives the following:

ERROR: ParseError:# Error @ REPL[2]:3:1    count++end  invalid identifier

There are at least three improvementsto the error message:

  1. The file location of the offending tokenis prominently displayed.(REPL[2]:3:1 meansthe second REPL entry,the third line,and the first character.This would be replacedwith a file path and line and character numbersif the code were run in a file.)
  2. The specific offending tokenis pointed out with some context.
  3. It is now clear that an identifier(i.e., a variable name)was expectedafter count++.(Note that ++ is a user-definableinfix operator in Julia;so just as a + end is an error,so too is count ++ end.)

Improved error messagesare certainly a welcome addition!

Multithreaded Garbage Collection

Part of Julia’s garbage collectionis now parallelizedin Julia 1.10,resulting in faster garbage collection.

Below is a screenshot of a slide sharedduring the State of Julia talk at JuliaCon 2023.It shows the percentage of timea piece of code spentdoing garbage collectionin different Julia versions(here the master branch is a pre-release version of Julia 1.10).The takeaway is that using threadsdecreased garbage collection time!

Faster garbage collection

The parallelization is implementedusing threads,and the number of threadsavailable for garbage collectioncan be specified when starting Juliawith the command line argument --gcthreads.For example,to use four threads for garbage collection:

julia --gcthreads=4

By default,--gcthreads is halfthe total number of threadsJulia is started with.

Experiment with different numbersof garbage collection threadsto see what works bestfor your code.

Timing Package Precompilation

Timing how long individual packages take to precompileis now easily achieved withPkg.precompile(timing = true).

In Julia 1.9,Pkg.precompile reported just the overall time precompilation took:

julia> using Pkg; Pkg.precompile()Precompiling project...  20 dependencies successfully precompiled in 91 seconds. 216 already precompiled.

Pkg.precompile()(without the timing option)behaves the same in Julia 1.10.But now there is the optionto report the precompilation timefor individual packages:

julia> using Pkg; Pkg.precompile(timing = true)Precompiling project...  19850.9 ms   DataFrames   2858.4 ms   Flux  26206.5 ms   Plots  3 dependencies successfully precompiled in 49 seconds. 235 already precompiled.

Now it is easyto see what packagesprecompile faster than others!

Broadcasting Defined for CartesianIndex

Julia 1.10 now defines broadcastingfor CartesianIndex objects.

A CartesianIndex is a wayto represent an indexinto a multidimensional arrayand can be useful forworking with loops over arrays of arbitrary dimensionality.

Suppose we define the following:

julia> indices = [CartesianIndex(2, 3), CartesianIndex(4, 5)];julia> I = CartesianIndex(1, 1);

In Julia 1.9,attempting to broadcast over a CartesianIndex(for example, indices .+ I)resulted in the following error:

ERROR: iteration is deliberately unsupported for CartesianIndex.

With broadcasting defined,where previously we would have to wrapthe CartesianIndex in a Tuple(e.g., indices .+ (I,)),now the following works:

julia> indices .+ I2-element Vector{CartesianIndex{2}}: CartesianIndex(3, 4) CartesianIndex(5, 6)

Summary

In this post,we learned aboutsome of the new featuresand improvementsintroduced in Julia 1.10.Curious readers cancheck out the release notesfor the full list of changes.

What are you most excited aboutin Julia 1.10?Let us know in the comments below!

Additional Links

Delving into Open Source Packages for Julia

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/learning-packages

Julia is a relatively new,free, and open-source programming language.It has a syntaxsimilar to that of other popular programming languagessuch as MATLAB and Python,but it boasts being able to achieve C-like speeds.

Julia comeswith a lot of functionality built-in.However, functionalitythat isn’t already built-inneeds to be createdfrom base Julia.Fortunately,Julia provides a simple, yet powerful, mechanismfor reusing and sharing code:packages.

Thanks to packages,we don’t have to write the codeto do many common tasks ourselves.(Imagine having to write a plotting function from scratch…)

We learned in a previous post about the Pkg REPL promptand how it can be used to install packages.

To install a package,you have to know it exists.And when using a packagefor the first time,how do you know where to begin?The purpose of this postis to address this question.

In this post,we will learn how to find useful packagesand demonstrate how to discovera package’s functionalityand learn how to use it.

This post assumes you already havea basic understanding of variables and functionsin Julia.You should also understand the differencebetween functions and methods.If you haven’t yet,check out our earlierpost on variables and functionsas well as our post on multiple dispatch,which explains the differencebetween functions and methods.

Package Discovery

The first step to learning a Julia packageis actually finding the package.

Essentially all Julia packages are registered,or made available for downloadvia the Pkg REPL prompt,in Julia’s General registry.Therefore, one could look through the registryto get a sense of the different packages available.

For a more powerful way to explore Julia packages,check out juliapackages.com.This website gathers information from GitHubto allow sorting by popularity(as measured by the number of GitHub stars)and when packages were last updated(which can help give a senseof how actively maintained or updated packages are).You can also explore packages by category.

Screenshot of juliapackages.com

Finally,another way to discover packagesis to visit Julia Discourse.You can look at package announcementsto see what packages are being created.You can also peruse the specific domains tagsto see what packages people are talking aboutand get a feel for what packages people usefor different applications.

Now that we have some toolsfor discovering packages,let’s discuss how to learnhow to use a package.

Learning Package Functionality

Look at Documentation

The first step to finding outwhat a package has to offeris to look at the package’s documentation.

Picture of an open book

Most packages will have at least a READMEthat will list package functionalityand provide some examplesof how to use the package.See Interpolation.jl’s READMEas an example.

Often, more established packageswill also have dedicated documentation(that typically is linkedin the README).Documentation typically includesmore in-depth examplesof how to perform specific tasksusing the package.For example,DataFrames.jl includes a “First Steps” pagein its documentation.

Another common feature of package documentationis a list of functions, types, constants, and other symbolsdefined by the package.See, for example,ForwardDiff.jl’s differentiation API.This list can be usefulfor discovering all possible package functionality,especially when the examples elsewhere in the documentationcover only a small portionof package functionality.

Explore in the REPL

Besides looking at online documentation,the REPL can also be usefulfor learning how to use a package.

After a package is loaded,an exhaustive list of symbolsdefined in a packagecan be obtained via tab completion:

julia> using Debuggerjulia> Debugger.<tab><tab>@bp                             @breakpoint                     @enter@make_frame                     @run                            DebugCompletionProviderDebuggerState                   HIGHLIGHT_24_BIT                HIGHLIGHT_256_COLORSHIGHLIGHT_OFF                   HIGHLIGHT_SYSTEM_COLORS         HighlightOptionLimitIO                         LimitIOException                LineNumbersMAX_BYTES_REPR                  NUM_SOURCE_LINES_UP_DOWN        RESETRunDebugger                     SEARCH_PATH                     WATCH_LIST__init__                        _current_theme                  _eval_code_iscall                         _isdotcall                      _make_frame_preprocess_enter               _print_full_path                _syntax_highlightingactive_frame                    add_breakpoint!                 add_watch_entry!append_any                      assert_allow_step               body_for_methodbreak_off                       break_on                        break_on_errorbreakpoint                      breakpoint_char                 breakpoint_linenumberscheck_breakpoint_index          clear_watch_list!               completionscompute_source_offsets          disable_breakpoint!             enable_breakpoint!eval                            execute_command                 get_function_in_module_or_Mainhighlight_code                  include                         interpret_variableinvalid_command                 julia_prompt                    locdesclocinfo                         maybe_quote                     parse_as_much_as_possiblepattern_match_apply_call        pattern_match_kw_call           print_codeinfoprint_frame                     print_lines                     print_localsprint_next_expr                 print_sourcecode                print_statusprint_var                       promptname                      remove_breakpoint!repr_limited                    set_highlight                   set_themeshow_breakpoint                 show_breakpoints                show_watch_liststacklength                     suppressed                      toggle_breakpoint!toggle_lowered                  toggle_mode                     write_prompt

As discussed in ourpost about the Julia REPL,the help prompt can be usedto display documentationfor individual functions and types:

# Press ? to enter help modehelp?> filtersearch: filter filter! fieldtype fieldtypes  filter(f, a)  Return a copy of collection a, removing elements for which f is false.  The function f is passed one argument.

If you want to find outwhat methods existfor a given function,you can use tab completion:

julia> print(<tab>print(io::IO, ex::Union{Core.GotoNode, Core.SSAValue, Expr, GlobalRef, Core.GotoIfNot, LineNumberNode, Core.PhiCNode, Core.PhiNode, QuoteNode, Core.ReturnNode, Core.Slot, Core.UpsilonNode}) @ Base show.jl:1384print(io::IO, s::Union{SubString{String}, String}) @ Base strings/io.jl:246print(io::IO, x::Union{Float16, Float32}) @ Base.Ryu ryu/Ryu.jl:128print(io::IO, n::Unsigned) @ Base show.jl:1144

You can also use the methods function:

julia> methods(print)# 35 methods for generic function "print" from Base:  [1] print(io::IO, ex::Union{Core.GotoNode, Core.SSAValue, Expr, GlobalRef, Core.GotoIfNot, LineNumberNode, Core.PhiCNode, Core.PhiNode, QuoteNode, Core.ReturnNode, Core.Slot, Core.UpsilonNode})     @ show.jl:1384  [2] print(io::IO, s::Union{SubString{String}, String})     @ strings/io.jl:246  [3] print(io::IO, x::Union{Float16, Float32})     @ Base.Ryu ryu/Ryu.jl:128  [4] print(io::IO, n::Unsigned)     @ show.jl:1144  

The methods functionalso allows filteringon input typesand on the modulein which the methods are defined.For example,to get a list of methods of printthat take two arguments,the second of which is an AbstractChar:

julia> methods(print, (Any, AbstractChar))# 5 methods for generic function "print" from Base: [1] print(io::IO, c::Char)     @ char.jl:252 [2] print(io::IO, c::AbstractChar)     @ char.jl:253 [3] print(io::IO, x)     @ strings/io.jl:32 [4] print(io::IO, xs...)     @ strings/io.jl:42 [5] print(xs...)     @ coreio.jl:3

(See also the related methodswith function.)

And to get methods of printdefined in the Dates package:

julia> using Datesjulia> methods(print, Dates)# 4 methods for generic function "print" from Base: [1] print(io::IO, x::Period)     @ Dates ~/programs/julia/julia-1.9.4/share/julia/stdlib/v1.9/Dates/src/periods.jl:48 [2] print(io::IO, t::Time)     @ Dates ~/programs/julia/julia-1.9.4/share/julia/stdlib/v1.9/Dates/src/io.jl:55 [3] print(io::IO, dt::Date)     @ Dates ~/programs/julia/julia-1.9.4/share/julia/stdlib/v1.9/Dates/src/io.jl:714 [4] print(io::IO, dt::DateTime)     @ Dates ~/programs/julia/julia-1.9.4/share/julia/stdlib/v1.9/Dates/src/io.jl:705

As another example,suppose you have a DataFrameand want to use the groupby functionbut aren’t sure what other argumentsgroupby expects.Tab completion (or the methods function) can help:

julia> using DataFramesjulia> x = DataFrame(a = 1:3, b = rand(3));julia> gx = groupby(x, <tab>groupby(df::AbstractDataFrame, cols; sort, skipmissing) @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/groupeddataframe.jl:218

After running a package’s function,you might want to learn more aboutwhat the function returned.The typeof functionreturns the type of its input,and fieldnamesreturns a list of propertiesthat can be accessed:

julia> d = Dict("a" => 1, "b" => 2)Dict{String, Int64} with 2 entries:  "b" => 2  "a" => 1julia> x = collect(d)2-element Vector{Pair{String, Int64}}: "b" => 2 "a" => 1julia> typeof(x)Vector{Pair{String, Int64}} (alias for Array{Pair{String, Int64}, 1})julia> fieldnames(typeof(x[1]))(:first, :second)julia> x[1].first"b"

Tab completion can also be usedto list properties:

julia> x[1].<tab><tab>first   second

You can also see where in the type hierarchyan object’s type lieswith the supertype function:

julia> supertype(typeof(x))DenseVector{Pair{String, Int64}} (alias for DenseArray{Pair{String, Int64}, 1})

Picture of source code

Read Source Code

In addition to reading documentationand experimenting in the REPL,sometimes the best way to learn a packageis to read the source code directly.While that may seem daunting at first,remember that Julia is a high-level language,making it somewhat easy to read(at least after getting used to it).

There are two main benefitsof reading the source code:

  1. You get to see how the packagecreates and usescustom types and functions.
  2. Typically code is organizedin a logical manner,so you get to see what symbolslogically belong together.For example,a file that defines a typetypically will also defineconstructors for that typeand functions that operate on it.

If you see a function calland want to know what method will be called,the @which command in the REPL can help.For example:

julia> using DataFramesjulia> df = DataFrame(a = 1:3);julia> @which hcat(df, df)hcat(df1::AbstractDataFrame, df2::AbstractDataFrame; makeunique, copycols)     @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/abstractdataframe.jl:1608

Now we know the file and line numberwhere the hcat methodthat acts on two DataFrames is defined,and we can look at the source codeto learn more about what the method does.

If you don’t have any objects to work withbut know the types of the inputs,you can use the which method instead:

julia> which(hcat, (DataFrame, DataFrame))hcat(df1::AbstractDataFrame, df2::AbstractDataFrame; makeunique, copycols)     @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/abstractdataframe.jl:1608

Summary

That wraps up our discussionabout how to find useful packagesand how to discover and learn to usea package’s functionality.We listed a few tools for finding packagesand walked through some different methodsfor learning how to use a package,including looking at documentation,exploring in the REPL,and reading source code.

Do you have any tips or tricksfor learning how to use packages in Julia?Let us know in the comments below!

Now that you have a better ideaof how to learn Julia packages,move on to thenext post to learn about parallel processing in Julia!Or,feel free to take a lookat our other Julia tutorial posts.

Additional Links