Author Archives: Tamás K. Papp

Working with large Julia source files in Emacs

By: Tamás K. Papp

Re-posted from: https://tpapp.github.io/post/large-files-julia/

When writing software, especially libraries, a natural question is how to organize source code into files. Some languages, eg Matlab, encourage a very fragmented style (one function per file), while for some other languages (C/C++), a separation between the interface (.h) and the implementation (.c/.cpp) is traditional.

Julia has no such constraint: include allows the source code for a module to be organized into small pieces, possibly scattered in multiple directories, or it can be a single monolithic piece of code. The choice on this spectrum is up to the authors, and is largely a matter of personal preference.

When I started working with Julia, I was following the example of some prominent packages, and organized code into small pieces (~ 500 LOC). Lately, whenever I refactored my code, I ended up putting it in a single file.

I found the following Emacs tools very helpful for navigation.

Form feeds and page-break-lines-mode

Form feed, or \f, is an ASCII control character that was used to request a new page in line printers. Your editor may display it as ^L. It has a long history of being used as a separator, and Emacs supports it in various ways.

By default, C-x [ and C-x ] take you to the previous and next form feed separators. Combined with numerical prefixes, eg C-3 C-x [, you can jump across multiple ones very quickly. Other commands with page in their name allow narrowing, marking, and other functions.

Many Emacs packages provide extra functionality for page breaks. My favorite is page-break-lines, which replaces ^L with a horizontal line, so that the output looks like this:

export
    ML_estimator


# general API """ ML_estimator(ModelType, data...) Estimate `ModelType` using maximum likelihood on `data`, which is model-specific. """ function ML_estimator end

Finding things quickly

I am using helm pervasively. helm-occur is very handy for listing all occurrences of something, and navigating them. The following is an except from base/operators.jl, looking for isless:

operators.jl:213:types with a canonical total order should implement `isless`.
operators.jl:227:<(x, y) = isless(x, y)
operators.jl:300:# this definition allows Number types to implement < instead of isless,
operators.jl:302:isless(x::Real, y::Real) = x<y
operators.jl:303:lexcmp(x::Real, y::Real) = isless(x,y) ? -1 : ifelse(isless(y,x), 1, 0)

You can move across these matches, jump to one in an adjacent buffer while keeping this list open, or save the list for later use. Its big brother helm-do-grep-ag is even more powerful, using ag to find something in a directory tree.

With these two tools, I find navigating files around 5K LOC very convenient — the better I learn Emacs, the larger my threshold for a “large” file becomes.1


  1. In case you are wondering, the largest files are around 6K LOC in Julia Base at the moment. [return]

Publication quality plots in Julia

By: Tamás K. Papp

Re-posted from: https://tpapp.github.io/post/plot-workflow/

In light of recent discussions on Julia’s Discourse forum about getting “publication-quality” or simply “nice” plots in Julia, I thought it would be worthwhile to briefly summarize what works for me.1 If you are a seasoned Julia user, this post may have nothing new for you, but I hope that newcomers to Julia find it useful.

Generate the data

I try to separate data generation and plotting. The first may be time-consuming (some calculations can take hours or days to run), and I find it best to save the results independently of any plotting. Recently I was sitting at a conference where a presentation about a really interesting topic had some plots that were extremely hard to see: if I remember correctly, something like 10×2 subplots, with almost all fine detail lost due to the resolution of the projector or the human eye. When someone in the audience asked about this, the presenting author replied that he is aware of the issue, but remaking the plots would involve rerunning the calculations, which take weeks. Saving the data separately will ensure that you are never in this situation; also, you can benefit from updates to plotting libraries when tweaking your plots.

For saving results, JLD2 is probably the most convenient tool: while it is technically work in progress, it is stable, fast, and convenient.2 The key question is where to save the data: I find it best to use a consistent path that you can just include in scripts.

You have several options:

  1. define a global variable in your ~/.juliarc for your projects, and construct a path with joinpath,

  2. if you have packaged your code, Pkg.dir can be used to obtain a subdirectory in the package root,

  3. if your code is in a module, you can wrap @__DIR__ in a function to obtain a directory.

For this blog post I used the first option, while in practice I use the second and the third.

To illustrate plots, I use the code below to generate random variates for sample skewness, and save it.

download as data.jl

using StatsBase                 # for skewness
using JLD2                      # saving data
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
sample_skewness = [skewness(randn(100)) for _ in 1:1000]
@save "data.jld2" sample_skewness # save data

Make the plot

No plotting so far, so let’s remedy that. I use Plots.jl, which is a metapackage that unifies syntax for plotting via various plotting backends in Julia. I find this practical, because I can quickly switch backends for different purposes, and experiment with various options when I find the output suboptimal. The price you pay for this flexibility is compilation time, a known issue which means that you have to wait a bit to get your first plot. Separating plotting and data generation has the advantage that once I fire up the plotting infrastructure, I switch to “plotting mode” and clean up several plots at the same time.

Users frequently ask what the “best” backend is. This all depends on your needs, but these days I use the pgfplots() backend almost exclusively.3 The gr() backend is also useful, because it is very fast.

Time to tweak the plot! I find the attributes documentation the most useful for this. For this plot I need axis labels, a title, and prefer to disable the legend since I am plotting a single series. I am also using LaTeXStrings.jl, which means that I can use LaTeX-compatible syntax for labels seamlessly (notice the L before the string).

download as plot.jl

using JLD2                      # loading data
using Plots; pgfplots()         # PGFPlots backend
using LaTeXStrings              # nice LaTeX strings
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
@load "data.jld2"                         # load data
# make plot and tweak; this is the end result
plot(histogram(sample_skewness, normalize = true),
     xlab = L"\gamma_1", fillcolor = :lightgray,
     yaxis = ("frequency", (0, 2)), title = "sample skewness", legend = false)
# finally save
savefig("sample_skewness.svg")  # for quick viewing and web content
savefig("sample_skewness.tex")  # for inclusion into papers
savefig("sample_skewness.pdf")  # for quick viewing

Having generated the plot, I save it in various formats with savefig. The SVG output is shown below.

The plot

How to get help

If you cannot achieve the desired output, you can

  1. reread the Plots.jl manual,

  2. study the example plots,

  3. ask for help in the Visualization topic.

For the third option, make sure you include a self-contained minimal working example,4 which also generates or loads the data, so that others can run your code as is. Randomly generated data should be fine, or standard datasets from RDatasets.jl.

Sometimes you will find that the feature you are looking for is not (yet) supported. You should check if there is an open issue for your problem (the discussion forum linked above is useful for this), and if not, open one.

When asking for help or just discussing plotting libraries in Julia, please keep in mind that they are a community effort with volunteers devoting their time to address a very difficult problem. Plotting is not a well-defined exercise, it involves a lot of heuristics and special cases, and most languages took years to get it right (for a given value of “right”). Make it easy for people to help you by making a reproducible, clean MWE: it is very hard to explain how to improve your plot without the actual code and output.


  1. Code in this post was written in December 2017, you may need to tweak it if the API of the packages changes. [return]
  2. In the worst case scenario, you can always regenerate the data ☺ [return]
  3. Note that you need a working TeX installation, which is easy to obtain on Linux. [return]
  4. You should use triple-backticks ``` to format your code. [return]

WIP: making error locations in julia-repl clickable

By: Tamás K. Papp

Re-posted from: https://tpapp.github.io/post/wip-julia-repl-clickable/

I scratched a long-standing itch and made locations in error messages
"clickable" in julia-repl. Not yet merged into master, the change
is in the
clickable-locations
branch.

Testing is needed because of some hacks (again, I am not an Emacs
expert), I will see if there are issues then merge it. This is what it
looks like, those red and orange lines take you to the source:

julia> include("/tmp/Foo.jl")
ERROR: LoadError: UndefVarError: T not defined
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:576
 [2] include(::String) at ./sysimg.jl:14
while loading /tmp/Foo.jl, in expression starting on line 9

julia>