Author Archives: Tamás K. Papp

Continuous integration for Julia packages using Docker

By: Tamás K. Papp

Re-posted from:

This post may be useful for maintainers of Julia packages which require a large binary dependencies on CI services like Travis.

I have recently started using Kristoffer Carlsson’s excellent PGFPlotsX for plotting. The package is a thin wrapper which emits LaTeX code for use with pgfplots, which is extremely versatile and well-documented.1 However, since most of the action happens in LaTeX, unit testing requires a lot of binary dependencies, including the TeXLive suite and some related packages. This is not a problem on one’s own machine where these would need to be installed just once, but when I submitted PRs, tests on Travis timed out more often than not because it had to install all of these for every run using apt-get.

The documentation of Travis suggested that docker may be a solution for such cases, and I have been looking an opportunity to experiment with it anyway. After reading their tutorial it was relatively quick to produce an image based on plain vanilla Ubuntu 17.10, which is available as a docker image to build on, and the required TeXLive and related packages, plus some utilities.

During building the image, I download the binaries for the stable version Julia, while nightly is downloaded on demand. This speeds up CI by 40–50 seconds for stable.

This is how it is run:

  1. the directory of the Julia package is mounted in the container at /mnt,

  2. Pkg.clone() and testing proceed as usual,

  3. coverage results are copied back to /mnt when done.

The resulting image runs in 3–4 minutes consistently. In case someone finds it useful for Julia packages with similarly large binary dependencies, I made it available as texlive-julia-minimal-docker on Github.2 Naturally, for projects with other large binary dependencies, one would install different Ubuntu packages or binaries.

  1. Using this package accelerated my plotting workflow in Julia. A post on this will follow soon. [return]
  2. “Minimal” turns out to be a misnomer, since some dependencies end up requiring X11 and the image is >700GB. [return]

Working with large Julia source files in Emacs

By: Tamás K. Papp

Re-posted from:

When writing software, especially libraries, a natural question is how to organize source code into files. Some languages, eg Matlab, encourage a very fragmented style (one function per file), while for some other languages (C/C++), a separation between the interface (.h) and the implementation (.c/.cpp) is traditional.

Julia has no such constraint: include allows the source code for a module to be organized into small pieces, possibly scattered in multiple directories, or it can be a single monolithic piece of code. The choice on this spectrum is up to the authors, and is largely a matter of personal preference.

When I started working with Julia, I was following the example of some prominent packages, and organized code into small pieces (~ 500 LOC). Lately, whenever I refactored my code, I ended up putting it in a single file.

I found the following Emacs tools very helpful for navigation.

Form feeds and page-break-lines-mode

Form feed, or \f, is an ASCII control character that was used to request a new page in line printers. Your editor may display it as ^L. It has a long history of being used as a separator, and Emacs supports it in various ways.

By default, C-x [ and C-x ] take you to the previous and next form feed separators. Combined with numerical prefixes, eg C-3 C-x [, you can jump across multiple ones very quickly. Other commands with page in their name allow narrowing, marking, and other functions.

Many Emacs packages provide extra functionality for page breaks. My favorite is page-break-lines, which replaces ^L with a horizontal line, so that the output looks like this:


# general API """ ML_estimator(ModelType, data...) Estimate `ModelType` using maximum likelihood on `data`, which is model-specific. """ function ML_estimator end

Finding things quickly

I am using helm pervasively. helm-occur is very handy for listing all occurrences of something, and navigating them. The following is an except from base/operators.jl, looking for isless:

operators.jl:213:types with a canonical total order should implement `isless`.
operators.jl:227:<(x, y) = isless(x, y)
operators.jl:300:# this definition allows Number types to implement < instead of isless,
operators.jl:302:isless(x::Real, y::Real) = x<y
operators.jl:303:lexcmp(x::Real, y::Real) = isless(x,y) ? -1 : ifelse(isless(y,x), 1, 0)

You can move across these matches, jump to one in an adjacent buffer while keeping this list open, or save the list for later use. Its big brother helm-do-grep-ag is even more powerful, using ag to find something in a directory tree.

With these two tools, I find navigating files around 5K LOC very convenient — the better I learn Emacs, the larger my threshold for a “large” file becomes.1

  1. In case you are wondering, the largest files are around 6K LOC in Julia Base at the moment. [return]

Publication quality plots in Julia

By: Tamás K. Papp

Re-posted from:

In light of recent discussions on Julia’s Discourse forum about getting “publication-quality” or simply “nice” plots in Julia, I thought it would be worthwhile to briefly summarize what works for me.1 If you are a seasoned Julia user, this post may have nothing new for you, but I hope that newcomers to Julia find it useful.

Generate the data

I try to separate data generation and plotting. The first may be time-consuming (some calculations can take hours or days to run), and I find it best to save the results independently of any plotting. Recently I was sitting at a conference where a presentation about a really interesting topic had some plots that were extremely hard to see: if I remember correctly, something like 10×2 subplots, with almost all fine detail lost due to the resolution of the projector or the human eye. When someone in the audience asked about this, the presenting author replied that he is aware of the issue, but remaking the plots would involve rerunning the calculations, which take weeks. Saving the data separately will ensure that you are never in this situation; also, you can benefit from updates to plotting libraries when tweaking your plots.

For saving results, JLD2 is probably the most convenient tool: while it is technically work in progress, it is stable, fast, and convenient.2 The key question is where to save the data: I find it best to use a consistent path that you can just include in scripts.

You have several options:

  1. define a global variable in your ~/.juliarc for your projects, and construct a path with joinpath,

  2. if you have packaged your code, Pkg.dir can be used to obtain a subdirectory in the package root,

  3. if your code is in a module, you can wrap @__DIR__ in a function to obtain a directory.

For this blog post I used the first option, while in practice I use the second and the third.

To illustrate plots, I use the code below to generate random variates for sample skewness, and save it.

download as data.jl

using StatsBase                 # for skewness
using JLD2                      # saving data
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
sample_skewness = [skewness(randn(100)) for _ in 1:1000]
@save "data.jld2" sample_skewness # save data

Make the plot

No plotting so far, so let’s remedy that. I use Plots.jl, which is a metapackage that unifies syntax for plotting via various plotting backends in Julia. I find this practical, because I can quickly switch backends for different purposes, and experiment with various options when I find the output suboptimal. The price you pay for this flexibility is compilation time, a known issue which means that you have to wait a bit to get your first plot. Separating plotting and data generation has the advantage that once I fire up the plotting infrastructure, I switch to “plotting mode” and clean up several plots at the same time.

Users frequently ask what the “best” backend is. This all depends on your needs, but these days I use the pgfplots() backend almost exclusively.3 The gr() backend is also useful, because it is very fast.

Time to tweak the plot! I find the attributes documentation the most useful for this. For this plot I need axis labels, a title, and prefer to disable the legend since I am plotting a single series. I am also using LaTeXStrings.jl, which means that I can use LaTeX-compatible syntax for labels seamlessly (notice the L before the string).

download as plot.jl

using JLD2                      # loading data
using Plots; pgfplots()         # PGFPlots backend
using LaTeXStrings              # nice LaTeX strings
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
@load "data.jld2"                         # load data
# make plot and tweak; this is the end result
plot(histogram(sample_skewness, normalize = true),
     xlab = L"\gamma_1", fillcolor = :lightgray,
     yaxis = ("frequency", (0, 2)), title = "sample skewness", legend = false)
# finally save
savefig("sample_skewness.svg")  # for quick viewing and web content
savefig("sample_skewness.tex")  # for inclusion into papers
savefig("sample_skewness.pdf")  # for quick viewing

Having generated the plot, I save it in various formats with savefig. The SVG output is shown below.

The plot

How to get help

If you cannot achieve the desired output, you can

  1. reread the Plots.jl manual,

  2. study the example plots,

  3. ask for help in the Visualization topic.

For the third option, make sure you include a self-contained minimal working example,4 which also generates or loads the data, so that others can run your code as is. Randomly generated data should be fine, or standard datasets from RDatasets.jl.

Sometimes you will find that the feature you are looking for is not (yet) supported. You should check if there is an open issue for your problem (the discussion forum linked above is useful for this), and if not, open one.

When asking for help or just discussing plotting libraries in Julia, please keep in mind that they are a community effort with volunteers devoting their time to address a very difficult problem. Plotting is not a well-defined exercise, it involves a lot of heuristics and special cases, and most languages took years to get it right (for a given value of “right”). Make it easy for people to help you by making a reproducible, clean MWE: it is very hard to explain how to improve your plot without the actual code and output.

  1. Code in this post was written in December 2017, you may need to tweak it if the API of the packages changes. [return]
  2. In the worst case scenario, you can always regenerate the data ☺ [return]
  3. Note that you need a working TeX installation, which is easy to obtain on Linux. [return]
  4. You should use triple-backticks ``` to format your code. [return]