Author Archives: Tamás K. Papp

How my Julia coding style changed

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/2019-09-07-coding_style_retrospective/

The recent redesign of DynamicHMC.jl provides an opportunity to reflect on how my Julia coding style changed. I started this package two years ago, and did not do any major refactoring after it became usable, mostly because I did not have a good comprehensive test suite or a solid conceptual understanding of all the ramifications of the algorithm, especially the adaptation, and I was afraid of breaking something that I just got working. Consequently, the code before the redesign largely reflected the coding style I developed for Julia 0.5 and then partly 0.6.

I am one of those people who recognize the benefits of reflecting on the past from time to time, but are too busy lazy to keep a diary. But code under version control is effectively a diary about the life of software, and I find it interesting to think about how my Julia coding style has changed in the past to years.

It is important to emphasize that this writeup reflects how I code at the moment. I recognize that other coding styles in Julia can be perfectly legitimate, and when I contribute to packages, I do my best to follow existing style. Also, it is very likely that my own coding style will change as the language evolves. In this blog post, I won’t address details of coding style about which there is a more or less general agreement — see the Style Guide in the manual and YASGuide for useful summaries. Finally, some examples are stylized adaptations of real code for simpler exposition.

Randomness

Random numbers are very important for scientific computing, but at the same time present an important challenge for writing bug-free parallel code and unit testing. A lot of code in Random and Base follows the interface conventions

rand([rng], ...)

making the random number generator an optional first argument, and defaulting to the global one.

While this can be convenient for user code, in packages I find that it is better to keep the random number generator as an explicit argument to almost all methods. This paves the way for making the code work nicely with composable multi-threading too.

If random and deterministic parts of the algorithm can be separated, it should be done, as it usually makes the code cleaner, and unit testing much easier. An example of this is factoring out the random directions for tree doubling in NUTS: drawing a single set of bit flags once and providing this to the tree building algorithm, the latter can treat the doubling directions as predetermined. This also made it much easier to verify detailed balance in the unit tests.

Named tuples

Named tuples are extremely useful for organizing multiple values as inputs and outputs when a composite type (struct) is not warranted or needed. If a function returns multiple values, especially more than two, I now always default to using named tuples, ie

function f(x, y)
    a = ...
    b = ...
    c = ...
    (a = a, b = b, c = c)
end

then use @unpack from Parameters.jl for destructuring:

@unpack a, b = f(x, y) # I don't need c here

This is costless and makes the interfaces more flexible: for example, I can add extra fields, or eventually make the returned value a composite type later, without changing anything in the callers.

Multi-liners as opposed to one-liners

Almost all the “clever” one liners are now expunged from the code. Compare

EBFMI(x) = (πs = map(x -> x.π, x); mean(abs2, diff(πs)) / var(πs))

with

function EBFMI(tree_statistics)
    πs = map(x -> x.π, tree_statistics)
    mean(abs2, diff(πs)) / var(πs)
end

The first one is kind of crammed, and difficult to read, while the second one makes it clear that an interim value is obtained for the calculation.

When a function returns multiple values, I find it nice to assign each to a variable first, ie

function f(x, y)
    a = g(x, y)
    b = h(x, y)
    a, b
end

as opposed to

f(x, y) = g(x, y), h(x, y)

This makes the returned structure much more readable when I need to look at some code later.

That said, I still have a strong preference for writing small functions that “do one thing”. This again is frequently an optimal strategy for Julia, as the compiler is very good about inlining decisions (and when not, can be nudged with @inline).

Abstract types can be an implementation detail

Abstract types can be very useful for organizing method dispatch: code like

abstract type SomeAbstractType end

f(x::SomeAbstractType) = do_something()

struct ThatSubType{T} <: SomeAbstractType
    y::T
end

g(x::ThatSubType) = do_something_else(x.y)

can help avoid repetition.

But abstract types are just one convenient tool for this, and building a type tree is a tool, not an end in itself. When they are no longer adequate, other solutions such as traits should be used instead.

Consequently, these days I mostly consider type hierarchies an implementation detail and I am wary of exposing them in the API of a package. This is especially important if some functionality requires that the user of a package implements their own custom types, e.g. for a model. Requiring that

UserDefinedModel <: SomePackage.ModelType

to work with some API is usually unnecessary, and can cause unnecessary complications when the user would want UserDefinedModel to implement some other API. Either duck typing, or an explicit interface check like

implements_model(::Any) = false
implements_model(::Type{<:UserDefinedModel}) = true

can work better (returning types when that is necessary, eg for further dispatch — see Base.IteratorSize for an example).

Small type unions

The ability of the compiler create efficient code for small type unions made code organization much easier for me. For example, DynamicHMC.adjacent_tree attempts to build a tree next to some position, but this can fail because of divergence or turning in the leapfrog integrator, in which case it is useful to bail out early and return information about the reason. Consequently, this function either returns a tree, or an InvalidTree, which is a container with the aforementioned diagnostic information. The compiler recognizes this and generates fast code.

Some attention should be paid to avoid combinatoric type explosions, but when used well, small type unions can make code much simpler at little or no extra cost.

Functional is often the best solution

Preallocating containers for outputs can make code faster, but at the cost of extra complexity, especially that of having to figure out type parameters for said containers.

My general strategy is to write code in a functional, side-effect-free style until I am really convinced that preallocation would help, and then confine it to carefully compartmentalized parts of the code. This is of course ideal only for some kinds of code — when allocations dominate, a functional style is not optimal.

That said, Julia’s compiler is getting more and more clever about optimizing allocations, so I frequently find that I never rewrite code with preallocated containers, and occasionally even revert from preallocated code to a functional style without a relevant performance cost.

When in doubt, document

Some readers of the code of DynamicHMC.jl will have the impression that it is overdocumented. Probably 20–25% of the lines in the source are docstrings, and most internal interfaces and convenience functions have one.

Primarily, this is driven by instances of the embarassing realization of having written some code two years ago which kind of works, but no longer being sure of all the details that would be nice to know before refactoring. But another consideration is encouraging contributions, or uses of the packages for research — besides being an implementation of the NUTS sampler, DynamicHMC.jl is also a platform for experimenting, which is easier to do when the internals are documented.

Julia’s documentation ecosystem is now very mature and robust. Besides the excellent Documenter.jl, I also find DocStringExtensions.jl very useful.

Internal interfaces have high return on investment

Many packages, especially those that are work in progress or experimental, do not document their user interfaces very carefully. This is of course understandable, but I have developed a preference for abstracting out and documenting interfaces in internally used code, too, as the package matures.

This makes refactoring and compartmentalization much easier, and makes sense even if a particular interface has only a single implementation in the package, because unit tests can just provide a “dummy” implementation and test related code that way. An example of this is the code for “trees” and the related tests: trees here are an abstraction for walking integer positions in two directions. This is key for the NUTS algorithm, but can be handled as an abstraction separate from the details of leapfrog integrators.

ANN: DynamicHMC 2.0

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/2019-08-29-dynamichmc2/

I am very pleased to announce version 2.0 of DynamicHMC.jl. I briefly summarize the new the developments in this blog post.

Code used in this post was written for Julia v1.* and, when applicable, reflects the state of packages used on 2019-09-03. You may need to modify it for different versions.

Note: my blog no longer has a comment section. Feel free to ask questions about this post on the Julia Discourse forum.

Some context

DynamicHMC.jl is part of a collection of packages for implementing Bayesian inference using a modular approach:

  1. TransformVariables.jl just maps \(\mathbb{R}^n\) vectors into collections of constrained parameters, like positive real numbers or valid correlation matrices,

  2. LogDensityProblems.jl provides an interface for log density functions \(\mathbb{R}^n \to \mathbb{R}\) and their derivative, also calculating the latter using automatic differentiation via one of the supported native Julia AD frameworks (see below),

  3. DynamicHMC.jl, which just takes a log density and its gradient on \(\mathbb{R}^n\), and performs MCMC using NUTS,

  4. MCMCDiagnostics.jl for generic convergence diagnostics (ie not specific to a particular MCMC method).

In contrast to a single interface and a DSL for describing models, these packages provide a suite of tools for modern MCMC, with easily interchangable and modular building blocks. For example, the gradients of a log density can be obtained with

  1. ForwardDiff.jl, which is very robust,
  2. ReverseDiff.jl, which works better for medium-sized problems using a classic taped approach,
  3. Flux.jl which is more modern and can be faster on larger problems, and of course
  4. Zygote.jl which is very promising, but work in progress;

and switching between these usually just requires changing a single line. I find this especially important since as Zygote keeps maturing, it will play a more and more important role for fast AD. Moreover, the user is free to code all gradients manually, or just parts of them to help out AD, for example with the adjoint method.

In addition to this, some packages use DynamicHMC.jl as a backend. This is encouraged and remains to be supported. Finally, the third common use case for this package is as a research platform for experimenting with modern HMC algorithms: this is supported by a detailed documentation of internals.

Why a new API was needed

As bug reports and test cases kept accumulating, it was very clear that better adaptation heuristics, and a more sophisticated diagnosic and warmup interface was needed. When NUTS/HMC goes wrong with models it should be able to handle otherwise, the problem is usually with adaptation. The user should be able to learn what went wrong, and either manually tune stepsize and the kinetic energy metric, or choose an adaptation better suited to the model.

The old API of DynamicHMC was lacking in several ways. The main entry point was something like

chain, tuning = NUTS_init_tune_mcmc(∇P, 1000)

with tuning containing the adapted stepsize and kinetic energy metric.

In practice, it turns out that

  1. either the model works fine, and then the user cares little about warmup,
  2. or it doesn’t, then information about the warmup would be necessary to debug the model.

Fine-tuning the adaptation sequence was possible, but rather unintuitive in the old API. I doubt that many people used this feature, or were even aware that it was available. Statistics on why adaptation failed or sampling didn’t mix were also difficult to obtain.

After collecting the various problems, I spent some time on redesigning the internals, then the API to address all major issues.

Here I would like to thank all users of this library who provided a lot of valuable feedback, especially in the form of bug reports which I could study and use to tweak the sampler. Robert J Goedman coded all the models of the excellent “Statistical Rethinking” book in DynamicHMCModels.jl, which is the most comprehensive collection of examples for this package, and also provides and extremely useful test suite for it. DiffEqBayes.jl included DynamicHMC as a backend, while Soss.jl provides a higher-level DSL for building models. Users (who are, mostly, also the authors) of these packages provided a lot of example models and test cases.

Changes in 2.0

Documentation

The documentation was rewritten from scratch, now including a worked example. All functions of the API have extensive docstrings, usually with examples where relevant. This documentation should be the starting point for using this package.

Of course, if you have questions, feature requests, or bug reports, don’t hesitate to open an issue; I would like to emphasize that it is still perfectly fine to open issues just to ask questions. You can also address questions to @Tamas_Papp on the Julia discourse forum.

Main interface

Most people would now call the sampler with

results = mcmc_with_warmup(Random.GLOBAL_RNG, ∇P, 1000)

where ∇P is an object that supports the LogDensityProblems.jl API (basically “give me a log density and its gradient at this position”, withs some bells and whistles). results is a NamedTuple with fields like chain (a vector of positions in \(\mathbb{R}^n\), which most users probably need to transform into a collection of constrained parameters), information on tree statistics and the adapted parameters. A detailed worked example is available.

In contrast to the previous API, the random number generator needs to be explicitly provided. This is in preparation for multithreading, encouraging the user to be conscious of RNGs as mutable states; the internals now also follow this approach with no default RNG in the sampling part.

Exposed warmup building blocks

The new API allows fine-grained control over the warmup stages. For example, this is how one would skip local optimization, ask for a dense (Symmetric) metric instead of the default Diagonal, and provide an initial position while at the same time allowing the stepsize to be found and adapted using the default heuristic:

q = ... # some initial position
warmup_stages = default_warmup_stages(; local_optimization = nothing,
                                      M = Symmetric)
κ = GaussianKineticEnergy(5, 0.1)
mcmc_with_warmup(rng, , N;
                 initialization = (q = q, κ = κ),
                 warmup_stages = warmup_stages)

The warmup_stages above is just a Tuple, with no secret sauce, you are provided the tools to make up your own if necessary.

Changed heuristics and warmup defaults

In contrast to Stan, the old API did not look for a local optimum before starting the stepsize adaptation. This is fine in most cases, but occasionally adapting to an otherwise atypical region of the parameter space can cause problems with the algorithm later on. On the other hand, adapting too eagerly to models with singularities in the log posterior can also backfire very easily.

The new default heuristics make a half-hearted effort to go near some local optimum, but don’t overdo it, which I think is the right compromise. Also, there is some protection against singularities on the “edges” of \(\mathbb{R}^n\) (which can happen with the “funnels” of hierarchical models).

Stepsize adaptation became a bit more robust with some tweaks. I still think that the initial bracketing algorithm in this package is better than Stan’s in some cases and worth the extra couple of evaluations, as it plays nicer with the dual averaging for posteriors where the optimal stepsize can change rapidly.

The default kinetic energy metric is now Diagonal. This should be nearly optimal except for very heavily correlated posteriors which also happen to be devoid of any other pathologies (this is rare in practice).

Diagnostics

Diagnostics were reorganized into a DynamicHMC.Diagnostics submodule of their own. They are intended for interactive use, and the exposed API can change with just a minor version increment. You can import them into your current module with

using DynamicHMC.Diagnostics

Note that this package has no plotting functionality. Use your favorite plotting package to visualize information, I made the plots below with PGFPlotsX.jl (and relevant code may eventually end up in a mini-package, for now see the link below).

download code as plots.jl

Detailed tree statistics

Each result that contains a chain also comes with corresponding tree_statistics, which is a vector of statistics for each NUTS step. It contains information about acceptance ratios, number of leapfrog steps and tree depth, and the doubling directions. In 2.0, it also contains a field which informs the user about the location (in steps relative to the starting point).

There is a summarize_tree_statistics function that produces a useful summary about acceptance rations:

julia> results = mcmc_with_warmup(Random.GLOBAL_RNG, , 1000;
                                  reporter = NoProgressReport());

julia> results.tree_statistics[1]
DynamicHMC.TreeStatisticsNUTS(-1.7246438302263802, 2,
                              turning at positions 1:4, 0.963443025058039,
                              7, DynamicHMC.Directions(0x595b1b9c))

julia> summarize_tree_statistics(results.tree_statistics)
Hamiltonian Monte Carlo sample of length 1000
  acceptance rate mean: 0.94, 5/25/50/75/95%: 0.75 0.92 0.97 1.0 1.0
  termination: divergence => 0%, max_depth => 0%, turning => 100%
  depth: 0 => 0%, 1 => 14%, 2 => 55%, 3 => 21%, 4 => 10%, 5 => 0%

Calculating leapfrog trajectories

A call not unlike

traj = leapfrog_trajectory(, [0, -2], 0.2, -6:10;
                           κ = GaussianKineticEnergy(2),
                           p = [2, 2])

was used to produce the information that was used for the plot below. The resulting vector (here, traj) contains a NamedTuple of positions, momenta, and relative energy. Here the stepsize \(\epsilon = 0.2\) was selected to on purpose as near-but-not-quite-unstable, starting from position [0, -2], taking 6 leapfrog steps backward and 10 forward.

Hamiltonian trajectory

The plot below visualizes the energy relative to the starting point.

Relative energy

Exploring log acceptance ratios

It can be very useful to explore log acceptance ratios. explore_log_acceptance_ratios returns a matrix of them with random momenta. In the plot below, we can see things become iffy for \(\epsilon > 0.5\), approximately.

Internal changes

Although there is a new API, the bulk of the changes were internal: resulting in (hopefully) much cleaner and more generic code, better unit tests, and improved documentation documentation for the internals, which are especially relevant for users using this package for research. If this affects you, please read the code and the docstrings and feel free to ask questions.

PSA: breaking API changes LogDensityProblems and DynamicHMC

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/2019-07-25-psa-dhmc-api-change/

The API of LogDensityProblems and DynamicHMC is being redesigned (issues: DynamicHMC#30, LogDensityProblems#45). This is necessary to address some issues related to ease of use, debugging, robustness, adaptation, and better integration with some advanced AD packages like Zygote.

Once the changes are complete, I will post a detailed writeup for the transition. But in the meantime, if you are using my MCMC libraries, you may want to pin versions, or put the following in your Project.toml‘s [compat] section (you only need the [compat] if there isn’t one):

[compat]
DynamicHMC = "^1.0.5"
LogDensityProblems = "^0.8.3"

For more information, please refer to the Pkg docs.