Proper Bayesian Estimation of a Point Process in Julia

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2020/11/03/BayesPointProcess.html

I know how to use Stan and I know how to use Turing. But how do
those packages perform the posterior sampling for the underlying
models. Can I write a posterior distribution down and get
AdvancedHMC.jl to sample it? This is exactly what I want to do with
a point process where the posterior distribution of the model is a
touch more complicated than your typical regression problems.

This post will take you through my thought process and how you got from an idea, to a simulation of that idea, frequentist estimation of the simulated data and then a full Bayesian sampling of the problem.

But first, these are the Julia libraries that we will be using.

using Plots
using PlotThemes
using StatsPlots
using Distributions

Inhomogeneous Point Processes

A point process basically describes the time when something happens. That “thing” we can call an event and they happen between \(0\) and some maximum time \(T\). We describe the probability of an event happening at time \(t\) with an intensity \(\lambda\). Specifically we are going to use 4 different parameters for a polynomial.

\[\lambda (t) = \exp \left( \beta _0 + \beta _1 t + \beta _2 t^2 + \beta _3 ^2 t^3 \right)\]

We take the exponent to ensure that the function is positive throughout the time period. What does this look like? We can simple plot the function from 0 to 100 with some random values for the \(\beta _i\)s.

λ(t::Number, params::Array{<:Number}) = exp(params[1] + params[2]*(t/100) + params[3]*(t/100)^2 + params[4]*(t/100)^3)
λ(t::Array{<:Number}, params::Array{<:Number}) = map(x-> λ(x, params), t)

testParams = [3, -0.5, -0.8, -2.9]
maxT = 100

plot(λ(collect(0:maxT), testParams), label=:none)

This looks like something that definitely changes over time. When
\(\lambda(t)\) is high we expect more events and likewise when it is
low there will be fewer events.

Simulating by Thinning

Let us simulate a point process using this intensity function. To do
so we use a procedure called thinning. This can be explained as a
three step process:

  1. Firstly simulate a constant Poisson process with intensity \(\lambda ^\star\) which is greater than \(\lambda (t)\) for all \(t\). This gives the un-thinned events, \(t^*_i\).
  2. For each un-thinned event calculate the probability it will become one of the final events as \(\frac{\lambda (t^*_i)}{\lambda ^\star}\).
  3. Sample from these probabilities to get the final events.

Simple enough to code up in a few lines of Julia.

lambdaMax = maximum(λ(collect(0:0.1:100), testParams)) * 1.1
rawEvents = rand(Poisson(lambdaMax * maxT), 1)[1]
unthinnedEvents = sort(rand(Uniform(0, maxT), rawEvents))
acceptProb = λ(unthinnedEvents, testParams) / lambdaMax
events = unthinnedEvents[rand(length(unthinnedEvents)) .< acceptProb];
histogram(events,label=:none)

svg

A steady decreasing amount of events following the intensity function from above.

Maximum Likelihood Estimation

The log likelihood of a point process can be written as:

\[\mathcal{L} = \Sigma _{i = 1} ^N \log \lambda (t_i) – \int _0 ^T \lambda (t) \mathrm{d} t\]

Again, easy to write the code for this. The only technical difference is I am using the QuadGK.jl package to numerically integrate the function rather than doing the maths myself. This keeps it simple and also flexible if we decided to change the intensity function later.

function likelihood(params, rate, events, maxT)
    sum(log.(rate(events, params))) - quadgk(t-> rate(t, params), 0, maxT)[1]
end

For maximum likelihood estimation we simply pass this function through to an optimiser and find the maximum point. As optimize actually finds minimum points we have to invert the function.

using Optim
using QuadGK
opt = optimize(x-> -1*likelihood(x, λ, events, maxT), rand(4))
plot(λ(collect(0:maxT), testParams), label="True")
plot!(λ(collect(0:maxT), Optim.minimizer(opt)), label = "MLE")

svg

Not a bad result! Our estimated intensity function is pretty close to
the actual function. So now we know that we can both simulate from a inhomogeneous point
process and that our likelihood can infer the correct parameters.

Bayesian Inference

Now for the good stuff. All of the above is needed for the Bayesian inference procedure. If you can’t get the maximum likelihood working for a relatively simple problem like above, adding in the complications of Bayesian inference will just get you knotted up without any results. So with the good results from above let us proceed to the Bayes methods. With the AdvancedHMC.jl package I can use all the fancy MCMC algos and upgrade from the basic Metropolis Hastings sampling.

I’ve shamelessly copied the README from AdvancedHMC.jl and changed the bits needed for this problem.

using AdvancedHMC, ForwardDiff

D = 4; initial_params = rand(D)

n_samples, n_adapts = 5000, 2000

target(x) = likelihood(x, λ, events, maxT) + sum(logpdf.(Normal(0, 5), x))

metric = DiagEuclideanMetric(D)
hamiltonian = Hamiltonian(metric, target, ForwardDiff)

initial_ϵ = find_good_stepsize(hamiltonian, initial_params)
integrator = Leapfrog(initial_ϵ)
proposal = NUTS{MultinomialTS, GeneralisedNoUTurn}(integrator)
adaptor = StanHMCAdaptor(MassMatrixAdaptor(metric), StepSizeAdaptor(0.8, integrator))

samples1, stats1 = sample(hamiltonian, proposal, initial_params, 
                        n_samples, adaptor, n_adapts; progress=true);
samples2, stats2 = sample(hamiltonian, proposal, initial_params, 
                        n_samples, adaptor, n_adapts; progress=true);

Samples done, now to manipulate the results to get the parameter
estimation.

a11 = map(x -> x[1], samples1)
a12 = map(x -> x[1], samples2)
a21 = map(x -> x[2], samples1)
a22 = map(x -> x[2], samples2)
a31 = map(x -> x[3], samples1)
a32 = map(x -> x[3], samples2)
a41 = map(x -> x[4], samples1)
a42 = map(x -> x[4], samples2)

bayesEst = map( x -> mean(x[1000:end]), [a11, a21, a31, a41])
bayesLower = map( x -> quantile(x[1000:end], 0.25), [a11, a21, a31, a41])
bayesUpper = map( x -> quantile(x[1000:end], 0.75), [a11, a21, a31, a41])
density(a21, label="Chain 1")
density!(a22, label="Chain 2")
vline!([testParams[2]], label="True")
plot!(-4:4, pdf.(Normal(0, 5), -4:4), label="Prior")

svg

The chains have sampled correctly and are centered around the correct
value. Plus it’s suitably different from the prior, which shows it has
updated with the information from the events.

plot(a11, label="Chain 1")
plot!(a12, label="Chain 2")

svg

Looking at the convergence of the chains is also positive. So for this
simple model, everything looks like it has worked correctly.

plot(λ(collect(0:maxT), testParams), label="True")
plot!(λ(collect(0:maxT), Optim.minimizer(opt)), label = "MLE")
plot!(λ(collect(0:maxT), bayesEst), label = "Bayes")

svg

Again, the bayesian estimate of the function isn’t too far from the true intensity. Success!

Conclusion

So what have I learnt after writing all this:

  • AdvancedHMC.jl is easy to use and despite all the scary terms and settings you can get away with the defaults.

What I have hopefully taught you after reading this:

  • Point process simulation through thinning.
  • What the likelihood of a point process looks like.
  • Maximum likelihood using Optim.jl
  • How to use AdvancedHMC.jl for that point process likelihood to get the posterior distribution.

NeuriViz (Part 1) – Performant Graphics for Neuroinformatics

By: The Cedar Ledge

Re-posted from: http://jacobzelko.com/neuriviz-1/

NeuriViz – an open experiment into creating performant graphics for neuroinformatics!

I am quite excited to introduce a new side-project of mine called NeuriViz!
NeuriViz is a proof of concept application of the Javis.jl package to create performant animated graphics and visualizations for the domain of neuroinformatics!

If you find this blog post useful, please consider citing it:

Jacob Zelko. NeuriViz (Part 1) – Performant Graphics for Neuroinformatics. November 1st, 2020. http://jacobzelko.com

How Did NeuriViz Come to Be?

NeuriViz started as an offshoot of my work on Javis.jl that my friend, Ole Kröger, and I created.
The Javis package acts as a general purpose library to easily construct informative, performant, and winsome animated graphics using the Julia programming language.
While working on Javis, I realized it had potential to be applied to domain specific graphics that can be difficult to make.
Given my background in health sciences and cognitive disabilities I used Javis to create an animation of a 10-20 EEG Electrode Array:

After I had created this, I reached out to a neuroscientist that I knew.
From there, I was connected with my collaborator on the project, Zachary Christensen, the creator of the JuliaNeuroscience organization and an MD/PhD student at the University of Rochester.
Finally, after a few different conversations, NeuriViz was created.

Creating EEG Topoplots with Julia

Obviously, the 10-20 EEG electrode array animation is not particularly useful to a neuroinformaticist.
However, what it was useful for was to illustrate an important aspect of NeuriViz: every single component of that animation was original.
I created the “template” for this array by using Luxor objects which Javis exports and was able to, after defining constants and package imports, draw the EEG within 40 lines of code.

The first animation goal of NeuriViz is to produce a performant EEG topoplot.
The following image is taken from documentation of the popular MATLAB EEG toolbox, EEGLAB, to illustrate exactly what is envisioned:

Currently, with help from the creator of Luxor, Cormullion, I have been able to produce the following topoplot example using signal noise and my template:

The data here is nonsensical but creating this entire prototype of a 10-20 EEG electrode array only took about 20 lines more than my original example.

There is still much to do with making a fully functioning topoplot using Javis.
For starters, the poor subject is missing a nose and ears!
As shown in the example from EEGLAB, there are no boundaries between different activity gradients.
Finally, taking actual channel data and interpolating it across the contour of the skull to create the actual topoplot is a non-trivial process.

However, I am confident that not only is this surmountable but that we can make animations for brain activity across epochs in a fast and performant way.
Why do I think that?
Let’s get to that in the next section where we talk about data to visualization pipelines!

NeuriViz Data-to-Visualization Pipelines

For this project, I have been working with Go-nogo categorization and detection task dataset submitted by Delorme et. al. to the open science neuroinformatics dataset service, OpenNEURO. [1]
The entire dataset is roughly 10GB worth of EEG data following the BIDS standard, the standard format for managing EEG data. [2]
Further, this dataset has been used in a variety of studies already and its validity as a strong dataset has been proven in literature. [3 – 5]

An initial challenge with handling this data was determining how best to represent the data in terms of a file format and data structure.
To address the issue of the file format, I turned to Arrow.jl, which is maintained by the JuliaData organization, to use the Apache Arrow format.
The Arrow format was created in 2016 by Wes McKinney, the creator of the extremely popular pandas Python package, to produce language-independent open standards and libraries to accelerate and simplify in-memory computing. [6]
Of note, it is ideal for columnar data, cache-efficiency on CPUs and GPUs, leverages parallel processing, and optimized for scan and random access.
Arrow.jl brilliantly used memory mapping to implement the standard in an immensely efficient way.

Here are light benchmarks from my usage of Arrow.jl on a single ~25MB data file:

I thought this was immensely impressive and performant for processing my entire dataset which, in this case, would be “larger than memory” in regards to my computer’s memory.
Further, the minimal allocations is fantastic – I am sure some of my methodology could be improved, but I am already very pleased for the NeuriViz project.

Having tentatively solved the problem of data formats to enable high speeds and efficiency for data manipulation, came the problem of actually creating a structure to hold this data.
The BIDS format enforces an architecture that does enhance reproducibility of science, but it results in having data split across multiple files and formats.
Thankfully, I found a solution by using the following packages:

  • BIDSTools.jl – I used this to easily parse out relevant data files from the dataset.
  • AxisIndices.jl – This was created by my collaborator, Zachary Christensen, and I utilized it to load in all the disparate files into one data structure.

After the loading was complete, I ended up with the following intuitive syntax:

Currently, this is the idea for a user interface and is being actively worked on in v0.1.0 of NeuriViz.
The goal then becomes piping this data to generate the topoplot and interfacing it with the array layout I created with Javis.

Conclusion

All the pieces that I need to accomplish the goal of making a performant visualization using Javis and raw EEG data are here.
I am quite excited to see what a prototype will produce with NeuriViz but, based on my prior work, I am quite optimistic about what will come!
Feel free to watch or star the project on GitHub as developments are soon to come!

Take care and all the best! ~ jz

If you spot any errors or have any questions, feel free to contact me about them!

References

[1] A. Delorme and M. Fabre-Thorpe, Go-nogo categorization and detection task. 2020.

[2] K. J. Gorgolewski et al., “The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments,” Sci Data, vol. 3, no. 1, p. 160044, Dec. 2016, doi: 10.1038/sdata.2016.44.

[3] Fabre-Thorpe, M., Delorme, A., Marlot, C., & Thorpe, S. (2001). A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes. Journal of cognitive neuroscience, 13(2), 171–180. https://doi.org/10.1162/089892901564234

[4] Delorme, A., Rousselet, G. A., Macé, M. J., & Fabre-Thorpe, M. (2004). Interaction of top-down and bottom-up processing in the fast visual analysis of natural scenes. Brain research. Cognitive brain research, 19(2), 103–113. https://doi.org/10.1016/j.cogbrainres.2003.11.010

[5] Delorme, A., Makeig, S., Fabre-Thorpe, M., & Sejnowski, T. (2002). From single-trial EEG to brain area dynamics. Neurocomputing, 44–46, 1057–1064. https://doi.org/10.1016/S0925-2312(02)00415-0

[6] Wes McKinney, Apache Arrow and the Future of Data Frames. 2020.