Author Archives: Posts | Welcome!

Bayesian Inference on State Space Models – Part 4

By: Posts | Welcome!

Re-posted from: https://paschermayr.github.io/post/statespacemodels-3-inference-on-state-space-models-part-4/

Particle MCMC for State Space Model Estimation

Building our first PMCMC algorithm!

This will be the final post of the series where everything will (hopefully) come together. In the first posts about
HMMs and
HSMMs,
I explained the basics about said models. In the

first blog post of my Inference in state space models series,
I outlined a rough sketch on how to conduct parameter estimation for these models, and concluded that Bayesian methods are particularly suitable if we want to jointly sample
the model parameter and the state sequence given the observed data. In the

following post,
I coded up a basic particle filter from scratch to sample the state sequence and, in my

latest post,
I implemented a framework to sample the corresponding model parameter. Please note that going forward you will need to import all Julia functions
that we defined so far in order to make the code in this article work. If you do not want to copy paste code through all the articles, you can download
the Julia scripts from my
GitHub profile.

Explaining the estimation problem

Let us straight jump into the topic: the goal is to infer the posterior $P( s_{1:T}, \theta \mid e_{1:T})$. $\theta$ is, in our case, all the observation and transition parameter.
If we assume univariate Normal observation distributions, we need to estimate $\mu$ and $\sigma$ for each state. In addition, we need to estimate the transition
matrix, which I will keep fixed for now. You can also estimate this matrix if you extend our code in the
MCMC part of the series for vector valued distributions.
As a quick recap, we will use the particle filter from
this post
to sample the state trajectory, and the Metropolis sampler from
this post
to propose new model parameter. With the information gathered from both steps, we will either accept or reject this pair.

Coding the PMCMC framework

Let us start by loading all necessary libraries and defining two helper functions that we need
in the PMCMC framework: a function to transfer the current values in our parameter container into the corresponding model distributions as well as
a function to show all current values in this parameter container.

#Support function to transfer parameter into distributions
function get_distribution(μᵥ::Vector{ParamInfo}, σᵥ::Vector{ParamInfo})
    return [Normal(μᵥ[iter].value, σᵥ[iter].value) for iter in eachindex(μᵥ)]
end
get_distribution(param::HMMParameter) = get_distribution(param.μ, param.σ)

#Support function to grab all parameter in HMMParameter container
function get_parameter(parameter::HMMParameter)
    θ = Float64[]
    for field_name in fieldnames( typeof( parameter ) )
        sub_field = getfield(parameter, field_name )
        append!(θ, [sub_field[iter].value for iter in eachindex(sub_field) ] )
    end
    return θ
end
get_parameter (generic function with 1 method)

Neither of the two functions are needed to grasp the concept of PMCMC, but are useful for the sampling process. Next, let us
sample the corresponding observed and latent data for some HMM. We will use these model parameter later on to compare the accuracy of our PMCMC
algorithm:

#Generate data
T = 100
evidence_true =  [Normal(2., 1.), Normal(-2.,1.)]
transition_true = [ Categorical([0.85, 0.15]), Categorical([0.5, 0.5]) ]

s, e = sampleHMM(evidence_true, transition_true, T)

#Plot data
plot( layout=(2,1), label=false, margin=-2Plots.px)
plot!(e, ylabel="observed data", label=false, subplot=1, color="gold4")
plot!(s, yticks = (1:2), ylabel="latent state", label=false, subplot=2, color="black")


I explained this code already in an
earlier post,
so check this one out if there are any unclarities. I will now initialize the model parameter and the corresponding particle
filter – note that there are many methods to do so, but for now we just initialize them manually, far enough away from the true parameter:

#Generate Initial parameter estimates - for univariate normal observations
μᵥ = [ParamInfo(3.0, Normal() ), ParamInfo(-3.0, Normal() )] #Initial value and Prior
σᵥ = [ParamInfo(1.5, Gamma(2,2) ), ParamInfo(2.0, Gamma(2,2) )] #Initial value and Prior
param_initial = HMMParameter(μᵥ, σᵥ)

#Generate initial state trajectory
evidence_current = get_distribution(param_initial)
transition_current = deepcopy(transition_true)
initial_current = Categorical( gth_solve( Matrix(get_transition_param(transition_current) ) ) )
pf = ParticleFilter(initial_current, transition_current, evidence_current)

You can check out
this post
if the code above is still unclear to you. Now all that is left to do is implementing the PMCMC
sampling routine. I did my best to
comment and make the code as clear as possible by adding comments at almost each line – the steps itself do not differ much from a standard Metropolis algorithm,
so this function should be relatively straightforward to understand. The Unicode support for Julia also helps a lot here, as I
could use a subscript t for variables that are defined in the transformed parameter space. Some more comments to understand the code a bit more clearly:

  • The function input are the particle filter and the initial parameter guess we stated above. Additionally, the observed data will be used for the
    likelihood evaluation. Iterations stands for the number of samples that will be obtained. Scaling is a tuning parameter that states how big the proposal
    steps for the parameter will be. The larger the tuning parameter, the larger the movements. This influences the
    speed of convergence and acceptance rate into opposite directions. Check it out yourself by plugging in different values.

  • The function output are the parameter and state trajectory samples. The results are pre-allocated at the beginning of the function in matrices
    called traces and will ultimately be used to check the accuracy of our algorithm.

  • Before the loop starts, the initial parameter and state trajectory are assigned, and the log prior and likelihood given these values are calculated.

  • The algorithm then proceeds by iteratively sampling new parameter and a state trajectory given the new parameter. Each iteration will be stored in the corresponding trace:
    if the step is accepted, the proposed parameter-trajectory pair is stored. Otherwise the current pair is stored.

  • That’s it! Let us have a look:

function sampling(pf::ParticleFilter, parameter::HMMParameter,
                  observations::AbstractArray{T}; iterations = 5000, scaling = 1/4) where {T}
    ######################################## INITIALIZATION
    #initial trajectory and parameter container
    trace_trajectory =  zeros(Int64, size(observations, 1), iterations)
    trace_θ =  zeros(Float64, size( get_parameter(parameter), 1), iterations)
    #Initial initial prior, likelihood, parameter and trajectory
    logπₜ_current = calculate_logπₜ(parameter)
    θ_current = get_parameter(parameter)
    loglik_current, _, s_current = propose(pf, observations)
    #Initialize MCMC sampler
    θₜ_current = transform(parameter)
    mcmc = Metropolis(θₜ_current, scaling)

    ######################################## PROPAGATE THROUGH ITERATIONS
    for iter in Base.OneTo(iterations)

        #propose a trajectory given current parameter
        θₜ_proposed = propose(mcmc)
        #propose new parameter given current parameter - start by changing modelparameter in pf with current θₜ_proposed
        inverse_transform!(parameter, θₜ_proposed)
        pf.observations = get_distribution(parameter)
        logπₜ_proposed = calculate_logπₜ(parameter)
        loglik_proposed, _, s_proposed = propose(pf, observations)
        #Accept or reject proposed trajectory and parameter
        accept_ratio = min(1.0, exp( (loglik_proposed + logπₜ_proposed) - (loglik_current + logπₜ_current) ) )
        accept_maybe =  accept_ratio > rand()
        if accept_maybe
            θ_current = get_parameter(parameter)
            s_current = s_proposed
            logπₜ_current = logπₜ_proposed
            loglik_current = loglik_proposed
            #Set current θ_current as starting point for Metropolis algorithm
            mcmc.θₜ = θₜ_proposed #transform(parameter)
        end
        #Store trajectory and parameter in trace
        trace_trajectory[:, iter] = s_current
        trace_θ[:, iter] = θ_current
    end
    ########################################
    return trace_trajectory, trace_θ
end
sampling (generic function with 1 method)

Let us run this function with our initial parameter estimates:

trace_s, trace_θ = sampling(pf, param_initial, e; iterations = 3000, scaling = 1/4)

The final step is to check our posterior samples. I will do so by comparing the traces of the trajectories and the parameter
against the parameter that have been used to generate the data:

#Check how well we did - trajectories
plot_latent = Plots.plot(layout=(1,1), legend=:topright)
Plots.plot!(s, ylabel="latent state", label=false, color="black")
Plots.plot!( round.( mean(trace_s; dims=2) ), color="gold4", label="Posterior nearest state")
plot_latent


#Check how well we did - θ
plot_θ = Plots.plot(layout=(2,1), legend=:topright)
Plots.hline!([2.,-2.], ylabel="μ", xlabel="", subplot=1, color="black", legend=false)
Plots.plot!( trace_θ[1:2,:]', color="gold4", label="Posterior draws", subplot=1, legend=false)

Plots.hline!([1.], ylabel="σ", xlabel="PMCMC Iterations", subplot=2, color="black", legend=false)
Plots.plot!( trace_θ[3:4,:]', color="gold4", label="Posterior draws", subplot=2, legend=false)
plot_θ


Not bad! Test it out yourself with different starting values, and check what happens if you change some of the tuning parameter, like the
scaling size for the MCMC sampler, or the number of particles in the particle filter. In any case, you should now have a rough
understanding about the PMCMC machinery, well done!

All done!

This concludes the mini series about inference on state space models. I hope this primer was enough to create some interest, shoot me a message if you have any
comments! There are a lot of topics left to make further posts: we have not touched parameter initialization, expanding the algorithm to
other state space models, or making the algorithm more efficient. So far, we have not even discussed forecasting!
Though I have not decided on future blog posts yet, I will certainly expand on some of these topics. See you soon!

Bayesian Inference on State Space Models – Part 3

By: Posts | Welcome!

Re-posted from: https://paschermayr.github.io/post/statespacemodels-3-inference-on-state-space-models-part-3/

The MCMC in PMCMC – Making a Proposal Distribution for State Space Model Parameter

Almost done!

In my previous blog posts

here
and
here,
we explored a general framework on how to conduct parameter estimation
for state space models (SSM), and took a first step in implementing this machinery. In order to
apply Particle MCMC (PMCMC), we have to

    1. Find a good proposal distribution $f(\theta^{\star} \mid \theta)$ for $ \theta^{\star}$.
    1. Find a good proposal distribution $P(s^\star_{1:T} \mid \theta^{\star}, e_{1:T})$ for $ S^{\star}_{1:T}$.
    1. Find a sufficiently ‘good’ likelihood approximation $\hat{\mathcal{L}}$$_{\theta}(e_{1:T})$ that we can plug into the acceptance rate.

We have already tackled the latter two problems in the previous posts, so what about the first one?

The MCMC in PMCMC

If we want to jointly sample the parameter vector $\theta$, the first thing we need to do is to check the corresponding boundary conditions
of our parameter. For instance, if we want to estimate parameter of univariate normal observations, the mean parameter is
typically defined on the whole Reals, while the variance is constrained to be positive. There are several ways to proceed:

  • i. We could, in principle, use some multivariate proposal distribution $f( . \mid \theta)$, and reject all proposals where one of the parameter
    falls outside of the corresponding boundaries. This is, as you already guessed, very wasteful.

  • ii. Alternatively, you could use a truncated multivariate proposal distribution. Note that this is NOT the same as (i.), as one needs
    to adjust the normalisation constants in the MH acceptance ratio, see

    this blog post for a more detailed explanation.
    This approach is intuitively more efficient than the first option, but highly model specific and sometimes difficult to track for errors.

  • iii. Another possibility is to first transform the corresponding parameter into an unconstrained space, perform the MCMC step in this
    unconstrained space, and then transform the proposed parameter back into the original space for the likelihood evaluation.
    In this case, we have to be careful when calculating the prior. As we are now working with the transformed
    parameter, we need to adjust the prior distribution with the Jacobian of the transform (keyword: push-forward measure). If you have trouble understanding that,
    I recommend
    this explanation
    before you proceed reading.
    This approach has a much higher initial workload, but to my mind is the most general and effective way to perform
    MCMC, and almost all libraries that you use work in this way under the hood. Consequently, we will focus on the third case.

So, how can we code this up? The most straightforward way to implement this approach is by using information from the corresponding
prior: boundaries, length and dimension of each parameter can be inferred from here. I will use a new Julia package

Bijectors.jl that makes the workflow significantly easier, but there are similar packages
in other languages as well.

Coding it all up

In my
last post, we implemented a particle
filter for basic hidden Markov models, so let us continue to use this example.
Hence, we are interested in the mean and variance parameter for each state, and the corresponding transition matrix of the
latent Markov chain. For now, let us assume the transition matrix is known, so we can implement
the MCMC machinery for scalar based parameter, and do not need to extend our functions for vector valued parameter. To my mind,
this would put too much attention to the code rather than the actual topic.
We start by creating a container that can hold one parameter and its corresponding priors.

using Distributions, Bijectors, Parameters
import Base: count

#A container with necessary information about θ
mutable struct ParamInfo{T, D<:Union{Nothing, <:Distributions.Distribution} }
    value   :: T #Can be Real, Integer, or a Arrays composed of it
    prior   :: D #Corresponding prior of value, needs to fulfill boundary constraints!
end
ParamInfo(value::T) where {T} = ParamInfo(value, nothing)

#Check the number of parameter in ParamInfo struct
function count(paraminfo::ParamInfo)
    return length(paraminfo.value)
end
count (generic function with 17 methods)

Note that I also defined a function that informs us about the parameter size inside this container. Next, we will make a
summary container for all the different parameter that we will need during the tutorial – i.e., mean and variance parameter of a univariate Normal distribution
for each state in the basic HMM case.

#A summary container for all relevant parameter
mutable struct HMMParameter
    μ   :: Vector{ParamInfo} #Observation Distribution Mean
    σ   :: Vector{ParamInfo} #Observation Distribution Variance
end

Further, we want to define some helper functions that enable us to transform and inverse transform the parameter
in said container. Remember that our goal is to sample from some unconstrained space, and then inverse transform said parameter back to our constrained space. This will help us to accept more parameter proposals in the MCMC step. Note that I will use a subscript t for
certain variables in order to indicate that the operation takes place in the transformed space.

#Wrapper to transform θ into an unconstrained space
function transform(parameter::ParamInfo)
    return Bijectors.link(parameter.prior, parameter.value)
end
#Wrapper to transform θₜ back into the original space, and store it in parameter struct
function inverse_transform!(parameter::ParamInfo, θₜ::T) where {T}
    value = Bijectors.invlink(parameter.prior, θₜ)
    @pack! parameter = value
    return value
end

#The same two functions dispatched on the summary container
function transform(parameter::HMMParameter)
    θₜ = Float64[]
    for field_name in fieldnames( typeof( parameter ) )
        sub_field = getfield(parameter, field_name )
        append!(θₜ, transform.(sub_field) )
    end
    return θₜ
end
function inverse_transform!(parameter::HMMParameter, θₜ::Vector{T}) where {T}
    counter = 1
    for field_name in fieldnames( typeof( parameter ) )
        sub_field = getfield(parameter, field_name )
        dim = sum( count.(sub_field) )
        inverse_transform!.( sub_field, θₜ[ counter:(-1 + counter + dim )] )
        counter += dim
    end
end
inverse_transform! (generic function with 2 methods)

We must define this for both the individual ParamInfo as well as the summary container. This is, admittedly, not a straightforward process. However,
the important part is that we now have a container that holds all the
unknown observation parameter that we need in the HMM case, and we can freely transform and inverse transform parameter
that are contained in there. If you understand this goal, then you are perfectly fine to continue.

As an example, let us now create an object that contains the mean and variance parameter of a univariate Normal two-state HMM. Next,
we transform all parameter into an unconstrained space, sample the transformed parameter and inverse transform them back into the original space:

#Create a Vector of univariate Normal Model parameter and assign a prior to each of them:
μᵥ = [ParamInfo(-2.0, Normal() ), ParamInfo(2.0, Normal() )] #Initial value and Prior
σᵥ = [ParamInfo(1.0, Gamma() ), ParamInfo(1.0, Gamma(2,2) )] #Initial value and Prior

hmmparam = HMMParameter(μᵥ, σᵥ)

#Transform parameter:
transform(hmmparam)

#Sample parameter from an unconstrained distribution, then inverse transform and plug into our container:
θₜ_proposed = randn(4)
inverse_transform!(hmmparam, θₜ_proposed)
hmmparam #Check that parameter in hmmparam fulfill boundary conditions

Well done, we achieved our initial goal! You can run the above code several times, and will see that the plugged in
parameter in the constrained space will satisfy the corresponding boundary conditions, even though we sampled from a multivariate
Normal distribution. At the very beginning, I mentioned that when transforming parameter in an unconstrained space,
one has to adjust the corresponding prior. These adjustments will all be handeled by the Bijectors package, so
we can conveniently write a function that calculates the prior for each of our parameter and returns the sum of it.
This will be helpful when we run the PMCMC algorithm in the next blog post.

#function to calculate prior including Jacobian adjustment
function calculate_logπₜ(parameter::ParamInfo)
    return logpdf_with_trans(parameter.prior, parameter.value, true)
end

#Wrapper to calculate prior including jacobian adjustment on all parameter
function calculate_logπₜ(parameter::HMMParameter)
    πₜ = 0.0
    for field_name in fieldnames( typeof( parameter ) )
        sub_field = getfield(parameter, field_name )
        πₜ += sum( calculate_logπₜ.(sub_field) )
    end
    return πₜ
end
calculate_logπₜ (generic function with 2 methods)

Perfect! Now we are almost done for today,
the last thing to do is to create a MCMC sampler. We will use a straightforward Metropolis algorithm with a multivariate Normal proposal distribution.
As before, we will first define a container with all the necessary information and a function on it to sample via this object:

# Container with necessary information for a Metropolis MCMC step
mutable struct Metropolis{T<:Real}
    θₜ           ::  Vector{T}
    scaling     ::  Float64
end

# Function to propose a Metropolis step
function propose(sampler::Metropolis)
    @unpack θₜ, scaling = sampler
    return rand( MvNormal(θₜ, scaling) )
end
propose (generic function with 1 method)

Scaling is a tuning parameter that determines how large the proposal steps for the parameter will be. The larger the tuning parameter,
the larger the proposed movements will be, which might result in a lower acceptance rate. Check this out yourself. At the end, we check our framework by assigning an initial parameter value
and sampling from there via this MCMC sampler:

#Create an initial sampler, and propose new model parameter:
θₜ_inital = randn(4)
metropolissampler = Metropolis(θₜ_inital, 1/length(θₜ_inital) )
propose(metropolissampler)
4-element Array{Float64,1}:
 -0.5791553185118626
  0.5875000886407763
 -0.09563827588507412
  1.1869926689230335

Way to go!

This section was more straightforward than the previous one! We can now store, transform and inverse transform model parameter
of a basic HMM. We can also create a basic MCMC algorithm to sample said parameter. In my final post, we will combine the previous instalments of this series
and implement the PMCMC algorithm to estimate HMM model and state trajectory parameter jointly. See you soon!

Inference in State Space Models – Part 2

By: Posts | Welcome!

Re-posted from: https://paschermayr.github.io/post/statespacemodels-3-inference-in-state-space-models-part-2/

Particle Filtering in State Space Models

Welcome back!

In my
previous blog post, we explored a
general framework to conduct parameter estimation
on state space models (SSM). We call this framework Particle MCMC (PMCMC), and the three major
steps to successfully apply this algorithm are as follows:

    1. Find a good proposal distribution $f(\theta^{\star} \mid \theta)$ for $ \theta^{\star}$.
    1. Find a good proposal distribution $P(s^\star_{1:T} \mid \theta^{\star}, e_{1:T})$ for $ S^{\star}_{1:T}$.
    1. Find a sufficiently ‘good’ likelihood approximation $\hat{\mathcal{L}}$$_{\theta}(e_{1:T})$ that we can plug into the acceptance rate.

In this post, we will try to solve problems 2. and 3. simultaneously. Hence, we need to find a way
to sample a state trajectory and get an approximation for the likelihood given the
current parameter vector $\theta$. In order to use all the code we use here, please load all the functions
from my
introductory HMM article,
or just include the corresponding HMM Julia file from my
GitHub account.

Filtering the latent state trajectory

In PMCMC, a particle filter (PF) is used to first sample a trajectory $s^{\star}_{1:T}$ and then calculate an
unbiased estimate of the corresponding likelihood. Note that unbiased does not mean our estimate is automatically
‘good’ enough, so there is quite a bit of ongoing research on how to keep the variance of particle filter likelihood estimates in check. For an in-depth explanation about
PFs, I refer to (Doucet and Johansen, 2011), as explaining the whole machinery of it in detail is simply out of reach in
a simple blog post.

In a nutshell, particle filters are usually used to solve filtering equations in the form of
$$
\begin{equation}
\begin{split}
\pi_t( x_{1:t} ) &= \frac{ \tau_t(x_{1:t}) }{ Z_t }. \\
&= \frac{ w_{1:t} q_t(x_{1:t}) }{ Z_t }\
\end{split}
\end{equation}
$$

Here, the goal is to sequentially sample a sequence of random variables, $X_t, t \in (1,…, T)$ that come
from a sequence of target probabilities $\pi_t( x_{1:t} )$ with the same computational complexity at
each time step. The last part is crucial as we do not want to increase the computing time as more and more data comes in.
You will notice I rewrote the equation by introducing $w_{1:t}=\frac{ \tau_t(x_{1:t}) }{ q_t(x_{1:t}) }$. Those familiar with
importance sampling will likely understand why, but the reason we did so is because usually it is very difficult
to sample from $\tau_t(x_{1:t})$. We introduce an easier distribution $q_t(x_{1:t})$ and reweight this distribution via
$w_{1:t}$. Consequently, we need to find a good proposal distribution $q_t(x_{1:t})$, such that the computational complexity will not increase over time.
A convenient by-product is that we can approximate the normalizing constant
$Z_t$ with these weights:
$$
\begin{equation}
\hat{Z_t} = \frac{1}{t} \sum_{i=1}^{K} \frac{\tau_t(x_{1:t})}{q_t(x_{1:t})} = \frac{1}{t} \sum_{i=1}^{K} w_t(x^i_{1:t}).
\end{equation}
$$

In the basic HMM case, the joint distribution and the normalizing constant are of the form
$\tau_t(x_{1:t}) = \prod_{k=1}^{t} f(s_k \mid s_{k-1}) g(e_k \mid s_k)$
and $Z = p(e_{1:t})$. Hence you can use the weights from above to approximate the likelihood that we need in the PMCMC sampler. In order to have a constant
time complexity, one can use the Markovian properties of the latent states by imposing some transition density $q_{t}( x_{t} \mid x_{t-k})$
and propagating particles forward at each time step t. Afterwards,
these particles will be weighted with the weight function we introduced above. Particles that are very unlikely given the observed data
will then be replaced by more
realistic particles – this has the effect that variance of our likelihood estimate will not be ‘too’ large. After we propagated
a bunch of particles forward up to the final time point T, we can sample one trajectory of these particles to get
$P(s^{\star}_{1:T} \mid \theta^{\star}, e_{1:T})$ for our PMCMC algorithm.

If you read through the paper mentioned above, you will discover the optimal importance/proposal distribution
in terms of minimising the variance of the likelihood estimate for processes with
Markov property is $q(s_t \mid s_{t-1}, e_t) = p(s_t \mid s_{t-1}, e_t)$. This is usually not availabe, but gives some insight about good candidates.
A common and simple choice is the so-called Bootstrap filter,
which takes $q(s_t \mid s_{t-1}, e_t) = p(s_t \mid s_{t-1})$ and simplifies the corresponding particle weights to to $p(e_t \mid s_t)$.
Both of these distributions are readily available in the HMM and HSMM case, so we jump straight into the coding section.

Coding our very first particle filter

The first thing that we need to define is the particle filter container itself. We will use the same style and structure as in our
HMM post,
and the advantage of the Bootstrap filter is that we can just plug in the model distributions from the HMM to define our PF. As a result,
we define a field for the initial and transition distribution for our particles. The last field, observations, is used to weight the particles that we sample,
so we can filter our particles that do not seem to capture the data well enough:

using Distributions, Parameters, Plots
import QuantEcon: gth_solve

mutable struct ParticleFilter
        initial         ::      Distributions.Distribution
        transition      ::      Vector{<:Distributions.Distribution}
        observations    ::      Vector{<:Distributions.Distribution}
end

In addition, we will define a helper function to calculate the corresponding weights in the log domain. This usually helps the numerical stability of the algorithm.

#Helper function
function logsumexp(arr::AbstractArray{T}) where {T <: Real}
max_arr = maximum(arr)
max_arr + log(sum(exp.(arr .- max_arr)))
end

function logmeanexp(arr::AbstractVector{T}) where {T <: Real}
    log( 1/size(arr, 1) ) + logsumexp( arr )
end

logsumexp (generic function with 1 method)

Good! Now we can straight jump into the actual code. As always, I do my best to make as many comments as possible, but to fully understand each step,
there is probably no escape from reading the actual particle filter literature I mentioned above. Some comments to
understand the code a bit more clearly:

  • The function input are the particle filter we defined above, and some observed data that we use to weight the particles that we propagate forward.

  • The function output is a log likelihood estimate, all the particle trajectories that we propagated forward, and a single trajectory that we will later on use in the PMCMC algorithm.

  • Before we start the for loop, we pre-allocate the weights and the particles that we calculate later on. As you can see, we also normalize the weights, which I will denote with a
    subscript norm.

  • The first particles are chosen from the initial distribution. If the latent states of the data are conceived as a subsequence of a
    long-running process, the probability of the initial state should be set to the stationary state probabilities
    of this unobserved Markov chain. Hence, this will be a function of the transition distribution of our model.

  • The for-loop evolves around iteratively propagating particles forward and calculating
    the corresponding weights of the particles. As you can see, we will not always reweighting the particles at each iteration, but only if a certain threshold is achieved.
    This usually results in a even lower variance for our likelihood estimate. You can test this out yourself by altering the code and reweight at each iteration.

  • That’s it! Let us have a look:

#Create a function to run a particle filter
function propose(pf::ParticleFilter, evidence::AbstractArray{T}; Nparticles=100, threshold=75) where {T<:Real}
        #Initialize variables
        @unpack initial, transition, observations = pf #Model parameter
        ℓlik = 0.0 #Log likelihood

        ℓweights = zeros(Float64, Nparticles)#Weights that are used to approximate log likelihood
        ℓweightsₙₒᵣₘ = similar(ℓweights)

        #initialize particles and sample first time point via initial distribution
        particles = zeros(Int64, size(evidence, 1), Nparticles )
        particles[1,:] =  rand(initial, Nparticles)

        #loop through time
        for t in 2:size(evidence, 1)

        #propagate particles forward
        particles[t, :] .= rand.( transition[ particles[t-1, :] ] )

        #Get new weights and calculate log likelihood
        ℓweights .= logpdf.( observations[ particles[t, :] ], evidence[t] )
        ℓweightsₙₒᵣₘ .= ℓweights .- logsumexp(ℓweights)
        ℓlik += logmeanexp(ℓweights) #add incremental likelihood

        #reweight particles if resampling threshold achieved
        if exp( -logsumexp(2. * ℓweightsₙₒᵣₘ) ) <= threshold
                paths = rand( Categorical( exp.(ℓweightsₙₒᵣₘ) ), Nparticles )
                particles .= particles[:, paths] #Whole trajectory!
        end

        end

        #Draw 1 trajectory path at the end
        path = rand( Categorical( exp.(ℓweightsₙₒᵣₘ) ) )
        trajectory = particles[:, path] #to keep type

        return ℓlik, particles, trajectory
end
propose (generic function with 1 method)

Perfect! With the weights, we can obtain the likelihood approximation, and with the particles, we can get the
state trajectory that we need for the PMCMC algorithm. We still need to compute a function that translates our transition matrix to the
initial distribution of the corresponding Markov chain. As such, we create a helper function so we can extract all the transition matrix parameter from our particle filter:

#Create a function so that we can obtain initial distribution from transition matrix
function get_transition_param(transitionᵥ::Vector{<:Categorical})
    return reduce(hcat, [transitionᵥ[iter].p for iter in eachindex(transitionᵥ)] )'
end
get_transition_param (generic function with 1 method)

Do not worry, this function is not necessary to understand the mechanics of PMCMC. It is just a convenient wrapper
to create a transition matrix from all the parameter in our transition distribution. Now, let us check how good our algorithm works by first
sampling some HMM data:

T = 150
HMMevidence =  [Normal(2., .5), Normal(-2.,2.)]
HMMtransition = [ Categorical([0.95, 0.05]), Categorical([0.5, 0.5]) ]

state, observation = sampleHMM(HMMevidence, HMMtransition, T)
plot( layout=(2,1), label=false, margin=-2Plots.px)
plot!(observation, ylabel="data", label=false, subplot=1, color="gold4")
plot!(state, yticks = (1:2), ylabel="state", label=false, subplot=2, color="black")


Next, let us create a Bootstrap particle filter with the corresponding transition and observation distribution from the HMM. If our code is correct, then
the particle trajectories of our PF should be very close to the latent state sequence from the sample HMM data. As already mentioned, the initial distribution will be
a function of the transition matrix:

#Initialize PF
pf = ParticleFilter( Categorical( gth_solve( Matrix(get_transition_param(HMMtransition) ) ) ),
                    HMMtransition,
                    HMMevidence
                    )

Cool! Now let us run the particle filter once and plot the propagated trajectories against the sample data:

ll, particles, trajectory = propose(pf, observation; Nparticles=500, threshold=500 )

Plots.plot(state, label="HMM latent states", xlabel="time", ylabel="latent state")
Plots.plot!( mean(particles; dims=2) , label="Particle Filter state trajectories")


Good! The propagated particle trajectories from our algorithm are close to the sample data, exactly what we wanted to have.
Play around with this yourself for different HMM parameter. There are, however, still a few aspects we have not yet dived into. For instance, how many particles do we
need to achieve a “good” likelihood approximation? In the basic HMM case, we could actually
calculate the likelihood exactly, and then check the solution against our approximation. This has been implemented many times, so it is easy to
code this up yourself. I will instead focus on a different aspect: we said that our approximation is unbiased, but what about the variance of this estimate?
There is a pretty good discussion and ideas on how to tune the number of particles on Professor Darren Wilkinson’s
blog post about Particle MCMC.
We already know that we can decrease the variance of our estimate by inflating the number of particles, but this comes at the cost of additional computing time – so how much is enough? Moreover,
given we apply the particle filter in combination
with a MCMC algorithm, it is important to know on whether the variance of our estimate is constant across the support of our parameter. To check this, we
will initiate a grid of possible values for one parameter of your choice, and then run the particle filter several times
for each element in the grid. In the end, we can visually check if the variance for these estimates is (a) low
enough for your problem and (b) stays constant for a range of possible values:

#Check variance of likelihood estimate over a range of θ
function check_ll(pf::ParticleFilter, evidence::AbstractArray{T}, grid; runs = 20, Nparticles = 100) where {T<:Real}

        #Assign a matrix to store log likelihood estimate for each run
        ll_estimate = zeros(Float64, runs, length(grid))

        #Loop through the grid "runs" number of times, and assign the likelihood estimate to the preallocated matrix
        for iter in eachindex(grid)
                pf.observations[1] = Normal( grid[iter], pf.observations[1].σ)
                Base.Threads.@threads for idx in Base.OneTo(runs)
                        ll_estimate[idx, iter], _, _ = propose(pf, observation; Nparticles = Nparticles, threshold = Nparticles )
                end
        end

        #Return the log likelihood estimates
        return ll_estimate
end

grid = 1.0:0.05:3.0
ll_estimate = check_ll(pf, observation, grid; Nparticles = 500)
plot(grid, ll_estimate', seriestype = :scatter, ms=3.0, label="", xlabel="Parameter value", ylabel="log likelihood estimate")


In this case, the variance does not seem to increase drastically given enough particles. You can see the contrast when assigning only a fraction of them:

ll_estimate = check_ll(pf, observation, grid; Nparticles = 50)
plot(grid, ll_estimate', seriestype = :scatter, ms=3.0, label="", xlabel="Parameter value", ylabel="log likelihood estimate")


In practice, you can choose the number of particles that you want to use by running a few trials for different parameter grids, but there are also methods to do so on the fly, some of them
mentioned in my linked blog post above.

Nice one!

Wow, it took me a very long time to write this article, because it was extremely difficult to find a satisfying trade-off between making this
post intuitive and explaining enough theory to ensure the code above makes any sense to you. To be honest, I would recommend reading
one or two tutorials on particle filtering to fully understand the code, but I hope you could grasp the general goal through this blog post.
That being said, the most difficult part is done! We can now sample a state trajectory and obtain a likelihood estimate
for our MCMC algorithm for state space models. In part 3, we will use the most basic MCMC sampler and then
finish the PMCMC algorithm to obtain estimates for a basic HMM. See you soon!

References

Doucet, A. and Johansen, A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later.