Automatic differentiation of discontinuous integrals

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/discontinuous_integral_ad/

This is a much simplified writeup of a problem I encountered, in a self-contained blog post.

You want to approximate the integral

\[
I(\theta) = \int 1(g(x) > 0) f(x,\theta) dx
\]

where \(g\) is a continuous function, and \(f(x,\theta)\) is a parametric distribution over \(x\). Everthing is continous, and thus \(I\) is, too.

You face the following constraints:

  1. \(g\) is a black box. We will pretend that you can’t invert it (except for checking our results, of course).

  2. You can calculate the probability density function \(f\) and even draw \(x\)‘s for a particular \(\theta\), but that’s pretty much it. You don’t even get a cdf! (Again, except for checking our results.)

Using Monte Carlo methods, you can do the following:

  1. draw \(x_i \sim F(\cdot, \theta)\), \(i=1,\dots,N\),

  2. approximate

\[
I(\theta) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0)
\]

You could code this in Julia as

d = distr(θ)   # suppose this returns some distribution that supports Base.rand
x = rand(d, N)
mean(g.(x) > 0)

So far, neat and simple. Now, the fly in the ointment: you need the derivative \(I'(\theta)\) for optimization or Hamiltonian Monte Carlo. The problem is that you cannot ForwardDiff your way through the code above: AD’ing a discontinuous step function will just give you \(0\), and rand does not work with ForwardDiff.Duals anyway (which is very sensible).

However, there is a solution: rewrite the approximation as

\[
I(\theta; \theta_0) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0) \frac{f(x_i,\theta)}{f(x_i,\theta_0)}
\]

where \(\theta_0\) is the parameter used to simulate the \(x_i\). Differentiate the above at \(\theta = \theta_0\). This approximates

\[
I'(\theta) = \int 1(g(x) > 0) \frac{\partial f(x,\theta)/\partial \theta}{f(x,\theta)}f(x,\theta) dx = \int 1(g(x) > 0) \partial f(x,\theta)/\partial \theta dx
\]

which is exactly what you want.

Assume that the actual calculation is very complicated, so we would rather avoid implementing it for the integral and the derivative separately. It turns out that this is very simple to do with ForwardDiff.Dual values: the code is literally a one-liner and a fallback method:

elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

which you can use in a function like

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

I demonstrate this with \(g(x) = x\) and \(x \sim \text{Normal}(\theta, 1)\), for which of course we know that the analytical values for \(I\) and \(I’\) are the right tail probability and the pdf at 0, respectively.

Graphs below show that the approximation is reasonable — we could make it much better with low-discrepancy sequences, but that is an orthogonal issue.

integral

derivative

It is amazing how much you can accomplish with two lines of code in Julia! The problem that motivated this blog post is multivariate with irregular regions over which \(\{ x: g(x) > 0 \}\), but I used elasticity as above.

Self-contained code for everything is available below.

download as code.jl

using ForwardDiff
import ForwardDiff: Dual, value, partials, Tag
using Distributions
using Plots
using LaTeXStrings
gr()

######################################################################
# elasticity calculation
######################################################################

"""
    elasticity(x)

Calculate `x/x`, stripping `x` of partial derivative information. Useful for
calculating elasticities of the form `∂f/f` using ForwardDiff.
"""
elasticity(x::Real) = one(x)

elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

######################################################################
# example application
######################################################################

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

"Helper function that returns the value and derivative at the same time."
function value_and_derivative(f::F, x::R) where {F,R<:Real}
    T = typeof(Tag(F, R))
    y = f(Dual{T}(x, one(x)))
    value(y), partials(y, 1)
end

distr(θ) = Normal(θ, 1)

g(x) = x

function ID_analytical(θ)
    d = distr(θ)
    ccdf(d, 0), pdf(d, 0)
end

function ID_approx(θ)
    x = rand(distr(θ), 1000)
    value_and_derivative(θ->integral_approx(g, distr(θ), x), θ)
end

θs = linspace(-2, 2, 51)
ID0s = ID_analytical.(θs)
ID1s = ID_approx.(θs)

scatter(θs, first.(ID0s), xlab = "\\theta", ylab = "integral",
        label = "analytical", legend = :topleft)
scatter!(θs, first.(ID1s), label = "approximation")
savefig("integral.svg")

scatter(θs, last.(ID0s), xlab = "\\theta", ylab = "derivative",
        label = "analytical", legend = :topleft)
scatter!(θs, last.(ID1s), label = "approximation")
savefig("derivative.svg")