By: Tamás K. Papp
Reposted from: https://tpapp.github.io/post/discontinuous_integral_ad/
This is a much simplified writeup of a problem I encountered, in a selfcontained blog post.
You want to approximate the integral
\[
I(\theta) = \int 1(g(x) > 0) f(x,\theta) dx
\]
where \(g\) is a continuous function, and \(f(x,\theta)\) is a parametric distribution over \(x\). Everthing is continous, and thus \(I\) is, too.
You face the following constraints:

\(g\) is a black box. We will pretend that you can’t invert it (except for checking our results, of course).

You can calculate the probability density function \(f\) and even draw \(x\)‘s for a particular \(\theta\), but that’s pretty much it. You don’t even get a cdf! (Again, except for checking our results.)
Using Monte Carlo methods, you can do the following:

draw \(x_i \sim F(\cdot, \theta)\), \(i=1,\dots,N\),

approximate
\[
I(\theta) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0)
\]
You could code this in Julia as
d = distr(θ) # suppose this returns some distribution that supports Base.rand
x = rand(d, N)
mean(g.(x) > 0)
So far, neat and simple. Now, the fly in the ointment: you need the derivative \(I'(\theta)\) for optimization or Hamiltonian Monte Carlo. The problem is that you cannot ForwardDiff
your way through the code above: AD’ing a discontinuous step function will just give you \(0\), and rand
does not work with ForwardDiff.Dual
s anyway (which is very sensible).
However, there is a solution: rewrite the approximation as
\[
I(\theta; \theta_0) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0) \frac{f(x_i,\theta)}{f(x_i,\theta_0)}
\]
where \(\theta_0\) is the parameter used to simulate the \(x_i\). Differentiate the above at \(\theta = \theta_0\). This approximates
\[
I'(\theta) = \int 1(g(x) > 0) \frac{\partial f(x,\theta)/\partial \theta}{f(x,\theta)}f(x,\theta) dx = \int 1(g(x) > 0) \partial f(x,\theta)/\partial \theta dx
\]
which is exactly what you want.
Assume that the actual calculation is very complicated, so we would rather avoid implementing it for the integral and the derivative separately. It turns out that this is very simple to do with ForwardDiff.Dual
values: the code is literally a oneliner and a fallback method:
elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))
which you can use in a function like
integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))
I demonstrate this with \(g(x) = x\) and \(x \sim \text{Normal}(\theta, 1)\), for which of course we know that the analytical values for \(I\) and \(I’\) are the right tail probability and the pdf at 0, respectively.
Graphs below show that the approximation is reasonable — we could make it much better with lowdiscrepancy sequences, but that is an orthogonal issue.
It is amazing how much you can accomplish with two lines of code in Julia! The problem that motivated this blog post is multivariate with irregular regions over which \(\{ x: g(x) > 0 \}\), but I used elasticity
as above.
Selfcontained code for everything is available below.
download as code.jl
using ForwardDiff
import ForwardDiff: Dual, value, partials, Tag
using Distributions
using Plots
using LaTeXStrings
gr()
######################################################################
# elasticity calculation
######################################################################
"""
elasticity(x)
Calculate `x/x`, stripping `x` of partial derivative information. Useful for
calculating elasticities of the form `∂f/f` using ForwardDiff.
"""
elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))
######################################################################
# example application
######################################################################
integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))
"Helper function that returns the value and derivative at the same time."
function value_and_derivative(f::F, x::R) where {F,R<:Real}
T = typeof(Tag(F, R))
y = f(Dual{T}(x, one(x)))
value(y), partials(y, 1)
end
distr(θ) = Normal(θ, 1)
g(x) = x
function ID_analytical(θ)
d = distr(θ)
ccdf(d, 0), pdf(d, 0)
end
function ID_approx(θ)
x = rand(distr(θ), 1000)
value_and_derivative(θ>integral_approx(g, distr(θ), x), θ)
end
θs = linspace(2, 2, 51)
ID0s = ID_analytical.(θs)
ID1s = ID_approx.(θs)
scatter(θs, first.(ID0s), xlab = "\\theta", ylab = "integral",
label = "analytical", legend = :topleft)
scatter!(θs, first.(ID1s), label = "approximation")
savefig("integral.svg")
scatter(θs, last.(ID0s), xlab = "\\theta", ylab = "derivative",
label = "analytical", legend = :topleft)
scatter!(θs, last.(ID1s), label = "approximation")
savefig("derivative.svg")