Tag Archives: CUDA

Julia GPU Programming with WSL2

Re-posted from: https://geekmonkey.org/julia-gpu-programming-with-wsl2/

Julia GPU Programming with WSL2

I use Windows for gaming. It's been a long time since I've last done any serious development work on my Windows machine and yet I still spent a good chunk of money on building out a beefy machine for my efforts to learn machine learning. It has taken me a few months to finally sit down and get this machine ready for anything other than gaming.

With the recent announcement of GUI support for WSLg I got really excited to try out WSL and see how good the GPU support actually is, but that's not the main reason. I've been shying away from developing on Windows because I'm used to a *NIX environment. WSL gives you that, but up until recently you wouldn't have been able to interact with any GPU – and this all changed with this announcement!

You can watch the video below to see what's coming for WSL2.

The first half of this article will show you how to get everything set up and in the second half we'll set up CUDA.jl in Julia. Since the latter part is about CUDA I'll assume that you have a compatible nVidia GPU.

Installation

Here's a summary of what we need to go through to prepare our environment:

Update Windows 10 to latest release on the dev channel
Install nVidia CUDA drivers
Install Ubuntu 20.20 in WSL2
Install Linux CUDA packages
🎉

Windows 10 Insider Preview

At the time of writing all of the features are only available through the Windows Insider Program. The Windows Insider Program allows you to receive new Windows features before the hit the main update line. The program is split into three channels: Dev, Beta and Release Preview.

To receive the update with WSLg and GPU support we will need to switch to the dev channel.

Note: The dev channel comes with some rough edges and potential for system instability. Be mindful of this when you switch and make sure you have backups!

After installing all downloaded updates you should end up with OS Build 21364 or higher. You can check your OS Build by running winver in PowerShell/cmd.

With this all set we can hop on to install the latest WSL2 compatible CUDA drivers.

CUDA drivers

NVIDIA are providing special CUDA drivers for Windows 10 WSL. The link below will take you to the download page. It's required to sign up for the NVIDIA Developer Program, which is free.

Follow the setup wizard (I chose the express installation which keeps existing settings in place).

Note: There's also a documentation page provided by NVIDIA around setting up your GPU for WSL. I found the docs to be outdated and not working on my machine.

Installing Ubuntu 20.04 LTS with WSL2

Before we can proceed with installing Ubuntu I advise to update the WSL kernel by running:

wsl --update

In case you're like me and don't enjoy the default terminal Windows comes with I suggest you install Windows Terminal from the Microsoft Store. This terminal comes is a lot more pleasant to use than either cmd or the PowerShell terminal ever were.

It's also a good idea to set WSL to default to version 2:

wsl --set-default-version 2

Finally let's install Ubuntu with:

wsl --install --distribution Ubuntu-20.04

In case you were wondering what other distributions are available you can simply run: wsl --list --online.

Ubuntu 20.04 LTS

With Ubuntu installed we have a couple of final steps. First we will add an upstream repo to apt for getting the latest CUDA builds directly from NVIDIA:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

Add CUDA apt repository for Ubuntu 20.04

We also need to add NVIDIA's GPG key for the apt repo:

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub

Add nVidia gpg key

And finally to make sure that we prefer the packages provided by NVIDIA over packages in mainline Ubuntu we need to pin the apt repo:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

Pin file to prioritize CUDA apt repository

With this out of the way, we're ready to install the CUDA drivers inside our WSL Ubuntu installation:

 sudo apt update && sudo apt install -y cuda-drivers

In case setting up WSL2 with GPU support was all you wanted to do – we're done!

Julia + WSL2 + Cuda.jl

Ok, so as promised in the headline we're now going to install Julia inside our Ubuntu 20.04 and set up Cuda.jl.

Install Julia

At the time of writing Julia 1.6.1 is the latest version available (make sure to check for updates on Julia's download page).

First let's fetch the latest Julia tarball:

wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.1-linux-x86_64.tar.gz

Extract the .tar.gz:

tar -xvzf julia-1.6.1-linux-x86_64.tar.gz

Move the extracted folder to /opt:

sudo mv -r julia-1.6.1 /opt/

Finally, create a symbolic link to julia inside the /usr/local/bin folder:

sudo ln -s /opt/julia-1.6.1/bin/julia /usr/local/bin/julia

You may pick a different target directory for your installation of Julia or use a version manager like asdf-vm.

Install Cuda.jl

At this point simply run julia in your terminal and you should be dropped into the Julia REPL. I assume you've worked with Julia before and know how to operate it's package manager.

Hit ] to enter pkg mode and install CUDA with:

activate --temp
add CUDA

From here hit backspace and import CUDA. CUDA.jl provides a useful function called functional which will confirm that we've done everything right (well, that's the hope at least, right?).

using CUDA
CUDA.functional()

You can additionally run CUDA.versioninfo() to get a more detailed breakdown of the supported features on your GPU.

At this point you should have a working installation with WSL2, Ubuntu 20.04, Julia and CUDA.jl. In case you're new to CUDA.jl I suggest you follow the excellent introduction to GPU programming by JuliaGPU or jump in at the deep end with FluxML's GPU support.

If you like articles like this one, please consider subscribing to my free newsletter where at least once a week I send out my latest work covering Julia, Python, Machine Learning and other tech.

You can also follow me on Twitter.

Neural Jump SDEs (Jump Diffusions) and Neural PDEs

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/neural-jump-sdes-jump-diffusions-and-neural-pdes/

This is just an exploration of some new neural models I decided to jot down for safe keeping. DiffEqFlux.jl gives you the differentiable programming tools to allow you to use any DifferentialEquations.jl problem type (DEProblem) mixed with neural networks. We demonstrated this before, not just with neural ordinary differential equations, but also with things like neural stochastic differential equations and neural delay differential equations.

At the time we made DiffEqFlux, we were the “first to the gate” for many of these differential equations types and left it as an open question for people to find a use for these tools. And judging by the Arxiv papers that went out days after NeurIPS submissions were due, it looks like people now have justified some machine learning use cases for them. There were two separate papers on neural stochastic differential equations, showing them to be the limit of deep latent Gaussian models. Thus when you stick these new mathematical results on our existing adaptive high order GPU-accelerated neural SDE solvers, you get some very interesting and fast ways to learn some of the most cutting edge machine learning methods.

So I wanted to help you guys out with staying one step ahead of the trend by going to the next differential equations. One of the interesting NeurIPS-timed Arxiv papers was on jump ODEs. Following the DiffEqFlux.jl spirit, you can just follow the DifferentialEquations.jl tutorials on these problems, implement them, add a neural network, and it will differentiate through them. So let’s take it one step further and show an example of how you’d do that. I wanted to take a look at jump diffusions, or jump stochastic differential equations, which are exactly what they sound like. They are a mixture of these two methods. After that, I wanted to show how using some methods for stiff differential equations plus a method of lines discretization gives a way to train neural partial differential equations.

Instead of being fully defined by neural networks, I will also be showcasing how you can selectively make parts of a differential equation neuralitized and other parts pre-defined, something we’ve been calling mixed neural differential equations, so we’ll demonstrate a mixed neural jump stochastic differential equation and a mixed neural partial differential equation with fancy GPU-accelerated adaptive etc. methods. I’ll then leave as homework how to train a mixed neural jump stochastic partial differential equation with the fanciest methods, which should be easy to see from this blog post (so yes, that will be the MIT 18.337 homework). This blog post will highlight that these equations are all already possible within our framework, and will also show the specific places we see that we need to accelerate to really put these types of models into production.

Neural Jump Stochastic Differential Equations (Jump Diffusions)

To get to jump diffusions, let’s start with a stochastic differential equation. A stochastic differential equation is defined via

$dX_t = f(t,X_t)dt + g(t,X_t)dW_t$

which is essentially saying that there is a deterministic term $f$ and a continuous randomness term $g$ driven by a Brownian motion. Theorems like Donsker’s theorem can be thought of as a generalization of the central limit theorem, saying that continuous stochastic processes of some large class can be reinterpreted as this kind of process (due to the Gaussian-ness of Brownian motion), so in some sense this is a very large encompassing class. If you haven’t seen the previous blog post which mentions how to define neural SDEs, please check that out now. Let’s start with a code that uses reverse-mode automatic differentiation through a GPU-accelerated high order adaptive SDE solver. The code looks like:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo
 
u0 = Float32[2.; 0.] |> gpu
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
    true_A = [-0.1 2.0; -2.0 -0.1] |> gpu
    du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
mp = Float32[0.2,0.2] |> gpu
function true_noise_func(du,u,p,t)
    du .= mp.*u
end
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100)
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))
 
dudt = Chain(x -&gt; x.^3,
             Dense(2,50,tanh),
             Dense(50,2)) |> gpu
ps = Flux.params(dudt)
n_sde = x-&gt;neural_dmsde(dudt,x,mp,tspan,SOSRI(),saveat=t,reltol=1e-1,abstol=1e-1)
 
pred = n_sde(u0) # Get the prediction using the correct initial condition
 
dudt_(u,p,t) = Flux.data(dudt(u))
g(u,p,t) = mp.*u
nprob = SDEProblem(dudt_,g,u0,(0.0f0,1.2f0),nothing)
 
monte_nprob = MonteCarloProblem(nprob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 100)
monte_nsum = MonteCarloSummary(monte_nsol)
#plot(monte_nsol,color=1,alpha=0.3)
p1 = plot(monte_nsum, title = "Neural SDE: Before Training")
scatter!(p1,t,sde_data',lw=3)
 
scatter(t,sde_data[1,:],label="data")
scatter!(t,Flux.data(pred[1,:]),label="prediction")
 
function predict_n_sde()
  n_sde(u0)
end
loss_n_sde1() = sum(abs2,sde_data .- predict_n_sde())
loss_n_sde10() = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:10])
Flux.back!(loss_n_sde1())
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
cb = function () #callback function to observe training
  sample = predict_n_sde()
  # loss against current data
  display(sum(abs2,sde_data .- sample))
  # plot current prediction against data
  cur_pred = Flux.data(sample)
  pl = scatter(t,sde_data[1,:],label="data")
  scatter!(pl,t,cur_pred[1,:],label="prediction")
  display(plot(pl))
end
 
# Display the SDE with the initial parameter values.
cb()
 
Flux.train!(loss_n_sde1 , ps, Iterators.repeated((), 100), opt, cb = cb)
Flux.train!(loss_n_sde10, ps, Iterators.repeated((), 100), opt, cb = cb)
 
dudt_(u,p,t) = Flux.data(dudt(u))
g(u,p,t) = mp.*u
nprob = SDEProblem(dudt_,g,u0,(0.0f0,1.2f0),nothing)
 
monte_nprob = MonteCarloProblem(nprob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 100)
monte_nsum = MonteCarloSummary(monte_nsol)
#plot(monte_nsol,color=1,alpha=0.3)
p2 = plot(monte_nsum, title = "Neural SDE: After Training", xlabel="Time")
scatter!(p2,t,sde_data',lw=3,label=["x" "y" "z" "y"])
 
plot(p1,p2,layout=(2,1))
 
savefig("neural_sde.pdf")
savefig("neural_sde.png")

This just uses the diffeq_rd layer function to tell Flux to use reverse-mode AD (using Tracker.jl, unless you check out a bunch of weird Zygote.jl branches: wait for Zygote) and then trains the neural network using a discrete adjoint. While the previously posted example uses forward-mode, we have found that this is much much faster on neural SDEs, so if you’re trying to train them, I would recommend using this code instead (and I’ll get the examples updated).

Now to this equation let’s add jumps. A jump diffusion is defined like:

$du = f(u,p,t)dt + \sum g_i(u,t)dW^i + \sum c_i(u,p,t)dp_i$

where $dp_i$ are the jump terms. The jump terms differ from the Brownian terms because they are non-continuous: they are zero except at countably many time points where you “hit” the equation with an amount $c_i(u,p,t)$ . The timing at which these occur is based on an internal rate $\lambda_i$ of the jump $dp_i$ .

Jump diffusions are important because, just as there is a justification for the universality of stochastic differential equations, there is a justification here as well. The Levy Decomposition says that essentially any Markov process can be decomposed into something of this form. They also form the basis for many financial models, because for example changing regimes into a recession isn’t gradual but rather sudden. Models like Merton’s model thus use these as an essential tool in quantitative finance. So let’s train a neural network on that!

What we have to do is define jump processes and append them onto an existing differential equation. The documentation shows how to use the different jump definitions along with their pros and cons, so for now we will use ContinuousRateJump. Let’s define a ContinuousRateJump which has a constant rate and a neural network that decides what the effect of the jump ( $c_i(u,p,t)$ ) will be. To do this, you’d simply put the neural network in there:

rate(u,p,t) = 2.0
affect!(integrator) = (integrator.u = dudt2(integrator.u))
jump = ConstantRateJump(rate,affect!)

where dudt2 is another neural network, and then wrap that into a jump problem:

prob = SDEProblem(dudt_,g,gpu(param(x)),tspan,nothing)
jump_prob = JumpProblem(prob,Direct(),jump,save_positions=(false,false))

And of course you can make this fancier: just replace that rate 2.0 with another neural network, make the g(u,p,t) term also have a neural network, etc.: explore this as you wish and go find some cool stuff. Let’s just stick with this as our example though, but please go ahead and make these changes and allow DiffEqFlux.jl to help you to explore your craziest mathematical idea!

Now when you solve this, the jumps also occur along with the stochastic differential equation. To show what that looks like, let’s define a jump diffusion and solve it 100 times, taking its mean as our training data:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo,
    DiffEqJump
 
u0 = Float32[2.; 0.]
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
  true_A = [-0.1 2.0; -2.0 -0.1]
  du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
const mp = Float32[0.2,0.2]
function true_noise_func(du,u,p,t)
  du .= mp.*u
end
 
true_rate(u,p,t) = 2.0
true_affect!(integrator) = (integrator.u[1] = integrator.u[1]/2)
true_jump = ConstantRateJump(true_rate,true_affect!)
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
jump_prob = JumpProblem(prob,Direct(),true_jump,save_positions=(false,false))
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(jump_prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100,parallel_type=:none)
plot(monte_sol,title="Training Data")
 
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))

From the plot you can see wild discontinuities mixed in with an equation with continuous randomness. Just lovely.

A full code for training a neural jump diffusion thus is:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo,
    DiffEqJump
 
u0 = Float32[2.; 0.] |> gpu
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
  true_A = [-0.1 2.0; -2.0 -0.1] |> gpu
  du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
const mp = Float32[0.2,0.2] |> gpu
function true_noise_func(du,u,p,t)
  du .= mp.*u
end
 
true_rate(u,p,t) = 2.0
true_affect!(integrator) = (integrator.u[1] = integrator.u[1]/2)
true_jump = ConstantRateJump(true_rate,true_affect!)
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
jump_prob = JumpProblem(prob,Direct(),true_jump,save_positions=(false,false))
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(jump_prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100,parallel_type=:none)
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))
 
dudt = Chain(x -&gt; x.^3,
           Dense(2,50,tanh),
           Dense(50,2)) |> gpu
dudt2 = Chain(Dense(2,50,tanh),
            Dense(50,2)) |> gpu
ps = Flux.params(dudt,dudt2)
 
g(u,p,t) = mp.*u
n_sde = function (x)
    dudt_(u,p,t) = dudt(u)
    rate(u,p,t) = 2.0
    affect!(integrator) = (integrator.u = dudt2(integrator.u))
    jump = ConstantRateJump(rate,affect!)
    prob = SDEProblem(dudt_,g,param(x),tspan,nothing)
    jump_prob = JumpProblem(prob,Direct(),jump,save_positions=(false,false))
    solve(jump_prob, SOSRI(); saveat=t ,abstol = 0.1, reltol = 0.1) |> Tracker.collect
end
 
pred = n_sde(u0) # Get the prediction using the correct initial condition
 
dudt__(u,p,t) = Flux.data(dudt(u))
rate__(u,p,t) = 2.0
affect!__(integrator) = (integrator.u = Flux.data(dudt2(integrator.u)))
jump = ConstantRateJump(rate__,affect!__)
nprob = SDEProblem(dudt__,g,u0,(0.0f0,1.0f0),nothing)
njump_prob = JumpProblem(prob,Direct(),jump, save_positions = (false,false))
 
monte_nprob = MonteCarloProblem(njump_prob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 1000,parallel_type=:none, abstol = 0.1, reltol = 0.1)
monte_nsum = MonteCarloSummary(monte_nsol)
 
#plot(monte_nsol,color=1,alpha=0.3)
p1 = plot(monte_nsum, title = "Neural Jump Diffusion: Before Training")
scatter!(p1,t,sde_data',lw=3)
 
scatter(t,sde_data[1,:],label="data")
scatter!(t,Flux.data(pred[1,:]),label="prediction")
 
function predict_n_sde()
    n_sde(u0)
end
 
loss_n_sde1() = sum(abs2,sde_data .- predict_n_sde())
 function loss_n_sde100()
    loss = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:100])
    @show loss
    loss
end
function loss_n_sde500()
    loss = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:500])
    @show loss
    loss
end 
Flux.back!(loss_n_sde1())
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
cb = function () #callback function to observe training
    sample = predict_n_sde()
    # loss against current data
    display(sum(abs2,sde_data .- sample))
    # plot current prediction against data
    cur_pred = Flux.data(sample)
    pl = scatter(t,sde_data[1,:],label="data")
    scatter!(pl,t,cur_pred[1,:],label="prediction")
    display(plot(pl))
end
 
# Display the SDE with the initial parameter values.
cb()
 
Flux.train!(loss_n_sde1 , ps, Iterators.repeated((), 100), opt, cb = cb)

Notice how it’s almost exactly the same as the SDE code but with the definition of the jumps. You still get the same high order adaptive GPU-accelerated (choice of implicit, etc.) SDE solvers, but now to this more generalized class of problems. Using the GPU gives a good speedup in the neural network case, but slows it down quite a bit when generating the training data since it’s not very parallel. Finding out new ways to use GPUs is one thing I am interested in perusing here. Additionally, using a lower tolerance StackOverflows Tracker.jl, which is something we have fixed with Zygote.jl and will be coming to releases once Zygote.jl on the differential equation solvers is more robust. Lastly, the plotting with GPU-based arrays is wonky right now, we’ll need to make the interface a little bit nicer. However, this is a proof of concept that this stuff does indeed work, though it takes awhile to train it to a “decent” loss (way more than the number of repetitions showcased in here).

[Note: you need to add using CuArrays to enable the GPU support. I turned it off by default because I was training this on my dinky laptop :)]

Neural Partial Differential Equations

Now let’s do a neural partial differential equation (PDE). We can start by pulling code from this older blog post on solving systems of stochastic partial differential equations with GPUs. Here I’m going to strip the stochastic part off, simply because I want to train this on my laptop before the flight ends, so again I’ll leave it as an exercise to do the same jump diffusion treatment to this PDE. Let’s start by defining the method of lines discretization for our PDE. If you don’t know what that is, please go read that blog post on defining SPDEs. What happens is the discretization gives you a set of ODEs to solve, which looks like:

using OrdinaryDiffEq, RecursiveArrayTools, LinearAlgebra,
      DiffEqOperators, Flux, CuArrays
 
# Define the constants for the PDE
const α₂ = 1.0f0
const α₃ = 1.0f0
const β₁ = 1.0f0
const β₂ = 1.0f0
const β₃ = 1.0f0
const r₁ = 1.0f0
const r₂ = 1.0f0
const D = 100.0f0
const γ₁ = 0.1f0
const γ₂ = 0.1f0
const γ₃ = 0.1f0
const N = 100
const X = reshape([i for i in 1:N for j in 1:N],N,N) |> gpu
const Y = reshape([j for i in 1:N for j in 1:N],N,N) |> gpu
const α₁ = 1.0f0.*(X.&gt;=80)
 
const Mx = Array(Tridiagonal([1.0f0 for i in 1:N-1],[-2.0f0 for i in 1:N],[1.0f0 for i in 1:N-1])) |> gpu
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
u0 = rand(Float32,N,N,3) |> gpu
const MyA = zeros(Float32,N,N) |> gpu
const AMx = zeros(Float32,N,N) |> gpu
const DA = zeros(Float32,N,N) |> gpu
 
# Define the discretized PDE as an ODE function
function f(_du,_u,p,t)
  u = reshape(_u,N,N,3)
  du= reshape(_du,N,N,3)
  A = @view u[:,:,1]
  B = @view u[:,:,2]
  C = @view u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,vec(u0),(0.0f0,100.0f0))
@time sol = solve(prob,BS3(),  progress=true,saveat = 5.0)
@time sol = solve(prob,ROCK2(),progress=true,saveat = 5.0)
 
 
using Plots; pyplot()
p1 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,1],title = "[A]")
p2 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,2],title = "[B]")
p3 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
savefig("neural_pde_training_data.png")
 
using DiffEqFlux, Flux
 
u0 = param(u0)
tspan = (0.0f0,100.0f0)
 
ann = Chain(Dense(3,50,tanh), Dense(50,3)) |> gpu
p1 = DiffEqFlux.destructure(ann)
ps = Flux.params(ann)
 
_ann = (u,p) -&gt; reshape(p[3*50+51 : 2*3*50+50],3,50)*
                    tanh.(reshape(p[1:3*50],50,3)*u + p[3*50+1:3*50+50]) + p[2*3*50+51:end]
 
function dudt_(_u,p,t)
  u = reshape(_u,N,N,3)
  A = u[:,:,1]
  DA = D .* (A*Mx + My*A)
  _du = mapslices(x -&gt; _ann(x,p),u,dims=3) |> gpu
  du = reshape(_du,N,N,3)
  x = vec(cat(du[:,:,1]+DA,du[:,:,2],du[:,:,3],dims=3))
end
 
prob = ODEProblem(dudt_,vec(Flux.data(u0)),tspan,Flux.data(p1))
@time diffeq_fd(p1,Array,length(u0)*length(0.0f0:5.0f0:100.0f0),prob,ROCK2(),progress=true,
                saveat=0.0f0:5.0f0:100.0f0)
 
function predict_fd()
  diffeq_fd(p1,Array,length(u0)*length(0.0f0:5.0f0:100.0f0),prob,ROCK2(),progress=true,
                  saveat=0.0f0:5.0f0:100.0f0)
end
 
function loss_fd()
  _sol = predict_fd()
  loss = sum(abs2,Array(sol) .- _sol)
  @show loss
  loss
end
loss_fd()
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
 
Flux.train!(loss_fd, ps, data, opt)

The interesting part of this neural differential equation is the local/global aspect of parts. The mapslices call makes it so that way there’s a local nonlinear function of 3 variables applied at each point in space. While it keeps the neural network small, this currently does not do well with reverse-mode automatic differentiation or GPUs. That isn’t a major problem here because, since the neural network is kept small in this architecture, the number of parameters is also quite small. That said, reverse-mode AD will be required for fast adjoint passes, so this is still a work in progress / proof of concept, with a very specific point made (all that’s necessary here is overloads to make mapslices work well).

One point that really came out of this was the ODE solver methods. The ROCK2 method is much faster when generating the training data and when running diffeq_fd. It was a difference of 3 minutes with ROCK2 vs 40 minutes with BS3 (on the CPU), showing how specialized methods really are the difference between the problem being solvable or not. The standard implicit methods like Rodas5 aren’t performing well here either since the 30,000×30,000 dense matrix, and I didn’t take the time to specify sparsity patterns or whatnot to actually make them viable competitors. So for the lazy neural ODE use with sparsity, ROCK2 seems like a very interesting option. This is a testament to our newest GSoC crew’s results since it’s one of the newer methods implemented by our student Deepesh Thakur. There are still a few improvements that need to be made to make the eigenvalue estimates more GPU-friendly as well, making this performance result soon carry over to GPUs as well (currently, the indexing in this part of the code gives it trouble, so a PR is coming probably in a week or so). Lastly, I’m not sure what’s a good picture for these kinds of things, so I’m going to have to think about how to represent a global neural PDE fit.

Conclusion

Have fun with this. There are still some rough edges, for example plotting is still a little wonky because all of the automatic DiffEq solution plotting seems to index, so the GPU-based arrays don’t like that (I’ll update that soon now that it’s becoming a standard part of the workflow). Use it as starter code and find some cool stuff. Note that the examples shown here are not the only ones that are possible. This all just uses Julia’s generic programming and differentiable programming infrastructure in order to automatically generate code that is compatible with GPUs and automatic differentiation, so it’s impossible for me to enumerate all of the possible combinations. That means there’s plenty of things to explore. These are very early preliminary results, but shows that these equations are all possible. These examples show some places where we want to continue accelerating by both improving the methods and their implementation details. I look forward to doing an update with Zygote soon.

CITATION:

Christopher Rackauckas, Neural Jump SDEs (Jump Diffusions) and Neural PDEs, The Winnower6:e155975.53637 (2019). DOI:10.15200/winn.155975.53637

This post is open to read and review on The Winnower.

The post Neural Jump SDEs (Jump Diffusions) and Neural PDEs appeared first on Stochastic Lifestyle.

Solving Systems of Stochastic PDEs and using GPUs in Julia

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/solving-systems-stochastic-pdes-using-gpus-julia/

What I want to describe in this post is how to solve stochastic PDEs in Julia using GPU parallelism. I will go from start to finish, describing how to use the type-genericness of the DifferentialEquations.jl library in order to write a code that uses within-method GPU-parallelism on the system of PDEs. This is mostly a proof of concept: the most efficient integrators for this problem are not compatible with GPU parallelism yet, and the GPU parallelism isn’t fully efficient yet. However, I thought it would be nice to show an early progress report showing that it works and what needs to be fixed in Base Julia and various libraries for us to get the full efficiency.

Our Problem: 2-dimensional Reaction-Diffusion Equations

The reaction-diffusion equation is a PDE commonly handled in systems biology which is a diffusion equation plus a nonlinear reaction term. The dynamics are defined as:

$u_t = D \Delta u + f(t,u)$

But this doesn’t need to only have a single “reactant” u: this can be a vector of reactants and the $f$ is then the nonlinear vector equations describing how these different pieces react together. Let’s settle on a specific equation to make this easier to explain. Let’s use a simple model of a 3-component system where A can diffuse through space to bind with the non-diffusive B to form the complex C (also non-diffusive, assume B is too big and gets stuck in a cell which causes C=A+B to be stuck as well). Other than the binding, we make each of these undergo a simple birth-death process, and we write down the equations which result from mass-action kinetics. If this all is meaningless to you, just understand that it gives the system of PDEs:

$A_t = D \Delta A + \alpha_A(x) - \beta_A A - r_1 A B + r_2 C$

$B_t = \alpha_B - \beta_B B - r_1 A B + r_2 C$

$C_t = \alpha_C - \beta_C C + r_1 A B - r_2 C$

One addition that was made to the model is that we let $\alpha_A(x)$ be the production of $A$ , and we let that be a function of space so that way it only is produced on one side of our equation. Let’s make it a constant when x>80, and 0 otherwise, and let our spatial domain be $x \in [0,100]$ and $y \in [0,100]$ .

This model is spatial: each reactant $u(t,x,y)$ is defined at each point in space, and all of the reactions are local, meaning that $f$ at spatial point $(x,y)$ only uses $u_i(t,x,y)$ . This is an important fact which will come up later for parallelization.

Discretizing the PDE into ODEs

In order to solve this via a method of lines (MOL) approach, we need to discretize the PDE into a system of ODEs. Let’s do a simple uniformly-spaced grid finite difference discretization. Choose $dx = 1$ and $dy = 1$ so that we have 100*100=10000 points for each reactant. Notice how fast that grows! Put the reactants in a matrix such that A[i,j] = $A(x_j,y_i)$ , i.e. the columns of the matrix is the $x$ values and the rows are the $y$ values (this way looking at the matrix is essentially like looking at the discretized space).

So now we have 3 matrices (A, B, and C) for our reactants. How do we discretize the PDE? In this case, the diffusion term simply becomes a tridiagonal matrix $M$ where $[1,-2,1]$ is central band. You can notice that $MA$ performs diffusion along the columns of $A$ , and so this is diffusion along the $y$ . Similarly, $AM$ flips the indices and thus does diffusion along the rows of $A$ making this diffusion along $x$ . Thus $D(M_yA + AM_x)$ is the discretized Laplacian (we could have separate diffusion constants and $dx \neq dy$ if we want by using different constants on the $M$ , but let’s not do that for this simple example. I’ll leave that as an exercise for the reader). I enforced a Neumann boundary condition with zero derivative (also known as a no-flux boundary condition) by reflecting the changes over the boundary. Thus the derivative operator is generated as:

const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))
const My = copy(Mx)
# Do the reflections, different for x and y operators
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0

I also could have done this using the DiffEqOperators.jl library, but I wanted to show what it truly is at its core.

Since all of the reactions are local, we only have each point in space react separately. Thus this represents itself as element-wise equations on the reactants. Thus we can write it out quite simply. The ODE which then represents the PDE is thus in pseudo Julia code:

DA = D*(M*A + A*M)
@. DA + α₁ - β₁*A - r₁*A*B + r₂*C
@. α₂ - β₂*B - r₁*A*B + r₂*C
@. α₃ - β₃*C + r₁*A*B - r₂*C

Note here that I am using α₁ as a matrix (or row-vector, since that will broadcast just fine) where every point in space with x<80 has this zero, and all of the others have it as a constant. The other coefficients are all scalars. How do we do this with the ODE solver?

Our Type: ArrayPartition

The ArrayPartition is an interesting type from RecursiveArrayTools.jl which allows you to define “an array” as actually being different discrete subunits of arrays. Let’s assume that our initial condition is zero for everything and let the production terms build it up. This means that we can define:

A = zeros(M,N); B  = zeros(M,N); C = zeros(M,N)

Now we can put them together as:

u0 = ArrayPartition((A,B,C))

You can read the RecursiveArrayTools.jl README to get more familiar with what the ArrayPartition is, but really it’s an array where u[i] indexes into A first, B second, then C. It also has efficient broadcast, doing the A, B and C parts together (and this is efficient even if they don’t match types!). But since this acts as an array, to DifferentialEquations.jl it is an array!

The important part is that we can “decouple” the pieces of the array at anytime by accessing u.x, which holds our tuple of arrays. Thus our ODE using this ArrayPartition as its container can be written as follows:

function f(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  DA = D*(M*A + A*M)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

where this is using @. to do inplace updates on our du to say how the full ArrayPartition should update in time. Note that we can make this more efficient by adding some cache variables to the diffusion matrix multiplications and using A_mul_B!, but let’s ignore that for now.

Together, the ODE which defines our PDE is thus:

prob = ODEProblem(f,u0,(0.0,100.0))
sol = solve(prob,BS3())

if I want to solve it on $t \in [0,100]$ . Done! The solution gives back ArrayPartitions (and interpolates to create new ones if you use sol(t)). We can plot it in Plots.jl

and see the pretty gradients. Using this 3rd order explicit adaptive Runge-Kutta method we solve this equation in about 40 seconds. That’s okay.

Some Optimizations

There are some optimizations that can still be done. When we do A*B as matrix multiplication, we create another temporary matrix. These allocations can bog down the system. Instead we can pre-allocate the outputs and use the inplace functions A_mul_B! to make better use of memory. The easiest way to store these cache arrays are constant globals, but you can use closures (anonymous functions which capture data, i.e. (x)->f(x,y)) or call-overloaded types to do it without globals. The globals way (the easy way) is simply:

const MyA = zeros(N,N)
const AMx = zeros(N,N)
const DA = zeros(N,N)
function f(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(MyA,My,A)
  A_mul_B!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

For reference, closures looks like:

MyA = zeros(N,N)
AMx = zeros(N,N)
DA = zeros(N,N)
function f_full(t,u,du,MyA,AMx,DA)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(MyA,My,A)
  A_mul_B!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
f = (t,u,du)-> f_full(t,u,du,MyA,AMx,DA)

and a call overloaded type looks like:

struct MyFunction{T} <: Function
  MyA::T
  AMx::T
  DA::T
end
 
# Now define the overload
function (ff::MyFunction)(t,u,du)
  # This is a function which references itself via ff
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(ff.MyA,My,A)
  A_mul_B!(ff.AMx,A,Mx)
  @. ff.DA = D*(ff.MyA + ff.AMx)
  @. dA = f.DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
MyA = zeros(N,N)
AMx = zeros(N,N)
DA = zeros(N,N)
 
f = MyFunction(MyA,AMx,DA)
# Now f(t,u,du) is our function!

These last two ways enclose the pointer to our cache arrays locally but still present a function f(t,u,du) to the ODE solver.

Now since PDEs are large, many times we don’t care about getting the whole timeseries. Using the output controls from DifferentialEquations.jl, we can make it only output the final timepoint.

sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)

Also, if you’re using Juno this’ll give you a nice progress bar so you can track how it’s going.

Quick Note About Performance

We are using an explicit Runge-Kutta method here because that’s what works with GPUs so far. Matrix factorizations need to be implemented for GPUArrays before the implicit (stiff) solvers will be available, so here we choose BS3 since it’s fully broadcasting (not all methods are yet) and it’s fully GPU compatible. In practice, right now using an NxNx3 tensor as the initial condition / dependent variable with either OrdinaryDiffEq’s Rosenbrock23(), Rodas4(), or Sundials’ CVODE_BDF() is actually more efficient right now. But after Julia fixes its broadcasting issue and with some updates to Julia’s differentiation libraries to handle abstract arrays like in DiffEqDiffTools.jl, the stiff solvers will be usable with GPUs and all will be well.

Thus for reference I will show some ways to do this efficiently with stiff solvers. With a stiff solver we will not want to factorize the dense Jacobian since that would take forever. Instead we can use something like Sundials’ Krylov method:

u0 = zeros(N,N,3)
const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N)
function f(t,u,du)
  A = @view u[:,:,1]
  B = @view u[:,:,2]
  C = @view u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  A_mul_B!(MyA,My,A)
  A_mul_B!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
using Sundials
@time sol = solve(prob,CVODE_BDF(linear_solver=:BCG))

and that will solve it in about a second. In this case it wouldn’t be more efficient to use the banded linear solver since the system of equations tends to have different parts of the system interact which makes the bands large, and thus a Krylov method is preferred. See this part of the docs for details on the available linear solvers from Sundials. DifferentialEquations.jl exposes a ton of Sundials’ possible choices so hopefully one works for your problem (preconditioners coming soon).

To do something similar with OrdinaryDiffEq.jl, we would need to make use of the linear solver choices in order to override the internal linear solve functions with some kind of sparse matrix solver like a Krylov method from IterativeSolvers.jl. For this size of problem though a multistep method like BDF is probably preferred though, at least until we implement some IMEX methods.

So if you want to solve it quickly right now, that’s how you do it. But let’s get back to our other story: the future is more exciting.

The Full ODE Code

As a summary, here’s a full PDE code:

using OrdinaryDiffEq, RecursiveArrayTools
 
# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 100
const X = reshape([i for i in 1:100 for j in 1:100],N,N)
const Y = reshape([j for i in 1:100 for j in 1:100],N,N)
const α₁ = 1.0.*(X.>=80)
 
const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
A = zeros(N,N); B  = zeros(N,N); C = zeros(N,N)
u0 = ArrayPartition((A,B,C))
 
const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N)
# Define the discretized PDE as an ODE function
function f(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(MyA,My,A)
  A_mul_B!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)
 
using Plots; pyplot()
p1 = surface(X,Y,sol[end].x[1],title = "[A]")
p2 = surface(X,Y,sol[end].x[2],title = "[B]")
p3 = surface(X,Y,sol[end].x[3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))

Making Use of GPU Parallelism

That was all using the CPU. How do we make turn on GPU parallelism with DifferentialEquations.jl? Well, you don’t. DifferentialEquations.jl “doesn’t have GPU bits”. So wait… can we not do GPU parallelism? No, this is the glory of type-genericness, especially in broadcasted operations. To make things use the GPU, we simply use a GPUArray. If instead of zeros(N,M) we used GPUArray(zeros(N,M)), then u becomes an ArrayPartition of GPUArrays. GPUArrays naturally override broadcast such that dotted operations are performed on the GPU. DifferentialEquations.jl uses broadcast internally (except in this list of current exceptions due to a limitation with Julia’s inference engine which I have discussed with Jameson Nash (@vtjnash) who mentioned this should be fixed in Julia’s 1.0 release), and thus just by putting the array as a GPUArray, the array-type will take over how all internal updates are performed and turn this algorithm into a fully GPU-parallelized algorithm that doesn’t require copying to the CPU. Wasn’t that simple?

From that you can probably also see how to multithread everything, or how to set everything up with distributed parallelism. You can make the ODE solvers do whatever you want by defining an array type where the broadcast does whatever special behavior you want.

So to recap, the entire difference from above is changing to:

using CLArrays
gA = CLArray(A); gB  = CLArray(B); gC = CLArray(C)
const gMx = CLArray(Mx)
const gMy = CLArray(My)
const gα₁ = CLArray(α₁)
gu0 = ArrayPartition((gA,gB,gC))
 
const gMyA = zeros(N,N)
const gAMx = zeros(N,N)
const gDA = zeros(N,N)
function gf(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(gMyA,gMy,A)
  A_mul_B!(gAMx,A,gMx)
  @. DA = D*(gMyA + AgMx)
  @. dA = DA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
GPUArrays.allowslow(false) # makes sure none of the slow fallbacks are used
@time sol = solve(prob2,BS3(),progress=true,dt=0.003,adaptive=false,save_everystep=false,save_start=false)
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
sol = solve(prob2,BS3(),progress=true,save_everystep=false,save_start=false)
# Adaptivity currently fails due to https://github.com/JuliaGPU/CLArrays.jl/issues/10

You can use CUArrays if you want as well. It looks exactly the same as using CLArrays except you exchange the CLArray calls to CUArray. Go have fun.

And Stochastic PDEs?

Why not make it an SPDE? All that we need to do is extend each of the PDE equations to have a noise function. In this case, let’s use multiplicative noise on each reactant. This means that our noise update equation is:

function g(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  @. dA = γ₁*A
  @. dB = γ₂*A
  @. dC = γ₃*A
end

Now we just define and solve the system of SDEs:

prob = SDEProblem(f,g,u0,(0.0,100.0))
sol = solve(prob,SRIW1())

We can see the cool effect that diffusion dampens the noise in [A] but is unable to dampen the noise in [B] which results in a very noisy [C]. The stiff SPDE takes much longer to solve even using high order plus adaptivity because stochastic problems are just that much more difficult (current research topic is to make new algorithms for this!). It gets GPU’d just by using GPUArrays like before. But there we go: solving systems of stochastic PDEs using high order adaptive algorithms with within-method GPU parallelism. That’s gotta be a first? The cool thing is that nobody ever had to implement the GPU-parallelism either, it just exists by virtue of the Julia type system.

Side Notes

Warning: This can take awhile to solve! An explicit Runge-Kutta algorithm isn’t necessarily great here, though to use a stiff solver on a problem of this size requires once again smartly choosing sparse linear solvers. The high order adaptive method is pretty much necessary though since something like Euler-Maruyama is simply not stable enough to solve this at a reasonable dt. Also, the current algorithms are not so great at handling this problem. Good thing there’s a publication coming along with some new stuff…

Note: the version of SRIW1 which uses broadcast for GPUs is not on the current versions of StochasticDiffEq.jl since it’s slower due to a bug when fusing too many broadcasts which will hopefully get fixed in one of Julia’s 1.x releases. Until then, GPUs cannot be used with this algorithm without a (quick) modification.

Conclusion

So that’s where we’re at. GPU parallelism works because of abstract typing. But in some cases we need to help the GPU array libraries get up to snuff to handle all of the operations, and then we’ll really be in business! Of course there’s more optimizing that needs to be done, and we can do this by specializing code paths on bottlenecks as needed.

I think this is at least a nice proof of concept showing that Julia’s generic algorithms allow for one to not only take advantage of things like higher precision, but also take advantage of parallelism and extra hardware without having to re-write the underlying algorithm. There’s definitely more work that needs to be done, but I can see this usage of abstract array typing as being one of Julia’s “killer features” in the coming years as the GPU community refines its tools. I’d give at least a year before all of this GPU stuff is compatible with stiff solvers and linear solver choices (so that way it can make use of GPU-based Jacobian factorizations and Krylov methods). And comparable methods for SDEs are something I hope to publish soon since the current tools are simply not fit for this scale of problem: high order, adaptivity, sparse linear solvers, and A/L-stability all need to be combined in order to tackle this problem efficiently.

Full Script

Here’s the full script for recreating everything:

#######################################################
### Solve the PDE
#######################################################
 
using OrdinaryDiffEq, RecursiveArrayTools
 
# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 100
const X = reshape([i for i in 1:100 for j in 1:100],N,N)
const Y = reshape([j for i in 1:100 for j in 1:100],N,N)
const α₁ = 1.0.*(X.>=80)
 
const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
A = zeros(N,N); B  = zeros(N,N); C = zeros(N,N)
u0 = ArrayPartition((A,B,C))
 
const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N)
# Define the discretized PDE as an ODE function
function f(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(MyA,My,A)
  A_mul_B!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
@time sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)
 
using Plots; pyplot()
p1 = surface(X,Y,sol[end].x[1],title = "[A]")
p2 = surface(X,Y,sol[end].x[2],title = "[B]")
p3 = surface(X,Y,sol[end].x[3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
 
#######################################################
### Solve the PDE using CLArrays
#######################################################
 
using CLArrays
gA = CLArray(A); gB  = CLArray(B); gC = CLArray(C)
const gMx = CLArray(Mx)
const gMy = CLArray(My)
const gα₁ = CLArray(α₁)
gu0 = ArrayPartition((gA,gB,gC))
 
const gMyA = CLArray(MyA)
const gAMx = CLArray(AMx)
const gDA = CLArray(DA)
function gf(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  A_mul_B!(gMyA,gMy,A)
  A_mul_B!(gAMx,A,gMx)
  @. gDA = D*(gMyA + gAMx)
  @. dA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
GPUArrays.allowslow(false)
@time sol = solve(prob2,BS3(),progress=true,dt=0.003,adaptive=false,save_everystep=false,save_start=false)
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
sol = solve(prob2,BS3(),progress=true,save_everystep=false,save_start=false)
# Adaptivity currently fails due to https://github.com/JuliaGPU/CLArrays.jl/issues/10
 
#######################################################
### Solve the SPDE
#######################################################
 
using StochasticDiffEq
 
function g(t,u,du)
  A,B,C = u.x
  dA,dB,dC = du.x
  @. dA = γ₁*A
  @. dB = γ₂*A
  @. dC = γ₃*A
end
 
prob3 = SDEProblem(f,g,u0,(0.0,100.0))
sol = solve(prob3,SRIW1(),progress=true,save_everystep=false,save_start=false)
 
p1 = surface(X,Y,sol[end].x[1],title = "[A]")
p2 = surface(X,Y,sol[end].x[2],title = "[B]")
p3 = surface(X,Y,sol[end].x[3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
 
# Exercise: Do SPDE + GPU

The post Solving Systems of Stochastic PDEs and using GPUs in Julia appeared first on Stochastic Lifestyle.

juliabloggers.com

A Julia Language Blog Aggregator

Tag Archives: CUDA

Julia GPU Programming with WSL2

Installation

Windows 10 Insider Preview

CUDA drivers

Installing Ubuntu 20.04 LTS with WSL2

Ubuntu 20.04 LTS

Julia + WSL2 + Cuda.jl

Install Julia

Install Cuda.jl

Neural Jump SDEs (Jump Diffusions) and Neural PDEs

Neural Jump Stochastic Differential Equations (Jump Diffusions)

Neural Partial Differential Equations

Conclusion

CITATION:

Solving Systems of Stochastic PDEs and using GPUs in Julia

Our Problem: 2-dimensional Reaction-Diffusion Equations

Discretizing the PDE into ODEs

Our Type: ArrayPartition

Some Optimizations

Quick Note About Performance

The Full ODE Code

Making Use of GPU Parallelism

And Stochastic PDEs?

Side Notes

Conclusion

Full Script