Re-posted from: https://blog.glcs.io/ude_symbolic_regression
This post was written by Steven Whitaker.
In this Julia for Devs post,we will explore the fascinating world of using universal differential equations (UDEs)for model discovery.A UDE is a differential equationwhere part (or all) of itis defined by a universal approximator(typically a neural network).These UDEs play a pivotal rolein uncovering missing aspectsof established models.
UDEs enable seamless integrationof known mechanistic modelswith data-driven modeling approaches.Rather than omit or guess at unknown aspects of a model,a neural network can be embedded into the model,which can then be trained using observed data.
Once trained,the embedded neural networkcan be approximated by a mathematical expression.This allows us to replace the neural networkwith an interpretable formula,improving transparencywhile filling in the missing aspectsof the original modelwith new structure learned from observed data.
This post will illustratekey ideasand design decisions madewhile helping one of our clientsuncover missing dynamics in their model,a complex system representingthe human body under treatmentof a rare disease.
By replacing just one aspect of our client’s modelwith a neural network,we achieved a significant milestone:a 30% reduction in the error of the model predictions.This success story is a testament to the powerof UDEs in enhancing model performance.
We will not share client-specific code,but will provide similar examplesto illustrate our approach.
Here are some of the key ideas we will focus on:
- Ensure nonnegativity of state variables (as needed).
- Appropriately initialize neural network weights.
- Normalize neural network inputs.
- Make dynamics modular.
- Fit symbolic expression.
Note that we will discussfully connected neural networks,so some comments belowmay not apply if other architectures are used.
And with that,let’s dive in!
Ensure Nonnegativity of States
When working with a modelrepresenting a real phenomenon,chances are that at least some of the model’s state variablesare naturally nonnegative.For example,the energy of a systemor the concentration of a chemicalcannot be negative quantities.
However,a differential equation solversees only math,not the physical phenomenonthe math represents.So,we have to tell the solverto respect the natural constraintsthat exist.
One way to enforce nonnegativityis to pass the isoutofdomain optionto solve,where isoutofdomain is a functionthat returns truewhen at least one of the provided state variablesviolates the nonnegativity constraint.For example,if all state variables should be nonnegative,the function can be defined as:
isoutofdomain(u, p, t) = any(<(0), u)
As another example,if only the first and third states should be nonnegative:
isoutofdomain(u, p, t) = u[1] < 0 || u[3] < 0
For more informationand other toolsfor tackling nonnegativity,check out the SciML docs.
The ideaof telling the solverabout the nonnegativity constraintsmay seem obvious.Still, it has important implicationsfor the initialization of neural networksembedded into the model.Care should be takento ensure the initial network weights(before training)do not cause states to become negative.See the next section below for more details.
Appropriately Initialize Neural Network Weights
To learn missing dynamics,the neural network in a UDEneeds to be trained.
Training the neural network can occuronly if the UDE can be solvedusing the initial network weights.Otherwise,the gradient of the loss,which is used by optimization algorithmsto train the neural network,cannot be computed.
As a result,it is necessaryto choose a good initializationfor the neural network weights.Here are some approaches for initializationand how well they work:
- Random initialization:Typically,neural network weightsare initialized randomly.However,this results in random UDE dynamics,so whether a state variablestays positive(or satisfies any other constraints)is left to chance.And if constraints are violated,but we try to enforce themvia, e.g.,
isoutofdomain,the solver will be unable to solve the UDE,preventing computation of the loss gradient.So,random initialization might work,but it might not. - Zero initialization:Don’t do this.Network weights won’t ever update during trainingbecause the gradients with respect to those weightswill always be zero.
- Constant initialization:Don’t do this, either.Initializing all the weightsto the same, non-zero valuewill result in some amount of learning,but it severely limitsthe expressivity of the network.
- Pre-training:If the neural network in the UDEis used to replace an expressionthat is believed to be incorrect(but otherwise gives a usable model),one way to initialize the network weightsfor UDE trainingis to initialize the weights randomlybut then train the network directly(i.e., not the UDE as a whole)to fit the expression the network is to replace.This works because no constraints are involvedduring pre-training(so randomly initializing the weights is okay),but when training the UDE,the network weights have been setto avoid running into issues with constraints.
- Smoothed Collocation:Smoothed collocation is another approachfor pre-training the neural networkwithout involving constraints.See the SciML docs for more info.
For our client,randomly initialized weightsfailed to give a usable gradientfor training,so we decided on pre-trainingbecause they had a reasonable guessthat we could fit the initial network weights to.
Normalize Neural Network Inputs
One key to ensuring UDE training proceeds nicelyis to normalize the inputsto the neural networkto ensure all inputsare roughly of the same magnitude.
Normalizing is essentialbecause the gradient of the training loss functionis proportional to the network inputs.If one input tends to be 1000 times largerthan another,the gradient will also tend to be that much larger,dominating the learning algorithm.

To illustrate how to normalize,suppose we have the following neural network(created using Lux.jl):
nn = Chain( Dense(2, 5, tanh), Dense(5, 5, tanh), Dense(5, 1),)
Then we can add a function to the Chainthat performs the normalization:
nn_normalized = Chain( x -> x ./ normalization, # Other layers as before)
In this example,normalization is a Vectorcontaining the valuesto scale each input variable.And of course,we’re not limited to scaling the inputs;we can implement whatever function we wantto process the inputs(such as z-score standardization).
The neural network in our client’s UDEdid not learn without normalization.After normalizing the inputs,we got the neural networkto learn meaningful dynamics.
Make Dynamics Modular
While not strictly necessary for UDEs,making the dynamics modularvastly simplifies comparingbetween different approaches.
For our client,we wanted to comparethree versions of the dynamics:the original dynamics,the UDE(where part of the original dynamicswas replaced with a neural network),and the updated dynamics(where the trained neural networkwas replaced with a symbolic expressionfit to the neural network).
Rather than duplicate the dynamics function three timeswith each slight variation,we allowed the variable part of the dynamicsto be passed in as an optionto the dynamics function.To illustrate:
function f!(du, u, p, t; g = original_g) # ... some code ... # `a` and `b` are values computed above # or pulled from `u`, etc. val = g(a, b, p) # `val` is used to update `du` in some way. # ... some code ...end
However,ODEProblems don’t allow passing optionsto the dynamics function.But that’s not a problem;we can create a closureto set the optionbefore handing the function to an ODEProblem.For example,if we want g to use a Lux.jl neural network:
nn = Chain(...)(p_nn, st) = Lux.setup(...)# Be sure to include `p_nn` in the `ODEProblem` parameters `p`.g_nn = (a, b, p) -> nn([a, b], p.p_nn, st)[1][1]f_nn! = (du, u, p, t) -> f!(du, u, p, t; g = g_nn)
Setting up our client’s code like thisreduced code duplicationwhile allowing easy comparisonsacross different versions of the dynamics.
Fit Symbolic Expression
One of the key stepsto learning interpretable dynamicsfrom a UDEis to fit a symbolic expressionto the trained neural network.Doing so allows us to replacethe black box neural networkwith a mathematical expressionwe can understand.
The first stepis to collect the data to usefor the fitting.If the neural network takes two inputs,we will need a Matrix of values:
X = [ a1 a2 ... aN; b1 b2 ... bN;]
The (ai, bi) pairsshould cover the expected range of input values.One way to determine this rangeis to solve the UDEand save off the valuesthat end up going into the neural network.
To get the target values,evaluate the neural networkwith the X we just created:
y = nn(X, p_nn, st)[1] |> vec
(Here we assume the neural networkoutputs a single numberper input pair.)
Now we can figure outa mathematical expressionrelating X to y.One way to do thisis to use SymbolicRegression.jl.
After obtaining an Expression object eq(called tree in the README for SymbolicRegression.jl),we can use it in our dynamics functionas described in the previous section:
# `p` needs to be an input even though it's not used.g_eq = (a, b, p) -> eq([a; b;;])[1]f_eq! = (du, u, p, t) -> f!(du, u, p, t; g = g_eq)
Using SymbolicRegression.jlto fit the trained UDE neural network,we could provide our clientwith a precise mathematical expressionrepresenting the dynamics that were missingfrom their original differential equation.
Summary
In this post,we discussed vital aspectsof implementing UDEsto learn missing dynamics.We learned some tipsfor helping UDE training,including appropriately initializing network weightsand normalizing network inputs.We also saw the benefitsof code modularityand how that can easily allow usto insert a learned mathematical expressioninto the dynamics.For our client,we were able to use these ideasto train a UDEto achieve 30% less errorin the model predictions.
Do you want to leverage UDEsto improve your models’ predictive capabilities?Contact us, and we can help you out!
Additional Links
- SciML UDE Tutorial
- UDE tutorial from official Julia SciML docs.
- SciML Docs about Nonnegativity
- More information about debugging problems with negativity,including a discussion about
isoutofdomain.
- More information about debugging problems with negativity,including a discussion about
- Lux.jl
- Package for creating and training neural networks.
- SymbolicRegression.jl
- Package for fitting expressions to data.
- GLCS Modeling & Simulation
- Connect with us for Julia Modeling & Simulation consulting.

