Category Archives: Julia

Direct Automatic Differentiation of (Differential Equation) Solvers vs Analytical Adjoints: Which is Better?

Re-posted from: http://www.stochasticlifestyle.com/direct-automatic-differentiation-of-solvers-vs-analytical-adjoints-which-is-better/

Automatic differentiation of a “solver” is a subject with many details for doing it in the most effective form. For this reason, there are a lot of talks and courses that go into lots of depth on the topic. I recently gave a talk on some of the latest stuff in differentiable simulation with the American Statistical Association, and have some detailed notes on such adjoint derivations as part of the 18.337 Parallel Computing and Scientific Machine Learning graduate course at MIT. And there are entire organizations like my SciML Open Source Software Organization which work day-in and day-out on the development of new differentiable solvers.

I’ll give a brief summary of all my materials here below.

Continuous vs Discrete Differentiation of Solvers

AD of a solver can be done in essentially two different ways: either directly performing automatic differentiation to the steps of the solver, or by defining higher level adjoint rules that will compute the derivative. In some cases these can be mathematically equivalent. For example, forward sensitivity analysis of an ODE $$u’ = f(u,p,t)$$ follows by the chain rule:

$$\frac{d}{dp} \frac{du}{dt} = \frac{d}{dp} f(u,p,t) = \frac{df}{du} \frac{du}{dp} + \frac{\partial f}{\partial p}$$

Thus if you solve the extended system of equations:

$$u’ = f(u,p,t)$$
$$s’ = \frac{df}{du} s + \frac{\partial f}{\partial p}$$

then you get $$s = \frac{du}{dp}$$ as the solution to the new equations. So therefore, solve these bigger ODEs and you get the derivative of the solution with respect to parameters as the extra piece. One way to do “automatic differentiation” is to add a derivative rule to the AD library that “if you see ODE solve, then replace the solve with this extended solve and take the latter part as the derivative”. The other way of course is to simply do forward-mode automatic differentiation of the ODE solver library steps itself. It turns out that in this case, if you work out the math the two are mathematically equivalent. Note that it’s not computationally equivalent though since the AD process may SIMD the expressions in a different way, doing some constant folding and common subexpression elimination (CSE) in a way that’s different from the hand-coded version, and thus the performance can be very different even though it’s computationally the same algorithm.

However, there are cases where the “analytical” way of writing the derivative is not equivalent to its automatic differentiation counterpart. For example, the adjoint method is a different way to get $$\frac{du}{dp}$$ values in $$\mathcal{O}(n+p)$$ time (instead of the $$\mathcal{O}(np)$$ time of the forward sensitivities above) by solving an ODE forward and some related ODE backwards (for a full derivation and description, see the lecture notes or the recorded video). If you were to do reverse-mode automatic differentiation of the solver, you do not get a mathematically equivalent algorithm. For example, if the solver for the ODE was Euler’s method, reverse-mode AD would be mathematically equivalent to solving the forward ODE with Euler’s method and the reverse ODE with something like implicit Euler (where part of the implicit equation is solved exactly using a cached value from the forward solve).

So What is Better, Continuous Derivative Rules or Discrete Derivatives of the Solver?

Like any complex question, it depends. We had a manuscript which looked at this in quite some detail (and a biologically-oriented follow-up), and can boil it down to a few basic notes:

Forward-mode outperforms reverse-mode / adjoint techniques when the equations are “sufficiently small”. For modern implementations this seems to be at around 100.
For forward-mode cases, “good” automatic differentiation libraries can make use of structure between the primal and derivative constructions to better CSE/SIMD the generated code for the derivative term, thus forward-mode AD of the solver can be much faster than forward sensitivity analysis even though the two are mathematically the same operation.
For reverse-mode cases, the continuous adjoints seem to be faster with current implementations.

But that last bit then has many caveats to put on it. For one, there seems to be a trade-off between performance and stability here. This is noted in the appendix of the paper “Universal Differential Equations for Scientific Machine Learning, which states:

Previous research has shown that the discrete adjoint approach is more stable than continuous adjoints in some cases [53, 47, 94, 95, 96, 97] while continuous adjoints have been demonstrated to be more stable in others [98, 95] and can reduce spurious oscillations [99, 100, 101]. This trade-off between discrete and continuous adjoint approaches has been demonstrated on some equations as a trade-off between stability and computational efficiency [102, 103, 104, 105, 106, 107, 108, 109, 110]. Care has to be taken as the stability of an adjoint approach can be dependent on the chosen discretization method [111, 112, 113, 114, 115]

with the references pointing to those in the manuscript.

This is discussed in even more detail in the manuscript Stiff Neural Ordinary Differential Equations which showcases how there are many ways to implement “the adjoint method”, and they can have major differences in stability, essentially trading off memory or performance for improved stability properties.

Special Case: Implicit Equations

The above discussion shows that there are good reasons to differentiate solvers directly, and good reasons to instead write derivative rules for solvers which use forward/adjoint equations. For time series equations, this always has a trade-off. There is an important special case here though that for methods which iterate to convergence, automatic differentiation of the solver is essentially never a good idea. The reason is because the implicit function theorem gives that the derivative of the solution is directly defined at the solution point. For example, for solving $$f(x,p) = 0$$, if $$x^\ast$$ is the value of $$x$$ which satisfies the equation, then $$\frac{d x^\ast}{dp} = …$$. In other words, Newton’s method might take $$n$$ steps, and thus automatic differentation will need to differentiate $$f$$ at least $$n$$ times. But if you use the implicit function theorem result, then you only need to differentiate it once!

Note of course a similar performance vs stability trade-off does apply here. Since this derivation assumes you have $$x^\ast$$ such that $$f(x^\ast,p) = 0$$ exactly, but you don’t. Newton’s method from the solve will give you something that satisfies the equation to tolerance, so maybe $$f(x^\ast,p) \approx 10^{-8}$$, which means that the derivative expression is also only approximate, and this then induces an error in the gradient etc. Thus direct differentiation of Newton’s method can be more accurate, and you need to worry about tolerance here if the gradients seem sufficiently off.

This does lead to some counter-intuitive results. For example, we had a paper where we exploited this to note that differentiating and ODE solve which goes to infinity (steady state) is faster than a “long ODE”, since steady states have a similar implicit definition. It’s quicker to go to infinity than it is to go to 1000, who would’ve thought? Math is fun.

Does Differentiation of Solver Internals Make Sense or Have a Meaning?

“ODE solvers” have all sorts of things in there, like adaptivity parameters and heuristics. One of the things that happens when you do automatic differentiation of the solver is that you aren’t just differentiating the solver’s states and parameters, but the process will differentiate everything. It turns out that AD of a solver can thus be useful in some tricky ways which put this to use. For example, at ICML we had a paper which regularized the parameters of a neural ODE by the sum of the computed error estimates of the adaptivity heuristics. This would then push the learned equation towards an area of parameter space where the adaptivity gives the largest time steps possible, and thus the learned equation is the “fastest possible learned equation that fits the time series”. Such a trick is only possible if you are doing automatic differentiation of the solver since you’d need to differentiate the solver’s internals in order to have access to those values in the loss function! This just shows one of many ways in which AD’s “extra information” which analytical continuous derivative definitions don’t have could potentially be useful for some applications.

Automatic Differentiation in Continuous Sensitivity Methods

Finally, I want to note that even when you attempt to avoid automatic differentiation of the solver by using continuous sensitivity methods, it turns out that the optimal way to build the extended equations is to use automatic differentiation!

Summary: there are many good reasons to do automatic differentiation of solvers, but there are also many good reasons to use some analytical derivative techniques. But even if you do analytical derivative techniques, you still want to automatic differentiate something in order to do it optimally!

For example, let’s return to the forward sensitivity equations:

$$u’ = f(u,p,t)$$
$$s’ = \frac{df}{du} s + \frac{\partial f}{\partial p}$$

It turns out that $$\frac{df}{du} s$$ does not require computing the full Jacobian. This operation, known as Jacobian-vector products or jvps, are the primitive operation of forward-mode automatic differentiation and thus special seeding of a forward-mode AD tool gives a faster and more robust algorithm than a finite difference form. When done correctly, this operation is computed without ever building the full Jacobian. A trick for this does exist in the finite difference sense as well:

$$\frac{df}{du} s \approx \frac{f(u + \epsilon s) – f(u)}{\epsilon}$$

since it is equivalent to the directional derivative. This is explained in more detail in these lectures (or accompanying video).

In the same vein, continuous adjoints of ODE solves boil down to defining a differential equation which is solved backwards and that differential equation which is solved backwards has a term which is $$\frac{df}{du}^T s$$, i.e. Jacobian transposed times a vector, also known as the vector-Jacobian product because it’s equivalent to $$s^T \frac{df}{du}$$ when transposed. It turns out that this is the primitive operation of reverse-mode AD, which then allows for computing this operation without fully building the Jacobian. There is no analogue for this operation with finite differencing, which means that there’s a pretty massive performance gain from doing this properly. Our paper A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions measures this effect on a stiff partial differential equation, getting:

The takeaway from this plot is that using these AD tricks results in a few orders of magnitude performance improvements (by avoiding the Jacobian construction, which are the “seeding” versions on the left, the right shows the difference that different AD techniques make, which itself is another few orders of magnitude). When people note that the Julia differential equation adjoint solvers are much faster than the adjoints from Sundials COVDES and IDAS on large equations, this part right here is one of the major factors because Sundials does not embed a reverse AD engine into its adjoint code to do the vjp definitions, and instead falls back to using a numerical formulation unless the user provides a vjp override, which is seemingly to be uncommon to do but from these plots clearly should be done more often.

Summary

In total, what can we takeaway so far about differentiating solvers?

There are some advantages to differentiating solvers, but there are also some advantages to mixing in analytical continuous adjoints. It’s context-dependent which is better.
Even when mixing in analytical continuous derivative rules, these are best defined with automatic differentiation within their constructed equations, so one cannot avoid AD completely if one wishes to achieve full performance on arbitrary models.
For cases which converge to some kind of implicitly defined solution, using special adjoint tricks will be much better than direct differentiation of the solver.

There’s still a lot more to mention, especially as stochastic simulation gets involved, but I’ll cut this here for now. As you can see, there’s still some open questions that are being investigated in the field, so if you find this interesting please feel free to get in touch.

The post Direct Automatic Differentiation of (Differential Equation) Solvers vs Analytical Adjoints: Which is Better? appeared first on Stochastic Lifestyle.

Polyglot Sorting

By: Jonathan Carroll

Re-posted from: https://jcarroll.com.au/2022/10/08/polyglot-sorting/

I’ve had the impression lately that everyone is learning Rust and there’s plenty of great material out there to make that easier. {gifski} is perhaps the most well-known example of an R package wrapping a Rust Cargo crate. I don’t really know any system language particularly well, so I figured I’d wade into it and see what it’s like.

The big advantages I’ve heard are that it’s more modern than C++, is “safe” (in the sense that you can’t compile something that tries to read out of bounds memory), and it’s super fast (it’s a compiled, strictly-typed language, so one would hope so).

I had a browse through some beginner material, and watched some videos on Youtube. Just enough to have some understanding of the syntax and keywords so I could actually search for things once I inevitably hit problems.

Getting everything up and running went surprisingly smoothly. Installing the toolchain went okay on my Linux (Pop!_OS) machine, and the getting started guide was straightforward enough to follow along with. I soon enough had Ferris welcoming me to the world of Rust

----------------------------
< Hello fellow Rustaceans! >
----------------------------
              \
               \
                 _~^~^~_
             \) /  o o  \ (/
               '_   -   _'
               / '-----' \

Visual Studio Code works nicely as a multi-language editor, and while it’s great to have errors visible to you immediately, I can imagine that gets annoying pretty quick (especially if you write as much bad Rust code as I do).

Next I needed to actually code something up myself. I love small, silly problems for learning – you don’t know exactly what problems you’ll solve along the way. This one ended up being really helpful.

I had this tweet

This week I’ve been posting #Python ? quizzes about sorting.

Let’s see if you can put everything together and solve a challenge! ?#CuriousAboutCode pic.twitter.com/ht51eA3Ttj

— David Amos (@somacdivad) September 16, 2022

in my bookmarks because I wanted to try to solve this with R (naturally) but I decided it was a reasonable candidate for trying to solve a problem and learn some language at the same time, so I decided to give it a go with Rust. This is slightly more complicated than an academic “sort some strings” because it’s “natural sorting” (2 before 10) and has a complicating character in the middle.

The first step was to get Rust to read in and just print back the ‘data’ (strings). I managed to copy some “print a vector of strings” code and got that working. I’ll figure out later what’s going with the format string here

println!("{:?}", x);

After that, I battled errors in converting between String, &str, and i32 types; returning a Result (error) rather than a value; dealing with obscure errors (“cannot move out of borrowed content”, “expected named lifetime parameter” – ???); and a lack of method support for a struct I just created (which didn’t have any inherited ‘type’). All in all, nothing too surprising given I know approximately 0 Rust, but I got there in the end!

Now, this won’t be anything “good”, but it does compile and appears to give the right answer, so I’m led to believe that means it’s “right”.

// enable printing of the struct
#[derive(Debug)]
// create a struct with a String and an integer
// not using &str due to lifetime issues
struct Pair {
    x: String,
    y: i32
}

fn main() {
    // input data vector
    let v = vec!["aa-2", "ab-100", "aa-10", "ba-25", "ab-3"];
    // create an accumulating vector of `Pair`s
    let mut res: Vec<Pair> = vec![];
    // for each string, split at '-', 
    //  convert the first part to String and the second to integer.
    //  then push onto the accumulator
    for s in v {
        let a: Vec<&str> = s.split("-").collect();
        let tmp_pair = Pair {x: a[0].to_string(), y: a[1].parse::<i32>().unwrap() };
        res.push(tmp_pair);
    }
    // sort by Pair.x then Pair.y
    res.sort_by_key(|k| (k.x.clone(), k.y.clone()));
    // start building a new vector for the final result
    let mut res2: Vec<String> = vec![];
    // paste together Pair.x, '-', and Pair.y (as String)
    for s2 in res {
        res2.push(s2.x + "-" + &s2.y.to_string());
    }

    // ["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
    println!("{:?}", res2);
}

Running

cargo run --release

produces the expected output

["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]

Feel free to suggest anything that could be improved, I’m sure there’s plenty.

That might have been an okay place to stop, but I did still want to see if I could solve the problem with R, and how that might compare (in approach, readability, and speed), so I coded that up as

# input vector
s <- c("aa-2", "ab-100", "aa-10", "ba-25", "ab-3")
# split into pairs of strings
x <- strsplit(s, "-")
# take elements of s sorted by the first elements of x then
#  the second (as integers)
s[order(sapply(x, `[[`, 1), as.integer(sapply(x, `[[`, 2)))]

## [1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25"

I don’t love that I had to use sapply() twice, but the only other alternative I could think of was to strip out the first and second element lists and use those in a do.call()

s[do.call(order, list(unlist(x)[c(T, F)], as.integer(unlist(x)[c(F,T)])))]

## [1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25"

which… isn’t better.

I also had an idea to shoehorn dplyr::arrange() into this, but that requires a data.frame. One idea I had was to read in the data, using "-" as a delimiter, explicitly stating that I wanted to read it as character and integer data. That seemed to work, which means I can try what I hoped

suppressMessages(library(dplyr, quietly = TRUE))
# input vector
s <- c("aa-2", "ab-100", "aa-10", "ba-25", "ab-3")

# read strings as fields delimited by '-', 
#  expecting character and integer
s %>% read.delim(
    text = .,
    sep = "-",
    header = FALSE,
    colClasses = c("character", "integer")
) %>%
    # sort by first then second column
    arrange(V1, V2) %>%
    # collapse to single string per row
    mutate(res = paste(V1, V2, sep = "-")) %>%
    pull()

## [1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25"

Why stop there? I know other languages! Okay, the Python and Julia examples I found in other Tweets.

In Julia, two options were offered. This one

strings = String["aa-2", "ab-100", "aa-10", "ba-25", "ab-3"];
print(join.(sort(split.(strings, "-"), by = x -> (x[1], parse(Int, x[2]))), "-"))

## ["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]

(I added a type to the input and an explicit print), and this one

strings = String["aa-2", "ab-100", "aa-10", "ba-25", "ab-3"];
print(sort(strings, by = x->split(x, "-") |> v->(v[1], parse(Int, v[2]))))

## ["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]

The Python example offered by the original author of the challenge was

def parts(s):
    letters, nums = s.split("-")
    return letters, int(nums)

strings = ["aa-2", "ab-100", "aa-10", "ba-25", "ab-3"]

print(sorted(strings, key=parts))

## ['aa-2', 'aa-10', 'ab-3', 'ab-100', 'ba-25']

I actually really like this one – it’s the approach I wanted to use for R; provide sort with a function returning the keys to use. Alas.

Lastly, I remembered that there’s a sort function in bash that can do natural sorting with the -V flag. I’m reminded of this anecdote (“More shell, less egg”) about using a very simple bash script when it’s possible. That came together okay

#!/bin/bash 

v=("aa-2" "ab-100" "aa-10" "ba-25" "ab-3")
readarray -t a_out < <(printf '%s\n' "${v[@]}" | sort -V)
printf '%s ' "${a_out[@]}"
echo 

exit 0

## aa-2 aa-10 ab-3 ab-100 ba-25

By the way, aside from the Rust example, all of these were run directly in the Rmd source of this post with knitr’s powerful engines… multi-language support FTW!

So, how do all these compare? I haven’t tuned any of these for performance; they’re how I would have written them as a developer trying to achieve something. Sure, if performance was an issue, I’d do some optimization, but I was curious just how the performance compares ‘out of the box’.

Mainly for my own posterity, I’ll add how I tracked this. I wrote the relevant code for each language in a file with suffix/filetype appropriate to each language. They’re all here, in case anyone is interested. Then I wanted to run each of them a few times, keeping track of the timing in a file. The solution I went with was to echo into a file (appending each time) both the input and output, with e.g.

echo "Rust (optimised/release)" >> timing
{time cargo run --release} >> timing 2>&1
{time cargo run --release} >> timing 2>&1
{time cargo run --release} >> timing 2>&1

(yes, trivial to loop 3 times, but whatever).

Doing this for all the languages (with both versions for R and Julia) I get

Rust (optimized/release)
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/sort`
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
cargo run --release  0.04s user 0.02s system 99% cpu 0.066 total
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/sort`
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
cargo run --release  0.07s user 0.01s system 99% cpu 0.087 total
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/sort`
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
cargo run --release  0.06s user 0.02s system 98% cpu 0.084 total

R1
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort1.R  0.15s user 0.05s system 102% cpu 0.197 total
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort1.R  0.17s user 0.05s system 102% cpu 0.206 total
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort1.R  0.16s user 0.05s system 103% cpu 0.202 total

R2
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort2.R  0.72s user 0.05s system 100% cpu 0.774 total
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort2.R  0.67s user 0.06s system 100% cpu 0.720 total
[1] "aa-2"   "aa-10"  "ab-3"   "ab-100" "ba-25" 
Rscript sort2.R  0.69s user 0.04s system 99% cpu 0.737 total

Python
['aa-2', 'aa-10', 'ab-3', 'ab-100', 'ba-25']
python3 sort.py  0.03s user 0.00s system 98% cpu 0.032 total
['aa-2', 'aa-10', 'ab-3', 'ab-100', 'ba-25']
python3 sort.py  0.02s user 0.01s system 98% cpu 0.034 total
['aa-2', 'aa-10', 'ab-3', 'ab-100', 'ba-25']
python3 sort.py  0.03s user 0.02s system 98% cpu 0.059 total

Julia1
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort1.jl  1.10s user 0.68s system 236% cpu 0.750 total
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort1.jl  1.14s user 0.64s system 233% cpu 0.765 total
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort1.jl  1.13s user 0.62s system 241% cpu 0.725 total

Julia2
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort2.jl  0.97s user 0.64s system 270% cpu 0.596 total
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort2.jl  1.00s user 0.58s system 259% cpu 0.607 total
["aa-2", "aa-10", "ab-3", "ab-100", "ba-25"]
julia sort2.jl  0.96s user 0.63s system 276% cpu 0.578 total

Bash
aa-2 aa-10 ab-3 ab-100 ba-25 
./sort.sh  0.01s user 0.00s system 109% cpu 0.013 total
aa-2 aa-10 ab-3 ab-100 ba-25 
./sort.sh  0.00s user 0.01s system 108% cpu 0.015 total
aa-2 aa-10 ab-3 ab-100 ba-25 
./sort.sh  0.01s user 0.00s system 99% cpu 0.009 total

This wouldn’t be much of a coding/benchmark post without a plot, so I also did a visual comparison

library(ggplot2)
d <- tibble::tribble(
  ~language, ~version, ~run, ~time,
  "Rust", "", 1, 0.066,
  "Rust", "", 2, 0.087,
  "Rust", "", 3, 0.084,
  "R", "1", 1, 0.197,
  "R", "1", 2, 0.206,
  "R", "1", 3, 0.202,
  "R", "2", 1, 0.774,
  "R", "2", 2, 0.720,
  "R", "2", 3, 0.737,
  "Julia", "1", 1, 0.750,
  "Julia", "1", 2, 0.756,
  "Julia", "1", 3, 0.725,
  "Julia", "2", 1, 0.596,
  "Julia", "2", 2, 0.607,
  "Julia", "2", 3, 0.578,
  "Python", "", 1, 0.032,
  "Python", "", 2, 0.034,
  "Python", "", 3, 0.059,
  "Bash", "", 1, 0.013,
  "Bash", "", 2, 0.015,
  "Bash", "", 3, 0.009
)

d$language <- factor(
  d$language, 
  levels = c("Rust", "R", "Julia", "Python", "Bash")
)

ggplot(d, aes(language, time, fill = language, group = run)) + 
  geom_col(position = position_dodge(0.9)) + 
  facet_grid(
    ~language + version, 
    scales = "free_x", 
    labeller = label_wrap_gen(multi_line = FALSE), 
    switch = "x"
  ) + 
  theme_minimal() +
  theme(axis.text.x = element_blank()) + 
  labs(
    title = "Performance of sort functions by language", 
    y = "Time [s]", 
    x = "Language, Version"
  ) + 
  scale_fill_brewer(palette = "Set1")

It’s true – Rust does pretty well, even with my terrible coding. My R implementation (the sensible one) isn’t too bad – perhaps over many strings it would be a bit slow. Surprisingly, the Julia implementations are actually quite slow. I don’t have a good explanation for that. I’m using Julia 1.5.0 which is slightly out of date, so perhaps that needs an update. The Python implementation does particularly well – I really should learn more python. The syntax there isn’t the worst, either. Oh, no – do I like that?

The big winner, though, is the simplest of all – Bash crushes the rest of the languages with a 2 liner, and calling it doesn’t involve compiling anything.

As I said, I’m not particularly interested in optimizing any of these – this is how they compare as written.

In summary, I learned some Rust – enough to actually manipulate some data. I’ll keep trying and hopefully some day I’ll be semi literate in it.

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.1.2 (2021-11-01)
##  os       Pop!_OS 21.04               
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language en_AU:en                    
##  collate  en_AU.UTF-8                 
##  ctype    en_AU.UTF-8                 
##  tz       Australia/Adelaide          
##  date     2022-10-08                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version date       lib source        
##  assertthat     0.2.1   2019-03-21 [3] CRAN (R 4.0.1)
##  blogdown       1.8     2022-02-16 [1] CRAN (R 4.1.2)
##  bookdown       0.24    2021-09-02 [1] CRAN (R 4.1.2)
##  brio           1.1.1   2021-01-20 [3] CRAN (R 4.0.3)
##  bslib          0.3.1   2021-10-06 [1] CRAN (R 4.1.2)
##  cachem         1.0.3   2021-02-04 [3] CRAN (R 4.0.3)
##  callr          3.7.0   2021-04-20 [1] CRAN (R 4.1.2)
##  cli            3.2.0   2022-02-14 [1] CRAN (R 4.1.2)
##  colorspace     2.0-0   2020-11-11 [3] CRAN (R 4.0.3)
##  crayon         1.5.0   2022-02-14 [1] CRAN (R 4.1.2)
##  DBI            1.1.1   2021-01-15 [3] CRAN (R 4.0.3)
##  desc           1.4.1   2022-03-06 [1] CRAN (R 4.1.2)
##  devtools       2.4.3   2021-11-30 [1] CRAN (R 4.1.2)
##  digest         0.6.27  2020-10-24 [3] CRAN (R 4.0.3)
##  dplyr        * 1.0.8   2022-02-08 [1] CRAN (R 4.1.2)
##  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.1.2)
##  evaluate       0.14    2019-05-28 [3] CRAN (R 4.0.1)
##  fansi          0.4.2   2021-01-15 [3] CRAN (R 4.0.3)
##  farver         2.0.3   2020-01-16 [3] CRAN (R 4.0.1)
##  fastmap        1.1.0   2021-01-25 [3] CRAN (R 4.0.3)
##  fs             1.5.0   2020-07-31 [3] CRAN (R 4.0.2)
##  generics       0.1.0   2020-10-31 [3] CRAN (R 4.0.3)
##  ggplot2      * 3.3.5   2021-06-25 [1] CRAN (R 4.1.2)
##  glue           1.6.1   2022-01-22 [1] CRAN (R 4.1.2)
##  gtable         0.3.0   2019-03-25 [3] CRAN (R 4.0.1)
##  here           1.0.1   2020-12-13 [1] CRAN (R 4.1.2)
##  highr          0.8     2019-03-20 [3] CRAN (R 4.0.1)
##  htmltools      0.5.2   2021-08-25 [1] CRAN (R 4.1.2)
##  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.1.2)
##  jsonlite       1.7.2   2020-12-09 [3] CRAN (R 4.0.3)
##  JuliaCall      0.17.4  2021-05-16 [1] CRAN (R 4.1.2)
##  knitr          1.37    2021-12-16 [1] CRAN (R 4.1.2)
##  labeling       0.4.2   2020-10-20 [3] CRAN (R 4.0.2)
##  lattice        0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
##  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.1.2)
##  magrittr       2.0.1   2020-11-17 [3] CRAN (R 4.0.3)
##  Matrix         1.3-2   2021-01-06 [4] CRAN (R 4.0.4)
##  memoise        2.0.0   2021-01-26 [3] CRAN (R 4.0.3)
##  munsell        0.5.0   2018-06-12 [3] CRAN (R 4.0.1)
##  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.1.2)
##  pkgbuild       1.2.0   2020-12-15 [3] CRAN (R 4.0.3)
##  pkgconfig      2.0.3   2019-09-22 [3] CRAN (R 4.0.1)
##  pkgload        1.2.4   2021-11-30 [1] CRAN (R 4.1.2)
##  png            0.1-7   2013-12-03 [3] CRAN (R 4.0.2)
##  prettyunits    1.1.1   2020-01-24 [3] CRAN (R 4.0.1)
##  processx       3.5.2   2021-04-30 [1] CRAN (R 4.1.2)
##  ps             1.5.0   2020-12-05 [3] CRAN (R 4.0.3)
##  purrr          0.3.4   2020-04-17 [3] CRAN (R 4.0.1)
##  R6             2.5.0   2020-10-28 [3] CRAN (R 4.0.2)
##  RColorBrewer   1.1-2   2014-12-07 [3] CRAN (R 4.0.1)
##  Rcpp           1.0.9   2022-07-08 [1] CRAN (R 4.1.2)
##  remotes        2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
##  reticulate     1.24    2022-01-26 [1] CRAN (R 4.1.2)
##  rlang          1.0.1   2022-02-03 [1] CRAN (R 4.1.2)
##  rmarkdown      2.13    2022-03-10 [1] CRAN (R 4.1.2)
##  rprojroot      2.0.2   2020-11-15 [3] CRAN (R 4.0.3)
##  rstudioapi     0.13    2020-11-12 [3] CRAN (R 4.0.3)
##  sass           0.4.0   2021-05-12 [1] CRAN (R 4.1.2)
##  scales         1.1.1   2020-05-11 [3] CRAN (R 4.0.1)
##  sessioninfo    1.1.1   2018-11-05 [3] CRAN (R 4.0.1)
##  stringi        1.5.3   2020-09-09 [3] CRAN (R 4.0.2)
##  stringr        1.4.0   2019-02-10 [3] CRAN (R 4.0.1)
##  testthat       3.1.2   2022-01-20 [1] CRAN (R 4.1.2)
##  tibble         3.1.6   2021-11-07 [1] CRAN (R 4.1.2)
##  tidyselect     1.1.2   2022-02-21 [1] CRAN (R 4.1.2)
##  usethis        2.1.5   2021-12-09 [1] CRAN (R 4.1.2)
##  utf8           1.1.4   2018-05-24 [3] CRAN (R 4.0.2)
##  vctrs          0.3.8   2021-04-29 [1] CRAN (R 4.1.2)
##  withr          2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
##  xfun           0.30    2022-03-02 [1] CRAN (R 4.1.2)
##  yaml           2.2.1   2020-02-01 [3] CRAN (R 4.0.1)
## 
## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library

SciML Ecosystem Update: Better Error Messages, Compile Times, and Documentation

By: Staging Team

Re-posted from: https://sciml.ai/news/2022/10/08/error_messages/index.html

SciML Ecosystem Update: Better Error Messages, Compile Times, and Documentation

juliabloggers.com

A Julia Language Blog Aggregator

Category Archives: Julia

Direct Automatic Differentiation of (Differential Equation) Solvers vs Analytical Adjoints: Which is Better?

Continuous vs Discrete Differentiation of Solvers

So What is Better, Continuous Derivative Rules or Discrete Derivatives of the Solver?

Special Case: Implicit Equations

Does Differentiation of Solver Internals Make Sense or Have a Meaning?

Automatic Differentiation in Continuous Sensitivity Methods

Summary

Polyglot Sorting

SciML Ecosystem Update: Better Error Messages, Compile Times, and Documentation