Tag Archives: Uncategorized

Optim.jl v0.9.0

By: pkofod

Re-posted from: http://www.pkofod.com/2017/06/02/optim-jl-v0-9-0/

I am very happy to say that we can finally announce that Optim.jl v0.9.0 is out. This version has quite a few user facing changes. Please read about the changes below if you use Optim.jl in a package, a script, or anything else, as you will quite likely have to make some changes to your code.

As always, I have to thank my two partners in crime: Asbjørn Nilsen Riseth (@anriseth) and Christoph Ortner (@cortner) for their help in making the changes, transitions, and tests that are included in v0.9.0.

The last update (form v0.6.0 to v0.7.0 had) some changes that were a long time coming, and so does v0.9.0. Hopefully, these fixes to old design problems will greatly improve the user experience and performance of Optim.jl, and pave the way for more exiting features in the future.

We’ve tried to make the transition as smooth as possible, although we do have breaking changes in this update. Please consult the documentation if you face problems, join us on gitter or ask the community at discourse!

Okay, now to the changes.

Why not v0.8.0?
First of all, why v0.9.0? Last version was v0.7.8! The is because we are dropping support for Julia v0.4 and v0.5 simultaneously, so we are reserving v0.8.0 for backporting serious fixes to Julia v0.5. However, v0.6 should be just around the corner. With Julia v0.7 and v1.0.0 not too far out in the horizon either, I’ve decided it’s more important to move forward than to keep v0.4 and v0.5 up to speed. The dev time is constrained, so currently it’s one or the other. Of course, for users of Julia v0.5. they can simply continue to use Optim.jl v0.7.8. Post Julia’s proper release, backwards compatibility and continuity will be more important, even if it comes at the expense of development speed.

Another note about the version number: The next version of Optim.jl will be v1.0.0, and we will follow SEMVER 2.0 fully.

Change order of evaluation point and storage arguments
This one is very breaking, although we have set up op a system such that all gradients and Hessians will be checked before proceeding. This check will be removed shortly in a v1.0.0 version bump, so please correct your code now. Basically, we closed a very old issue (#156) concerning the input argument order in gradients and Hessians. In Julia, an in-place function typically has an exclamation mark at the end of its name, and the cache as the first argument. In Optim.jl it has been the other way around for the argument order. We’ve changed that, and this means that you now have to provide “g” or “H” as the first argument, and “x” as the second. The old version

function g!(x, g)
    ... do something ...

is now

function g!(g, x)
    ... do something ...

Since v0.7.0, we’ve moved some of the basic infrastructure of Optim.jl to NLSolversBase.jl. This is currently the Non-, Once-, and TwiceDifferentiable types and constructors. This is done to, as a first step, share code between Optim.jl and LineSearches.jl, and but also NLsolve.jl in the future. At the same time, we’ve made the code a little smarter, such that superfluous calls to the objective function, gradient, and Hessian are now avoided. As an example, compare the objective and gradient calls in the example in our readme. Here, we optimize the Rosenbrock “banana” function using BFGS. Since last version of Optim we had to change the output, as it has gone from 157 calls to 53. Much of this comes from this refactoring, but some of it also comes form a better choices for initial line search steps for BFGS and Newton introduced in #328.

As mentioned, we’ve made the *Differentiable-types a bit smarter, including moving the gradient and Hessian caches into the respective types. This also means, that a OnceDifferentiable type instance needs to know what the return type of the gradient is. This is done by providing an x seed in the constructor

rosenbrock = Optim.UnconstrainedProblems.examples["Rosenbrock"]
f = rosenbrock.f
g! = rosenbrock.g!
x_seed = rosenbrock.initial_x
od = OnceDifferentiable(f, g!, x_seed)

If the seed also happens to be the initial x, then you do not have to provide an x when calling optimize

julia> optimize(od, BFGS(), Optim.Options(g_tol=0.1))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [1.0005999613152214,1.001138415164852]
 * Minimizer: [1.0005999613152214,1.001138415164852]
 * Minimum: 7.427113e-07
 * Iterations: 13
 * Convergence: true
   * |x - x'| < 1.0e-32: false 
     |x - x'| = 1.08e-02 
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN 
   * |g(x)| < 1.0e-01: true 
     |g(x)| = 2.60e-02 
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 45
 * Gradient Calls: 45

If you’ve used Optim.jl before, you’ll notice that the output carries a bit more information about the convergence criteria.

LineSearches.jl turned Julian
Line searches used to be chosen using symbols in the method constructor for line search based methods such as GradientDescent, BFGS, and Newton by use of the linesearch keyword. The new version of LineSearches.jl uses types and dispatch exactly like Optim.jl does for solvers. This means that you now have to pass a type instance instead of a keyword, and this also means that we can open up for easy tweaking of line search parameters through fields in the line search types.

Let us illustrate by the following example how the new syntax works. First, we construct a BFGS instance without specifying the linesearch. This defaults to HagerZhang.

julia> rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)
julia> result = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926033423,0.9999999852005353]
 * Minimum: 5.471433e-17
 * Iterations: 16
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 3.47e-07
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 2.33e-09
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 53
 * Gradient Calls: 53

or we could choose a backtracking line search instead

julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking()))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926655744,0.9999999853309254]
 * Minimum: 5.379380e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 1.13e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 8.79e-11
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 31
 * Gradient Calls: 24

this defaults to cubic backtracking, but quadratic can be chosen using the order keyword

julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking(order = 2)))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926644578,0.9999999853284671]
 * Minimum: 5.381020e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 4.73e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 1.76e-10
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 29
 * Gradient Calls: 24

LineSearches.jl should have better documentation coming soon, but the code is quite self-explanatory for those who want to twiddle around with these parameters.

The method state is now an argument to optimize
While not always that useful to know for users, we use method states internally to hold all the pre-allocated cache variables that are needed. In the new version of Optim.jl, this can be explicitly provided by the user such that you can retrieve various diagnostics after the optimization routine is done. One such example is the inverse Hessian estimate that BFGS spits out.

method = BFGS()
options = Optim.Options()
initial_x = rand(2)
d = OnceDifferentiable(f, g!, initial_x)
my_state = Optim.initial_state(method, options, d, initial_x)
optimize{d, method, options, my_state)

The future
We have more changes coming in the near future. There’s PR #356 for a Trust Region solver for cases where you can explicitly calculate Hessian-vector products without forming the Hessian (from @jeff-regier from the Celeste.jl project), the interior point replacement for our current barrier function approach to box constrained optimization in PR #303, and more.

Solving a simple discrete choice model using Gaussian quadrature

By: pkofod

Re-posted from: http://www.pkofod.com/2017/05/30/solving-a-simple-discrete-choice-model-using-gaussian-quadrature/

In the style of some of the earlier posts, I present a simple economic problem, that uses some sort of numerical method as part of the solution method. Of course, we use Julia to do so. However, this time we’re actually relying a bit on R, but don’t tell anyone.

Rust models

In the empirical discrete choice literature in economics, a relatively simple and popular framework is the one that matured in Rust (1987, 1988), and was later named Rust models in Aguirregabiria and Mira (2010). Basically, we consider an agent who has to choose a (in the infinite horizon) stationary policy (sequence of actions), to solve the following problem

\(\max_{a}E\left\{\sum_{t=0}^{T} \beta^t U(a_t, s_t)|s_0\right\}\)

where \(a=(a_0, a_1, \ldots, a_T)\), and \(s_t\) denotes the states. For simplicity, we consider binary decision problems such that \(a_t\in\{1,2\}\). Assume that there an additive shock, \(\varepsilon_t\), to utility such that

\(U(a_t,s_t)=U(a_t, x_t, \varepsilon_t) = u(a_t, x_t)+\varepsilon_t\)

where \(s_t=(x_t,\varepsilon_t)\) and \(x_t\) is usually called the observed states.

The additive and time separable nature of the problem allows us to consider a set of simpler problems instead. We reformulate the problem according to the principle of optimality, and write the problem in its dynamic programming formulation

\( V_t(x_t, \varepsilon_t) = max_{a_t}\left[u(a_t, x_t)+\epsilon_t + \beta E_{s_{t+1}|s_t}(V_{t+1}(x_{t+1}, \varepsilon_{t+1}))\right], \forall t\in\{0,1,\ldots,T\} \)

The object \(V_t\) is called the value function, as it summarizes the optimal value we can obtain. If we assume conditional independence between the observed states and the shocks along with the assumptions explained in the articles above, we can instead consider this simpler problem

\( W_t(x_t) = E_{\varepsilon_{t+1}|\varepsilon_t}\left\{max_{a_t}(u(a_t, x_t)+\varepsilon_t + \beta E_{x_{t+1}|x_t}((W_{t+1}(x_{t+1}))\right\}, \forall t\in\{0,1,\ldots,T\} \)

where \(W(x_t)\equiv E_{\varepsilon_{t+1}|\varepsilon}\left\{V_{t+1}(x_{t+1},\varepsilon_{t+1})\right\}\). This object is often called the ex-ante or integrated value function. Now, if we assume that the shocks are mean 0 extreme value type I, we get the following

\( W_t(x_t) = \log\left\{\sum_{a\in\mathcal{A}} \exp\left[u(a_t,x_t)+\beta E_{x_{t+1}|x_t}\left(W_{t+1}(x_{t+1})\right)\right]\right\}, \forall t\in\{0,1,\ldots,T\} \)

At this point we’re very close to something we can calculate. Either we just recursively apply the above to find the finite horizon solution, or if we’re in the infinite horizon case, then then we can come up with a guess for \(W\), and apply value function iterations (successive application of the right-hand side in the equation above), to find a solution. We just need to be able to handle the evaluation of the expected value inside the \(\exp\)‘s.

Solving for a continuous function

In the application below, we’re going to have a continuous state. Rust originally solved the problem of handling a continuous function on a computer by discretizing the continuous state in 90 or 175 bins, but we’re going to approach it a bit differently. We’re going to create a type that allows us to construct piecewise linear functions. This means that we’re going to have some nodes where we calculate the function value, and in between these, we simply use linear interpolation. Outside of the first and last nodes we simply set the value to the value at these nodes. We’re not going to extrapolate below, so this won’t be a problem.

Let us have a look at a type that can hold this information.

type PiecewiseLinear

To construct an instance of this type from a set of nodes and a function, we’re going to use the following constructor

function PiecewiseLinear(nodes, f)
    slopes = Float64[]
    fn = f.(nodes)
    for i = 1:length(nodes)-1
        # node i and node i+1
        ni, nip1 = nodes[i], nodes[i+1]
        # f evaluated at the nodes
        fi, fip1 = fn[i], fn[i+1]
        # store slopes in each interval, so we don't have to recalculate them every time
        push!(slopes, (fip1-fi)/(nip1-ni))
    # Construct the type
    PiecewiseLinear(nodes, fn, slopes)

Using an instance of \(PiecewiseLinear\) we can now evaluate the function at all input values between the first and last nodes. However, we’re going to have some fun with types in Julia. In Julia, we call a function using parentheses \(f(x)\), but we generally cannot call a type instance.

julia> pwl = PiecewiseLinear(1:10, sqrt)
julia> pwl(3.5)
ERROR: MethodError: objects of type PiecewiseLinear are not callable

… but wouldn’t it be great if we could simply evaluate the interpolated function value at 3.5 that easily? We can, and it’s cool, fun, and extremely handy. The name of the concept is: call overloading. We simply need to define the behavior of “calling” (using parentheses with some input) an instance of a type.

function (p::PiecewiseLinear)(x)
    index_low = searchsortedlast(p.nodes, x)
    n = length(p.nodes)
    if 0 < index_low < n
        return p.values[index_low+1]+(x-p.nodes[index_low+1])*p.slopes[index_low]
    elseif index_low == n
        return p.values[end]
    elseif index_low == 0
        return p.values[1]

Basically, we find out which interval we’re in, and then we interpolate appropriately in said interval. Let me say something upfront, or… almost upfront. This post is not about optimal performance. Julia is often sold as “the fast language”, but to me Julia is much more about productivity. Sure, it’s great to be able to optimize your code, but it’s also great to simply be able to do *what you want to do* – without too much hassle. Now, we can do what we couldn’t before.

julia> pwl(3.5)

We can even plot it together with the actual sqrt function on the interval (2,4).

julia> using Plots
julia> plot(x->pwl(x), 2, 4, label="Piecewise Linear sqrt")
julia> plot!(sqrt, 2, 4, label="sqrt")
julia> savefig("sqrtfig")

and we get

which seems to show a pretty good approximation to the square root function considering the very few nodes.

Numerical integrals using Gaussian quadrature

So we can now create continuous function approximations based on finite evaluation points (nodes). This is great, because this allow us to work with \(W\) in our code. The only remaining problem is: we need to evaluate the expected value of \(W\). This can be expressed as

\( E_x(f) = \int_a^b f(x)w(x)dx\)

where \(w(x)\) is going to be a probability density function and \(a\) and \(b\) can be finite or infinite and represent upper and lower bounds on the values the random variable (state) can take on. In the world of Gaussian quadrature, \(w\)‘s are called weight functions. Gaussian quadrature is basically a method for finding good evaluation points (nodes) and associated weights such that the following approximation is good

\(\int_a^b f(x)w(x)dx\approx \sum_{i=1}^N f(x_i)w_i\)

where \(N\) is the number of nodes. We’re not going to provide a long description of the methods involved, but we will note that the package DistQuads.jl allows us to easily obtain nodes and weights for a handful of useful distributions. To install this package write


as it is not currently tagged in METADATA.jl. This is currently calling out to R’s statmod package. The syntax is quite simple. Define a distribution instance, create nodes and weights, and calculate the expected value of the function in three simple steps:

julia> using Distributions, DistQuads
julia> bd = Beta(1.5, 50.0)
Distributions.Beta{Float64}(α=1.5, β=50.0)
julia> dq = DistQuad(bd, N = 64)
DistQuads.DistQuad([0.000334965,0.00133945,0.00301221,0.0053512,0.00835354,0.0120155,0.0163327,0.0212996,0.0269103,0.03315780.738325,0.756581,0.774681,0.792633,0.810457,0.828194,0.845925,0.863807,0.882192,0.902105],[0.00484732,0.0184431,0.0381754,0.0603806,0.0811675,0.097227,0.106423,0.108048,0.102724,0.0920435  …  1.87035e-28,5.42631e-30,1.23487e-31,2.11992e-33,2.60541e-35,2.13019e-37,1.03855e-39,2.52575e-42,2.18831e-45,2.90458e-49],Distributions.Beta{Float64}(α=1.5, β=50.0))
julia> E(sqrt, dq)

We can then try Monte Carlo integration with many nodes to see how close they are

julia> mean(sqrt.(rand(bd, 100000000)))

and they appear to be in the same ballpark.

To replace, or not to replace

The model we’re considering here is a simple one. It’s a binary choice model very close to the model in Rust (1987). An agent is in charge of maintaining a bus fleet and has a binary choice each month then the buses come in for maintenance: replace the engine (effectively renewing the bus) or maintain it. Replacement costs a fixed price RC, and regular maintenance has a cost that is a linear function of the odometer reading since last replacement (or purchase if replacement has never occurred). We can use the expressions above to solve this model, but first we need to specify how the odometer reading changes from month to month conditional on the choices made. We assume that the odometer reading (mileage) changes according to the following

\(x_{t+1}=\tilde{a}x_{t}+(1-\tilde{a}x_t)\Delta x, \quad\text{where }\Delta x \sim Beta(1.4, 50.0)\)

where \(\tilde{a}=2-a\), and as we remember \(a\in\{1,2\}\). As we see, a replacement returns the state to 0 plus whatever mileage might accumulate that month, and regular maintenance means that the bus will end up with end of period mileage between \(x_{t+1}\) and 1. To give an idea about the state process, we see the pdf for the distribution of \(\Delta x\) below.

Solving the model

We are now ready to solve the model. Let us say that the planning horizon is 96 months and the monthly discount factor is 0.9. After the 96 months, the bus is scrapped at a value of 2 units of currency, such that


and from here on, we use the recursion from above. First, set up the state space.

using Distributions, DistQuads, Plots
# State space
bd = Beta(1.5, 50.0)
dq = DistQuad(bd, N = 64)
Sˡ = 0
Sʰ = 1
n_nodes = 100 # arbitrary, but could be varied
nodes = linspace(Sˡ, Sʰ, n_nodes) # doesn't have to be uniformly distributed
RC = 11.5 # Replacement cost
c = 9.0 # parameter in linear maintenance cost c*x
β = 0.9 # discount factor

Then, define utility function and expectation operator

u1 = PiecewiseLinear(nodes, x->-c*x) # "continuous" cost of maintenance
u2 = PiecewiseLinear(nodes, x->-RC) # "continuous" cost of replacement (really just a number, but...)
# Expected value of f at x today given a where x′ is a possible state next period
Ex(f, x, a, dq) = E(x′->f((2-a)*x.+(Sʰ-(2-a)*x).*x′), dq)

Then, we simply

#### SOLVE
V = Array{PiecewiseLinear,1}(70)
V[70] = PiecewiseLinear(nodes, x->2)
for i = 69:-1:1
    EV1 = PiecewiseLinear(nodes, x->Ex(V[i+1], x, 1, dq))
    EV2 = PiecewiseLinear(nodes, x->Ex(V[i+1], x, 2, dq))
    V[i] = PiecewiseLinear(nodes, x->log(exp(u1(x)*EV1(x))+exp(u2(x)*EV2(x))))

To get our solution. We can then plot either integrated value functions or policies (choice probabilities). We calculate the policies using the following function

function CCP(x, i)
    EV1 = PiecewiseLinear(nodes, x->Ex(V[i], x, 1, dq))
    EV2 = PiecewiseLinear(nodes, x->Ex(V[i], x, 2, dq))

We see that there are not 69/70 distinct curves in the plots. This is because we eventually approach the “infinite horizon”/stationary policy and solution.


Given the CCPs from above, it is very straight forward to simulate an agent say from period 1 to period 69.

x0 = 0.0
x = [x0]
a0 = 0
a = [a0]
T = 69
for i = 2:T
   _a = rand()<CCP(x[end], i) ? 1 : 2
   push!(a, _a)
   push!(x, (2-_a)*x[end]+(Sʰ-(2-_a)*x[end])*rand(bd))
plot(1:T, x, label="mileage")
Is = []
for i in eachindex(a)
    if a[i] == 2
        push!(Is, i)
vline!(Is, label="replacement")

This blog post had a look at simple quadrature, creating custom types with call overloading in Julia, and how this can be used to a solve a very simple discrete choice model in Julia. Interesting extensions are of course to allow for more states, more choices, other shock distributions than extreme value type I and so on. Let me know if you try to extend the model in any of those directions, and I would love to have a look!


  • Aguirregabiria, Victor, and Pedro Mira. “Dynamic discrete choice structural models: A survey.” Journal of Econometrics 156.1 (2010): 38-67.
  • Rust, John. “Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher.” Econometrica: Journal of the Econometric Society (1987): 999-1033.
  • John, Rust. “Maximum likelihood estimation of discrete control processes.” SIAM Journal on Control and Optimization 26.5 (1988): 1006-1024.
  • Exploring Fibonacci Fractions with Julia

    By: perfectionatic

    Re-posted from: http://perfectionatic.org/?p=367

    Recently, I came across a fascinating blog and video from Mind you Decisions. It is about how a fraction
    would show the Fibonacci numbers in order when looking at its decimal output.

    On a spreadsheet and most standard programming languages, such output can not be attained due to the limited precision for floating point numbers. If you try this on R or Python, you will get an output of 1e-48.
    Wolfram alpha,however,allows arbitrary precision.

    In Julia by default we get a little better that R and Python

    julia> 1/999999999999999999999998999999999999999999999999
    julia> typeof(ans)

    We observe here that we are getting the first few Fibonacci numbers 1, 1, 2, 3. We need more precision to get more numbers. Julia has arbitrary precision arithmetic baked into the language. We can crank up the precision of the BigFloat type on demand. Of course, the higher the precision, the slower the computation and the greater the memory we use. We do that by setprecision.

    julia> setprecision(BigFloat,10000)

    Reevaluating, we get

    julia> 1/999999999999999999999998999999999999999999999999

    That is looking much better. However it we be nice if we could extract the Fibonacci numbers that are buried in that long decimal. Using the approach in the original blog. We define a function


    and calculate the decimal


    Here we use the non-standard string literal big"..." to insure proper interpretation of our input. Using BigFloat(1e-24)) would first construct at floating point with limited precision and then do the conversion. The initial loss of precision will not be recovered in the conversion, and hence the use of big. Now we extract our Fibonacci numbers by this function

    function extract_fib(a)
       for i=1:div(length(x)-24,24)

    Here we first convert our very long decimal number of a string and they we exploit the fact the Fibonacci numbers occur in blocks that 24 digits in length. We get out output in an array of BigInt. We want to compare the output with exact Fibonacci numbers, we just do a quick and non-recursive implementation.

    function fib(n)
        for i=3:n+1

    Now we compare…

    for i in eachindex(fib_frac)
         println(fib_exact[i], " ", fib_exact[i]-fib_frac[i])

    We get a long sequence, we just focused here on when the discrepancy happens.

    184551825793033096366333 0
    298611126818977066918552 0
    483162952612010163284885 0
    781774079430987230203437 -1
    1264937032042997393488322 999999999999999999999998
    2046711111473984623691759 1999999999999999999999997

    The output shows that just before the extracted Fibonacci number exceeds 24 digits, a discrepancy occurs. I am not quite sure why, but this was a fun exploration. Julia allows me to do mathematical explorations that would take one or even two orders of magnitude of effort to do in any other language.