Index Funds with Mixed-Integer-Programming¶

We will analyze daily price data for stocks in the Dow Jones index and then try to build an accurate index fund using a small numbers of stocks therein.

Similar material was already used in a presentation at PyData Berlin 2017. See the “Tour of popular packages” notebook. Back then, we worked with Julia 0.6 and used the packages DataFrames, Plots and JuMP. Now, we work with Julia 1.0 and use packages from the Queryverse, VegaLite and IndexedTables for data prep and visualization. Also, I added an alternative model.

Loading the Data¶

using Queryverse
using IndexedTables: ndsparse

The data consists of daily closing prices from 2016 of the 30 stocks that participate in the Dow Jones and are given in a CSV file:

;head -n4 ../files/dowjones2016.csv

Date,Symbol,Price
2016-01-04,AAPL,105.349997999999999
2016-01-04,AXP,67.589995999999999
2016-01-04,BA,140.500000000000000

price = load("../files/dowjones2016.csv") |> ndsparse
price |> @take(3)

price |> @vlplot(:line, x=:Date, y=:Price, color={"Symbol:o", scale={scheme="viridis"}},
                 width=600, height=450)

Computing the Dow Jones Index¶

The Dow Jones index is computed from the prices of its stocks, as a weighted average, where the weight is itself defined through the price of the stock.

We will compute the average price of each stock over the days. Then we will normalize these values by dividing through the total of the average prices. The normalized weights are then multiplied to the daily prices to get the daily value of the index.

using Statistics: mean

avgprice = price |>
    @groupby(_.Symbol) |>
    @map({Symbol=key(_), AvgPrice=mean(_.Price)}) |> ndsparse
avgprice |> @take(4)

avgprice |> @vlplot(:bar, x=:Symbol, y=:AvgPrice)

totalavg = avgprice |> @map(_.AvgPrice) |> sum

2617.739246769842

weight = avgprice |> @map({Symbol=_.Symbol, Weight=_.AvgPrice / totalavg}) |> ndsparse
weight |> @take(4)

dowjones = price |> 
    @join(weight, _.Symbol, _.Symbol, {_.Date, _.Symbol, Contrib=_.Price * __.Weight}) |>
    @groupby(_.Date) |>
    @map({Date=key(_), Value=sum(_.Contrib)}) |> ndsparse
dowjones |> @take(4)

@vlplot(width=600, height=450) +
@vlplot(mark={:line, strokeWidth=1, opacity=0.8}, data=price, x=:Date, y=:Price,
        color={"Symbol:n", scale={scheme="blues"}}) +
@vlplot(mark={:line, strokeWidth=3, color=:orange}, data=dowjones, x=:Date, y=:Value)

In the previous plot, the Dow Jones index is marked in orange. In the following, it will be our target, which we want to approximate by using a small number of the stocks.

Model for Linear Fit¶

Let us start with an optimization model similar to an ordinary linear regression, but using the l1-norm, which is readily formulated as a linear program. In compact vector form, this reads

\begin{align*}
\text{minimize} \quad & \lVert w^T P – I \rVert_1 \\
\text{subject to} \quad & w \ge 0
\end{align*}

where $P$ stands for the prices of the individual st, $I$ for our index (the target) and $w$ for the weights we use in our fund. We only allow non-negative weights for use in the portfolio.

We can use a standard linear programming trick and introduce auxiliary variables for the positive and negative parts inside the absolute values and minimize their sum. This is formulated using JuMP:

using JuMP
using SCIP

dates = unique(price.index.columns.Date)
symbols = unique(price.index.columns.Symbol)

length(dates), length(symbols)

(252, 30)

We could use all days of the year as input for our model, but for our evaluation we only use the days from the first three quarters. That way, we can see how our fit extrapolates through the remaining quarter.

training_days = 189
traindates = dates[1:training_days]
testdates = dates[training_days + 1:end]
@show traindates[end] testdates[1]

# for visualization
datebreak = [(Date=traindates[end],)] |> @take(1)

length(traindates), length(testdates)

traindates[end] = 2016-09-30
testdates[1] = 2016-10-03

(189, 63)

function solve_index_linear()
    m = Model(solver=SCIPSolver("display/verblevel", 2))
    
    @variable(m, weight[symbols] >= 0)
    @variable(m, posdev[traindates] >= 0)
    @variable(m, negdev[traindates] >= 0)
    
    for d in traindates
        @constraint(m, sum(weight[s]*price[d,s][1] for s in symbols) - dowjones[d][1] == posdev[d] - negdev[d])
    end
    
    @objective(m, :Min, sum(posdev[d] + negdev[d] for d in traindates))
    
    status = solve(m)
    return (
        status = status,
        weight = getvalue(weight),
    )
end

solve_index_linear (generic function with 1 method)

sol = solve_index_linear()
sol.status

:Optimal

solweight = [(Symbol=s, SolWeight=sol.weight[s]) for s in symbols]
weight |>
    @join(solweight, _.Symbol, _.Symbol, {_.Symbol, _.Weight, __.SolWeight}) |>
    @vlplot(:point, x=:Weight, y=:SolWeight)

In the scatter chart above, we compare the weights from the Dow Jones index, with the weights found by the LP model. As we can see (the points lie on the diagonal), we recover the actual weights perfectly, even though we are only using a subset of the data.

Model with Sparsity Constraint¶

The last model gave us an exact fit, but used all available stocks.

We are changing it now, and allow it to use only a given number of stocks. For this end, we will introduce additional binary variables to select stocks to become active in our index fund.

The weight of inactive variables must be forced to 0, which we do with a so-called big-M constraint. The constant is chosen in such a way, that any active stock could single-handedly approximate the target index.

bigM = 1 / minimum(weight.data.columns[1])

90.926297738811

We start by using only a single stock and then go up to subsets of 6 stocks out of the 30.

nstocks_range = 1:6

1:6

The model seems to be relatively difficult to solve for SCIP, probably because of the big-M constraints, so we set a gap limit of 5%, rather than solving to proven optimality.

function solve_index_sparse(nstocks)
    m = Model(solver=SCIPSolver("display/verblevel", 2, "limits/gap", 0.05))
    
    @variable(m, active[symbols], Bin)
    @variable(m, weight[symbols] >= 0)
    @variable(m, posdev[traindates] >= 0)
    @variable(m, negdev[traindates] >= 0)
    
    for d in traindates
        @constraint(m, sum(weight[s]*price[d,s][1] for s in symbols) - dowjones[d][1] == posdev[d] - negdev[d])
    end

    for s in symbols
        @constraint(m, weight[s] <= bigM*active[s])
    end
    
    @constraint(m, sum(active[s] for s in symbols) <= nstocks)
    
    @objective(m, :Min, sum(posdev[d] + negdev[d] for d in traindates))
    
    status = solve(m, suppress_warnings=true)
    return (
        status = status,
        weight = getvalue(weight)
    )
end

solve_index_sparse (generic function with 1 method)

sols = [solve_index_sparse(n) for n in nstocks_range];

We extract the model solutions and concatenate them into a table for visualization.

solweight = [(NStocks=n, Symbol=s, SolWeight=sols[n].weight[s]) for n in nstocks_range for s in symbols] |>
            @filter(_.SolWeight > 0.0)
solweight |> @take(3)

indices = price |>
    @join(solweight, _.Symbol, _.Symbol, {__.NStocks, _.Date, _.Symbol, Contrib=_.Price * __.SolWeight}) |>
    @groupby({_.NStocks, _.Date}) |>
    @map({NStocks=_.NStocks[1], Date=_.Date[1], Value=sum(_.Contrib)}) |> ndsparse
indices |> @take(3)

@vlplot(width=600, height=300) +
@vlplot(mark={:line, color=:orange, strokeWidth=3},
        data=dowjones, x=:Date, y={:Value, scale={domain=[80, 125]}}) +
@vlplot(mark={:line, strokeWidth=1, opacity=0.8},
        data=indices, x=:Date, y=:Value, color="NStocks:o") +
@vlplot(:rule, data=datebreak, x=:Date)

As we can see, our solutions stick to the target fund quite closely during the training period, but then diverge from it quickly. Allowing more stocks gives us a better fit.

Computing Returns¶

In the previous model, we tried to fit our index fund to the absolute day-to-day closing prices. But the next model will be based on picking representative that are similar, which is defined through correlation of returns.

So let’s start by computing daily returns from the closing prices (for the training period).

prevdates = (Date=dates[2:end], PrevDate=dates[1:end-1])
prevdates |> @take(3)

price[:, "AAPL"] |> @take(3)

returns = prevdates |>
    @filter(_.Date <= traindates[end]) |>
    @join(price, _.Date, _.Date, {_.Date, _.PrevDate, __.Symbol, __.Price}) |>
    @join(price, {Date=_.PrevDate, _.Symbol}, {_.Date, _.Symbol},
                 {__.Symbol, _.Date, Return=_.Price / __.Price - 1.0}) |>
    ndsparse
returns |> @take(3)

returns |>
    @vlplot(mark={:line, opacity=0.8, strokeWidth=1}, width=600, height=450,
            x=:Date, y=:Return, color={"Symbol:o", scale={scheme="viridis"}})

Analyzing Correlation¶

We compute pair-wise correlation of the 30 stocks, but first need to compute the mean and standard deviations of the returns.

returnagg = returns |>
    @groupby(_.Symbol) |>
    @map({Symbol=key(_), MeanReturn=mean(_.Return)}) |>
    @join(returns, _.Symbol, _.Symbol,
          {_.Symbol, __.Date, _.MeanReturn, ShiftedSqr=(__.Return - _.MeanReturn)^2}) |>
    @groupby(_.Symbol) |>
    @map({Symbol=key(_), MeanReturn=_.MeanReturn[1],
          StdDevReturn=sqrt(sum(_.ShiftedSqr)/(length(_) - 1))})
returnagg |> @take(3)

The mean and standard deviation of the stock returns can be used as performance and risk markers. In the following scatter chart, we give an overview of these (bottom right is best).

returnagg |>
    @vlplot(:point, x=:MeanReturn, y=:StdDevReturn, color={"Symbol:o", scale={scheme="viridis"}},
            width=450, height=450)

shiftedreturns = returns |>
    @join(returnagg, _.Symbol, _.Symbol, {_.Symbol, _.Date, ShiftedReturn=_.Return - __.MeanReturn})
shiftedreturns |> @take(3)

correlation = shiftedreturns |>
    @join(shiftedreturns, _.Date, _.Date,
          {Left=_.Symbol, Right=__.Symbol, Product=_.ShiftedReturn * __.ShiftedReturn}) |>
    @groupby({_.Left, _.Right}) |>
    @map({Left=_.Left[1], Right=_.Right[1], Covariance=mean(_.Product)}) |>
    @join(returnagg, _.Left, _.Symbol, {_.Left, _.Right, _.Covariance, LeftStdDev=__.StdDevReturn}) |>
    @join(returnagg, _.Right, _.Symbol,
          {_.Left, _.Right, Correlation=_.Covariance / (_.LeftStdDev * __.StdDevReturn)}) |>
    ndsparse
correlation |> @take(3)

correlation |>
    @vlplot(:rect, x=:Left, y=:Right, color=:Correlation, width=450, height=450)

As we can see here, all of the stocks are either unrelated or positively related, but never negatively related, which is not at all obvious to me.

Model for Picking Representatives¶

The following model is taken from the book Optimization Methods in Finance.

Rather than looking at the day-to-day values, it precomputes a similary for each pair of stocks, and then selects a representative for each stock while limiting the total number of representatives. The actual weight given to the stocks is not part of the model, but computed in a post-processing step.

The authors note that this model can be solved more efficiently using a Lagrangian Relaxation approach, but I found that this is not necessary for our small dataset.

function solve_index_repr(nstocks)
    m = Model(solver=SCIPSolver("display/verblevel", 2))
    
    @variable(m, active[symbols], Bin)       # is stock in index fund?
    @variable(m, repr[symbols,symbols], Bin) # is stock 'r' represented by 's'?
    
    @constraint(m, sum(active[s] for s in symbols) <= nstocks)
    
    for r in symbols
        for s in symbols
            @constraint(m, repr[r,s] <= active[s])
        end
    end
    
    for r in symbols
        @constraint(m, sum(repr[r,s] for s in symbols) == 1)
    end
    
    @objective(m, :Max, sum(correlation[r,s][1] * repr[r,s] for r in symbols for s in symbols))
    
    status = solve(m, suppress_warnings=true)
    
    # post-processing: determine weights for representatives
    reprsol = getvalue(repr)
    accweight = [sum(weight[r][1] * reprsol[r,s] for r in symbols)/avgprice[s][1] for s in symbols]
                    
    return (
        status = status,
        active = getvalue(active),
        weight = accweight * mean(dowjones.data.columns.Value)
    )
end

solve_index_repr (generic function with 1 method)

sols_repr = [solve_index_repr(n) for n in nstocks_range];

reprweight = [(NStocks=n, Symbol=s, SolWeight=w) for n in nstocks_range
              for (s,w) in zip(symbols,sols_repr[n].weight)] |>
            @filter(_.SolWeight > 0.0)
reprweight |> @take(3)

reprindices = price |>
    @join(reprweight, _.Symbol, _.Symbol, {__.NStocks, _.Date, _.Symbol, Contrib=_.Price * __.SolWeight}) |>
    @groupby({_.NStocks, _.Date}) |>
    @map({NStocks=_.NStocks[1], Date=_.Date[1], Value=sum(_.Contrib)}) |> ndsparse
reprindices |> @take(3)

@vlplot(width=600, height=300) +
@vlplot(mark={:line, color=:orange, strokeWidth=3},
        data=dowjones, x=:Date, y={:Value, scale={domain=[80, 125]}}) +
@vlplot(mark={:line, strokeWidth=1, opacity=0.8},
        data=reprindices, x=:Date, y=:Value, color="NStocks:o") +
@vlplot(:rule, data=datebreak, x=:Date)

We can see that the solution funds are not following the target index as closely as before, on a day-to-day basis, but the general shape of the curve is very similar. On the other hand, there seems to be hardly any difference in performance between the training and the test period, so it extrapolates quite well!

Comparison of methods¶

We could compare the approaches more systematically, by computing the losses with different norms (l1, or l2) on the training or test data, but I feel the visualization already gives a good enough impression.

But as a final comparison, let us count how often the different stocks are used for the index funds.

count1 = solweight |>
    @groupby(_.Symbol) |>
    @map({Symbol=key(_), Method="L1Fit", Count=length(_)}) |>
    @orderby_descending(_.Count)
count2 = reprweight|>
    @groupby(_.Symbol) |>
    @map({Symbol=key(_), Method="Repr", Count=length(_)}) |>
    @orderby_descending(_.Count)

counts = [count1...; count2...] |> ndsparse

counts |>
    @vlplot(:bar, column=:Symbol, x={field=:Method, axis={title=""}}, y=:Count, color=:Method)

It looks like the two model have very different taste in stocks and can only agree on 2-3 of them!

Notes on packages used¶

Since this blog was also an exercise in using Queryverse and VegaLite, I would like to comment on my experience now.
For Query itself, building the queries was relatively straight-forward, after watching the tutorial given by David Anthoff at JuliaCon 2018.

Some of the computations look quite awkward and are probably not efficient either, such as the twofold join of price with itself and another table that links each day with the previous day. Here, an array-based view using [2:end] and [1:end-1] would have been more natural. Also, after a while I found it a little annoying that I had to repeatedly name all of the columns that I wanted to keep, in @join or @map calls.

Finally, the combination of Query and ndsparse of IndexedTables.jl is a little unfortunate, when the latter one is used as a sink for some query. Typically, ndsparse is constructed with a set of columns to be used as an index, and another (single, unnamed) column as a value. Individual values can then be referenced just like an array, that is, table[i,j]. But the constructor of ndsparse used on query results takes the first $n – 1$ columns as index and the last as value, but wrapped in a singledton NamedTuple. So, table[i,j] will actually be a (k=23,). This meant that I had to use price[d,s][1] as coefficient in my model, rather than simply price[d,s]. I could not figure out how to avoid this, without creating essentially another copy of the data.

We compare the macronutrional content of baby formula with that of cow’s milk and then try to recreate the formula by blending milk with water and adding supplements as needed.

Human milk has a vastly different macronutrient composition compared to that of other mammals, in particular cows. While the energy content is about the same, it is much richer in carbs and a lot lower in protein. The reasoning is that human babies already have a large and active brain for learning and need the sugar to fuel it. At the same time, their body does not need to grow as rapidly as that of other mammals who are already much more mature when born.

For parents who do not (or no longer) breastfeed their baby, a substitute can be used made from a powder, which mimics the nutritional content of human milk. As the baby turns older, and its digestive system is already used to different kinds of foods, cow milk (and other dairy products) may be introduced in the diet.

From other parents, I have heard that they will not feed cow’s milk as is, but dilute it with water during the night. I wondered whether this approach is justified with respect to the macronutrients which led to the following analysis.

This analysis is done purely out of curiosity. Please consult a health care professional prior to changing your infant’s diet!

Data¶

As a reference for baby formula, we will use Hipp PRE BIO COMBIOTIK. They provide the following data on macronutrients:

(Energy: 66 kcal / 100 ml)
Fat: 3.5 g / 100 ml
Carbs: 7.3 g / 100 ml
Protein: 1.25 g / 100 ml

For cow’s milk, we will use both whole milk as well as reduced-fat milk. The macronutrients as given for milk at LIDL are, for whole milk:

(Energy: 68 kcal / 100 ml)
Fat: 3.9 g / 100 ml
Carbs: 4.9 g / 100 ml
Protein: 3.4 g / 100 ml

And for reduced-fat milk:

(Energy: 48 kcal / 100 ml)
Fat: 1.6 g / 100 ml
Carbs: 4.9 g / 100 ml
Protein: 3.5 g / 100 ml

Model¶

We setup an optimization model inspired by the classical diet problem, where we want to use four ingredients (water, whole milk, reduced-fat milk, lactose) to match the nutrional profile of baby formula.

My thinking was that a blend of whole milk and reduced-fat milk should get the amount of fat right, while a dilution with water will get the protein level down to the target. Finally, we can add sugar in the form of lactose powder. This is widely available in drug stores, since it is also used as a mild laxative for infants.

using JuMP
using GLPKMathProgInterface

m = Model(solver=GLPKSolverLP())

# our ingredients (without the implicit water)
@variable(m, whole_milk ≥ 0)   # in 100 ml
@variable(m, lowfat_milk ≥ 0)  # in 100 ml
@variable(m, lactose ≥ 0)      # in g

# slack in our macro targets 
@variable(m, slack_fat ≥ 0)
@variable(m, slack_carb ≥ 0)
@variable(m, slack_protein ≥ 0)

# fit liquid volume (100 ml)
@constraint(m, whole_milk + lowfat_milk ≤ 1)

# match fat content (g)
@constraint(m, 3.9 * whole_milk + 1.6 * lowfat_milk == 3.5 + slack_fat)

# match carb content (g)
@constraint(m, 4.9 * whole_milk + 4.9 * lowfat_milk + lactose == 7.3  + slack_carb )

# match protein content (g)
@constraint(m, 3.4 * whole_milk + 3.5 * lowfat_milk == 1.25  + slack_protein)

# minimize the mismatch of target
@objective(m, :Min, slack_carb + slack_fat + slack_protein);

status = solve(m)

:Optimal

@show getvalue(whole_milk) getvalue(lowfat_milk) getvalue(lactose)
@show getvalue(slack_fat) getvalue(slack_carb) getvalue(slack_protein);

getvalue(whole_milk) = 0.8974358974358976
getvalue(lowfat_milk) = 0.0
getvalue(lactose) = 2.902564102564101
getvalue(slack_fat) = 0.0
getvalue(slack_carb) = 0.0
getvalue(slack_protein) = 1.8012820512820515

It seems that we can not get a perfect match since the slack variables are not zero.

In particular, we will overshoot the protein content, or else not meet the fat content. So, with these ingredients, we will not be able to reproduce the baby formula as planned.

Adding Canola Oil¶

In order to reduce the protein content, we need another non-dairy fat source. Let us use rapeseed oil which is often recommended for preparation of (non-dairy) infant meals because of its neutral taste and rich content of essential fatty acids (omega-3).

m = Model(solver=GLPKSolverLP())

# our ingredients
@variable(m, whole_milk ≥ 0)   # in 100 ml
@variable(m, lowfat_milk ≥ 0)  # in 100 ml
@variable(m, lactose ≥ 0)      # in g
@variable(m, oil ≥ 0)  # in g

# no slack variables this time, we are confident in feasibility!

# fit liquid volume (100ml)
@constraint(m, whole_milk + lowfat_milk ≤ 1)

# match fat content (g)
@constraint(m, 3.9 * whole_milk + 1.6 * lowfat_milk + oil == 3.5)

# match carb content (g)
@constraint(m, 4.9 * whole_milk + 4.9 * lowfat_milk + lactose == 7.3)

# match protein content (g)
@constraint(m, 3.4 * whole_milk + 3.5 * lowfat_milk == 1.25)

# minimize supplements
@objective(m, :Min, lactose + oil);

status = solve(m)

:Optimal

@show getvalue(whole_milk) getvalue(lowfat_milk) getvalue(lactose) getvalue(oil);

getvalue(whole_milk) = 0.36764705882352944
getvalue(lowfat_milk) = 0.0
getvalue(lactose) = 5.498529411764705
getvalue(oil) = 2.0661764705882355

The interpretation of this solution is as follows: To reproduce 100 ml of baby formula, we should mix 37 ml of whole milk with 5.5 g of lactose, 2 ml of oil and about 60 ml of water. Reduced-fat milk is not useful but both lactose and oil supplements are necessary.

Of course, this only takes the macronutrients into account, while the sophisticated baby formulas also strive to meet the micronutrient needs of the baby.

Diluting the Drink¶

Once the infant has reached a certain age, it is not necessary to supply its body with food every 3-4 hours anymore. In particular, they do not really need to drink milk during the night and might only do so out of habit. In fact, it seems that at one year old, the infant does not need even need to get any liquid and will be fine anyways.

At the same time, frequent food intake (in particular drinking) will increase the risk of tooth decay. So it makes sense to avoid night-time feeding.

In order to ensure some peaceful sleep nonetheless, I would like to scale down the energy content of our night-time drink over time. We will end up with just water, but go there in small steps. We will use our ingredients (with supplements) from above but change the targets: Rather then fixing the macronutrient levels, we will only ask for a specific energy content and put a limit on the protein level, since the infant’s kidney might not be ready to handle larger amounts yet.

function dilution(rel_energy=1.0)
    m = Model(solver=GLPKSolverLP())

    # our ingredients
    @variable(m, whole_milk ≥ 0)   # in 100 ml
    @variable(m, lowfat_milk ≥ 0)  # in 100 ml
    @variable(m, lactose ≥ 0)      # in g
    @variable(m, oil ≥ 0)  # in g

    # fit liquid volume (100ml)
    @constraint(m, whole_milk + lowfat_milk ≤ 1)

    # match energy content (kcal)
    @constraint(m, 68 * whole_milk + 48 * lowfat_milk + 4 * lactose + 9 * oil == rel_energy * 66)
    
    # limit protein content (g)
    @constraint(m, 3.4 * whole_milk + 3.5 * lowfat_milk <= 1.25)

    # minimize supplements
    @objective(m, :Min, lactose + oil);

    status = solve(m)
    sol = getvalue(whole_milk), getvalue(lowfat_milk), getvalue(lactose), getvalue(oil)
    
    return status, sol
end

dilution (generic function with 2 methods)

status, sol = dilution(1)

(:Optimal, (0.36764705882352944, 0.0, 0.0, 4.555555555555555))

By using a parameter value of 1, we will get a drink with the same energy content as our baby formula, but with a more relaxed relation of macronutrients.

Again, reduced-fat milk is not used and we find that only an oil supplement is needed but no sugar supplement. This makes sense, since the energy content of fat is higher than that of sugar and our objective here was to minimize the total mass of supplements.

Let us now compute a sequence of solutions with increasingly low energy content, over the course of two weeks.

# iterate over parameter range and collect solutions
r = range(1.0, stop=1e-6, length=14)
sols = []
for rel_nrg in r
    status, sol = dilution(rel_nrg)
    sol = collect(sol)' # tuple to row vector
    push!(sols, sol)
end

# collect all solutions values into one 2d-array
sols = vcat(sols...)

14×4 Array{Float64,2}:
 0.367647    0.0  0.0  4.55556  
 0.367647    0.0  0.0  3.99145  
 0.367647    0.0  0.0  3.42735  
 0.367647    0.0  0.0  2.86325  
 0.367647    0.0  0.0  2.29915  
 0.367647    0.0  0.0  1.73505  
 0.367647    0.0  0.0  1.17094  
 0.367647    0.0  0.0  0.606842 
 0.367647    0.0  0.0  0.0427396
 0.298643    0.0  0.0  0.0      
 0.223983    0.0  0.0  0.0      
 0.149322    0.0  0.0  0.0      
 0.0746615   0.0  0.0  0.0      
 9.70588e-7  0.0  0.0  0.0

We can see that we should never use more than 37 ml of whole milk per 100 ml of our drink.
In the beginning, this amount of milk stays the same and the oil supplementation will go down to account for the reduced energy content. Only when there is no more oil to be added, does the amount of milk go down. At this point, we will only mix whole milk and water.

Let us also analyze the macronutrient composition of our dilutions:

whole_milk, oil = sols[:, 1], sols[:, 4]

fat = 3.9 * whole_milk + oil
carbs = 4.9 * whole_milk
protein = 3.4 * whole_milk

energy = 9 * fat + 4 * carbs + 4 * protein

macros = hcat(fat, carbs, protein, energy)

14×4 Array{Float64,2}:
 5.98938     1.80147     1.25      66.1103    
 5.42528     1.80147     1.25      61.0334    
 4.86118     1.80147     1.25      55.9565    
 4.29707     1.80147     1.25      50.8795    
 3.73297     1.80147     1.25      45.8026    
 3.16887     1.80147     1.25      40.7257    
 2.60477     1.80147     1.25      35.6488    
 2.04067     1.80147     1.25      30.5719    
 1.47656     1.80147     1.25      25.495     
 1.16471     1.46335     1.01539   20.3973    
 0.873532    1.09751     0.761541  15.298     
 0.582356    0.731678    0.507695  10.1987    
 0.29118     0.365841    0.253849   5.09938   
 3.78529e-6  4.75588e-6  3.3e-6     6.62912e-5

Not surprisingly, our drink is relatively high in fat, more so than either whole milk or baby formula.

Similarly, we can compute the relative energy from macros in the above sequence:

macro_nrg = hcat(9 * fat, 4 * carbs, 4 * protein) ./ energy

14×3 Array{Float64,2}:
 0.815371  0.108998  0.0756312
 0.800013  0.118065  0.0819224
 0.781868  0.128777  0.0893552
 0.760102  0.141626  0.0982713
 0.733511  0.157325  0.109164 
 0.70029   0.176937  0.122773 
 0.657607  0.202135  0.140257 
 0.600748  0.235703  0.163549 
 0.521243  0.28264   0.196117 
 0.513909  0.286969  0.199122 
 0.513909  0.286969  0.199122 
 0.513909  0.286969  0.199122 
 0.513909  0.286969  0.199122 
 0.513909  0.286969  0.199122

Energy from fat gos down, while the contribution from carbs and protein goes up at the same rate:

macro_nrg[:, 2] ./ macro_nrg[:, 3]

14-element Array{Float64,1}:
 1.4411764705882357
 1.4411764705882355
 1.4411764705882355
 1.4411764705882357
 1.4411764705882353
 1.4411764705882355
 1.4411764705882355
 1.4411764705882355
 1.4411764705882357
 1.4411764705882353
 1.4411764705882355
 1.4411764705882353
 1.4411764705882353
 1.4411764705882353

That’s it for this analysis. I’m sorry for the lack of visualization!

Criteria	Julia	Matlab
Cost	Free	Hundreds of dollars per year
License	Open-Source	One year user license
Source	Non-profitable foundation and community	MathWorks company
Scope	Mainly numeric	Numeric only
Performance	Excellent	Faster than Python but slower than Julia
Editor/IDE	Jupyter is recommended	IDE already included
Packaging	Pkg manager included	Toolboxes included
Usage	Generic	Industry and research in academia
Fame	Young but starts to be known	Old and well-known
Support	Community	MathWorks
Documentation	Growing	Okay

Symbol	AvgPrice
“AAPL”	104.604
“AXP”	63.7933
“BA”	133.112
“CAT”	78.698

Symbol	Weight
“AAPL”	0.0399597
“AXP”	0.0243696
“BA”	0.0508498
“CAT”	0.0300634

Date	Value
2016-01-04	100.573
2016-01-05	100.511
2016-01-06	99.0142
2016-01-07	96.606

NStocks	Symbol	SolWeight
1	“TRV”	0.91443
2	“BA”	0.394428
2	“MMM”	0.313139

juliabloggers.com

A Julia Language Blog Aggregator

Tag Archives: optimization

Optimization with JuMP in Julia

JuMP: A package for mathematical programming in Julia

Comparing Julia vs. Matlab for optimization problems

JuMP and GLPK packages for optimization with Julia

A test problem

Solving the problem

Index Funds with Mixed-Integer-Programming