# Build & Deploy Machine Learning Apps on Big Data Platforms with Microsoft Linux Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft.

This post covers our latest additions to the Microsoft Linux Data Science Virtual Machine (DSVM), a custom VM image on Azure, purpose-built for data science, deep learning and analytics. Offered in both Microsoft Windows and Linux editions, DSVM includes a rich collection of tools, seen in the picture below, and makes you more productive when it comes to building and deploying advanced machine learning and analytics apps.

The central theme of our latest Linux DSVM release is to enable the development and testing of ML apps for deployment to distributed scalable platforms such as Spark, Hadoop and Microsoft R Server, for operating on data at a very large scale. In addition, with this release, DSVM also offers Julia Computing’s JuliaPro on both Linux and Windows editions.

Here’s more on the new DSVM components you can use to build and deploy intelligent apps to big data platforms:

#### Microsoft R Server 9.0

Version 9.0 of Microsoft R Server (MRS) is a major update to enterprise-scale R from Microsoft, supporting parallel and distributed computation. MRS 9.0 supports analytics execution in the Spark 2.0 context. There’s a new architecture and simplified interface for deploying R models and functions as web services via a new library called mrsdeploy, which makes it easy to consume models from other apps using the open Swagger framework.

#### Local Spark Standalone Instance

Spark is one of the premier platforms for highly scalable big data analytics and machine learning. Spark 2.0 launched in mid-2016 and brings several improvements such as the revised machine learning library (MLLib), scaling and performance optimization, better ANSI SQL compliance and unified APIs. The Linux DSVM now offers a standalone Spark instance (based on the Apache Spark distribution), PySpark kernel in Jupyter to help you build and test applications on the DSVM and deploy them on large scale clusters like Azure HDInsight Spark or your own on-premises Spark cluster. You can develop your code using either Jupyter notebook or with the included community edition of the Pycharm IDE for Python or RStudio for R.

#### Single Node Local Hadoop (HDFS and YARN) Instance

To make it easier to develop Hadoop programs and/or use HDFS storage locally for development and testing, a single node Hadoop installation is built into the VM. Also, if you are developing on the Microsoft R Server for execution in Hadoop or Spark remote contexts, you can first test things locally on the Linux DSVM and then deploy the code to a remote scaled out Hadoop or Spark cluster or to Microsoft R Server. These DSVM additions are designed to help you iterate rapidly when developing and testing your apps, before they get deployed into large-scale production big data clusters.

The DSVM is also a great environment for self-learning and running training classes on big data technologies. We provide sample code and notebooks to help you get started quickly on the different data science tools and technologies offered.

#### DSVM Resources

New to DSVM? Here are resources to get you started:

The goal of DSVM is to make data scientists and developers highly productive in their work and provide a broad array of popular tools. We hope you find it useful to have these new big data tools pre-installed with the DSVM.

Gopi

# DynProg Class – Week 2

By: pkofod

Re-posted from: http://www.pkofod.com/2017/02/18/dynprog-class-week-2/

This post, and other posts with similar tags and headers, are mainly directed at the students who are following the Dynamic Programming course at Dept. of Economics, University of Copenhagen in the Spring 2017. The course is taught using Matlab, but I will provide a few pointers as to how you can use Julia to solve the same problems. So if you are an outsider reading this I welcome you, but you won’t find all the explanations and theory here. If you want that, you’ll have to come visit us at UCPH and enroll in the class!

This week we continue with a continuous choice model. This means we have to use interpolation and numerical optimization.

## A (slightly less) simple model

Consider a simple continuous choice consumption-savings model:

$$V_t(M_t) = \max_{C_t}\sqrt{C_t}+\mathbb{E}[V_{t+1}(M_{t+1})|C_t, M_t]$$
subject to
$$M_{t+1} = M_t – C_t+R_t\\ C_t\leq M_t\\ C_t\in\mathbb{R}_+$$

where $$R_t$$ is 1 with probability $$\pi$$ and 0 with probability $$1-\pi$$, $$\beta=0.9$$, and $$\bar{M}=5$$

Last week the maximization step was merely comparing values associated with the different discrete choices. This time we have to do continuous optimization in each time period. Since the problem is now continuous, we cannot solve for all $$M_t$$. Instead, we need to solve for $$V_t$$ at specific values of $$M_t$$, and interpolate in between. Another difference to last time is the fact that the transitions are stochastic, so we need to form (very simple) expectations as part of the Bellman equation evaluations.

### Interpolation

It is of course always possible to make your own simple interpolation scheme, but we will use the functionality provided by the Interpolations.jl package. To perform interpolation, we need a grid to be used for interpolation $$\bar{x}$$, and calculate the associated function values.

f(x) = (x-3)^2
x̄ = linspace(1,5,5)
fx̄ = f.(x̄)

Like last time, we remember that the dot after the function name and before the parentheses represent a “vectorized” call, or a broadcast – that is we call f on each element of the input. We now use the Interpolations.jl package to create an interpolant $$\hat{f}$$.

using Interpolations
f̂ = interpolate((collect(x̄),), fx̄, Gridded(Linear()))

We can now index into $$\hat{f}$$ as if it was an array

f̂[-3] #returns 16.0

We can also plot it

using Plots
plot(f̂)

which will output

### Solving the model

Like last time, we prepare some functions, variables, and empty containers (Arrays)

# Model again
u(c) = sqrt(c)
T = 10; β = 0.9
π = 0.5 ;M₁ = 5
# Number of interpolation nodes
Nᵐ = 50 # number of grid points in M grid
Nᶜ  = 50 # number of grid points in C grid

M = Array{Vector{Float64}}(T)
V = Array{Any}(T)
C = Array{Any}(T)

The V and C arrays are allocated using the type “Any”. We will later look at how this can hurt performance, but for now we will simply do the convenient thing. Then we solve the last period

M[T] = linspace(0,M₁+T,Nᵐ)
C[T] = M[T]
V[T] = interpolate((M[T],), u.(C[T]), Gridded(Linear()))

This new thing here is that we are not just saving V[T] as an Array. The last element is the interpolant, such that we can simply index into V[T] as if we had the exact solution at all values of M (although we have to remember that it is an approximation). For all periods prior to T, we have to find the maximum as in the Bellman equation from above. To solve this reduced “two-period” problem (sum of utility today and discounted value tomorrow), we need to form expectations over the two possible state transitions given an M and a C today, and then we need to find the value of C that maximizes current value. We define the following function to handle this

# Create function that returns the value given a choice, state, and period
v(c, m, t, V) = u(c)+β*(π*V[t+1][m-c+1]+(1-π)*V[t+1][m-c])

Notice how convenient it is to simply index into V[t] using the values we want to evaluate tomorrow’s value function at. We perform the maximization using grid search on a predefined grid from 0 to the particular M we’re solving form. If we abstract away from the interpolation step, this is exactly what we did last time.

for t = T-1:-1:1
M[t] = linspace(0,M₁+t,Nᵐ)
C[t] = zeros(M[t])
Vt = fill(-Inf, length(M[t]))
for (iₘ, m) = enumerate(M[t])
for c in linspace(0, m, Nᶜ)
_v = v(c, m, t, V)
if _v >= Vt[iₘ]
Vt[iₘ] = _v
C[t][iₘ] = c
end
end
end
V[t] = interpolate((M[t],), Vt, Gridded(Linear()))
end

Then we can plot the value functions to verify that they look sensible

Nicely increasing in time and in M.

### Using Optim.jl for optimization

The last loop could just as well have been done using a proper optimization routine. This will in general be much more robust, as we don’t confine ourselves to a certain amount of C-values. We use the one of the procedures in Optim.jl. In Optim.jl, constrained, univariate optimization is available as either Brent’s method or Golden section search. We will use Brent’s method. This is the standard method, so an optimization call simply has the following syntax

using Optim
f(x) = x^2
optimize(f, -1.0, 2.0)

Unsurprisingly, this will return the global minimizer 0.0. However, if we constrain ourselves to a strictly positive interval

optimize(f, 1.0, 2.0)

we get a minimizer of 1.0. This is not the unconstrained minimizer of the square function, but it is minimizer given the constraints. Then, it should be straight forward to see how the grid search loop can be converted to a loop using optimization instead.

for t = T-1:-1:1
update_M!(M, M₁, t, Nᵐ)
C[t] = zeros(M[t])
Vt = fill(-Inf, length(M[t]))
for (iₘ, m) = enumerate(M[t])
if m == 0.0
C[t][iₘ] = m
Vt[iₘ] = v(m, m, t, V)
else
res = optimize(c->-v(c, m, t, V), 0.0, m)
Vt[iₘ] = -Optim.minimum(res)
C[t][iₘ] = Optim.minimizer(res)
end
end
V[t] = interpolate((M[t],), Vt, Gridded(Linear()))
end

If our agent has no resources at the beginning of the period, the choice set has only one element, so we skip the optimization step. We also have to negate the minimum to get the maximum we’re looking for. The main advantage of using a proper optimization routine is that we’re not restricting C to be in any predefined grid. This increases precision. If we look at the number of calls to “v” (using Optim.f_calls(res)), we see that it generally takes around 10-30 v calls. With only 10-30 grid points from 0 up to M, we would generally get an approximate solution of much worse quality.

### Julia bits and pieces

This time we used a few different package from the Julia ecosystem: Plots.jl, Interpolations.jl, and Optim.jl. These are based on my personal choices (full disclosure: I’ve contributed to the former and the latter), but there are lots of packages to explore. Visit the JuliaLang discourse forum or gitter channel to discuss Julia and the various packages with other users.

# DynProg Class – Week 1

By: pkofod

Re-posted from: http://www.pkofod.com/2017/02/10/dynprog-class-week-1/

This post, and other posts with similar tags and headers, are mainly directed at the students who are following the Dynamic Programming course at Dept. of Economics, University of Copenhagen in the Spring 2017. The course is taught using Matlab, but I will provide a few pointers as to how you can use Julia to solve the same problems. So if you are an outsider reading this I welcome you, but you won’t find all the explanations and theory here. Then you’ll have to come visit us at UCPH and enroll in the class!

We start out with very simple examples and Julia code, and gradually ramp up the difficulty level.

## Simple model

Consider the simplest consumption-savings model:

$$V^*(M) = \max_{C_1,C_2,\ldots,C_T}\beta^0\sqrt{C_1},\beta^1\sqrt{C2},\ldots,\beta^T\sqrt{C_T}$$

subject to
$$\bar{M} = C_1+C_2+\ldots+C_T\equiv m_1$$
$$m_{t+1} = m_t – C_t$$
$$C_t\in\mathbb{N}$$
$$\beta=0.9,\quad\bar{M}=5$$

Let us try to solve such models.

### 1) Brute force

This is not a very clever way, but let us start somewhere. We don’t have a better method – yet! Set T=2. Note, there is only one free choice to vary. Given some consumption in period t, the other is given due to the “No cake left” boundary condition.

    # Primitives
β = 0.90
α = 0.5
u(x, α) = x^α
## State
M̄ = 5.0
Cᵒᵖᵗ = u(0.0, α) + β*u(M̄, α)
Vᵒᵖᵗ = -Inf
for C1 in 1.0:1.0:M̄
V = u(C1, α) + β*u(M̄-C1, α)
if Vᵒᵖᵗ &lt;= V
Vᵒᵖᵗ = V
Cᵒᵖᵗ = C1
end
end
println("Results")
println("* optimimum: ", Vᵒᵖᵗ)
println("* optimizer: ", Cᵒᵖᵗ)

This tells us that our agent should consume three units in the first period and two in the last period.

### 2) Vary β and M

Check that different M’s and β’s give sensible predictions.

We don’t want to rerun that code manually each time – we’re going to need something smarter. Let’s wrap it in a function. In Julia, a typical function (method) definition looks something like the following.

function brute_force(M̄, α, β)
Vᵒᵖᵗ = u(0.0, α) + β*u(M̄, α)
Cᵒᵖᵗ = 0

for C1 in 1.0:1.0:M̄
V = u(C1, α) + β*u(M̄-C1, α)
if Vᵒᵖᵗ <= V
Vᵒᵖᵗ = V
Cᵒᵖᵗ = C1
end
end
Vᵒᵖᵗ, Cᵒᵖᵗ
end

Notice that whatever is on the last line will be returned. For example, you can verify that it returns the same thing as the loop above, or check what happens if β is 0:

brute_force(M̄, α, 0.0)

The output tells us that the agent should consume everything in the first period. This is consistent with the intuition that the agent doesn’t care about the future at all. What if we have no units at the first period?

brute_force(0, α, β)

We see that the agent should consume nothing. This might seem like a stupid thing to test for, but it is always important to sanity check your code. If it can’t predict things that are obvious, it will probably also give wrong answers to more complex cases. It can be hard to come up with such cases, but the last two examples will always hold. Full discounting -> full consumption today. No ressources -> no consumption today (unless you can borrow).

### 3) Solve a model using backwards induction

It requires little convincing to realize, that it is wasteful to check all possible combinations – especially when the number of periods increases. Instead, we use the principle of optimality to solve the problem using dynamic programming. To solve the model using dynamic programming, we need to create a few variables to hold the value and policy functions. These will hold the solutions from each (t,m) subproblem. We start with T=3

T = 3
M = 0.0:M̄
Vᵒᵖᵗ = [zeros(M) for t = 1:T]
Cᵒᵖᵗ = [zeros(M) for t = 1:T]

We’ve used vectors of vectors here, but could just as well have created a $5\times 3$ matrix in this case. However, in a more general class of models, the number of states might vary over time, and there a vector of vector becomes convenient. With these variables in place, we can solve the last period by realizing that given some resource level in period T, we need to consume everything in order fulfill the constraints of the problem (no cake left on the table!).

Vᵒᵖᵗ[end] = sqrt.(M)
Cᵒᵖᵗ[end] = M

Notice the dot after sqrt. This is Julia syntax for “broadcast the square root function over M”. Then, we solve for the period t=T-1, then t=T-2, and then we’re done. We implement a straight forward loop over all possible m’s and feasible consumption given these m’s, by using the fact that the feasible values for consumption make up the set
$$\{0,1,2,…,m_t\}$$.

Again, you should try to see if some of all this shouldn’t really be separate functions.
Julia bits and pieces
To close off this post, let me just briefly touch upon some of the things we used above, that might be different from Matlab.
To allocate an empty array, write

V = Vector(10)

or a vector with some specific element value (here, a Float64 NaN)

W = fill(NaN, 10)

Indexing into arrays is done with square brackets. For example, let’s assign the first element of W to a variable WW

WW = W[1]

And the same goes for setting the value at some index

function f(x,y)
x+y+rand() # returns the sum of the two inputs and a random number
end

g(x,y) = x+y+rand()



and comments are inserted using # or the #= =# blocks

# This is a comment

#= This
is
a
comment
=#

For-loops are written as

for a = some_iterator
# do something
end

or to get both the value and a counter use enumerate

for (i, a) = enumerate(some_iterator)
# do stuff with a and index something with i
end

Code that should be run conditional on some statement being true is written as

if condition
# do something if condition is satisfied
end

and that’s all for now, folks!