My quest to responsive visualizations with Julia

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2016/09/my-quest-to-responsive-visualizations-with-julia/

Responsive and fun interactivity is notoriously hard! But it’s also the key to less frustration and more patience when working on a project.

There are endless important projects that humanity needs to solve. To become less of a burden to the environment and have a better society we constantly need to advance the state of the art.

Still, people give up on doing this when the initial amount of patience, motivation and curiosity is depleted without getting new incentives.

This actually happened to me with my 3D modeling hobby a few years ago. Not only was the 3D software difficult to learn, it also turned unresponsive fairly quickly, even on a powerful machine.

Just as an example, rendering a 30 second clip in high quality took ~2 weeks with my PC running day and night. This slowness drained my patience and at some point I decided that this is too much hassle for a hobby.

Half a year later, I ran into a similar problem when working on a computer vision pipeline. We simply couldn’t find frameworks which were fast enough for real time processing while maintaining high programming productivity. This time I didn’t give up, but the job could have been much more fun and productive!

At this point I decided that I want to work on software that turns really tough problems like object recognition of 3D models into weekend projects.

I started working on GLVisualize, a library for fast and interactive graphics written in Julia.

Why graphics?

Visualizing something is the first step to understanding and it allows us to explore huge problem spaces that were invisible before.

But current tools seem to be divided into 2D and 3D libraries, high performance libraries, which are kind of a hassle to use, and easily usable libraries which turn into an unresponsive mess quickly.

So with my background, this seemed like a worthy start!

Why Julia?

While there are several reasons for choosing Julia, today I want to show you its impressive speed despite being an easy-to-use dynamic language.

Let me show you the benefit with two toy examples (all graphics including the plots were produced with GLVisualize!).

Consider the following function, the famous Lorenz Attractor:

It takes a point t0 and parameters a to d and returns a new point. Could you imagine what kind of line this will draw, when recursively applying it to the new result and changing parameters?

I guess no one has such a powerful brain, which is why we visualize these kind of things, together with sliders to explore the function:

lorenz

code

These are 100,000 points, and as you can see GLVisualize can deal with that amount easily.

Lets see how long we could have stayed at pleasant interactive speeds in another language.

Lets take for example Python, a language comparable in usability:

 

pyjulo

 

minimal speedup 70.4680453961563
maximal speedup 148.02061279172827
max seconds Julia 0.193422334
max seconds Python 13.630093812942505

As you can see, computation times jump beyond one second at around 10⁵ points for Python, while Julia stays interactive with up to 10⁷ points.

So if you’re interested in a higher resolution in Python you better crank up your patience or call out to C, eliminating all convenience of Python abruptly!

Next example is a 3D mandelbulb visualization:

mandelbulb on the gpu

code

One step through the function takes around 24 seconds with Julia on the CPU, which makes it fairly painful to explore the parameters.

So why is the shown animation still smooth? Well, when choosing Julia, I’ve been betting on the fact that it should be fairly simple

to run Julia code on the GPU. This bet is now starting to become reality.

I’m using CUDAnative and GPUArrays to speed up the calculation of the mandelbulb.

CUDAnative allows you to run Julia code on the GPU while GPUArrays offers a simpler array interface for this task.

Here are the benchmark results:

cujlmandel

minimal speedup 37.05708590791309
maximal speedup 401.59165062897495
max seconds cpu 24.275534426
max seconds cuda 0.128474155

This means that by running Julia on the GPU we can still enjoy a smooth interaction with this function.

But the problem is actually so hard, that I can’t run this interactively at the resolution I would like to.

As you can see, the iso surface visualization still looks very coarse. So even when using state of the art software, you still run into problems that won’t compute under one second.

But with Julia you can at least squeeze out the last bit of performance of your hardware, be it CPU or GPU, while enjoying the comfort of a dynamic high level language!

I’m sure that with Julia and advances in hardware, algorithms and optimizations, we can soon crack even the hardest problem with ease!

 


Benchmarking system:

RAM: 32GB

CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8

GPU: GeForce GTX 950

 

An Introduction to Machine Learning in Julia

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2016/09/28/knn-char-recognition.html

Introduction

Machine learning is now pervasive in every field of inquiry and has lead to
breakthroughs in various fields from medical diagnoses to online advertising.
Practical machine learning is quite computationally intensive, whether it involves
millions of repetitions of simple mathematical methods such as
Euclidian Distance or more
intricate optimizers
or backpropagation algorithms.
Such computationally intensive techniques need a fast and expressive language –
one that enables scientists to write simple, readable code that
performs well. Enter Julia.

In this post, we introduce a simple machine learning algorithm
called K Nearest Neighbors,
and demonstrate certain Julia features that allow for its easy and efficient
implementation. We will demonstrate that the code we write is inherently
generic, and show the use of the same code to run on GPUs via the
ArrayFire package.

Problem statement

We need to come up with a character recognition software that can identify images
of textual characters. The data we’ll be using for this problem are a
set of images from Google Street View.
The algorithm we’ll be using is called k-Nearest Neighbors (kNN), which is a
useful starting point for working on problems of this nature.

Solution

Let us begin by taking a good look at our data. First, we read the data from a set of
CSV files from disk, and plot a histogram that represents the distribution of the training data:

# Read training set 
trainLabels = readtable("$(path)/trainLabels.csv")
counts=by(trainLabels, :Class, nrow);
plot(x = counts[:Class], y=counts[:x1], color = counts[:Class], Theme(key_position = :none), Guide.xlabel("Characters")

Training data distribution

When developing this code within an integrared environment such as IJulia Notebooks, we can explore this data interactively. For example, we can scroll through the images using the
Interact package:

@manipulate for i = 1:size(trainLabels,1)
    load("$(path)/trainResized/$i.Bmp")
end

kNN

The solution of labelling the image with its textual character involves finding the “distances” of each image in the training
set to every other image. Then, the image under consideration is classified
as having the most common label among its k nearest images.
We use Euclidean distance
to measure the proximity of the images:

Let X be a set of m training samples, Y be the labels for each of those samples, and x be
the unknown sample to be classified. We perform the following steps:

  • First, calculate all the distances:
for i = 1 to m
    distance = d(X(i), x)
end

We can also represent this generally in matrix notation:

function get_all_distances(imageI::AbstractArray, x::AbstractArray)
    diff = imageI .- x
    distances = vec(sum(diff .* diff,1))
end
  • Then sort the distances in ascending order and select first k indices
    from the sorted list.
function get_k_nearest_neighbors(x::AbstractArray, imageI::AbstractArray, k::Int = 3, train = true)
    nRows, nCols = size(x)
    distances = get_all_distances(imageI, x)
    sortedNeighbors = sortperm(distances)
    if train
        return kNearestNeighbors = Array(sortedNeighbors[2:k+1])
    else
        return kNearestNeighbors = Array(sortedNeighbors[1:k])  
    end
end
  • Finally, classify x as the most common element among the k indices.
function assign_label(x::AbstractArray, y::AbstractArray{Int64}, imageI::AbstractArray, k, train::Bool)
    kNearestNeighbors = get_k_nearest_neighbors(x, imageI, k, train)
    counts = Dict{Int, Int}()
    highestCount = 0
    mostPopularLabel = 0
    for n in kNearestNeighbors
        labelOfN = y[n]
        if !haskey(counts, labelOfN)
            counts[labelOfN] = 0
        end
        counts[labelOfN] += 1
        if counts[labelOfN] > highestCount
            highestCount = counts[labelOfN]
            mostPopularLabel = labelOfN
        end
     end
    mostPopularLabel
end

The objective is to find the best value of k such that we reach optimal
accuracy.

Training

We use the labeled data to find the optimal value of k. The following code
will output the accuracy for values of k from 3 to 5, for example.

for k  = 3:5
    checks = trues(size(train_data, 2))
    @time for i = 1:size(train_data, 2)
        imageI = train_data[:,i]
        checks[i]  = (assign_label(train_data, trainLabelsInt, imageI, k, true) == trainLabelsInt[i])
    end
    accuracy = length(find(checks)) / length(checks)
    @show accuracy
end

Testing

Now, from our training, we should have identified a value of k that gives us
the best accuracy. Of course, this varies depending on the data. Suppose our runs
tell us the best value of k is 3. It’s time to now test our model. For this,
we can actually use the same code that we used for training, but we needn’t check
for accuracy or loop over values of k.

x = test_data
xT = train_data
yT = trainLabelsInt
k = 3, #say
prediction = zeros(Int,size(x,2))
@time for i = 1:size(x,2)
    imageI = x[:,i]
    prediction[i]  = assign_label(xT, yT, imageI, k, false)
end

Improving Performance

There are a couple of ways we can speed up our code. Notice that each iteration
of the for-loop in the main compute loop is independent. This looks like a great
candidate for parallelisation. So, we can spawn a couple
of Julia processes and execute the for loop in parallel.

Parallel for-loops

The macro @parallel will split the iteration space amongst the available Julia
processes and then perform the computation in each process simultaneously.

Let us first add n processes by using the addprocs function. We
then execute the following for loop in parallel.

addprocs(n)
@time sumValues = @parallel (+) for i in 1:size(xTrain, 2)
    assign_label(xTrain, yTrain, k, i) == yTrain[i, 1]
end

The following scaling plot shows performance variation as you spawn more processes:

GPU Acceleration

The ArrayFire package enables us to run
computations on various general purpose GPUs from within Julia programs.
The AFArray constructor from this package
converts Julia vectors to ArrayFire Arrays, which can be transferred to the GPU.
Converting Arrays to AFArrays allows us to trivially use the same code as above to run our
computation on the GPU.

checks = trues(AFArray{Bool}, size(train_data, 2))
train_data = AFArray(train_data)
trainLabelsInt = AFArray(trainLabelsInt)
for k  = 3:5
    @time for i = 1:size(train_data, 2)
        imageI = train_data[:,i]
        checks[i]  = (assign_label(train_data, trainLabelsInt, imageI, k, true) == trainLabelsInt[i])
    end
    accuracy = length(find(checks)) / length(checks)
    @show accuracy
end

Benchmarking against the serial version shows a significant speedup, which has been achieved without much extra code!

Serial Time ArrayFire Time
34 sec 20 sec

Conclusion

  • Julia allows for easy prototyping and deployment of machine learning models.
  • Parallelization also very easy to implement.
  • Built-in parallelism gives the programmer a variety of options to speed up their code.
  • Generic code can be run on GPUs using the package ArrayFire

Future Work

Congratulations! You now know how to write a simple image recognition model using
kNN. While this represents a good starting point for problems such as these, it is by
no means the state of the art. We shall explore more sophisticated methods, such as deep
neural networks
, in a future blog post.

Machine Learning and Visualization in Julia

By: Tom Breloff

Re-posted from: http://www.breloff.com/JuliaML-and-Plots/

In this post, I’ll introduce you to the Julia programming language and a couple long-term projects of mine: Plots for easily building complex data visualizations, and JuliaML for machine learning and AI. After short introductions to each, we’ll quickly throw together some custom code to build and visualize the training of an artificial neural network. Julia is fast, but you’ll see that the real speed comes from developer productivity.

Introduction to Julia

Julia is a fantastic, game-changing language. I’ve been coding for 25 years, using mainstays like C, C++, Python, Java, Matlab, and Mathematica. I’ve dabbled in many others: Go, Erlang, Haskel, VB, C#, Javascript, Lisp, etc. For every great thing about each of these languages, there’s something equally bad to offset it. I could never escape the “two-language problem”, which is when you must maintain a multi-language code base to deal with the deficiencies of each language. C can be fast to run, but it certainly isn’t fast to code. The lack of high-level interfaces means you’ll need to do most of your analysis work in another language. For me, that was usually Python. Now… Python can be great, but when it’s not good enough… ugh.

Python excels when you want high level and your required functionality already exists. If you want to implement a new algorithm with even minor complexity, you’ll likely need another language. (Yes… Cython is another language.) C is great when you just want to move some bytes around. But as soon as you leave the “sweet spot” of these respective languages, everything becomes prohibitively difficult.

Julia is amazing because you can properly abstract exactly the right amount. Write pseudocode and watch it run (and usually fast!) Easily create strongly-typed custom data manipulators. Write a macro to automate generation of your boilerplate code. Use generated functions to produce highly specialized code paths depending on input types. Create your own mini-language for domain-specificity. I often find myself designing solutions to problems that simply should not be attempted in other languages.

using Plots
pyplot()
labs = split("Julia C/C++ Python Matlab Mathematica Java Go Erlang")
ease = [0.8, 0.1, 0.8, 0.7, 0.6, 0.3, 0.5, 0.5]
power = [0.9, 0.9, 0.3, 0.4, 0.2, 0.8, 0.7, 0.5]
txts = map(i->text(labs[i], font(round(Int, 5+15*power[i]*ease[i]))), 1:length(labs))
scatter(ease, power,
    series_annotations=txts, ms=0, leg=false,
    xguide="Productivity", yguide="Power",
    formatter=x->"", grid=false, lims=(0,1)
)

I won’t waste time going through Julia basics here. For the new users, there are many resources for learning. The takeaway is: if you’re reading this post and you haven’t tried Julia, drop what you’re doing it and give it a try. With services like JuliaBox, you really don’t have an excuse.

Introduction to Plots

Plots (and the JuliaPlots ecosystem) are modular tools and a cohesive interface, which let you very simply define and manipulate visualizations.

One of its strengths is the varied supported backends. Choose text-based plotting from a remote server or real-time 3D simulations. Fast, interactive, lightweight, or complex… all without changing your code. Massive thanks to the creators and developers of the many backend packages, and especially to Josef Heinen and Simon Danisch for their work in integrating the awesome GR and GLVisualize frameworks.

However, more powerful than any individual feature is the concept of recipes. A recipe can be simply defined as a conversion with attributes. “User recipes” and “type recipes” can be defined on custom types to enable them to be “plotted” just like anything else. For example, the Game type in my AtariAlgos package will capture the current screen from an Atari game and display it as an image plot with the simple command plot(game):

“Series recipes” allow you to build up complex visualizations in a modular way. For example, a histogram recipe will bin data and return a bar plot, while a bar recipe can in turn be defined as a bunch of shapes. The modularity greatly simplifies generic plot design. Using modular recipes, we are able to implement boxplots and violin plots, even when a backend only supports simple drawing of lines and shapes:

To see many more examples of recipes in the wild, check out StatPlots, PlotRecipes, and more in the wider ecosystem.

For a more complete introduction of Plots, see my JuliaCon 2016 workshop and read through the documentation

Introduction to JuliaML

JuliaML (Machine Learning in Julia) is a community organization that was formed to brainstorm and design cohesive alternatives for data science. We believe that Julia has the potential to change the way researchers approach science, enabling algorithm designers to truly “think outside the box” (because of the difficulty of implementing non-conventional approaches in other languages). Many of us have independently developed tools for machine learning before contributing. Some of my contributions to the current codebase in JuliaML are copied-from or inspired-by my work in OnlineAI.

The recent initiatives in the Learn ecosystem (LearnBase, Losses, Transformations, Penalties, ObjectiveFunctions, and StochasticOptimization) were spawned during the 2016 JuliaCon hackathon at MIT. Many of us, including Josh Day, Alex Williams, and Christof Stocker (by Skype), stood in front of a giant blackboard and hashed out the general design. Our goal was to provide fast, reliable building blocks for machine learning researchers, and to unify the existing fragmented development efforts.

  • Learn: The “meta” package for JuliaML, which imports and re-exports many of the packages in the JuliaML organization. This is the easiest way to get everything installed and loaded.
  • LearnBase: Lightweight method stubs and abstractions. Most packages import (and re-export) the methods and abstract types from LearnBase.
  • Losses: A collection of types and methods for computing loss functions for supervised learning. Both distance-based (regression/classification) and margin-based (Support Vector Machine) losses are supported. Optimized methods for working with array data are provided with both allocating and non-allocating versions. This package was originally Evizero/LearnBase.jl. Much of the development is by Christof Stocker, with contributions from Alex Williams and myself.
  • Transformations: Tensor operations with attached storage for values and gradients: activations, linear algebra, neural networks, and more. The concept is that each Transformation has both input and output Node for input and output arrays. These nodes implicitly link to storage for the current values and current gradients. Nodes can be “linked” together in order to point to the same underlying storage, which makes it simple to create complex directed graphs of modular computations and perform backpropagation of gradients. A Chain (generalization of a feedforward neural network) is just an ordered list of sub-transformations with appropriately linked nodes. A Learnable is a special type of transformation that also has parameters which can be learned. Utilizing Julia’s awesome array abstractions, we can collect params from many underlying transformations into a single vector, and avoid costly copying and memory allocations. I am the primary developer of Transformations.
  • Penalties: A collection of types and methods for regularization functions (penalties), which are typically part of a total model loss in learning algorithms. Josh Day (creator of the awesome OnlineStats) is the primary developer of Penalties.
  • ObjectiveFunctions: Combine transformations, losses, and penalties into an objective. Much of the interface is shared with Transformations, though this package allows for flexible Empirical Risk Minimization and similar optimization. I am the primary developer on the current implementation.
  • StochasticOptimization: A generic framework for optimization. The initial focus has been on gradient descent, but I have hopes that the framework design will be adopted by other classic optimization frameworks, like Optim. There are many gradient descent methods included: SGD with momentum, Adagrad, Adadelta, Adam, Adamax, and RMSProp. The flexible “Master Learner” framework provides a modular approach to optimization algorithms, allowing developers to add convergence criteria, custom iteration traces, plotting, animation, etc. We’ll see this flexibility in the example below. We have also redesigned data iteration/sampling/splitting, and the new iteration framework is currently housed in StochasticOptimization (though it will eventually live in MLDataUtils). I am the primary developer for this package.

Learning MNIST

Time to code! I’ll walk you through some code to build, learn, and visualize a fully connected neural network for the MNIST dataset. The steps I’ll cover are:

  • Load and initialize Learn and Plots
  • Build a special wrapper for our trace plotting
  • Load the MNIST dataset
  • Build a neural net and our objective function
  • Create custom traces for our optimizer
  • Build a learner, and learn optimal parameters

Custom visualization for tracking MNIST fit

Disclaimers:

  • I expect you have a basic understanding of gradient descent optimization and machine learning models. I don’t have the time or space to explain those concepts in detail, and there are plenty of other resources for that.
  • Basic knowledge of Julia syntax/concepts would be very helpful.
  • This API is subject to change, and this should be considered pre-alpha software.
  • This assumes you are using Julia 0.5.

Get the software (use Pkg.checkout on a package for the latest features):

# Install Learn, which will install all the JuliaML packages
Pkg.clone("https://github.com/JuliaML/Learn.jl")
Pkg.build("Learn")
Pkg.checkout("MLDataUtils", "tom") # call Pkg.free if/when this branch is merged

# A package to load the data
Pkg.add("MNIST")

# Install Plots and StatPlots
Pkg.add("Plots")
Pkg.add("StatPlots")

# Install GR -- the backend we'll use for Plots
Pkg.add("GR")

Start up Julia, then load the packages:

using Learn
import MNIST
using MLDataUtils
using StatsBase
using StatPlots

# Set up GR for plotting. x11 is uglier, but much faster
ENV["GKS_WSTYPE"] = "x11"
gr(leg=false, linealpha=0.5)

A custom type to simplify the creation of trace plots (which will probably be added to MLPlots):

# the type, parameterized by the indices and plotting backend
type TracePlot{I,T}
    indices::I
    plt::Plot{T}
end

getplt(tp::TracePlot) = tp.plt

# construct a TracePlot for n series.  note we pass through
# any keyword arguments to the `plot` call
function TracePlot(n::Int = 1; maxn::Int = 500, kw...)
    indices = if n > maxn
        # limit to maxn series, randomly sampled
        shuffle(1:n)[1:maxn]
    else
        1:n
    end
    TracePlot(indices, plot(length(indices); kw...))
end

# add a y-vector for value x
function add_data(tp::TracePlot, x::Number, y::AbstractVector)
    for (i,idx) in enumerate(tp.indices)
        push!(tp.plt.series_list[i], x, y[idx])
    end
end

# convenience: if y is a number, wrap it as a vector and call the other method
add_data(tp::TracePlot, x::Number, y::Number) = add_data(tp, x, [y])

Load the MNIST data and preprocess:

# our data:
x_train, y_train = MNIST.traindata()
x_test, y_test = MNIST.testdata()

# normalize the input data given μ/σ for the input training data
μ, σ = rescale!(x_train)
rescale!(x_test, μ, σ)

# convert class vector to "one hot" matrix
y_train, y_test = map(to_one_hot, (y_train, y_test))

train = (x_train, y_train)
test = (x_test, y_test)

Build a neural net with softplus activations for the inner layers and softmax output for classification:

nin, nh, nout = 784, [50,50], 10
t = nnet(nin, nout, nh, :softplus, :softmax)

Note: the nnet method is a very simple convenience constructor for Chain transformations. It’s pretty easy to construct the transformation yourself for more complex models. This is what is constructed on the call to nnet:

Chain{Float64}(
   Affine{784-->50}
   softplus{50}
   Affine{50-->50}
   softplus{50}
   Affine{50-->10}
   softmax{10}
)

Create an objective function to minimize, adding an Elastic (combined L1/L2) penalty/regularization. Note that the cross-entropy loss function is inferred automatically for us since we are using softmax output:

obj = objective(t, ElasticNetPenalty(1e-5))

Build TracePlot objects for our custom visualization:

# parameter plots
pidx = 1:2:length(t)
pvalplts = [TracePlot(length(params(t[i])), title="$(t[i])") for i=pidx]
ylabel!(pvalplts[1].plt, "Param Vals")
pgradplts = [TracePlot(length(params(t[i]))) for i=pidx]
ylabel!(pgradplts[1].plt, "Param Grads")

# nnet plots of values and gradients
valinplts = [TracePlot(input_length(t[i]), title="input", yguide="Layer Value") for i=1:1]
valoutplts = [TracePlot(output_length(t[i]), title="$(t[i])", titlepos=:left) for i=1:length(t)]
gradinplts = [TracePlot(input_length(t[i]), yguide="Layer Grad") for i=1:1]
gradoutplts = [TracePlot(output_length(t[i])) for i=1:length(t)]

# loss/accuracy plots
lossplt = TracePlot(title="Test Loss", ylim=(0,Inf))
accuracyplt = TracePlot(title="Accuracy", ylim=(0.6,1))

Add a method for computing the loss and accuracy on a subsample of test data:

function my_test_loss(obj, testdata, totcount = 500)
    totloss = 0.0
    totcorrect = 0
    for (x,y) in eachobs(rand(eachobs(testdata), totcount))
        totloss += transform!(obj,y,x)

        # logistic version:
        # ŷ = output_value(obj.transformation)[1]
        # correct = (ŷ > 0.5 && y > 0.5) || (ŷ <= 0.5 && y < 0.5)

        # softmax version:
         = output_value(obj.transformation)
        chosen_idx = indmax()
        correct = y[chosen_idx] > 0

        totcorrect += correct
    end
    totloss, totcorrect/totcount
end

Our custom trace method which will be called after each minibatch:

tracer = IterFunction((obj, i) -> begin
    n = 100
    mod1(i,n)==n || return false

    # param trace
    for (j,k) in enumerate(pidx)
        add_data(pvalplts[j], i, params(t[k]))
        add_data(pgradplts[j], i, grad(t[k]))
    end

    # input/output trace
    for j=1:length(t)
        if j==1
            add_data(valinplts[j], i, input_value(t[j]))
            add_data(gradinplts[j], i, input_grad(t[j]))
        end
        add_data(valoutplts[j], i, output_value(t[j]))
        add_data(gradoutplts[j], i, output_grad(t[j]))
    end

    # compute approximate test loss and trace it
    if mod1(i,500)==500
        totloss, accuracy = my_test_loss(obj, test, 500)
        add_data(lossplt, i, totloss)
        add_data(accuracyplt, i, accuracy)
    end

    # build a heatmap of the total outgoing weight from each pixel
    pixel_importance = reshape(sum(t[1].params.views[1],1), 28, 28)
    hmplt = heatmap(pixel_importance, ratio=1)

    # build a nested-grid layout for all the trace plots
    plot(
        map(getplt, vcat(
                pvalplts, pgradplts,
                valinplts, valoutplts,
                gradinplts, gradoutplts,
                lossplt, accuracyplt
            ))...,
        hmplt,
        size = (1400,1000),
        layout=@layout([
            grid(2,length(pvalplts))
            grid(2,length(valoutplts)+1)
            grid(1,3){0.2h}
        ])
    )

    # show the plot
    gui()
end)

# trace once before we start learning to see initial values
tracer.f(obj, 0)

Finally, we build our learner and learn! We’ll use the Adadelta method with a learning rate of 0.05. Notice we just added our custom tracer to the list of parameters… we could have added others if we wanted. The make_learner method is just a convenience to optionally construct a MasterLearner with some common sub-learners. In this case we add a MaxIter(50000) sub-learner to stop the optimization after 50000 iterations.

We will train on randomly-sampled minibatches of 5 observations at a time, and update our parameters using the average gradient:

learner = make_learner(
    GradientDescent(5e-2, Adadelta()),
    tracer,
    maxiter = 50000
)
learn!(obj, learner, infinite_batches(train, size=5))

A snapshot after training for 30000 iterations

After a little while we are able to predict ~97% of the test examples correctly. The heatmap (which represents the “importance” of each pixel according to the outgoing weights of our model) depicts the important curves we have learned to distinguish the digits. The performance can be improved, and I might devote future posts to the many ways we could improve our model, however model performance was not my focus here. Rather I wanted to highlight and display the flexibility in learning and visualizing machine learning models.

Conclusion

There are many approaches and toolsets for data science. In the future, I hope that the ease of development in Julia convinces people to move their considerable resources away from inefficient languages and towards Julia. I’d like to see Learn.jl become the generic interface for all things learning, similar to how Plots is slowly becoming the center of visualization in Julia.

If you have questions, or want to help out, come chat. For those in the reinforcement learning community, I’ll probably focus my next post on Reinforce, AtariAlgos, and OpenAIGym. I’m open to many types of collaboration. In addition, I can consult and/or advise on many topics in finance and data science. If you think I can help you, or you can help me, please don’t hesitate to get in touch.