Author Archives: Julia Computing, Inc.

An Introduction to Machine Learning in Julia

Introduction

Machine learning is now pervasive in every field of inquiry and has lead to
breakthroughs in various fields from medical diagnoses to online advertising.
Practical machine learning is quite computationally intensive, whether it involves
millions of repetitions of simple mathematical methods such as
Euclidian Distance or more
intricate optimizers
or backpropagation algorithms.
Such computationally intensive techniques need a fast and expressive language –
one that enables scientists to write simple, readable code that
performs well. Enter Julia.

In this post, we introduce a simple machine learning algorithm
called K Nearest Neighbors,
and demonstrate certain Julia features that allow for its easy and efficient
implementation. We will demonstrate that the code we write is inherently
generic, and show the use of the same code to run on GPUs via the
ArrayFire package.

Problem statement

We need to come up with a character recognition software that can identify images
of textual characters. The data we’ll be using for this problem are a
set of images from Google Street View.
The algorithm we’ll be using is called k-Nearest Neighbors (kNN), which is a
useful starting point for working on problems of this nature.

Solution

Let us begin by taking a good look at our data. First, we read the data from a set of
CSV files from disk, and plot a histogram that represents the distribution of the training data:

# Read training set 
trainLabels = readtable("$(path)/trainLabels.csv")
counts=by(trainLabels, :Class, nrow);
plot(x = counts[:Class], y=counts[:x1], color = counts[:Class], Theme(key_position = :none), Guide.xlabel("Characters")

Training data distribution

When developing this code within an integrared environment such as IJulia Notebooks, we can explore this data interactively. For example, we can scroll through the images using the
Interact package:

@manipulate for i = 1:size(trainLabels,1)
    load("$(path)/trainResized/$i.Bmp")
end

kNN

The solution of labelling the image with its textual character involves finding the “distances” of each image in the training
set to every other image. Then, the image under consideration is classified
as having the most common label among its k nearest images.
We use Euclidean distance
to measure the proximity of the images:

Let X be a set of m training samples, Y be the labels for each of those samples, and x be
the unknown sample to be classified. We perform the following steps:

  • First, calculate all the distances:
for i = 1 to m
    distance = d(X(i), x)
end

We can also represent this generally in matrix notation:

function get_all_distances(imageI::AbstractArray, x::AbstractArray)
    diff = imageI .- x
    distances = vec(sum(diff .* diff,1))
end
  • Then sort the distances in ascending order and select first k indices
    from the sorted list.
function get_k_nearest_neighbors(x::AbstractArray, imageI::AbstractArray, k::Int = 3, train = true)
    nRows, nCols = size(x)
    distances = get_all_distances(imageI, x)
    sortedNeighbors = sortperm(distances)
    if train
        return kNearestNeighbors = Array(sortedNeighbors[2:k+1])
    else
        return kNearestNeighbors = Array(sortedNeighbors[1:k])  
    end
end
  • Finally, classify x as the most common element among the k indices.
function assign_label(x::AbstractArray, y::AbstractArray{Int64}, imageI::AbstractArray, k, train::Bool)
    kNearestNeighbors = get_k_nearest_neighbors(x, imageI, k, train)
    counts = Dict{Int, Int}()
    highestCount = 0
    mostPopularLabel = 0
    for n in kNearestNeighbors
        labelOfN = y[n]
        if !haskey(counts, labelOfN)
            counts[labelOfN] = 0
        end
        counts[labelOfN] += 1
        if counts[labelOfN] > highestCount
            highestCount = counts[labelOfN]
            mostPopularLabel = labelOfN
        end
     end
    mostPopularLabel
end

The objective is to find the best value of k such that we reach optimal
accuracy.

Training

We use the labeled data to find the optimal value of k. The following code
will output the accuracy for values of k from 3 to 5, for example.

for k  = 3:5
    checks = trues(size(train_data, 2))
    @time for i = 1:size(train_data, 2)
        imageI = train_data[:,i]
        checks[i]  = (assign_label(train_data, trainLabelsInt, imageI, k, true) == trainLabelsInt[i])
    end
    accuracy = length(find(checks)) / length(checks)
    @show accuracy
end

Testing

Now, from our training, we should have identified a value of k that gives us
the best accuracy. Of course, this varies depending on the data. Suppose our runs
tell us the best value of k is 3. It’s time to now test our model. For this,
we can actually use the same code that we used for training, but we needn’t check
for accuracy or loop over values of k.

x = test_data
xT = train_data
yT = trainLabelsInt
k = 3, #say
prediction = zeros(Int,size(x,2))
@time for i = 1:size(x,2)
    imageI = x[:,i]
    prediction[i]  = assign_label(xT, yT, imageI, k, false)
end

Improving Performance

There are a couple of ways we can speed up our code. Notice that each iteration
of the for-loop in the main compute loop is independent. This looks like a great
candidate for parallelisation. So, we can spawn a couple
of Julia processes and execute the for loop in parallel.

Parallel for-loops

The macro @parallel will split the iteration space amongst the available Julia
processes and then perform the computation in each process simultaneously.

Let us first add n processes by using the addprocs function. We
then execute the following for loop in parallel.

addprocs(n)
@time sumValues = @parallel (+) for i in 1:size(xTrain, 2)
    assign_label(xTrain, yTrain, k, i) == yTrain[i, 1]
end

The following scaling plot shows performance variation as you spawn more processes:

GPU Acceleration

The ArrayFire package enables us to run
computations on various general purpose GPUs from within Julia programs.
The AFArray constructor from this package
converts Julia vectors to ArrayFire Arrays, which can be transferred to the GPU.
Converting Arrays to AFArrays allows us to trivially use the same code as above to run our
computation on the GPU.

checks = trues(AFArray{Bool}, size(train_data, 2))
train_data = AFArray(train_data)
trainLabelsInt = AFArray(trainLabelsInt)
for k  = 3:5
    @time for i = 1:size(train_data, 2)
        imageI = train_data[:,i]
        checks[i]  = (assign_label(train_data, trainLabelsInt, imageI, k, true) == trainLabelsInt[i])
    end
    accuracy = length(find(checks)) / length(checks)
    @show accuracy
end

Benchmarking against the serial version shows a significant speedup, which has been achieved without much extra code!

Serial Time ArrayFire Time
34 sec 20 sec

Conclusion

  • Julia allows for easy prototyping and deployment of machine learning models.
  • Parallelization also very easy to implement.
  • Built-in parallelism gives the programmer a variety of options to speed up their code.
  • Generic code can be run on GPUs using the package ArrayFire

Future Work

Congratulations! You now know how to write a simple image recognition model using
kNN. While this represents a good starting point for problems such as these, it is by
no means the state of the art. We shall explore more sophisticated methods, such as deep
neural networks
, in a future blog post.

Julia Enters Top 50 Programming Languages

Julia has taken its place among the top 50 computer programming languages, according to the TIOBE Programming Community Index, an index of the world’s most popular programming languages:

Julia Enters Top 50 for the First Time

It was a matter of time until Julia would hit the top 50. This month it did so. The Julia programming language is meant for numerical computing. It combines functional programming paradigms with high speed. In other words, readable and stable code that performs. Chances are high that Julia will gain even more popularity [during] the next few months.

TIOBE Index for September 2016

About Julia

Julia is the simplest, fastest and most powerful numerical computing language available today. Julia combines the functionality of quantitative environments such as Python, R, MATLAB, SAS, SPSS and Stata, with the speed of production programming languages like Java and C++ to solve big data and analytics problems. Julia delivers dramatic improvements in simplicity, speed, capacity, and productivity for data scientists, algorithmic traders, quants, scientists, and engineers who need to solve massive computational problems quickly and accurately.

Julia offers an unbeatable combination of simplicity and productivity with speed that is thousands of times faster than other mathematical, scientific and statistical computing languages.

Partners and users include: Intel, The Federal Reserve Bank of New York, Lincoln Laboratory (MIT), The Moore Foundation and a number of private sector finance and industry leaders, including several of the world’s leading hedge funds, investment banks, asset managers and insurers.

About the TIOBE Programming Community Index

The TIOBE Programming Community Index is an indicator of the popularity of programming languages. The index is updated once a month. The ratings are based on the number of skilled engineers world-wide, courses and third party vendors. Popular search engines such as Google, Bing, Yahoo!, Wikipedia, Amazon, YouTube and Baidu are used to calculate the ratings. For more information, please visit www.tiobe.com.

GPU Programming in Julia

Scientific problems have been traditionally solved with powerful clusters of homogeneous CPUs connected in a variety of network topologies.
However, the number of supercomputers that employ accelerators has steadily been on the rise.
The latest Top500 list released at SC ‘15 shows that the number of supercomputers that employ accelerators has risen to 109.

Accelerators SC15

Accelerators that are employed in practice are mostly graphics processing units (GPUs), Xeon Phis and FPGAs.
These accelerators take advantage of many-core architectures which can be used to exploit coarse and fine grained parallelism.
However, the traditional problem with using GPUs and other accelerators has been the ease (or lack thereof) of programming them.
To this end, NVIDIA Corporation designed the currently pervasive Compute Unified Device Architecture (CUDA) to allow for a C-like interface for scientific and general purpose programming.
This was a considerable improvement over previous frameworks such as DirectX or OpenGL that required advanced skills in graphics programming.
However, CUDA would still feature low on a productivity curve, with programmers having to fine tune their applications for different devices and algorithms.
In this context, interactive programming on the GPU would provide tremendous benefits to scientists and programmers who not only wish to prototype their applications, but to deploy them with little or no code changes.

Julia on GPUs

Julia offers programmers the ability to code interactively on the GPU.
There are several libraries wrapped in Julia,
giving Julia users access to accelerated BLAS,
FFTs, sparse routines and
solvers, and deep learning.
With a combination of these packages, programmers can interactively develop custom GPU kernels.
One such example is the Conjugate Gradient, which has been benchmarked below:

Conjugate Gradient

However, one might argue that low-level wrapper libraries do not in any manner
increase programmer productivity as they involve working with obscure function interfaces.
In such a case, it would be ideal to have a clean array interface for arrays on the GPU with a
convenient standard library that operates on these arrays. Each operation would in turn be tuned with
regards to the device in question to achieve great performance. The folks over at ArrayFire have put together a high quality open source library to work on scientific problems with GPUs.

ArrayFire.jl

ArrayFire.jl is a set of Julia bindings to the library.
It is designed to mimic the Julia standard library in its versatility and ease of use, providing an easy-yet-powerful
rrray interface that points to locations on GPU memory.

Julia’s multiple dispatch and generic programming capabilities make it possible for users to write natural mathematical code and transparently leverage GPUs for performance.This is done by defining a type AFArray as a subtype of AbstractArray.
AFArray now acts as an interface to an array on device memory. A set of functions are imported from Base Julia and are dispatched across the new AFArray type. Thus users may be able to write code in Julia that runs on the CPU and port it to run on the GPU with very few code changes. In addition to functions that mimic Julia’s standard library, ArrayFire.jl provides powerful functions in image processing and computer vision, amongst others.

Usage

The following examples illustrate high level usage:

using ArrayFire

#Random number generation
a = rand(AFArray{Float64}, 100, 100)
b = randn(AFArray{Float64}, 100, 100)

#Transfer to device from the CPU
host_to_device = AFArray(rand(100,100))

#Transfer back to CPU
device_to_host = Array(host_to_device)

#Basic arithmetic operations
c = sin(a) + 0.5
d = a * 5

#Logical operations
c = a .> b
any_trues = any(c)

#Reduction operations
total_max = maximum(a)
colwise_min = min(a,2)

#Matrix operations
determinant = det(a)
b_positive = abs(b)
product = a * b
dot_product = a .* b
transposer = a'

#Linear Algebra
lu_fact = lu(a)
cholesky_fact = chol(a*a') #Multiplied to create a positive definite matrix
qr_fact = qr(a)
svd_fact = svd(a)

#FFT
fast_fourier = fft(a)

Benchmarks

ArrayFire.jl has also been benchmarked for common operations (Note that Julia’s default RNG and the one that ArrayFire uses are not directly comparable):

general

The benefits of accelerated code can be seen in real world applications.
Consider the following image segmentation demo on satellite footage of the Hurricane Katrina.
Image segmentation is an important step in weather forecasting,
and should be performed on many high definition images on a daily basis. In such a use-case,
interactive GPU programming would allow the applications designer to
leverage powerful graphics processing on the GPU with little or no code changes from his original prototype.
The application used the K-means algorithm which can easily be expressed in Julia
and accelerated by ArrayFire.jl.
It initializes some random clusters
and then reassigns the clusters according to Manhattan distances.

Another interesting example is non-negative matrix factorization, which is often used in linear algebra
and multivariate analysis. It is applied in fields such as computer vision,
document clustering, chemometrics, audio signal processing,
and recommender systems.
The following application implements the Lee-Seung algorithm:

NMF Benchmark

Changing Backends

ArrayFire.jl also has the added advantage that it can switch backends at runtime,
which allows a user to choose the appropriate backend according to hardware availability.

setBackend(AF_BACKEND_CUDA)

Future Work

  • Allowing ArrayFire.jl users to easily interface with other packages in the JuliaGPU ecosystem
    would allow them access to accelerated and possibly more memory-efficient kernels
    (for signal processing or deep learning, for example).

  • Currently, only dense linear algebra is supported. It would be worthwhile to wrap sparse linear algebra
    libraries and interface with them seamlessly.

  • Allow users to interface with packages such as GLVisualize.jl
    for 3D visualizations on the GPU using OpenGL (or Vulkan, its recently released successor.)