Author Archives: Andrew Collier

Installing LightTable and Juno on Ubuntu

The recipe below works for Light Table v. 0.7.2 and Julia v. 0.4.0. It might work for other versions too, but these are the ones I can vouch for.

Grab the distribution from the Light Table homepage. Unpack it and move the resulting folder somewhere suitable.

$ tar -zxvf LightTableLinux64.tar.gz 
$ sudo mv LightTable /opt/

Go ahead and fire it up.

$ /opt/LightTable/LightTable

At this stage Light Table is just a generic editor: it doesn’t know anything about Julia or Juno. We’ll need to install a plugin to make that connection. In the Light Table IDE type Ctrl-Space, which will open the Commands dialog. Type show plugin manager into the search field and then click on the resulting entry.

light-table-plugin-manager

Search for Juno among the list of available plugins and select Install.

light-table-plugin-search

Open the Commands dialog again using Ctrl-Space. Type settings into the search field.

light-table-settings

Click on the User behaviors entry.

light-table-user-behaviours

Add the following line to the configuration file:

[:app :lt.objs.langs.julia/julia-path "julia"]

At this point you should start up Julia in a terminal and install the Jewel package.

Pkg.add("Jewel")

I ran into some issues with the configuration file for the Julia plugin, so I replaced the contents of ~/.config/LightTable/plugins/Julia/jl/init.jl with the following:

using Jewel

Jewel.server(map(parseint, ARGS)..., true)

That strips out a lot of error checking, but as long as you have a recent installation of Julia and you have installed the Jewel package, you’re all good.

Time to restart Light Table.

$ /opt/LightTable/LightTable

You should find that it starts in Juno mode.

Finally, to make things easier we can define a shell macro for Juno.

$ alias juno='/opt/LightTable/LightTable'
$ juno

Enjoy.

The post Installing LightTable and Juno on Ubuntu appeared first on Exegetic Analytics.

#MonthOfJulia Day 33: Evolutionary Algorithms

Julia-Logo-Evolutionary

This seems like an opportune time to mention that a new stable version of Julia has been released. Read the release announcement for Julia 0.4.

There are two packages implementing evolutionary computation in Julia: GeneticAlgorithms and Evolutionary. Today we’ll focus on the latter. The Evolutionary package already has an extensive range of functionality and is under active development. The documentation is a little sparse but the author is very quick to respond to any questions or issues you might raise.

I used a GA to optimize seating assignments at my wedding reception. 80 guests over 10 tables. Evaluation function was based on keeping people with their dates, putting people with something in common together, and keeping people with extreme opposite views at separate tables. I ran it several times. Each time, I got nine good tables, and one with all the odd balls. In the end, my wife did the seating assignments.
Adrian McCarthy on stackoverflow

Let’s get the package loaded up and then we’ll be ready to begin.

julia> using Evolutionary

We’ll be using a genetic algorithm to solve the knapsack problem. We first need to set up an objective function, which in turn requires data giving the utility and mass of each item we might consider putting in our knapsack. Suppose we have nine potential items with the following characteristics:

julia> utility = [10, 20, 15, 2, 30, 10, 30, 45, 50];
julia> mass = [1, 5, 10, 1, 7, 5, 1, 2, 10];

To get an idea of their relative worth we can look at the utility per unit mass.

julia> utility ./ mass
9-element Array{Float64,1}:
 10.0    
  4.0    
  1.5    
  2.0    
  4.28571
  2.0    
 30.0    
 22.5    
  5.0  

Evidently item 7 has the highest utility/mass ratio, followed by item 8. So these two items are quite likely to be included in an optimal solution.

The objective function is simply the total utility for a set of selected items. We impose a penalty on the total mass of the knapsack by setting the total utility to zero if our knapsack becomes too heavy (the maximum permissible mass is set to 20).

julia> function summass(n::Vector{Bool})
           sum(mass .* n)
       end
summass (generic function with 1 method)
julia> function objective(n::Vector{Bool})
           (summass(n) <= 20) ? sum(utility .* n) : 0
       end
objective (generic function with 1 method)

We’ll give those a whirl just to check that they make sense. Suppose our knapsack holds items 3 and 9.

julia> summass([false, false, true, false, false, false, false, false, true])
20
julia> objective([false, false, true, false, false, false, false, false, true])
65

Looks about right. Note that we want to maximise the objective function (total utility) subject to the mass constraints of the knapsack.

We’re ready to run the genetic algorithm. Note that ga() takes as it’s first argument a function which it will minimise. We therefore give it the reciprocal of the objective function.

julia> best, invbestfit, generations, tolerance, history = ga(
           x -> 1 / objective(x),               # Function to MINIMISE
           9,                                   # Length of chromosome
           initPopulation = collect(randbool(9)),
           selection = roulette,                # Options: sus
           mutation = inversion,                # Options: insertion, swap2, scramble, shifting
           crossover = singlepoint,             # Options: twopoint, uniform
           mutationRate = 0.2,
           crossoverRate = 0.5,
           ɛ = 0.1,                             # Elitism
           debug = false,
           verbose = false,
           iterations = 200,
           populationSize = 50,
           interim = true);
julia> best
9-element Array{Bool,1}:
  true
  true
 false
  true
 false
 false
  true
  true
  true

The optimal solution consists of items 1, 2, 4, 7, 8 and 9. Note that items 7 and 8 (with the highest utility per unit mass) are included. We can check up on the mass constraint and total utility for the optimal solution.

julia> summass(best)
20
julia> objective(best)
157
julia> 1 / invbestfit
157.0

Examining the debug output from ga() is rather illuminating (set the debug and verbose parameters to true). You’ll want to limit the population size and number of iterations when you do this though, otherwise the information deluge can get a little out of hand. The output shows how each member of the population is initialised with the same set of values. The last field on each line is the corresponding value of the objective function.

INIT 1: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 2: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 3: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 4: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 5: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885

Each subsequent iteration dumps output like this:

BEST: [1,2,4,3,5]
MATE 2+4>: Bool[true,true,false,true,true,true,true,false,false] : Bool[true,true,false,true,false,false,true,true,true]
MATE >2+4: Bool[true,true,false,true,true,true,true,true,true] : Bool[true,true,false,true,false,false,true,false,false]
MATE 5+1>: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MATE >5+1: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MUTATED 2>: Bool[true,true,false,true,false,false,true,false,false]
MUTATED >2: Bool[true,false,true,false,false,true,false,true,false]
MUTATED 4>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >4: Bool[true,true,false,false,true,false,true,true,true]
MUTATED 5>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >5: Bool[true,true,false,true,true,true,true,false,false]
ELITE 1=>4: Bool[true,true,false,true,false,false,true,true,true] => Bool[true,true,false,false,true,false,true,true,true]
FIT 1: 0.0
FIT 2: 79.99999999999858
FIT 3: 101.9999999999977
FIT 4: 156.99999999999451
FIT 5: 101.9999999999977
BEST: 0.006369426751592357: Bool[true,true,false,true,false,false,true,true,true], G: 8
BEST: [4,3,5,2,1]

We start with a list of the members from the preceding iteration in order of descending fitness (so member 1 has the highest fitness to start with). MATE records detail crossover interactions between pairs of members. These are followed by MUTATED records which specify which members undergo random mutation. ELITE records show which members are promoted unchanged to the following generation (these will always be selected from the fittest of the previous generation). Next we have the FIT records which give the fitness of each of the members of the new population (after crossover, mutation and elitism have been applied). Here we can see that the new member 1 has violated the total mass constraint and thus has a fitness of zero. Two BEST records follow. The first gives details of the single best member from the new generation. Somewhat disconcertingly the first number in this record is the reciprocal of fitness. The second BEST record again rates the members of the new generation in terms of descending fitness.

Using the history of interim results generated by ga() I could produce the Plotly visualisation below which shows the average and maximum fitness as a function of generation. It’s clear to see how the algorithm rapidly converges on an optimal solution. Incidentally, I asked the package author to modify the code to return these interim results and he complied with a working solution within hours.

In addition to genetic algorithms, the Evolutionary package also implements two other evolutionary algorithms which I will not pretend to understand. Not even for a moment. However, you might want to check out es() and cmaes() to see how well they work on your problem. For me, that’s an adventure for another day.

Other related projects you should peruse:

This series is drawing to a close. Still a few more things I want to write about (although I have already violated the “Month” constraint). I’ll be back later in the week.

The post #MonthOfJulia Day 33: Evolutionary Algorithms appeared first on Exegetic Analytics.

#MonthOfJulia Day 32: Classification

Julia-Logo-Classification

Yesterday we had a look at Julia’s regression model capabilities. A natural counterpart to these are models which perform classification. We’ll be looking at the GLM and DecisionTree packages. But, before I move on to that, I should mention the MLBase package which provides a load of functionality for data preprocessing, performance evaluation, cross-validation and model tuning.

Logistic Regression

Logistic regression lies on the border between the regression techniques we considered yesterday and the classification techniques we’re looking at today. In effect though it’s really a classification technique. We’ll use some data generated in yesterday’s post to illustrate. Specifically we’ll look at the relationship between the Boolean field valid and the three numeric fields.

julia> head(points)
6x4 DataFrame
| Row | x        | y       | z       | valid |
|-----|----------|---------|---------|-------|
| 1   | 0.867859 | 3.08688 | 6.03142 | false |
| 2   | 9.92178  | 33.4759 | 2.14742 | true  |
| 3   | 8.54372  | 32.2662 | 8.86289 | true  |
| 4   | 9.69646  | 35.5689 | 8.83644 | true  |
| 5   | 4.02686  | 12.4154 | 2.75854 | false |
| 6   | 6.89605  | 27.1884 | 6.10983 | true  |

To further refresh your memory, the plot below shows the relationship between valid and the variables x and y. We’re going to attempt to capture this relationship in our model.

regression-synthetic-data

Logistic regression is also applied with the glm() function from the GLM package. The call looks very similar to the one used for linear regression except that the error family is now Binomial() and we’re using a logit link function.

julia> model = glm(valid ~ x + y + z, points, Binomial(), LogitLink())
DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Binomial,LogitLink},
                         DensePredChol{Float64,Cholesky{Float64}}},Float64}:

Coefficients:
              Estimate Std.Error   z value Pr(>|z|)
(Intercept)   -23.1457   3.74348  -6.18295    <1e-9
x            -0.260122  0.269059 -0.966786   0.3337
y              1.36143  0.244123    5.5768    <1e-7
z             0.723107   0.14739   4.90606    <1e-6

According to the model there is a significant relationship between valid and both y and z but not x. Looking at the plot above we can see that x does have an influence on valid (there is a gradual transition from false to true with increasing x), but that this effect is rather “fuzzy”, hence the large p-value. By comparison there is a very clear and abrupt change in valid at y values of around 15. The effect of y is also about twice as strong as that of z. All of this makes sense in light of the way that the data were constructed.

Decision Trees

Now we’ll look at another classification technique: decision trees. First load the required packages and then grab the iris data.

julia> using MLBase, DecisionTree
julia> using RDatasets, Distributions
julia> iris = dataset("datasets", "iris");
julia> iris[1:5,:]
5x5 DataFrame
| Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species  |
|-----|-------------|------------|-------------|------------|----------|
| 1   | 5.1         | 3.5        | 1.4         | 0.2        | "setosa" |
| 2   | 4.9         | 3.0        | 1.4         | 0.2        | "setosa" |
| 3   | 4.7         | 3.2        | 1.3         | 0.2        | "setosa" |
| 4   | 4.6         | 3.1        | 1.5         | 0.2        | "setosa" |
| 5   | 5.0         | 3.6        | 1.4         | 0.2        | "setosa" |

We’ll also define a Boolean variable to split the data into training and testing sets.

julia> train = rand(Bernoulli(0.75), nrow(iris)) .== 1;

We split the data into features and labels and then feed those into build_tree(). In this case we are building a classifier to identify whether or not a particular iris is of the versicolor variety.

julia> features = array(iris[:,1:4]);
julia> labels = [n == "versicolor" ? 1 : 0 for n in iris[:Species]];
julia> model = build_tree(labels[train], features[train,:]);

Let’s have a look at the product of a labours.

julia> print_tree(model)
Feature 3, Threshold 3.0
L-> 0 : 36/36
R-> Feature 3, Threshold 4.8
    L-> Feature 4, Threshold 1.7
        L-> 1 : 38/38
        R-> 0 : 1/1
    R-> Feature 3, Threshold 5.1
        L-> Feature 1, Threshold 6.7
            L-> Feature 2, Threshold 3.2
                L-> Feature 4, Threshold 1.8
                    L-> Feature 1, Threshold 6.3
                        L-> 0 : 1/1
                        R-> 1 : 1/1
                    R-> 0 : 5/5
                R-> 1 : 1/1
            R-> 1 : 2/2
        R-> 0 : 29/29

The textual representation of the tree above breaks the decision process down into a number of branches where the model decides whether to go to the left (L) or right (R) branch according to whether or not the value of a given feature is above or below a threshold value. So, for example, on the third line of the output we must decide whether to move to the left or right depending on whether feature 3 (PetalLength) is less or greater than 4.8.

We can then apply the decision tree model to the testing data and see how well it performs using standard metrics.

julia> predictions = apply_tree(model, features[!train,:]);
julia> ROC = roc(labels[!train], convert(Array{Int32,1}, predictions))
ROCNums{Int64}
  p = 8
  n = 28
  tp = 7
  tn = 28
  fp = 0
  fn = 1
julia> precision(ROC)
1.0
julia> recall(ROC)
0.875

A true positive rate of 87.5% and true negative rate of 100% is not too bad at all!

You can find a more extensive introduction to using decision trees with Julia here. The DecisionTree package also implements random forest and boosting models. Other related packages are:

Definitely worth checking out if you have the time. My time is up though. Come back soon to hear about what Julia provides for evolutionary programming.

The post #MonthOfJulia Day 32: Classification appeared first on Exegetic Analytics.