Tag Archives: julialang

#MonthOfJulia Day 34: Networking

Julia-Logo-Networking

Today’s post is a mashup of various things relating to networking with Julia. We’ll have a look at FTP transfers, HTTP requests and using the Twitter API.

Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it.
Linus Torvalds (1996)

Back in the mid-90s Linus Torvalds was a big fan of FTP. I suspect that his sentiments have not changed, although now he’d probably modify that statement with 's/upload/push/;s/ftp/github/'. He might have made it more gender neutral too, but it’s hard to be sure.

FTP

FTP seems a little “old school”, but if you grew up in the 1980s, before scp and sftp came along, then you’ll probably feel (like me) that FTP is an intrinsic part of the internet experience. There are still a lot of anonymous FTP sites in operation. You can find a list here, although it appears to have last been updated in 2003, so some of that information might no longer be valid. We’ll use ftp://speedtest.tele2.net/ for illustrative purposes since it also allows uploads.

First we initiate a connection to the FTP server.

julia> using FTPClient
julia> ftp_init();
julia> ftp = FTP(host = "speedtest.tele2.net", user = "anonymous", pswd = "[email protected]")
Host:      ftp://speedtest.tele2.net/
User:      anonymous
Transfer:  passive mode
Security:  None

Grab a list of files available for download.

julia> readdir(ftp)
18-element Array{ByteString,1}:
 "1000GB.zip"
 "100GB.zip" 
 "100KB.zip" 
 "100MB.zip" 
 "10GB.zip"  
 "10MB.zip"  
 "1GB.zip"   
 "1KB.zip"   
 "1MB.zip"   
 "200MB.zip" 
 "20MB.zip"  
 "2MB.zip"   
 "3MB.zip"   
 "500MB.zip" 
 "50MB.zip"  
 "512KB.zip" 
 "5MB.zip"   
 "upload"

This site (as its name would imply) has the sole purpose of conducting speed tests. So the content of those files is not too interesting. But that’s not going to stop me from downloading one.

julia> binary(ftp)                                 # Change transfer mode to BINARY
julia> download(ftp, "1KB.zip", "local-1KB.zip");

Generally anonymous FTP sites do not allow uploads, but this site is an exception. We’ll test that out too.

julia> cd(ftp, "upload")
julia> ascii(ftp)                                  # Change transfer mode to ASCII
julia> upload(ftp, "papersize", open("/etc/papersize"));

Close the connection when you’re done.

julia> ftp_cleanup()
julia> close(ftp);

Okay, I’m over the historical reminiscences now. Onto something more current.

HTTP Clients

There are a few Julia packages implementing HTTP methods. We’ll focus on the Requests package. The package homepage makes use of http://httpbin.org/ to illustrate the various bits of functionality. This is a good choice since it allows essentially all of the functionality in Requests to be exercised. We’ll take a different approach and apply a subset of the functionality to a couple of more realistic scenarios. Specifically we’ll look at the GET and POST requests.

First we’ll use a GET request to retrieve information from Google Books using ISBN to specify a particular book. The get() call below is equivalent to opening this URL in your browser.

julia> r1 = get("https://www.googleapis.com/books/v1/volumes";
                query = {"q" => "isbn:178328479X"});

We check that everything went well with the request: the status code of 200 indicates that it was successful. The request headers provide some additional metadata.

julia> r1.status
200
julia> r1.headers
Dict{AbstractString,AbstractString} with 18 entries:
  "Alt-Svc"                => "quic=":443"; p="1"; ma=604800"
  "Date"                   => "Mon, 12 Oct 2015 06:01:13 GMT"
  "http_minor"             => "1"
  "Keep-Alive"             => "1"
  "status_code"            => "200"
  "Cache-Control"          => "private, max-age=0, must-revalidate, no-transform"
  "Server"                 => "GSE"
  "Expires"                => "Mon, 12 Oct 2015 06:01:13 GMT"
  "ETag"                   => ""65-LEm5ATkHVhzLpHrk8rG7RWww/xI4TbmPbZwN2eJh_EyxSqn0UHDU""
  "X-XSS-Protection"       => "1; mode=block"
  "Content-Length"         => "2092"
  "X-Content-Type-Options" => "nosniff"
  "Vary"                   => "X-Origin"
  "http_major"             => "1"
  "Alternate-Protocol"     => "443:quic,p=1"
  "Content-Type"           => "application/json; charset=UTF-8"
  "X-Frame-Options"        => "SAMEORIGIN"
  "Content-Language"       => "en"

The actual content is found in the JSON payload which is stored as an array of unsigned bytes in the data field. We can have a look at the text content of the payload using Requests.text(), but accessing fields in these data is done via Requests.json(). Finding the data you’re actually looking for in the resulting data structure may take a bit of trial and error.

julia> typeof(r1.data)
Array{UInt8,1}
julia> Requests.json(r1)["items"][1]["volumeInfo"]     # Parsed JSON
Dict{AbstractString,Any} with 17 entries:
  "publisher"           => "Packt Publishing"
  "industryIdentifiers" => Any[Dict{AbstractString,Any}("identifier"=>"178328479X","type"=>"ISBN_10"),Dict{AbstractString,Any}("identifier"=>"9781783…
  "language"            => "en"
  "contentVersion"      => "preview-1.0.0"
  "imageLinks"          => Dict{AbstractString,Any}("smallThumbnail"=>"http://books.google.co.za/books/content?id=Rc0drgEACAAJ&printsec=frontcover&im…
  "readingModes"        => Dict{AbstractString,Any}("image"=>false,"text"=>false)
  "printType"           => "BOOK"
  "infoLink"            => "http://books.google.co.za/books?id=Rc0drgEACAAJ&dq=isbn:178328479X&hl=&source=gbs_api"
  "previewLink"         => "http://books.google.co.za/books?id=Rc0drgEACAAJ&dq=isbn:178328479X&hl=&cd=1&source=gbs_api"
  "allowAnonLogging"    => false
  "publishedDate"       => "2015-02-26"
  "canonicalVolumeLink" => "http://books.google.co.za/books/about/Getting_Started_with_Julia_Programming_L.html?hl=&id=Rc0drgEACAAJ"
  "title"               => "Getting Started with Julia Programming Language"
  "categories"          => Any["Computers"]
  "pageCount"           => 214
  "authors"             => Any["Ivo Balbaert"]
  "maturityRating"      => "NOT_MATURE

We see that the book in question was written by Ivo Balbaert and entitled “Getting Started with Julia Programming Language“. It was published by Packt Publishing earlier this year. It’s a pretty good book, well worth checking out.

If the payload is not JSON then we process the data differently. For example, after using get() to download CSV content from Quandl you’d simply use readtable() from the DataFrames package to produce a data frame.

julia> URL = "https://www.quandl.com/api/v1/datasets/EPI/8.csv";
julia> using DataFrames
julia> population = readtable(IOBuffer(get(URL).data), separator = ',', header = true);
julia> names!(population, [symbol(i) for i in ["Year", "Industrial", "Developing"]]);
julia> head(population)
6x3 DataFrames.DataFrame
| Row | Year         | Industrial | Developing |
|-----|--------------|------------|------------|
| 1   | "2100-01-01" | 1334.79    | 8790.14    |
| 2   | "2099-01-01" | 1333.72    | 8786.27    |
| 3   | "2098-01-01" | 1332.64    | 8782.08    |
| 4   | "2097-01-01" | 1331.54    | 8777.6     |
| 5   | "2096-01-01" | 1330.43    | 8772.83    |
| 6   | "2095-01-01" | 1329.32    | 8767.78    |

Of course, as we saw on Day 15, if you’re going to access data from Quandl it would make more sense to use the Quandl package.

Those two queries above were submitted using GET. What about POST? We’ll directly access the Twitter public API to see how many times the URL http://julialang.org/ has been included in a tweet.

julia> r3 = post("http://urls.api.twitter.com/1/urls/count.json";
                 query = {"url" => "http://julialang.org/"}, data = "Quite a few times!");
julia> Requests.json(r3)
Dict{AbstractString,Any} with 2 entries:
  "count" => 2639
  "url"   => "http://julialang.org/"

The JSON payload has an element count which indicates that to date that URL has been included in 2639 distinct tweets.

We’ve just seen how to directly access the Twitter API using a POST request. We also know that there is a Quandl package which provides a wrapper around the Quandl API. Not too surprisingly there’s also a wrapper for the Twitter API in the Twitter package. This package greatly simplifies interacting with the Twitter API. No doubt wrappers for other services will follow.

First you need to load the package and authenticate yourself. I’ve got my keys and secrets stored in environment variables which I retrieve using from the ENV[] global array.

julia> using Twitter
julia> consumer_key = ENV["CONSUMER_KEY"];
julia> consumer_secret = ENV["CONSUMER_SECRET"];
julia> oauth_token = ENV["OAUTH_TOKEN"];
julia> oauth_secret = ENV["OAUTH_SECRET"];
julia> twitterauth(consumer_key, consumer_secret, oauth_token, oauth_secret)

I’ll take this opportunity to pander to my own vanity, looking at which of my tweets have been retweeted. To make sense of the results, convert them to a DataFrame.

julia> retweets = DataFrame(get_retweets_of_me());
julia> retweets[:, [:created_at, :text]]
20x2 DataFrames.DataFrame
| Row | created_at                       | text                                                                                                              |
|-----|----------------------------------|-------------------------------------------------------------------------------------------------------------------|
| 1   | "Mon Oct 12 21:03:57 +0000 2015" | "Sparkline theory and practice  Edward Tufte http://t.co/THgFkv3ZZS #Statistics @EdwardTufte"                     |
| 2   | "Mon Oct 12 18:33:49 +0000 2015" | "R Developer Fluent in Shiny and ggvis ($100 for ~2 hours gig) http://t.co/sM8JRVOKiA #jobs"                      |
| 3   | "Mon Oct 12 15:31:39 +0000 2015" | "Installing LightTable and Juno on Ubuntu http://t.co/2sbEFR7MXR http://t.co/ZMmQ0QHEZs"                          |
| 4   | "Sun Oct 11 20:05:08 +0000 2015" | "On Forecast Intervals "too Wide to be Useful" http://t.co/pxqrpgkewu #Statistics"                                |
| 5   | "Sun Oct 11 20:04:01 +0000 2015" | "P-value madness: A puzzle about the latest test ban (or dont ask, dont tell) http://t.co/aBSgVYCb3E #Statistics" |
| 6   | "Sat Oct 10 19:04:37 +0000 2015" | "Seasonal adjusment on the fly with X-13ARIMA-SEATS, seasonal and ggplot2 http://t.co/hB9gW8LPn5 #rstats"         |
| 7   | "Sat Oct 10 14:34:04 +0000 2015" | "Doomed to fail:  A pre-registration site for parapsychology http://t.co/NTEfpJim5k #Statistics"                  |
| 8   | "Sat Oct 10 13:34:41 +0000 2015" | "Doomed to fail:  A pre-registration site for parapsychology http://t.co/7NwYJZRsky #Statistics"                  |
| 9   | "Sat Oct 10 08:34:43 +0000 2015" | "Too Much Information Can Ruin Your Presentation http://t.co/RdRp9V6EDd #Presentation #speaking"                  |
| 10  | "Fri Oct 09 20:03:32 +0000 2015" | "Manage The Surge In Unstructured Data http://t.co/fhqfNCNq6O #visualization #infographics"                       |
| 11  | "Fri Oct 09 12:33:50 +0000 2015" | "Julia 0.4 Release Announcement http://t.co/jqaKWflomJ #julialang"                                                |
| 12  | "Fri Oct 09 12:04:22 +0000 2015" | "User-friendly scaling http://t.co/P9rYu38FeD #rstats"                                                            |
| 13  | "Thu Oct 08 16:03:37 +0000 2015" | "#MonthOfJulia Day 31: Regression http://t.co/HBJv5xDHcy #julialang"                                              |
| 14  | "Thu Oct 08 15:33:06 +0000 2015" | "MIT Master's Program To Use MOOCs As 'Admissions Test' http://t.co/OjF8CVYBzW #slashdot"                         |
| 15  | "Thu Oct 08 06:03:36 +0000 2015" | "Announcing: Calls For Speakers For 2016 Conferences http://t.co/HOqzeAJ3Bx #Presentation #speaking"              |
| 16  | "Wed Oct 07 21:05:45 +0000 2015" | "Spark Turns Five Years Old! http://t.co/TislhgsDrz #bigdata"                                                     |
| 17  | "Wed Oct 07 21:03:49 +0000 2015" | "5 Reasons To Learn Hadoop http://t.co/ZdmSdkoJUI #bigdata"                                                       |
| 18  | "Wed Oct 07 16:04:56 +0000 2015" | "#MonthOfJulia Day 30: Clustering http://t.co/dh6AUqSqKe #julialang"                                              |
| 19  | "Wed Oct 07 15:01:04 +0000 2015" | "#MonthOfJulia Day 30: Clustering http://t.co/IEm60jRNYp http://t.co/tn9iZ65L4j"                                  |
| 20  | "Wed Oct 07 00:34:48 +0000 2015" | "What is Hadoop? Great Infographics Explains How it Works http://t.co/36Cm2raL1w #visualization #infographics"    |

You can have a lot of fun playing around with the features in the Twitter API. Trust me.

HTTP Servers

The HttpServer package provides low level functionality for implementing a HTTP server in Julia. The Mux package implements a higher level of abstraction. There are undoubtedly easier ways of serving your HTTP content, but being able to do it from the ground up in Julia is cool if nothing else! Case in point: Sudoku-as-a-Service is hosted using the HttpServer package. The code is available on the project page and serves as an excellent illustration of why you might want to use Julia to serve your content directly.

That’s it for today. I realise that I have already broken through the “month” boundary. I still have a few more topics that I want to cover. It might end up being something more like “A Month and a Week of Julia”.

The post #MonthOfJulia Day 34: Networking appeared first on Exegetic Analytics.

Installing LightTable and Juno on Ubuntu

The recipe below works for Light Table v. 0.7.2 and Julia v. 0.4.0. It might work for other versions too, but these are the ones I can vouch for.

Grab the distribution from the Light Table homepage. Unpack it and move the resulting folder somewhere suitable.

$ tar -zxvf LightTableLinux64.tar.gz 
$ sudo mv LightTable /opt/

Go ahead and fire it up.

$ /opt/LightTable/LightTable

At this stage Light Table is just a generic editor: it doesn’t know anything about Julia or Juno. We’ll need to install a plugin to make that connection. In the Light Table IDE type Ctrl-Space, which will open the Commands dialog. Type show plugin manager into the search field and then click on the resulting entry.

light-table-plugin-manager

Search for Juno among the list of available plugins and select Install.

light-table-plugin-search

Open the Commands dialog again using Ctrl-Space. Type settings into the search field.

light-table-settings

Click on the User behaviors entry.

light-table-user-behaviours

Add the following line to the configuration file:

[:app :lt.objs.langs.julia/julia-path "julia"]

At this point you should start up Julia in a terminal and install the Jewel package.

Pkg.add("Jewel")

I ran into some issues with the configuration file for the Julia plugin, so I replaced the contents of ~/.config/LightTable/plugins/Julia/jl/init.jl with the following:

using Jewel

Jewel.server(map(parseint, ARGS)..., true)

That strips out a lot of error checking, but as long as you have a recent installation of Julia and you have installed the Jewel package, you’re all good.

Time to restart Light Table.

$ /opt/LightTable/LightTable

You should find that it starts in Juno mode.

Finally, to make things easier we can define a shell macro for Juno.

$ alias juno='/opt/LightTable/LightTable'
$ juno

Enjoy.

The post Installing LightTable and Juno on Ubuntu appeared first on Exegetic Analytics.

#MonthOfJulia Day 33: Evolutionary Algorithms

Julia-Logo-Evolutionary

This seems like an opportune time to mention that a new stable version of Julia has been released. Read the release announcement for Julia 0.4.

There are two packages implementing evolutionary computation in Julia: GeneticAlgorithms and Evolutionary. Today we’ll focus on the latter. The Evolutionary package already has an extensive range of functionality and is under active development. The documentation is a little sparse but the author is very quick to respond to any questions or issues you might raise.

I used a GA to optimize seating assignments at my wedding reception. 80 guests over 10 tables. Evaluation function was based on keeping people with their dates, putting people with something in common together, and keeping people with extreme opposite views at separate tables. I ran it several times. Each time, I got nine good tables, and one with all the odd balls. In the end, my wife did the seating assignments.
Adrian McCarthy on stackoverflow

Let’s get the package loaded up and then we’ll be ready to begin.

julia> using Evolutionary

We’ll be using a genetic algorithm to solve the knapsack problem. We first need to set up an objective function, which in turn requires data giving the utility and mass of each item we might consider putting in our knapsack. Suppose we have nine potential items with the following characteristics:

julia> utility = [10, 20, 15, 2, 30, 10, 30, 45, 50];
julia> mass = [1, 5, 10, 1, 7, 5, 1, 2, 10];

To get an idea of their relative worth we can look at the utility per unit mass.

julia> utility ./ mass
9-element Array{Float64,1}:
 10.0    
  4.0    
  1.5    
  2.0    
  4.28571
  2.0    
 30.0    
 22.5    
  5.0  

Evidently item 7 has the highest utility/mass ratio, followed by item 8. So these two items are quite likely to be included in an optimal solution.

The objective function is simply the total utility for a set of selected items. We impose a penalty on the total mass of the knapsack by setting the total utility to zero if our knapsack becomes too heavy (the maximum permissible mass is set to 20).

julia> function summass(n::Vector{Bool})
           sum(mass .* n)
       end
summass (generic function with 1 method)
julia> function objective(n::Vector{Bool})
           (summass(n) <= 20) ? sum(utility .* n) : 0
       end
objective (generic function with 1 method)

We’ll give those a whirl just to check that they make sense. Suppose our knapsack holds items 3 and 9.

julia> summass([false, false, true, false, false, false, false, false, true])
20
julia> objective([false, false, true, false, false, false, false, false, true])
65

Looks about right. Note that we want to maximise the objective function (total utility) subject to the mass constraints of the knapsack.

We’re ready to run the genetic algorithm. Note that ga() takes as it’s first argument a function which it will minimise. We therefore give it the reciprocal of the objective function.

julia> best, invbestfit, generations, tolerance, history = ga(
           x -> 1 / objective(x),               # Function to MINIMISE
           9,                                   # Length of chromosome
           initPopulation = collect(randbool(9)),
           selection = roulette,                # Options: sus
           mutation = inversion,                # Options: insertion, swap2, scramble, shifting
           crossover = singlepoint,             # Options: twopoint, uniform
           mutationRate = 0.2,
           crossoverRate = 0.5,
           ɛ = 0.1,                             # Elitism
           debug = false,
           verbose = false,
           iterations = 200,
           populationSize = 50,
           interim = true);
julia> best
9-element Array{Bool,1}:
  true
  true
 false
  true
 false
 false
  true
  true
  true

The optimal solution consists of items 1, 2, 4, 7, 8 and 9. Note that items 7 and 8 (with the highest utility per unit mass) are included. We can check up on the mass constraint and total utility for the optimal solution.

julia> summass(best)
20
julia> objective(best)
157
julia> 1 / invbestfit
157.0

Examining the debug output from ga() is rather illuminating (set the debug and verbose parameters to true). You’ll want to limit the population size and number of iterations when you do this though, otherwise the information deluge can get a little out of hand. The output shows how each member of the population is initialised with the same set of values. The last field on each line is the corresponding value of the objective function.

INIT 1: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 2: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 3: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 4: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 5: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885

Each subsequent iteration dumps output like this:

BEST: [1,2,4,3,5]
MATE 2+4>: Bool[true,true,false,true,true,true,true,false,false] : Bool[true,true,false,true,false,false,true,true,true]
MATE >2+4: Bool[true,true,false,true,true,true,true,true,true] : Bool[true,true,false,true,false,false,true,false,false]
MATE 5+1>: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MATE >5+1: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MUTATED 2>: Bool[true,true,false,true,false,false,true,false,false]
MUTATED >2: Bool[true,false,true,false,false,true,false,true,false]
MUTATED 4>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >4: Bool[true,true,false,false,true,false,true,true,true]
MUTATED 5>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >5: Bool[true,true,false,true,true,true,true,false,false]
ELITE 1=>4: Bool[true,true,false,true,false,false,true,true,true] => Bool[true,true,false,false,true,false,true,true,true]
FIT 1: 0.0
FIT 2: 79.99999999999858
FIT 3: 101.9999999999977
FIT 4: 156.99999999999451
FIT 5: 101.9999999999977
BEST: 0.006369426751592357: Bool[true,true,false,true,false,false,true,true,true], G: 8
BEST: [4,3,5,2,1]

We start with a list of the members from the preceding iteration in order of descending fitness (so member 1 has the highest fitness to start with). MATE records detail crossover interactions between pairs of members. These are followed by MUTATED records which specify which members undergo random mutation. ELITE records show which members are promoted unchanged to the following generation (these will always be selected from the fittest of the previous generation). Next we have the FIT records which give the fitness of each of the members of the new population (after crossover, mutation and elitism have been applied). Here we can see that the new member 1 has violated the total mass constraint and thus has a fitness of zero. Two BEST records follow. The first gives details of the single best member from the new generation. Somewhat disconcertingly the first number in this record is the reciprocal of fitness. The second BEST record again rates the members of the new generation in terms of descending fitness.

Using the history of interim results generated by ga() I could produce the Plotly visualisation below which shows the average and maximum fitness as a function of generation. It’s clear to see how the algorithm rapidly converges on an optimal solution. Incidentally, I asked the package author to modify the code to return these interim results and he complied with a working solution within hours.

In addition to genetic algorithms, the Evolutionary package also implements two other evolutionary algorithms which I will not pretend to understand. Not even for a moment. However, you might want to check out es() and cmaes() to see how well they work on your problem. For me, that’s an adventure for another day.

Other related projects you should peruse:

This series is drawing to a close. Still a few more things I want to write about (although I have already violated the “Month” constraint). I’ll be back later in the week.

The post #MonthOfJulia Day 33: Evolutionary Algorithms appeared first on Exegetic Analytics.