juliabloggers.com http://www.juliabloggers.com A Julia Language Blog Aggregator Wed, 23 Aug 2017 08:10:48 +0000 en-US hourly 1 https://wordpress.org/?v=4.8.1 69897842 Local packages in a separate directory in Julia http://www.juliabloggers.com/local-packages-in-a-separate-directory-in-julia/ Wed, 23 Aug 2017 08:10:48 +0000 http://tpapp.github.io/post/julia-local-test/ ]]> By: Julia on Tamás K. Papp's website

Re-posted from: http://tpapp.github.io/post/julia-local-test/

I run Pkg.update() fairly often to stay up to date and benefit from the latest improvements of various packages. I rarely ever pin to a specific package version, but I occasionally checkout master for some packages, especially if I am contributing.
Despite updating regularly, I found that the documentation subtly diverged from what I was experiencing for some packages. After looking into the issue, I learned that I was 2–3 minor versions behind despite updating regularly.

]]>
3790
Conducting analysis on LendingClub data using JuliaDB http://www.juliabloggers.com/conducting-analysis-on-lendingclub-data-using-juliadb/ Tue, 22 Aug 2017 00:00:00 +0000 http://juliacomputing.com/blog/2017/08/22/lendingclub-demo-blog If there is anything years of research on how to streamline data analytics has shown us, it is that working with big data is not cake walk. No matter how one looks at it, it is time consuming and computationally intensive to create, maintain, and build models based upon large datasets.

Introducing JuliaDB

In Julia v0.6, we aim to take another step towards solving this problem with our new package, JuliaDB.

JuliaDB is a high performance, distributed, column-oriented data store providing functionality for both in-memory and out-of-core calculations. Being fully implemented in Julia, JuliaDB allows for ease of integration with data loading, analytics, and visualization packages throughout the Julia language ecosystem. Such seamless integration allows for rapid development of data and compute intensive applications.

This example shall use datasets provided by LendingClub, the world’s largest online marketplace for connecting borrowers and investors. On their website, they provide publicly available, detailed datasets that contain anonymous data regarding all loans that have been issued through their system, including the current loan status and latest payment information.

The analysis conducted below is similar to that performed on the same datasets in this post by the Microsoft R Server Tiger Team for the Azure Data Science VM.

The first step in conducting this analysis is to download the following files from the website: LoanStats2016_Q1.csv, LoanStats2016_Q2.csv, LoanStats2016_Q3.csv, LoanStats2016_Q4.csv, LoanStats2017_Q1.csv, LoanStats3b.csv, LoanStats3c.csv and LoanStats3d.csv. A basic clean-up of the data files is performed by deleting the first and last line of descriptive text from each csv.

Writing the Julia code

Once the file clean-up is done, add the following packages: JuliaDB, TextParse, IndexedTables, NullableArrays, DecisionTree, CoupledFields, Gadfly, Cairo, Fontconfig, Dagger, and Compose, followed by loading the required ones.

# Packages that need to be installed with Julia 0.6
 Pkg.add("JuliaDB")
 Pkg.add("TextParse")
 Pkg.add("IndexedTables")
 Pkg.add("NullableArrays")
 Pkg.add("DecisionTree")
 Pkg.add("CoupledFields")
 Pkg.add("Gadfly")
 Pkg.add("Cairo") # Needed for PNG creation with Gadfly
 Pkg.add("Fontconfig") # Needed for PNG creation with Gadfly
 Pkg.clone("https://github.com/AndyGreenwell/ROC.jl.git")
 Pkg.add("Dagger")
 Pkg.add("Compose")
 Pkg.add("BenchmarkTools")
 using Dagger, Compose
 using ROC, Gadfly
 using DecisionTree, JuliaDB, TextParse, NullableArrays
 import TextParse: Numeric, NAToken, CustomParser, tryparsenext, eatwhitespaces, Quoted, Percentage

Now define a variable that contains a path to the directory containing the data files, and a dictionary that contains the names of all of the columns that are contained in the dataset as keys.

dir = "/home/venkat/LendingClubDemo/files"
const floatparser = Numeric(Float64)
const intparser = Numeric(Int)

t  = Dict("id"                 => Quoted(Int),
            "member_id"                      => Quoted(Int),
            "loan_amnt"                      => Quoted(Nullable{Float64}),
            "funded_amnt"                    => Quoted(Nullable{Float64}),
            "funded_amnt_inv"                => Quoted(Nullable{Float64}),
            "term"                           => Quoted(TextParse.StrRange),
            "int_rate"                       => Quoted(NAToken(Percentage())),
            "delinq_2yrs"                    => Quoted(Nullable{Int}),
            "earliest_cr_line"               => Quoted(TextParse.StrRange),
            "inq_last_6mths"                 => Quoted(Nullable{Int}),
...and so on
           )

Calling the function “loadfiles” from the JuliaDB package parses the data files, and constructs the corresponding table (providing the above dictionary as input helps it construct the table, although it doesn’t necessarily need this input). Since none of the dictionary columns are index columns, JuliaDB will itself create its own implicit index column with each row having a unique integer value, starting with 1.

LS = loadfiles(glob("*.csv", dir), indexcols=[], colparsers=t, escapechar='"')

Once done, we classify some loans as bad loans and others as good loans based upon whether the payment on the loan is late, in default, or has been charged off. We then split the table based upon whether the loans are good or bad.

bad_status = ("Late (16-30 days)","Late (31-120 days)","Default","Charged Off")
# Determine which loans are bad loans
  is_bad = map(status->(status in bad_status),
               getdatacol(LS, :loan_status)) |> collect |> Vector{Bool}
# Split the table into two based on the loan classification
  LStrue = filter(x->x.loan_status in bad_status, LS)
  LSfalse = filter(x->!(x.loan_status in bad_status), LS)

Constructing a relevant model necessitates that we identify which factors are the best in identifying good and bad loans. Over here, the feature selection method that we use is a graphical comparison based upon how each numerical column’s row values are associated with either a good or bad categorization of individual loans. We construct two density plots of the values contained in each numerical column, one for good loans and the other for bad. This process necessitates that we first figure out which columns are numerical. We do that by using the following set of “isnumeric” functions.

# Define a function for determining if a value is numeric, whether or not the
# value is a Nullable.
  isnumeric(::Number) = true
  isnumeric{T<:Number}(::Nullable{T}) = true
  isnumeric(::Any) = false
  isnumeric{T<:Number}(x::Quoted{T}) = true
  isnumeric{T<:Nullable}(x::Quoted{T}) = eltype(T) <: Number

We then map our isnumeric function over each column of the JuliaDB table, construct Gadfly layers for each density plot for the good and bad loans, and then display that collection for feature selection.

# Produce density plots of the numeric columns based on the loan classification
  varnames = map(Symbol, collect(keys(filter((k,v)->(k != "id" && k!="member_id" && isnumeric(v)), t))))
  layers = Vector{Gadfly.Layer}[]

  for s in varnames
      nt = dropnull(collect(getdatacol(LStrue,s)))
      nf = dropnull(collect(getdatacol(LSfalse,s)))
      push!(layers, layer(x = nt, Geom.density, Theme(default_color=colorant"blue")))
      push!(layers, layer(x = nf, Geom.density, Theme(default_color=colorant"red")))
  end

  # Layout the individual plots on a 2D grid
  N = length(varnames)
  M = round(Int,ceil(sqrt(N)))
  cs = Array{Compose.Context}(M,M)
  for i = 1:N
      cs[i] = render(Gadfly.plot(layers[2i-1],layers[2i],
                     Guide.title(string(varnames[i])),
                     Guide.xlabel("value"),Guide.ylabel("density")))
  end
  for i = N+1:M^2
      cs[i] = render(Gadfly.plot(x=[0],y=[0]))
  end
  draw(PNG("featureplot.png",24inch, 24inch), gridstack(cs))

The Gadfly plots would typically look like this:

In order to make sure that our analysis is as close as possible as that conducted by Microsoft, we’ll select the same set of predictor variables that they did:

revol_util, int_rate, mths_since_last_record, annual_inc_joint, dti_joint
total_rec_prncp, all_util

Creating the predictive model

Our predictive model will be created by using the random forest model of the DecisionTree.jl package. There are two steps here — one where we use a large amount of data to construct the model, and two, a smaller set of data to test the model. So we randomly split the data into two parts, one containing 75% of the data points, to be used for training the model, and the other containing the other 25%, to be used to test the model.

# Split the data into 75% training / 25% test
  n = length(LS)
  srand(1)
  p = randperm(n)
  m = round(Int,n*3/4)
  a = sort(p[1:m])
  b = sort(p[m+1:end])
  LStrain = LS[a]
  LStest  = LS[b]
  labels_train = is_bad[a]

The random forest model needs us to create two vectors — one being a vector of labels, and the other being the corresponding feature matrix. For the label vector, we reuse the index vector used above (when extracting the training subset of the original data to extract the corresponding subset of the is_bad label vector). For the construction of the feature matrix, we extract the columns for our selected features from the distributed JuliaDB table, gather those columns to the master process, and finally concatenate the resulting vectors into our feature matrix.

features_train = [revol_util_train int_rate_train mths_since_last_record_train annual_inc_joint_train total_rec_prncp_train all_util_train]

Having done this, we can now call the “build_forest” function from the DecisionTree.jl package.

model = build_forest(labels_train, features_train, 3, 10, 0.8, 6)

Should we want to save our model to reuse at a later time, we can store it to our disk.

f = open("  loanmodel.jls", "w")
serialize(f, model)
close(f)

We can now test our model on the rest of the data. To do this, we will generate predictions in parallel across all workers by mapping the “apply_forest” function onto every row of the JuliaDB dataset.

predictions = collect(map(row->DecisionTree.apply_forest(model, [row.revol_util.value; row.int_rate.value;row.mths_since_last_record.value;row.annual_inc_joint.value;row.dti_joint.value;row.total_rec_prncp.value;row.all_util.value]), LStest)).data

With our set of predictions, we construct a ROC curve using the ROC.jl package and calculate the area under the curve to find a single measure of how predictive our trained model is on the dataset.

# Receiver Operating Characteristics curve
curve = roc(convert(Vector{Float64},predictions), convert(BitArray{1},is_bad[b]))

# An ROC plot in Gadfly with data calculuated using ROC.jl
Gadfly.plot(layer(x = curve.FPR,y = curve.TPR, Geom.line),
       layer(x = linspace(0.0,1.0,101), y = linspace(0.0,1.0,101),
       Geom.point, Theme(default_color=colorant"red")), Guide.title("ROC"),
       Guide.xlabel("False Positive Rate"),Guide.ylabel("True Positive Rate"))

The ROC would look like this.

Area under the curve would be:

# Area Under Curve
AUC(curve)
0.5878135617540067

There. That is how you would create a model that can predictively determine the quality of a loan using JuliaDB.

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/08/22/lendingclub-demo-blog.html

If there is anything years of research on how to streamline data analytics has shown us, it is that working with big data is not cake walk. No matter how one looks at it, it is time consuming and computationally intensive to create, maintain, and build models based upon large datasets.

Introducing JuliaDB

In Julia v0.6, we aim to take another step towards solving this problem with our new package, JuliaDB.

JuliaDB is a high performance, distributed, column-oriented data store providing functionality for both in-memory and out-of-core calculations. Being fully implemented in Julia, JuliaDB allows for ease of integration with data loading, analytics, and visualization packages throughout the Julia language ecosystem. Such seamless integration allows for rapid development of data and compute intensive applications.

This example shall use datasets provided by LendingClub, the world’s largest online marketplace for connecting borrowers and investors. On their website, they provide publicly available, detailed datasets that contain anonymous data regarding all loans that have been issued through their system, including the current loan status and latest payment information.

The analysis conducted below is similar to that performed on the same datasets in this post by the Microsoft R Server Tiger Team for the Azure Data Science VM.

The first step in conducting this analysis is to download the following files from the website: LoanStats2016_Q1.csv, LoanStats2016_Q2.csv, LoanStats2016_Q3.csv, LoanStats2016_Q4.csv, LoanStats2017_Q1.csv, LoanStats3b.csv, LoanStats3c.csv and LoanStats3d.csv. A basic clean-up of the data files is performed by deleting the first and last line of descriptive text from each csv.

Writing the Julia code

Once the file clean-up is done, add the following packages: JuliaDB, TextParse, IndexedTables, NullableArrays, DecisionTree, CoupledFields, Gadfly, Cairo, Fontconfig, Dagger, and Compose, followed by loading the required ones.

# Packages that need to be installed with Julia 0.6
 Pkg.add("JuliaDB")
 Pkg.add("TextParse")
 Pkg.add("IndexedTables")
 Pkg.add("NullableArrays")
 Pkg.add("DecisionTree")
 Pkg.add("CoupledFields")
 Pkg.add("Gadfly")
 Pkg.add("Cairo") # Needed for PNG creation with Gadfly
 Pkg.add("Fontconfig") # Needed for PNG creation with Gadfly
 Pkg.clone("https://github.com/AndyGreenwell/ROC.jl.git")
 Pkg.add("Dagger")
 Pkg.add("Compose")
 Pkg.add("BenchmarkTools")
 using Dagger, Compose
 using ROC, Gadfly
 using DecisionTree, JuliaDB, TextParse, NullableArrays
 import TextParse: Numeric, NAToken, CustomParser, tryparsenext, eatwhitespaces, Quoted, Percentage

Now define a variable that contains a path to the directory containing the data files, and a dictionary that contains the names of all of the columns that are contained in the dataset as keys.

dir = "/home/venkat/LendingClubDemo/files"
const floatparser = Numeric(Float64)
const intparser = Numeric(Int)

t  = Dict("id"                 => Quoted(Int),
            "member_id"                      => Quoted(Int),
            "loan_amnt"                      => Quoted(Nullable{Float64}),
            "funded_amnt"                    => Quoted(Nullable{Float64}),
            "funded_amnt_inv"                => Quoted(Nullable{Float64}),
            "term"                           => Quoted(TextParse.StrRange),
            "int_rate"                       => Quoted(NAToken(Percentage())),
            "delinq_2yrs"                    => Quoted(Nullable{Int}),
            "earliest_cr_line"               => Quoted(TextParse.StrRange),
            "inq_last_6mths"                 => Quoted(Nullable{Int}),
...and so on
           )

Calling the function “loadfiles” from the JuliaDB package parses the data files, and constructs the corresponding table (providing the above dictionary as input helps it construct the table, although it doesn’t necessarily need this input). Since none of the dictionary columns are index columns, JuliaDB will itself create its own implicit index column with each row having a unique integer value, starting with 1.

LS = loadfiles(glob("*.csv", dir), indexcols=[], colparsers=t, escapechar='"')

Once done, we classify some loans as bad loans and others as good loans based upon whether the payment on the loan is late, in default, or has been charged off. We then split the table based upon whether the loans are good or bad.

bad_status = ("Late (16-30 days)","Late (31-120 days)","Default","Charged Off")
# Determine which loans are bad loans
  is_bad = map(status->(status in bad_status),
               getdatacol(LS, :loan_status)) |> collect |> Vector{Bool}
# Split the table into two based on the loan classification
  LStrue = filter(x->x.loan_status in bad_status, LS)
  LSfalse = filter(x->!(x.loan_status in bad_status), LS)

Constructing a relevant model necessitates that we identify which factors are the best in identifying good and bad loans. Over here, the feature selection method that we use is a graphical comparison based upon how each numerical column’s row values are associated with either a good or bad categorization of individual loans. We construct two density plots of the values contained in each numerical column, one for good loans and the other for bad. This process necessitates that we first figure out which columns are numerical. We do that by using the following set of “isnumeric” functions.

# Define a function for determining if a value is numeric, whether or not the
# value is a Nullable.
  isnumeric(::Number) = true
  isnumeric{T<:Number}(::Nullable{T}) = true
  isnumeric(::Any) = false
  isnumeric{T<:Number}(x::Quoted{T}) = true
  isnumeric{T<:Nullable}(x::Quoted{T}) = eltype(T) <: Number

We then map our isnumeric function over each column of the JuliaDB table, construct Gadfly layers for each density plot for the good and bad loans, and then display that collection for feature selection.

# Produce density plots of the numeric columns based on the loan classification
  varnames = map(Symbol, collect(keys(filter((k,v)->(k != "id" && k!="member_id" && isnumeric(v)), t))))
  layers = Vector{Gadfly.Layer}[]

  for s in varnames
      nt = dropnull(collect(getdatacol(LStrue,s)))
      nf = dropnull(collect(getdatacol(LSfalse,s)))
      push!(layers, layer(x = nt, Geom.density, Theme(default_color=colorant"blue")))
      push!(layers, layer(x = nf, Geom.density, Theme(default_color=colorant"red")))
  end

  # Layout the individual plots on a 2D grid
  N = length(varnames)
  M = round(Int,ceil(sqrt(N)))
  cs = Array{Compose.Context}(M,M)
  for i = 1:N
      cs[i] = render(Gadfly.plot(layers[2i-1],layers[2i],
                     Guide.title(string(varnames[i])),
                     Guide.xlabel("value"),Guide.ylabel("density")))
  end
  for i = N+1:M^2
      cs[i] = render(Gadfly.plot(x=[0],y=[0]))
  end
  draw(PNG("featureplot.png",24inch, 24inch), gridstack(cs))

The Gadfly plots would typically look like this:

In order to make sure that our analysis is as close as possible as that conducted by Microsoft, we’ll select the same set of predictor variables that they did:

revol_util, int_rate, mths_since_last_record, annual_inc_joint, dti_joint
total_rec_prncp, all_util

Creating the predictive model

Our predictive model will be created by using the random forest model of the DecisionTree.jl package. There are two steps here — one where we use a large amount of data to construct the model, and two, a smaller set of data to test the model. So we randomly split the data into two parts, one containing 75% of the data points, to be used for training the model, and the other containing the other 25%, to be used to test the model.

# Split the data into 75% training / 25% test
  n = length(LS)
  srand(1)
  p = randperm(n)
  m = round(Int,n*3/4)
  a = sort(p[1:m])
  b = sort(p[m+1:end])
  LStrain = LS[a]
  LStest  = LS[b]
  labels_train = is_bad[a]

The random forest model needs us to create two vectors — one being a vector of labels, and the other being the corresponding feature matrix. For the label vector, we reuse the index vector used above (when extracting the training subset of the original data to extract the corresponding subset of the is_bad label vector). For the construction of the feature matrix, we extract the columns for our selected features from the distributed JuliaDB table, gather those columns to the master process, and finally concatenate the resulting vectors into our feature matrix.

features_train = [revol_util_train int_rate_train mths_since_last_record_train annual_inc_joint_train total_rec_prncp_train all_util_train]

Having done this, we can now call the “build_forest” function from the DecisionTree.jl package.

model = build_forest(labels_train, features_train, 3, 10, 0.8, 6)

Should we want to save our model to reuse at a later time, we can store it to our disk.

f = open("  loanmodel.jls", "w")
serialize(f, model)
close(f)

We can now test our model on the rest of the data. To do this, we will generate predictions in parallel across all workers by mapping the “apply_forest” function onto every row of the JuliaDB dataset.

predictions = collect(map(row->DecisionTree.apply_forest(model, [row.revol_util.value; row.int_rate.value;row.mths_since_last_record.value;row.annual_inc_joint.value;row.dti_joint.value;row.total_rec_prncp.value;row.all_util.value]), LStest)).data

With our set of predictions, we construct a ROC curve using the ROC.jl package and calculate the area under the curve to find a single measure of how predictive our trained model is on the dataset.

# Receiver Operating Characteristics curve
curve = roc(convert(Vector{Float64},predictions), convert(BitArray{1},is_bad[b]))

# An ROC plot in Gadfly with data calculuated using ROC.jl
Gadfly.plot(layer(x = curve.FPR,y = curve.TPR, Geom.line),
       layer(x = linspace(0.0,1.0,101), y = linspace(0.0,1.0,101),
       Geom.point, Theme(default_color=colorant"red")), Guide.title("ROC"),
       Guide.xlabel("False Positive Rate"),Guide.ylabel("True Positive Rate"))

The ROC would look like this.

Area under the curve would be:

# Area Under Curve
AUC(curve)
0.5878135617540067

There. That is how you would create a model that can predictively determine the quality of a loan using JuliaDB.

]]>
3788
FemtoCleaner – A bot to automatically upgrade your Julia syntax http://www.juliabloggers.com/femtocleaner-a-bot-to-automatically-upgrade-your-julia-syntax/ Thu, 17 Aug 2017 00:00:00 +0000 http://juliacomputing.com/blog/2017/08/17/femtocleaner TL;DR: FemtoCleaner is a GitHub bot that upgrades old Julia syntax to new syntax. It has been installed in more than 700 repositories, submitted 100+ pull requests and touched 10000 lines of code since last Friday. Scroll down for instructions, screen shots and pretty plots.

Background

As julia is approaching its 1.0 release, we have been revisiting several key areas of the language. We want to ensure that the 1.0 release is of sufficient quality that it can serve as a stable foundation of the Julia ecosystem for many years to come without requiring breaking changes. In effect, however, prioritizing such breaking changes over ones that can be safely done in a non-breaking fashion after 1.0 means that we are currently making many more breaking changes than we otherwise might. Two particularly disruptive such changes were the syntax changes to type keywords and parametric function syntax, both of which were introduced in 0.6. The old syntax is now deprecated on master will be removed in 1.0. The former change involves changing the type definition keywords from type/immutable to mutable struct/struct, e.g.

immutable RGB{T}
    r::T
    g::T
    b::T
end

becomes

struct RGB{T}
    r::T
    g::T
    b::T
end

The parametric function syntax change is a bit more tricky. In the simplest case, it involves rewriting functions like:

eltype{T}(::Array{T}) = T

to

eltype(::Array{T}) where {T} = T

which is relatively straightforward. However, there are more complicated corner cases involving inner constructors such as:

immutable Wrapper{T}
    data::Vector{T}
    Wrapper{S}(data::Vector{S}) = new(convert(Vector{T}, data))
end

which now has to read

struct Wrapper{T}
    data::Vector{T}
    Wrapper{T}(data::Vector{S}) where {T,S} = new(convert(Vector{T}, data))
end

This last example also shows why this syntax was changed. In prior versions of julia, the braces syntax (F{T} for some F,T) was inconsistent between meaning parameter application and introducing parameters for a method. Julia 0.6 features a significantly more powerful (and correct) type system. At the same time, the F{T} syntax was changed to always mean parameter application (modulo support for parsing the deprecated syntax for backwards compatibility of course), reducing confusion and making it possible to more easily express some of the new method signatures now made possible by the new type system improvements. For further information see also Stefan’s Discourse post and the 0.6 release notes.

Realizing the magnitude of the required change and the growing amount of Julia code that exists in the wild, several julia contributors suggested on Discourse that we attempt to automate these syntax upgrades. Unfortunately, is not simply a of search/replace in a source file. The rewrite can be quite complex and depends on the scope in which it is used. Nevertheless, we set out to build such an automated system, with the following goals in mind:

  • Correctness - Being able to upgrade syntax is not very useful if we have to go in and clean up after the automated process’ mistakes, it probably would have been faster to just do it ourselves in the first place.
  • Style preservation - Many programmers carefully write their code in their own preferred style and we should try hard to preserve such choices whenever possible (otherwise people might not want to use the tool)
  • Convenience - Ideally no setup would be required to use the tool

CSTParser

The first goal, correctness, forces us to use a proper parser for our undertaking, rather than relying on simple find/replace or regular expressions. Unfortunately, while julia’s parser is accessible from within the language and can be used to find these instances of deprecated syntax, it cannot be used for our purposes. This is because it does not support our second goal - style preservation. In going from the input text to the Abstract Syntax Tree, the parser discards a significant amount of semantically-irrelevant information (formatting, comments, distinctions between different syntax forms that are otherwise semantically equivalent). Instead, we need a parser that retains and exposes all of this information. There are several names of this concept, “round-tripable representation”, “Concrete Syntax Tree (CST)” or “Lossless Syntax Tree” being perhaps the most common. Luckily, in the Julia ecosystem we have not one, but two choices for such a parser:

  • JuliaParser.jl - a slightly older translation of the scheme parser from the main julia codebase into Julia, later retrofitted with precise location information.
  • CSTParser.jl - a ground up rewrite of the parser with the explicit goal of writing a high performance, correct, lossless parser, originally for use in the VS Code IDE extension

Ultimately the decision came down to the fact that CSTParser.jl was actively maintained, while JuliaParser.jl had not yet been updated to the new Julia syntax. With a number of small enhancements and additional features I contributed in order to make it useful for this project, CSTParser is now able to parse essentially all publicly available Julia code correctly, while retaining the needed formatting information.

The design of CSTParser.jl is somewhat similar to that of the Roslyn parser (a good overview can be found here). Each leaf node in the AST stores only its total size (but not its absolute position in the file), as well as what part of its contents are semantically significant as opposed to leading or trailing trivia (comments, whitespace, semicolons etc). This is useful for the IDE use case, since it allows efficient reparsing when small changes are made to a file (since a local change does not invalidate any data in a far away node). The resulting tree can be a little awkward to work with, but as we shall see it is easy to work around this for our use case.

Deprecations.jl

The new Deprecations.jl package is the heart of this project. It contains all the logic to rewrite Julia code making use of deprecated syntax constructs. It supports two modes of specifying such rewrites:

  • Using CST matcher templates
  • By working with the raw CST api Independent of the mode, a new rewrite is introduced as such:
      struct OldStructSyntax; end
      register(OldStructSyntax, Deprecation(
          "The type-definition keywords (type, immutable, abstract) where changed in Julia 0.6",
          "julia",
          v"0.6.0",
          v"0.7.0-DEV.198",
          typemax(VersionNumber)
      ))
    

    which gives a description, as well as some version bounds. This is important because we need to make sure to only apply rewrites that are compatible with the package’s declared minimum supported julia version (i.e. we need to make sure not to introduce julia 0.6 syntax to a package that still supports julia 0.5). Each Julia package provides a REQUIRE file specifying it’s supported minimum versions.

Having declared the new rewrite, let’s actually make it work by adding some CST matcher templates to it:

    match(OldStructSyntax,
        "immutable \$name\n\$BODY...\nend",
        "struct\$name\n\$BODY!...\nend",
        format_paramlist
    )
    match(OldStructSyntax,
        "type \$name\n\$BODY...\nend",
        "mutable struct\$name\n\$BODY!...\nend",
        format_paramlist
    )

The way this works is fairly straightforward. For each match call, the first line is the template to match and the second is its replacement. Under the hood, this works by parsing both expressions, pattern matching the resulting template tree against the tree we want to update and then splicing in the replacement tree (with the appropriate parameters taken from the tree we’re matching against). The whole thing is implemented in about 200 lines of code.

In this description I’ve skipped a bit of magic. Simply splicing together a new tree of CST nodes, doesn’t quite work. As mentioned above the CST nodes only know their kind and size and very little else. In particular, they know neither their position in the original buffer, nor what text is at that position. Instead, the replacement tree is made out of different kind of node that retains both pieces of information (which the original buffer is and where in the buffer that node is located). Conceptually this is again similar to Roslyn’s red-green trees. However, there is very little code associated with this abstraction. Most of the functionality is provided by the AbstractTrees.jl package by lifting the tree structure of the underlying CST nodes.

Lastly, there’s a couple of other node types to be found in this “replacement tree” to insert or replace whitespace or other trivia. This is useful for formatting purposes. E.g. the example above, we passed format_paramlist as a formatter. This function runs after the matching and adjusts formatting. To see this consider:

immutable RGB{RT,
              GT,
              BT}
    r::RT
    g::GT
    b::BT
end

Naively, this would end up as

struct RGB{RT,
              GT,
              BT}
    r::RT
    g::GT
    b::BT
end

leaving us with unhappy users. Instead, the formatter turns this into

struct RGB{RT,
           GT,
           BT}
    r::RT
    g::GT
    b::BT
end

by adjusting the leading trivia of the GT and BT nodes (or rather the trailing trivia of their predecessors).

Lastly, while the CST templates shown above are powerful, they are still limited to simple pattern matching. Sometimes we need to perform more complicated kinds of transformation to decide which rewrites to perform. One example is code like:

if VERSION > v"0.5"
    do_this()
else
    do_that()
end

which, depending on the current julia version, executes either one branch or the other. Of course, if the package declares that it requires julia 0.6 at a minimum, the condition is true for any supported julia version, so we can “constant fold” the expression and remove the else branch. Doing so with simple templates is infeasible, since we need to recognize all patterns of the form “comparison of VERSION against some version number” and then compute whether the condition is actually always true (or false) given the declared version bounds. Such transformations are possible using the raw API. Writing such transformations is more complicated (and beyond the scope of this post), but can be very powerful.

FemtoCleaner

Having addressed the first two goals, let’s get to the third goal - convenience. The vast majority of public Julia code is hosted on GitHub, so the natural way to do this is create a GitHub bot that clones a repository, applies the rewrites and submits a pull request to the relevant repository. The simplest way to do would be to clone all the repositories, apply the rewrites, and then programmatically submit pull requests to all of them (the PkgDev.jl packages has a function to automatically submit a pull request against a Julia package). However, this approach falls short for several reasons:

  • It’s very manual. When new features are added, we have to manually perform a new such run. This is also problematic, because in practice it means that these runs have to always be done by the person who knows how the setup works. He’s a very busy guy.
  • It would only catch registered Julia packages. There are a significant number of repositories that use Julia code, but are not registered Julia packages. Of course one could go the other way and submit pull requests to repositories that look like Julia code, but that risks creating a significant number of useless pull request (because of forks, long dead codebases, etc)
  • It wouldn’t work on private packages
  • It doesn’t allow the user to control and interact with the process

A better alternative that addresses all these problems is to create a GitHub bot (also called a GitHub app) to perform these functions. The Julia community is quite familiar with these already. We have the venerable nanosoldier, which performs on-demand performance benchmarking of julia commits, attobot which assists Julia users in registering their packages with METADTA and (perhaps less well known) jlbuild which controls the julia buildbots (which build releases and perform some additional continous integration on secondary platforms).

Joining these now is femtocleaner (phew that took a while to get to - I hope the background above was useful though), which performs exactly this function. Let’s see how it works. First go to https://github.com/apps/femtocleaner and click “Configure”. You’ll be presented with a choice of accounts to install femtocleaner into:

Choosing an account will give you the option to install femtocleaner on either one or all of the repositories in that account:

In this case, I will install femtocleaner in all repositories of the JuliaParallel organization. Without any further ado, femtocleaner will go to work, cloning each repository, applying the rewrites it knows about and then submitting a pull request to each repository where it was able to make a change:

From now on, FemtoCleaner will listen to events on these repositories and submit another pull request whenever these packages decide to drop support for an older julia version, thus allowing the bot remove more deprecated syntax. The bot can also be triggered manually by opening an issue with the title “Run femtocleaner”.

The bot has a few additional features meant to make interacting with it easier. The most used one is the bad bot command, which is used to inform the developers that the bot has made a mistake. It can be triggered by simply creating a “Changes Requested” GitHub PR review, and annotating an incorrect change with the review comment bad bot, like so:

In response the bot will open an issue on its source code repository giving the relevant context and linking back to the PR:

Enabling this functionality right from the the pull request review window has proven very powerful. Rather than requiring the user to leave their current context (reviewing a pull request) and navigate to a different repository to file an issue, everything can be done right there in the pull request review. Lastly, once the rewrite bug has been addressed, the bot will come back, update the pull request and leave a comment to inform the user it did so:

This workflow is also very convenient from the other side. All the issues are in one place (rather than having to monitor activity on all pull requests filed by the bug) and addressing the bug is as simple as fixing the code and pushing it to the source code repository. The bot will automatically, update itself and go back and fix up any pull requests that would now differ as a result of the new code:

Results

The whole project from the first line of code written in support of it until this blog post (which represents its completion) took about three weeks. As part of it, I made a number of changes to CSTParser and its dependencies (which should prove very useful for future parsing endeavors) as well as GitHub.jl (which will hopefully help write more of these kinds of bots to support the Julia community). After some initial testing and an alpha run on JuliaStats on Aug 8 (huge thanks to Alex Arslan for aggreeing to diligently review and try out the process), we announced the public availability of femtocleaner on discourse last friday (Aug 11). Since then, the bot has been installed on 759 repositories (though about 200 of them were ineligible for femtocleaner processing, either because they had missing or malformed REQUIRE files or because they were not actually Julia packages), submitting 132 pull requests that add 8850 lines and delete 9331. Most of these pull requests have been merged:

As people started using femtocleaner, a number of issues were discovered, but developers took advantage of the bad bot mechanism described above to report them and we did our best to address them quickly. The following graph shows the number of open/closed such issues over the time period that femtocleaner has been active:

Alex Arslan’s original testing on Aug 8 is well visible (and took a few days to catch up to), but all known issues have been addressed. Another interesting data point is the distribution of supported julia versions that femtocleaner was installed on. As discussed above, it was primarily written to aid in moving to the new syntax available in julia 0.6, though a few rewrites (such as the generic VERSION comparisons) are also applicable to older versions. The following shows the number of repositories as well as the number of open prs by minimum supported julia version (no pr opened means that the bot found no deprecated syntax):

As expected, packages supporting 0.6.0 got proportionally the most pull requests. However, this just means that femtocleaner will be back for the remaining 0.5.0 packages once they decide to drop support for 0.5.0. We can also look at the number of changed lines by the package’s supported minimum version:

Again the bias of the bot for upgrading 0.6 syntax stands out. It is perhaps interesting to note that most of the 0.6 packages with a small to medium number of changes had already been upgraded manually to the new syntax. Still, the bot was able to find a few changes that were missed in this process and clean them up automatically.

Conclusions

Overall, this work should accelerate the movement of the package ecosystem towards 1.0 by making upgrading code easier. Generally, the package ecosystem lags behind the julia release by a few months as package maintainers upgrade their code bases. We hope this system will help make sure that 1.0 is released with a full set of up-to-date packages, as well as ease the burden on package maintainers, allowing them to spend their time on improving their packages instead of being forced to spend a lot of time performing tedious syntax upgrades. We are very happy with the outcome of this work. There are already almost ten thousand fewer deprecation warnings across the Julia ecosystem and more will be removed automatically once the package developers are ready for it. Additionally, the underlying technology should help with a number of other developer-productivity tools and improvements, such as IDE support, better error messages and the debugger. All code is open source and available on GitHub. You are welcome to contribute, improve the code or build your own GitHub bots.

We would like to thank GitHub for providing a rich enough API to allow this convenient workflow.

Lastly, we thank and acknowledge the Sloan foundation for their continued supported of the Julia ecosystem by providing the funding for this work.

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/08/17/femtocleaner.html

TL;DR: FemtoCleaner is a GitHub bot that upgrades old Julia syntax to new syntax. It has been installed
in more than 700 repositories, submitted 100+ pull requests and touched 10000 lines of code since
last Friday. Scroll down for instructions, screen shots and pretty plots.

Background

As julia is approaching its 1.0 release, we have been revisiting
several key areas of the language. We want to ensure that the 1.0
release is of sufficient quality that it can serve as a stable
foundation of the Julia ecosystem for many years to come without
requiring breaking changes. In effect, however, prioritizing such
breaking changes over ones that can be safely done in a non-breaking
fashion after 1.0 means that we are currently making many more
breaking changes than we otherwise might. Two particularly disruptive
such changes were the syntax changes to type keywords and parametric
function syntax, both of which were introduced in 0.6. The old syntax
is now deprecated on master will be removed in 1.0. The former change
involves changing the type definition keywords from type/immutable to
mutable struct/struct, e.g.

immutable RGB{T}
    r::T
    g::T
    b::T
end

becomes

struct RGB{T}
    r::T
    g::T
    b::T
end

The parametric function syntax change is a bit more tricky.
In the simplest case, it involves rewriting functions like:

eltype{T}(::Array{T}) = T

to

eltype(::Array{T}) where {T} = T

which is relatively straightforward. However, there are more complicated corner cases
involving inner constructors such as:

immutable Wrapper{T}
    data::Vector{T}
    Wrapper{S}(data::Vector{S}) = new(convert(Vector{T}, data))
end

which now has to read

struct Wrapper{T}
    data::Vector{T}
    Wrapper{T}(data::Vector{S}) where {T,S} = new(convert(Vector{T}, data))
end

This last example also shows why this syntax was changed. In prior versions of julia,
the braces syntax (F{T} for some F,T) was inconsistent between meaning parameter application
and introducing parameters for a method. Julia 0.6 features a significantly more powerful (and correct) type
system. At the same time, the F{T} syntax was changed to always mean parameter application
(modulo support for parsing the deprecated syntax for backwards compatibility of course),
reducing confusion and making it possible to more easily express some of the new method signatures
now made possible by the new type system improvements. For further information see also
Stefan’s Discourse post
and the 0.6 release notes.

Realizing the magnitude of the required change and the growing amount of Julia code that exists in the wild,
several julia contributors suggested on Discourse that we attempt to automate these syntax upgrades.
Unfortunately, is not simply a of search/replace in a source file. The rewrite can be quite complex
and depends on the scope in which it is used. Nevertheless, we set out to build such an automated system, with the following goals in mind:

  • Correctness – Being able to upgrade syntax is not very useful if we have to go in and clean up after the automated process’ mistakes,
    it probably would have been faster to just do it ourselves in the first place.
  • Style preservation – Many programmers carefully write their code in their own preferred style and we should try hard to preserve
    such choices whenever possible (otherwise people might not want to use the tool)
  • Convenience – Ideally no setup would be required to use the tool

CSTParser

The first goal, correctness, forces us to use a proper parser for our undertaking, rather than
relying on simple find/replace or regular expressions. Unfortunately, while julia’s parser is
accessible from within the language and can be used to find these instances of deprecated syntax,
it cannot be used for our purposes. This is because it does not support our second goal – style preservation.
In going from the input text to the Abstract Syntax Tree, the parser discards a significant amount
of semantically-irrelevant information (formatting, comments, distinctions between different syntax
forms that are otherwise semantically equivalent). Instead, we need a parser that retains and exposes
all of this information. There are several names of this concept, “round-tripable representation”,
“Concrete Syntax Tree (CST)” or “Lossless Syntax Tree” being perhaps the most common. Luckily,
in the Julia ecosystem we have not one, but two choices for such a parser:

  • JuliaParser.jl – a slightly older translation of the scheme parser from the main julia codebase into Julia,
    later retrofitted with precise location information.
  • CSTParser.jl – a ground up rewrite of the parser with the explicit goal of writing a high performance, correct,
    lossless parser, originally for use in the VS Code IDE extension

Ultimately the decision came down to the fact that CSTParser.jl was actively maintained, while JuliaParser.jl had
not yet been updated to the new Julia syntax. With a number of small enhancements and additional features I
contributed in order to make it useful for this project, CSTParser is now able to parse essentially all publicly
available Julia code correctly, while retaining the needed formatting information.

The design of CSTParser.jl is somewhat similar to that of the Roslyn parser (a good overview can be found here). Each leaf node in the AST stores
only its total size (but not its absolute position in the file), as well as what part of its contents are semantically significant
as opposed to leading or trailing trivia (comments, whitespace, semicolons etc). This is useful for the IDE use case,
since it allows efficient reparsing when small changes are made to a file (since a local change does not invalidate any
data in a far away node). The resulting tree can be a little awkward to work with, but as we shall see it is easy to work
around this for our use case.

Deprecations.jl

The new Deprecations.jl package is the heart of this project. It contains all the
logic to rewrite Julia code making use of deprecated syntax constructs. It supports two modes of specifying such rewrites:

  • Using CST matcher templates
  • By working with the raw CST api
    Independent of the mode, a new rewrite is introduced as such:

      struct OldStructSyntax; end
      register(OldStructSyntax, Deprecation(
          "The type-definition keywords (type, immutable, abstract) where changed in Julia 0.6",
          "julia",
          v"0.6.0",
          v"0.7.0-DEV.198",
          typemax(VersionNumber)
      ))
    

    which gives a description, as well as some version bounds. This is important because we need to make sure
    to only apply rewrites that are compatible with the package’s declared minimum supported julia version
    (i.e. we need to make sure not to introduce julia 0.6 syntax to a package that still supports julia 0.5).
    Each Julia package provides a REQUIRE file specifying it’s supported minimum versions.

Having declared the new rewrite, let’s actually make it work by adding some CST matcher templates to it:

    match(OldStructSyntax,
        "immutable \$name\n\$BODY...\nend",
        "struct\$name\n\$BODY!...\nend",
        format_paramlist
    )
    match(OldStructSyntax,
        "type \$name\n\$BODY...\nend",
        "mutable struct\$name\n\$BODY!...\nend",
        format_paramlist
    )

The way this works is fairly straightforward. For each match call, the first line is the template to
match and the second is its replacement. Under the hood, this works by parsing both expressions, pattern
matching the resulting template tree against the tree we want to update and then splicing in the replacement
tree (with the appropriate parameters taken from the tree we’re matching against). The whole thing is implemented
in about 200 lines of code.

In this description I’ve skipped a bit of magic. Simply splicing together a new tree of CST nodes, doesn’t quite
work. As mentioned above the CST nodes only know their kind and size and very little else. In particular,
they know neither their position in the original buffer, nor what text is at that position. Instead, the replacement
tree is made out of different kind of node that retains both pieces of information (which the original buffer is
and where in the buffer that node is located). Conceptually this is again similar to Roslyn’s red-green trees. However, there
is very little code
associated with this abstraction. Most of the functionality is provided by the AbstractTrees.jl package by lifting the tree structure of the underlying CST nodes.

Lastly, there’s a couple of other node types to be found in this “replacement tree” to insert or replace
whitespace or other trivia. This is useful for formatting purposes. E.g. the example above, we passed format_paramlist
as a formatter. This function runs after the matching and adjusts formatting. To see this consider:

immutable RGB{RT,
              GT,
              BT}
    r::RT
    g::GT
    b::BT
end

Naively, this would end up as

struct RGB{RT,
              GT,
              BT}
    r::RT
    g::GT
    b::BT
end

leaving us with unhappy users. Instead, the formatter turns this into

struct RGB{RT,
           GT,
           BT}
    r::RT
    g::GT
    b::BT
end

by adjusting the leading trivia of the GT and BT nodes (or rather the trailing trivia of their predecessors).

Lastly, while the CST templates shown above are powerful, they are still limited to simple pattern matching.
Sometimes we need to perform more complicated kinds of transformation to decide which rewrites to perform.
One example is code like:

if VERSION > v"0.5"
    do_this()
else
    do_that()
end

which, depending on the current julia version, executes either one branch or the other. Of course, if
the package declares that it requires julia 0.6 at a minimum, the condition is true for any supported
julia version, so we can “constant fold” the expression and remove the else branch. Doing so with simple
templates is infeasible, since we need to recognize all patterns of the form “comparison of VERSION against
some version number” and then compute whether the condition is actually always true (or false) given the declared
version bounds. Such transformations are possible using the raw API. Writing such transformations is more complicated
(and beyond the scope of this post), but can be very powerful.

FemtoCleaner

Having addressed the first two goals, let’s get to the third goal – convenience. The vast majority of public Julia code
is hosted on GitHub, so the natural way to do this is create a GitHub bot that clones a repository, applies the rewrites
and submits a pull request to the relevant repository. The simplest way to do would be to clone all the repositories,
apply the rewrites, and then programmatically submit pull requests to all of them (the PkgDev.jl packages has a function
to automatically submit a pull request against a Julia package). However, this approach falls short for several reasons:

  • It’s very manual. When new features are added, we have to manually perform a new such run. This is also problematic,
    because in practice it means that these runs have to always be done by the person who knows how the setup works. He’s
    a very busy guy.
  • It would only catch registered Julia packages. There are a significant number of repositories that use Julia code,
    but are not registered Julia packages. Of course one could go the other way and submit pull requests to repositories
    that look like Julia code, but that risks creating a significant number of useless pull request (because of forks,
    long dead codebases, etc)
  • It wouldn’t work on private packages
  • It doesn’t allow the user to control and interact with the process

A better alternative that addresses all these problems is to create a GitHub bot (also called a GitHub app) to perform these functions. The
Julia community is quite familiar with these already. We have the venerable nanosoldier, which performs on-demand performance benchmarking of julia commits, attobot which assists Julia users in registering their packages with METADTA and (perhaps less well known) jlbuild which controls the julia buildbots (which build releases and perform some additional continous integration on secondary platforms).

Joining these now is femtocleaner (phew that took a while to get to – I hope the background above was useful though), which performs exactly this function. Let’s see how it works. First go to https://github.com/apps/femtocleaner and click “Configure”. You’ll be presented with a
choice of accounts to install femtocleaner into:

Choosing an account will give you the option to install femtocleaner on either one or all of
the repositories in that account:

In this case, I will install femtocleaner in all repositories of the JuliaParallel organization.
Without any further ado, femtocleaner will go to work, cloning each repository, applying
the rewrites it knows about and then submitting a pull request to each repository where it was
able to make a change:

From now on, FemtoCleaner will listen to events on these repositories and submit another pull
request whenever these packages decide to drop support for an older julia version, thus allowing
the bot remove more deprecated syntax. The bot can also be triggered manually by opening an
issue with the title “Run femtocleaner”.

The bot has a few additional features meant to make interacting with it easier. The most used one
is the bad bot command, which is used to inform the developers that the bot has made a mistake.
It can be triggered by simply creating a “Changes Requested” GitHub PR review, and annotating an incorrect
change with the review comment bad bot, like so:

In response the bot will open an issue on its source code repository giving the relevant context
and linking back to the PR:

Enabling this functionality right from the the pull request review window has proven very powerful.
Rather than requiring the user to leave their current context (reviewing a pull request) and navigate
to a different repository to file an issue, everything can be done right there in the pull request
review. Lastly, once the rewrite bug has been addressed, the bot will come back, update the pull
request and leave a comment to inform the user it did so:

This workflow is also very convenient from the other side. All the issues are in one place (rather
than having to monitor activity on all pull requests filed by the bug) and addressing the bug is
as simple as fixing the code and pushing it to the source code repository. The bot will automatically,
update itself and go back and fix up any pull requests that would now differ as a result of the new code:

Results

The whole project from the first line of code written in support of it until this blog post (which
represents its completion) took about three weeks. As part of it, I made a number of changes
to CSTParser and its dependencies (which should prove very useful for future parsing endeavors) as
well as GitHub.jl (which will hopefully help write more of these kinds of bots to support the Julia
community). After some initial testing and an alpha run on JuliaStats on Aug 8 (huge thanks to Alex Arslan for aggreeing to diligently review and try out the process), we announced the public availability of
femtocleaner on discourse last friday (Aug 11). Since then, the bot has been installed on 759 repositories (though about 200 of them were ineligible for femtocleaner processing, either because they had missing or malformed REQUIRE files or because they were not actually Julia packages), submitting 132 pull requests that add 8850 lines and
delete 9331. Most of these pull requests have been merged:

As people started using femtocleaner, a number of issues were discovered, but developers took advantage
of the bad bot mechanism described above to report them and we did our best to address them quickly.
The following graph shows the number of open/closed such issues over the time period that femtocleaner has
been active:

Alex Arslan’s original testing on Aug 8 is well visible (and took a few days to catch up to), but all known
issues have been addressed. Another interesting data point is the distribution of supported julia versions
that femtocleaner was installed on. As discussed above, it was primarily written to aid in moving to the new
syntax available in julia 0.6, though a few rewrites (such as the generic VERSION comparisons) are also applicable to older versions. The following shows the number of repositories as well as the number of open
prs by minimum supported julia version (no pr opened means that the bot found no deprecated syntax):

As expected, packages supporting 0.6.0 got proportionally the most pull requests. However, this just means that
femtocleaner will be back for the remaining 0.5.0 packages once they decide to drop support for 0.5.0.
We can also look at the number of changed lines by the package’s supported minimum version:

Again the bias of the bot for upgrading 0.6 syntax stands out. It is perhaps interesting to note that
most of the 0.6 packages with a small to medium number of changes had already been upgraded manually to
the new syntax. Still, the bot was able to find a few changes that were missed in this process and clean
them up automatically.

Conclusions

Overall, this work should accelerate the movement of the package ecosystem
towards 1.0 by making upgrading code easier. Generally, the package ecosystem lags
behind the julia release by a few months as package maintainers upgrade their code bases.
We hope this system will help make sure that 1.0 is released with a full set of up-to-date
packages, as well as ease the burden on package maintainers, allowing them to spend their time
on improving their packages instead of being forced to spend a lot of time performing tedious
syntax upgrades. We are very happy with the outcome of this work. There are already almost ten thousand
fewer deprecation warnings across the Julia ecosystem and more will be removed automatically once
the package developers are ready for it. Additionally, the underlying technology should help
with a number of other developer-productivity tools and improvements, such as IDE support, better
error messages and the debugger. All code is open source and available on GitHub.
You are welcome to contribute, improve the code or build your own GitHub bots.

We would like to thank GitHub for providing a rich enough API to allow this convenient workflow.

Lastly, we thank and acknowledge the Sloan foundation for their continued supported of the Julia
ecosystem by providing the funding for this work.

]]>
3780
Using julia -L startupfile.jl, rather than machinefiles for starting workers. http://www.juliabloggers.com/using-julia-l-startupfile-jl-rather-than-machinefiles-for-starting-workers/ Thu, 17 Aug 2017 00:00:00 +0000 http://white.ucc.asn.au/2017/08/17/starting-workers.html If one wants to have full control over the worker process to method to use is addprocs and the -L startupfile.jl commandline arguement when you start julia See the documentation for addprocs.

The simplest way to add processes to the julia worker is to invoke it with julia -p 4. The -p 4 argument says start 4 worker processes, on the local machine. For more control, one uses julia --machinefile ~/machines Where ~/machines is a file listing the hosts. The machinefile is often just a list of hostnames/IP-addresses, but sometimes is more detailed. Julia will connect to each host and start a number of workers on each equal to the number of cores.

Even the most detailed machinefile doesn’t give full control, for example you can not specify the topology, or the location of the julia exectuable.

For full control, one shoud invoke addprocs directly, and to do so, one should use julia -L startupfile.jl

Inkoking julia with julia -L startupfile.jl ... causes julia to exectute startupfile.jl before all other things. It can be invoked to start the REPL, or with a normal script afterwoods. It is thus a good place to do addprocs rather than doing it at the top of the script being run, since this allows the main script to be concerned only with the task.

On to a proper example, let us assume one has 4 servers.

  • host1 with 24 available cores
  • host2 with 12 available cores
  • host3 with 8 available cores

Doing this with machinefile would just be having file:

host1
host2
host3

Assuming the number of workers desired is equal to the number of cores; and you don’t need anything fancy then using julia -m ~/machinefile is fine. If you are working on a supercomputer or a cluster you likely have a method to generate a machinefile – it is the same as is used by MPI etc.

However, let us also say that you want to use the :master_slave topology, so all workers are only allowed to talk to the master process, not to each other. This is common in many clusters.

We can start a number of local workers, using addproc(4).

for host in ["host1", "host2", "host3"]
	addproc(host; topology=:master_slave)
end

There is also an overload for this which takes a vector of hostnames. So this can be done as:

addproc(["host1", "host2", "host3"]; topology=:master_slave)

This method will use the default number of workers, i.e. equal to the core count. It also takes an overload allowing one to specify the number of worker on each process:

addproc([("host1", 24), ("host2", 12), ("host3", 8)]; topology=:master_slave)

Further to this, instread of a hostname, one can provide a line from a machine file.

Thus: one can read every line from a machinefile by: collect(eachline("~/machinefile"))

addproc(collect(eachline("~/machinefile")); topology=:master_slave)

This is useful, if your system is providing you with a machine file.

The short of all this is, using julia -L startup.jl is the most powerful, and generally best way to do anything when it comes to setting up your distributed julia worker processes.

]]>
By: A Technical Blog -- julia

Re-posted from: http://white.ucc.asn.au/2017/08/17/starting-workers.html

If one wants to have full control over the worker process to method to use is addprocs and the -L startupfile.jl commandline arguement when you start julia
See the documentation for addprocs.

The simplest way to add processes to the julia worker is to invoke it with julia -p 4.
The -p 4 argument says start 4 worker processes, on the local machine.
For more control, one uses julia --machinefile ~/machines
Where ~/machines is a file listing the hosts.
The machinefile is often just a list of hostnames/IP-addresses,
but sometimes is more detailed.
Julia will connect to each host and start a number of workers on each equal to the number of cores.

Even the most detailed machinefile doesn’t give full control,
for example you can not specify the topology, or the location of the julia exectuable.

For full control, one shoud invoke addprocs directly,
and to do so, one should use julia -L startupfile.jl

Inkoking julia with julia -L startupfile.jl ...
causes julia to exectute startupfile.jl before all other things.
It can be invoked to start the REPL, or with a normal script afterwoods.
It is thus a good place to do addprocs rather than doing it at the top of the script being run,
since this allows the main script to be concerned only with the task.

On to a proper example,
let us assume one has 4 servers.

  • host1 with 24 available cores
  • host2 with 12 available cores
  • host3 with 8 available cores

Doing this with machinefile would just be having file:

host1
host2
host3

Assuming the number of workers desired is equal to the number of cores;
and you don’t need anything fancy then using julia -m ~/machinefile is fine.
If you are working on a supercomputer or a cluster you likely have a method to generate a machinefile – it is the same as is used by MPI etc.

However, let us also say that you want to use the :master_slave topology,
so all workers are only allowed to talk to the master process,
not to each other.
This is common in many clusters.

We can start a number of local workers, using addproc(4).

for host in ["host1", "host2", "host3"]
	addproc(host; topology=:master_slave)
end

There is also an overload for this which takes a vector of hostnames.
So this can be done as:

addproc(["host1", "host2", "host3"]; topology=:master_slave)

This method will use the default number of workers, i.e. equal to the core count.
It also takes an overload allowing one to specify the number of worker on each process:

addproc([("host1", 24), ("host2", 12), ("host3", 8)]; topology=:master_slave)

Further to this, instread of a hostname,
one can provide a line from a machine file.

Thus:
one can read every line from a machinefile by: collect(eachline("~/machinefile"))

addproc(collect(eachline("~/machinefile")); topology=:master_slave)

This is useful, if your system is providing you with a machine file.

The short of all this is,
using julia -L startup.jl is the most powerful,
and generally best way to do anything when it comes to setting up your distributed julia worker processes.

]]>
3778
Newsletter August 2017 http://www.juliabloggers.com/newsletter-august-2017/ Wed, 16 Aug 2017 00:00:00 +0000 http://juliacomputing.com/blog/2017/08/16/newsletter We wanted to thank all Julia users and well wishers for the support and for being part of the Julia Community, and to give an update on some exciting developments for 2017:

  1. Julia Joins Petaflop Club: Celeste is the first application written in a dynamic high-level language to exceed 1 petaflop per second
  2. JuliaRun: new and vastly improved for deployment and scaling with Julia v0.6
  3. JuliaFin: updated with JuliaDB and Julia v0.6
  4. Julia v0.6 and JuliaPro v0.6.0.1
  5. JuliaCon 2017
  6. Julia Computing Funding and Grant Announcements
  7. Julia and Julia Computing in the News
  8. Julia Case Studies
  9. Contact Us

  1. Julia Joins Petaflop Club: Celeste joins the rarified list of applications to exceed 1 petaflop per second performance, and is the first to do so in a dynamic high-level language. The Celeste research team processed 55 terabytes of visual data and classified 188 million astronomical objects in just 15 minutes, resulting in the first comprehensive catalog of all visible objects from the Sloan Digital Sky Survey. This is one of the largest problems in mathematical optimization ever solved. The Celeste team, which includes researchers from UC Berkeley, Lawrence Berkeley National Laboratory, National Energy Research Supercomputing Center, Intel, Julia Computing and the Julia Lab at MIT, used 9,300 Knights Landing (KNL) nodes on the NERSC Cori Phase II supercomputer to execute 1.3 million threads on 650,000 KNL cores.

  2. JuliaRun allows you to run and deploy Julia applications in production at scale, including parallel and distributed computing on private or public clusters. JuliaRun works seamlessly with AWS and Microsoft Azure, and can be configured to run with any private cloud. You can start a JuliaRun instance today with just a few minutes of setup time. Write to us at info@juliacomputing.com for an evaluation version.

  3. JuliaFin is a suite of Julia packages that simplify the workflow for quantitative finance including storage, retrieval, analysis and action. These include: Miletus, a domain specific language (DSL) for defining financial contracts; JuliaDB, a high performance in-memory database, with best performance time series analytics, in-memory and out-of-core analytics; integration with Bloomberg, Excel and other proprietary systems. Click here for details and to download for evaluation.

  4. Julia v0.6 and JuliaPro v0.6.0.1 were released last month with the following upgrades:

    Highlights of Julia v0.6:

    • New type system capabilities that make Julia even more expressive and accurate
    • Automatic broadcasting and loop fusion for all operators and functions
    • Significantly faster strings
    • Improved inter-task communications using channels
    • Significant improvements in the standard library
    • For more details, please see the release notes of Julia 0.6

    Highlights of JuliaPro v0.6.0.1:

    • Supports Julia v0.6
    • Updated Atom to the latest version (1.18.0), updated all Atom packages
    • Updated all bundled Julia packages to Julia v0.6
    • Added the following packages
    • In the interest of getting this release out in as timely a fashion as possible, this release unfortunately does not include Gallium or MXNet. Updating both of these packages for Julia 0.6 is in progress and they will be included in the next release of JuliaPro.
  5. JuliaCon 2017 was the biggest and most successful JuliaCon yet. It featured more than 300 participants and presenters, including presentations on how Julia is being used for deep learning, quantitative finance, energy, astrophysics, agriculture, medicine and more. Presentation videos are available on YouTube.

  6. Julia Computing Funding and Grant Announcements: Julia Computing has announced completion of our first round of seed funding and a significant grant from the Sloan Foundation which includes dedicated funding to promote diversity in the Julia community.

  7. Julia and Julia Computing in the News: There has been a huge increase in Julia and Julia Computing news mentions so far this year, consistent with significant increases in Julia adoption.

  8. Julia Case Studies: Several exciting new Julia case studies are available on the Julia Computing Website
    including:

    • San Jose Semaphore – When Science Meets Art: High school math teacher Jimmy Waters from Powell, Tennessee used Julia to solve a Silicon Valley cryptography stumper that had gone unsolved for nearly 5 years.
    • Milk Output Optimizer (MOO): Oscar Dawson from the University of Auckland Electric Power Optimization Centre (EPOC) is using Julia to optimize dairy farming.
    • Mapping Global Genetic Diversity: Researchers are using Julia to conduct analysis and mapping of global genetic diversity. Their results have been published in Science.
    • Cancer Genomics: UK scientists used Julia to model tumor growth, informing interpretation of cancer genomes. Their results have been published in Nature Genetics.
  9. Contact Us: Please contact us at info@juliacomputing.com if you wish to:

    • Purchase or obtain license information for Julia products such as JuliaPro, JuliaRun, JuliaFin or JuliaBox.
    • Obtain pricing for Julia consulting projects for your enterprise.
    • Schedule Julia training for your organization.
    • Share information about exciting new Julia case studies or use cases.

About Julia and Julia Computing

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 10x for macroeconomic modeling.

  2. Julia provides unlimited scalability. Julia applications can be deployed on large clusters with a click of a button and can run parallel and distributed computing quickly and easily on tens of thousands of nodes.

  3. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  4. Julia integrates well with existing code and platforms. Users of C, C++, Python, R and other languages can easily integrate their existing code into Julia.

  5. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing. It has advanced libraries that make programming simple and fast and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  6. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/08/16/newsletter.html

We wanted to thank all Julia users and well wishers for the support and for being part of the Julia Community, and to give an update on some exciting developments for 2017:

  1. Julia Joins Petaflop Club: Celeste is the first application written in a dynamic high-level language to exceed 1 petaflop per second
  2. JuliaRun: new and vastly improved for deployment and scaling with Julia v0.6
  3. JuliaFin: updated with JuliaDB and Julia v0.6
  4. Julia v0.6 and JuliaPro v0.6.0.1
  5. JuliaCon 2017
  6. Julia Computing Funding and Grant Announcements
  7. Julia and Julia Computing in the News
  8. Julia Case Studies
  9. Contact Us


  1. Julia Joins Petaflop Club: Celeste joins the rarified list of applications to exceed 1 petaflop per second
    performance,
    and is the first to do so in a dynamic high-level language. The Celeste research team processed 55
    terabytes of visual data and classified 188 million astronomical objects in just 15 minutes, resulting in the
    first comprehensive catalog of all visible objects from the Sloan Digital Sky Survey. This is one of the largest
    problems in mathematical optimization ever solved. The Celeste team, which includes researchers from UC Berkeley,
    Lawrence Berkeley National Laboratory, National Energy Research Supercomputing Center, Intel, Julia Computing and
    the Julia Lab at MIT, used 9,300 Knights Landing (KNL) nodes on the NERSC Cori Phase II supercomputer to execute
    1.3 million threads on 650,000 KNL cores.

  2. JuliaRun allows you to run and deploy Julia applications
    in production at scale, including parallel and distributed computing on private or public clusters. JuliaRun works
    seamlessly with AWS and Microsoft Azure, and can be configured to run with any private cloud. You can start a
    JuliaRun instance today with just a few minutes of setup time. Write to us at info@juliacomputing.com for an
    evaluation version.

  3. JuliaFin is a suite of Julia packages that simplify the
    workflow for quantitative finance including storage, retrieval, analysis and action. These include: Miletus, a
    domain specific language (DSL) for defining financial contracts; JuliaDB, a high performance in-memory
    database, with best performance time series analytics, in-memory and out-of-core analytics; integration with
    Bloomberg, Excel and other proprietary systems. Click here for
    details and to download for evaluation.

  4. Julia v0.6 and JuliaPro v0.6.0.1 were released last month with the following upgrades:

    Highlights of Julia v0.6:

    • New type system capabilities that make Julia even more expressive and accurate
    • Automatic broadcasting and loop fusion for all operators and functions
    • Significantly faster strings
    • Improved inter-task communications using channels
    • Significant improvements in the standard library
    • For more details, please see the release notes of Julia
      0.6

    Highlights of JuliaPro v0.6.0.1:

    • Supports Julia v0.6
    • Updated Atom to the latest version (1.18.0), updated all Atom packages
    • Updated all bundled Julia packages to Julia v0.6
    • Added the following packages
    • In the interest of getting this release out in as timely a fashion as possible, this release unfortunately does
      not include Gallium or MXNet. Updating
      both of these packages for Julia 0.6 is in progress and they will be included in the next release of JuliaPro.
  5. JuliaCon 2017 was the biggest and most successful JuliaCon yet. It featured more than 300 participants and
    presenters, including presentations on how Julia is being used for deep learning, quantitative finance, energy,
    astrophysics, agriculture, medicine and more. Presentation videos are available on YouTube.

  6. Julia Computing Funding and Grant Announcements: Julia Computing has announced completion of our first round of
    seed funding and a significant grant from the
    Sloan Foundation which includes dedicated funding
    to promote diversity in the Julia community.

  7. Julia and Julia Computing in the News: There has been a huge increase in Julia and Julia Computing
    news mentions so far this year, consistent with significant increases in Julia
    adoption.

  8. Julia Case Studies: Several exciting new Julia case studies are available on the Julia Computing Website
    including:

    • San Jose Semaphore – When Science Meets Art:
      High school math teacher Jimmy Waters from Powell, Tennessee used Julia to solve a Silicon Valley cryptography
      stumper that had gone unsolved for nearly 5 years.
    • Milk Output Optimizer (MOO): Oscar Dawson from the University of Auckland Electric Power Optimization Centre (EPOC) is using Julia to optimize dairy farming.
    • Mapping Global Genetic Diversity:
      Researchers are using Julia to conduct analysis and mapping of global genetic diversity. Their results have been
      published in Science.
    • Cancer Genomics: UK scientists used Julia to model tumor
      growth, informing interpretation of cancer genomes. Their results have been published in Nature Genetics.
  9. Contact Us: Please contact us at info@juliacomputing.com if you wish to:

    • Purchase or obtain license information for Julia products such as JuliaPro, JuliaRun, JuliaFin or JuliaBox.
    • Obtain pricing for Julia consulting projects for your enterprise.
    • Schedule Julia training for your organization.
    • Share information about exciting new Julia case studies or use cases.

About Julia and Julia Computing

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic
trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R,
Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed,
capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability
with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10
programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics,
genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One,
Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 10x for macroeconomic modeling.

  2. Julia provides unlimited scalability. Julia applications can be deployed on large clusters with a click of a button and can run parallel and distributed computing quickly and easily on tens of thousands of nodes.

  3. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  4. Julia integrates well with existing code and platforms. Users of C, C++, Python, R and other languages can
    easily integrate their existing code into Julia.

  5. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing. It
    has advanced libraries that make programming simple and fast and dramatically reduce the number of lines of code
    required – in some cases, by 90% or more.

  6. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R
    and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and
    reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and
provide support for businesses and researchers who use Julia.

]]>
3774
Parallelizing Distance Calculations Using A GPU With CUDAnative.jl http://www.juliabloggers.com/parallelizing-distance-calculations-using-a-gpu-with-cudanative-jl/ Mon, 14 Aug 2017 00:00:00 +0000 http://randyzwitch.com/cudanative-jl-julia/ Hacker News discussion: link

Code as Julia Jupyter Notebook

Julia has the reputation as a “fast” language in that it’s possible to write high-performing code. However, what I appreciate most about Julia is not just that the code is fast, but rather that Julia makes high-performance concepts accessible without having to have a deep computer science or compiled language background (neither of which I possess!)

For version 0.6 of Julia, another milestone has been reached in the “accessible” high-performance category: the ability to run Julia code natively on NVIDIA GPUs through the CUDAnative.jl package. While CUDAnative.jl is still very much in its development stages, the package is already far-enough along that within a few hours, as a complete beginner to GPU programming, I was able to see in excess of 20x speedups for my toy example to calculate haversine distance.

Getting Started

The CUDAnative.jl introduction blog post and documentation cover the installation process in-depth, so I won’t repeat the details here. I’m already a regular compile-from-source Julia user and I found the installation process pretty easy on my CUDA-enabled Ubuntu workstation. If you can already do TensorFlow, Keras or other GPU tutorials on your computer, getting CUDAnative.jl to work shouldn’t take more than 10-15 minutes.

Julia CPU Implementation

To get a feel for what sort of speedup I could expect from using a GPU, I wrote a naive implementation of a distance matrix calculation in Julia:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#https://github.com/quinnj/Rosetta-Julia/blob/master/src/Haversine.jl
haversine(lat1::Float32,lon1::Float32,lat2::Float32,lon2::Float32) = 2 * 6372.8 * asin(sqrt(sind((lat2-lat1)/2)^2 + cosd(lat1) * cosd(lat2) * sind((lon2 - lon1)/2)^2))

function pairwise_dist(lat::Vector{Float32}, lon::Vector{Float32})

    #Pre-allocate, since size is known
    n = length(lat)
    result = Array{Float32}(n, n)

    #Brute force fill in each cell, ignore that distance [i,j] = distance [j,i]
    for i in 1:n
        for j in 1:n
            @inbounds result[i, j] = haversine(lat[i], lon[i], lat[j], lon[j])
        end
    end

    return result

end

#Example benchmark call
lat10000 = rand(Float32, 10000) .* 45
lon10000 = rand(Float32, 10000) .* -120
@time native_julia_cellwise = pairwise_dist(lat10000, lon10000)

The above code takes a pair of lat/lon values, then calculates the haversine distance between the two points. This algorithm is naive in that a distance matrix is symmetric (i.e. the distance between A to B is the same from B to A), so I could’ve done half the work by setting result[i,j] and result[j,i] to the same value, but as a measure of work for a benchmark this toy example is fine. Also note that this implementation runs on a single core, no CPU-core-level parallelization has been implemented.

Or to put all that another way: if someone wanted to tackle this problem without thinking very hard, the implementation might look like this.

CUDAnative.jl Implementation

There are two parts to the CUDAnative.jl implementation: the kernel (i.e. the actual calculation) and the boilerplate code for coordinating the writing to/from the CPU and GPU.

Kernel Code

The kernel code has similarities to the CPU implementation, with a few key differences:

  • Method signature is one lat/lon point vs. the lat/lon vectors, rather than a pairwise distance calculation
  • Boilerplate code for thread index on the GPU (0-indexed vs. normal Julia 1-indexing)
  • The trigonometric functions need to be prepended with CUDAnative., to differentiate that the GPU functions aren’t the same as the functions from Base Julia
  • Rather than return an array as part of the function return, we use the out keyword argument to write directly to the GPU memory
1
2
3
4
5
6
7
8
9
10
11
12
13
14
using CUDAnative, CUDAdrv

#Calculate one point vs. all other points simultaneously
function kernel_haversine(latpoint::Float32, lonpoint::Float32, lat::AbstractVector{Float32}, lon::AbstractVector{Float32}, out::AbstractVector{Float32})

    #Thread index
    #Need to do the n-1 dance, since CUDA expects 0 and Julia does 1-indexing
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x

    out[i] =  2 * 6372.8 * CUDAnative.asin(CUDAnative.sqrt(CUDAnative.sind((latpoint-lat[i])/2)^2 + CUDAnative.cosd(lat[i]) * CUDAnative.cosd(latpoint) * CUDAnative.sind((lonpoint - lon[i])/2)^2))

    #Return nothing, since we're writing directly to the out array allocated on GPU
    return nothing
end

Coordination Code

The coordination code is similar to what you might see in a main() function in C or Java, where the kernel is applied to the input data. I am using the dev keyword with the default value of CuDevice(0) to indicate that the code should be run on the first (in my case, only) GPU device.

The remainder of the code has comments on its purpose, primarily:

  • Transfer Julia CPU arrays to GPU arrays (CuArray)
  • Set number of threads/blocks
  • Calculate distance between a point and all other points in the array, write back to CPU
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#validated kernel_haversine/distmat returns same answer as CPU haversine method (not shown)
function distmat(lat::Vector{Float32}, lon::Vector{Float32}; dev::CuDevice=CuDevice(0))

    #Create a context
    ctx = CuContext(dev)

    #Change to objects with CUDA context
    n = length(lat)
    d_lat = CuArray(lat)
    d_lon = CuArray(lon)
    d_out = CuArray(Vector{Float32}(n))

    #Calculate number of calculations, threads, blocks
    len = n
    threads = min(len, 1024)
    blocks = Int(ceil(len/threads))

    #Julia side accumulation of results to relieve GPU memory pressure
    accum = Array{Float32}(n, n)

    # run and time the test
    secs = CUDAdrv.@elapsed begin
        for i in 1:n
            @cuda (blocks, threads) kernel_haversine(lat[i], lon[i], d_lat, d_lon, d_out)
            accum[:, i] = Vector{Float32}(d_out)
        end
    end

    #Clean up context
    destroy!(ctx)

    #Return timing and bring results back to Julia
    return (secs, accum)

end

#Example benchmark call
timing, result = distmat(lat10000, lon10000)
result  native_julia_cellwise #validate results equivalent CPU and GPU

The code is written to process one row of the distance matrix at a time to minimize GPU memory usage. By writing out the results to the CPU after each loop iteration, I have n-1 extra CPU transfers, which is less performant than calculating all the distances first then transferring, but my consumer-grade GPU with 6GB of RAM would run out of GPU memory before completing the calculation otherwise.

Performance

The performance characteristics of the CPU and GPU calculations are below for various sizes of distance matrices. Having not done any GPU calculations before, I was surprised to see how much of a penalty there is writing back and forth to the GPU. As you can see from the navy-blue line, the execution time is fixed for matrices of size 1 to 1000, representing the fixed cost of moving the data from the CPU to the GPU.

Of course, once we get above 1000x1000 matrices, the GPU really starts to shine. Due to the log scale, it’s a bit hard to see the magnitude differences, but at 100000x100000 there is a 23x reduction in execution time (565.008s CPU vs. 24.32s GPU).

What I Learned

There are myriad things I learned from this project, but most important is that GPGPU processing can be accessible for people like myself without a CS background. Julia isn’t the first high-level language to provide CUDA functionality, but the fact that the code is so similar to native Julia makes GPU computing something I can include in my toolbox today.

Over time, I’m sure I’ll get better results as I learn more about CUDA, as CUDAnative.jl continues to smooth out the rough edges, etc. But the fact that as a beginner that I could achieve such large speedups in just an hour or two of coding and sparse CUDAnative.jl documentation bodes well for the future of GPU computing in Julia.

Code as Julia Jupyter Notebook

]]>
By: randyzwitch.com

Re-posted from: http://randyzwitch.com/cudanative-jl-julia/

Hacker News discussion: link

Code as Julia Jupyter Notebook

Julia has the reputation as a “fast” language in that it’s possible to write high-performing code. However, what I appreciate most about Julia is not just that the code is fast, but rather that Julia makes high-performance concepts accessible without having to have a deep computer science or compiled language background (neither of which I possess!)

For version 0.6 of Julia, another milestone has been reached in the “accessible” high-performance category: the ability to run Julia code natively on NVIDIA GPUs through the CUDAnative.jl package. While CUDAnative.jl is still very much in its development stages, the package is already far-enough along that within a few hours, as a complete beginner to GPU programming, I was able to see in excess of 20x speedups for my toy example to calculate haversine distance.

Getting Started

The CUDAnative.jl introduction blog post and documentation cover the installation process in-depth, so I won’t repeat the details here. I’m already a regular compile-from-source Julia user and I found the installation process pretty easy on my CUDA-enabled Ubuntu workstation. If you can already do TensorFlow, Keras or other GPU tutorials on your computer, getting CUDAnative.jl to work shouldn’t take more than 10-15 minutes.

Julia CPU Implementation

To get a feel for what sort of speedup I could expect from using a GPU, I wrote a naive implementation of a distance matrix calculation in Julia:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#https://github.com/quinnj/Rosetta-Julia/blob/master/src/Haversine.jl
haversine(lat1::Float32,lon1::Float32,lat2::Float32,lon2::Float32) = 2 * 6372.8 * asin(sqrt(sind((lat2-lat1)/2)^2 + cosd(lat1) * cosd(lat2) * sind((lon2 - lon1)/2)^2))

function pairwise_dist(lat::Vector{Float32}, lon::Vector{Float32})

    #Pre-allocate, since size is known
    n = length(lat)
    result = Array{Float32}(n, n)

    #Brute force fill in each cell, ignore that distance [i,j] = distance [j,i]
    for i in 1:n
        for j in 1:n
            @inbounds result[i, j] = haversine(lat[i], lon[i], lat[j], lon[j])
        end
    end

    return result

end

#Example benchmark call
lat10000 = rand(Float32, 10000) .* 45
lon10000 = rand(Float32, 10000) .* -120
@time native_julia_cellwise = pairwise_dist(lat10000, lon10000)

The above code takes a pair of lat/lon values, then calculates the haversine distance between the two points. This algorithm is naive in that a distance matrix is symmetric (i.e. the distance between A to B is the same from B to A), so I could’ve done half the work by setting result[i,j] and result[j,i] to the same value, but as a measure of work for a benchmark this toy example is fine. Also note that this implementation runs on a single core, no CPU-core-level parallelization has been implemented.

Or to put all that another way: if someone wanted to tackle this problem without thinking very hard, the implementation might look like this.

CUDAnative.jl Implementation

There are two parts to the CUDAnative.jl implementation: the kernel (i.e. the actual calculation) and the boilerplate code for coordinating the writing to/from the CPU and GPU.

Kernel Code

The kernel code has similarities to the CPU implementation, with a few key differences:

  • Method signature is one lat/lon point vs. the lat/lon vectors, rather than a pairwise distance calculation
  • Boilerplate code for thread index on the GPU (0-indexed vs. normal Julia 1-indexing)
  • The trigonometric functions need to be prepended with CUDAnative., to differentiate that the GPU functions aren’t the same as the functions from Base Julia
  • Rather than return an array as part of the function return, we use the out keyword argument to write directly to the GPU memory
1
2
3
4
5
6
7
8
9
10
11
12
13
14
using CUDAnative, CUDAdrv

#Calculate one point vs. all other points simultaneously
function kernel_haversine(latpoint::Float32, lonpoint::Float32, lat::AbstractVector{Float32}, lon::AbstractVector{Float32}, out::AbstractVector{Float32})

    #Thread index
    #Need to do the n-1 dance, since CUDA expects 0 and Julia does 1-indexing
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x

    out[i] =  2 * 6372.8 * CUDAnative.asin(CUDAnative.sqrt(CUDAnative.sind((latpoint-lat[i])/2)^2 + CUDAnative.cosd(lat[i]) * CUDAnative.cosd(latpoint) * CUDAnative.sind((lonpoint - lon[i])/2)^2))

    #Return nothing, since we're writing directly to the out array allocated on GPU
    return nothing
end

Coordination Code

The coordination code is similar to what you might see in a main() function in C or Java, where the kernel is applied to the input data. I am using the dev keyword with the default value of CuDevice(0) to indicate that the code should be run on the first (in my case, only) GPU device.

The remainder of the code has comments on its purpose, primarily:

  • Transfer Julia CPU arrays to GPU arrays (CuArray)
  • Set number of threads/blocks
  • Calculate distance between a point and all other points in the array, write back to CPU
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#validated kernel_haversine/distmat returns same answer as CPU haversine method (not shown)
function distmat(lat::Vector{Float32}, lon::Vector{Float32}; dev::CuDevice=CuDevice(0))

    #Create a context
    ctx = CuContext(dev)

    #Change to objects with CUDA context
    n = length(lat)
    d_lat = CuArray(lat)
    d_lon = CuArray(lon)
    d_out = CuArray(Vector{Float32}(n))

    #Calculate number of calculations, threads, blocks
    len = n
    threads = min(len, 1024)
    blocks = Int(ceil(len/threads))

    #Julia side accumulation of results to relieve GPU memory pressure
    accum = Array{Float32}(n, n)

    # run and time the test
    secs = CUDAdrv.@elapsed begin
        for i in 1:n
            @cuda (blocks, threads) kernel_haversine(lat[i], lon[i], d_lat, d_lon, d_out)
            accum[:, i] = Vector{Float32}(d_out)
        end
    end

    #Clean up context
    destroy!(ctx)

    #Return timing and bring results back to Julia
    return (secs, accum)

end

#Example benchmark call
timing, result = distmat(lat10000, lon10000)
result  native_julia_cellwise #validate results equivalent CPU and GPU

The code is written to process one row of the distance matrix at a time to minimize GPU memory usage. By writing out the results to the CPU after each loop iteration, I have n-1 extra CPU transfers, which is less performant than calculating all the distances first then transferring, but my consumer-grade GPU with 6GB of RAM would run out of GPU memory before completing the calculation otherwise.

Performance

The performance characteristics of the CPU and GPU calculations are below for various sizes of distance matrices. Having not done any GPU calculations before, I was surprised to see how much of a penalty there is writing back and forth to the GPU. As you can see from the navy-blue line, the execution time is fixed for matrices of size 1 to 1000, representing the fixed cost of moving the data from the CPU to the GPU.

Of course, once we get above 1000×1000 matrices, the GPU really starts to shine. Due to the log scale, it’s a bit hard to see the magnitude differences, but at 100000×100000 there is a 23x reduction in execution time (565.008s CPU vs. 24.32s GPU).

What I Learned

There are myriad things I learned from this project, but most important is that GPGPU processing can be accessible for people like myself without a CS background. Julia isn’t the first high-level language to provide CUDA functionality, but the fact that the code is so similar to native Julia makes GPU computing something I can include in my toolbox today.

Over time, I’m sure I’ll get better results as I learn more about CUDA, as CUDAnative.jl continues to smooth out the rough edges, etc. But the fact that as a beginner that I could achieve such large speedups in just an hour or two of coding and sparse CUDAnative.jl documentation bodes well for the future of GPU computing in Julia.

Code as Julia Jupyter Notebook

]]>
3771
SDIRK Methods http://www.juliabloggers.com/sdirk-methods/ Sun, 13 Aug 2017 01:30:00 +0000 http://juliadiffeq.org/2017/08/13/SDIRK.html This has been a very productive summer! Let me start by saying that a relative newcomer to the JuliaDiffEq team, David Widmann, has been doing some impressive work that has really expanded the internal capabilities of the ordinary and delay differential equation solvers. Much of the code has been streamlined due to his efforts which has helped increase our productivity, along with helping us identify and solve potential areas of floating point inaccuracies. In addition, in this release we are starting to roll out some of the results of the Google Summer of Code projects. Together, there’s some really exciting stuff!

]]>
By: JuliaDiffEq

Re-posted from: http://juliadiffeq.org/2017/08/13/SDIRK.html

This has been a very productive summer! Let me start by saying that a relative
newcomer to the JuliaDiffEq team, David Widmann, has been doing some impressive
work that has really expanded the internal capabilities of the ordinary and
delay differential equation solvers. Much of the code has been streamlined
due to his efforts which has helped increase our productivity, along with helping
us identify and solve potential areas of floating point inaccuracies. In addition,
in this release we are starting to roll out some of the results of the Google
Summer of Code projects. Together, there’s some really exciting stuff!

]]>
3765
Solving the code lock riddle with Julia http://www.juliabloggers.com/solving-the-code-lock-riddle-with-julia/ Sun, 06 Aug 2017 07:49:46 +0000 http://perfectionatic.org/?p=494 ]]> By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=494

I came across a neat math puzzle involving counting the number of unique combinations in a hypothetical lock where digit order does not count. Before you continue, please watch at least the first minute of following video:

The rest of the video describes two related approaches for carrying out the counting. Often when I run into complex counting problems, I like to do a sanity check using brute force computation to make sure I have not missed anything. Julia is fantastic choice for doing such computation. It has C like speed, and with an expressiveness that rivals many other high level languages.

Without further ado, here is the Julia code I used to verify my solution the problem.

  1. function unique_combs(n=4)
  2.     pat_lookup=Dict{String,Bool}()
  3.     for i=0:10^n-1
  4.         d=digits(i,10,n) # The digits on an integer in an array with padding
  5.         ds=d |> sort |> join # putting the digits in a string after sorting
  6.         get(pat_lookup,ds,false) || (pat_lookup[ds]=true)
  7.     end
  8.     println("The number of unique digits is $(length(pat_lookup))")
  9. end

In line 2 we create a dictionary that we will be using to check if the number fits a previously seen pattern. The loop starting in line 3, examines all possible ordered combinations. The digits function in line 4 takes any integer and generate an array of its constituent digits. We generate the unique digit string in line 5 using pipes, by first sorting the integer array of digits and then combining them in a string. In line 6 we check if the pattern of digits was seen before and make use of quick short short-circuit evaluation to avoid an if-then statement.

]]>
3761
Intro to Machine Learning with TensorFlow.jl http://www.juliabloggers.com/intro-to-machine-learning-with-tensorflow-jl/ Wed, 02 Aug 2017 00:00:00 +0000 http://white.ucc.asn.au/2017/08/02/Intro-to-Machine-Learning-with-TensorFlow.jl.html In this blog post, I am going to go through as series of neural network structures. This is intended as a demonstration of the more basic neural net functionality. This blog post serves as an accompanyment to the introduction to machine learning chapter of the short book I am writing ( Currently under the working title “Neural Network Representations for Natural Language Processing”)

I do have an earlier blog covering some similar topics. However, I exect the code in this one to be a lot more sensible, since I am now much more familar with TensorFlow.jl, having now written a significant chunk of it. Also MLDataUtils.jl is in different state to what it was.

Input:

using TensorFlow
using MLDataUtils
using MLDatasets

using ProgressMeter

using Base.Test
using Plots
gr()
using FileIO
using ImageCore

MNIST classifier

This is the most common benchmark for neural network classifiers. MNIST is a collection of hand written digits from 0 to 9. The task is to determine which digit is being shown. With neural networks this is done by flattening the images into vectors, and using one-hot encoded outputs with softmax.

Input:

"""Makes 1 hot, row encoded labels."""
onehot_encode_labels(labels_raw) = convertlabel(LabelEnc.OneOfK, labels_raw, LabelEnc.NativeLabels(collect(0:9)),  LearnBase.ObsDim.First())
"""Convert 3D matrix of row,column,observation to vector,observation"""
flatten_images(img_raw) = squeeze(mapslices(vec, img_raw,1:2),2)




@testset "data prep" begin
    @test onehot_encode_labels([4,1,2,3,0]) == [0 0 0 0 1 0 0 0 0 0
                                  0 1 0 0 0 0 0 0 0 0
                                  0 0 1 0 0 0 0 0 0 0
                                  0 0 0 1 0 0 0 0 0 0
                                  1 0 0 0 0 0 0 0 0 0]
    
    data_b1 = flatten_images(MNIST.traintensor())
    @test size(data_b1) == (28*28, 60_000)
    labels_b1 = onehot_encode_labels(MNIST.trainlabels())
    @test size(labels_b1) == (60_000, 10)
end;

Output:

Test Summary: | Pass  Total
data prep     |    3      3

A visualisation of one of the examples from MNIST. Code is a little complex because of the unflattening, and adding a border.

Input:

const frames_image_res = 30

"Convests a image vector into a framed 2D image"
function one_image(img::Vector)
    ret = zeros((frames_image_res, frames_image_res))
    ret[2:end-1, 2:end-1] = 1-rotl90(reshape(img, (28,28)))
    ret
end

train_images=flatten_images(MNIST.traintensor())
heatmap(one_image(train_images[:,10]))

Output:

10203010203000.10.20.30.40.50.60.70.80.91.0

In this basic example we use a traditional sigmoid feed-forward neural net. It uses just a single wide hidden layer. It works surprisingly well compaired to early benchmarks. This is becuase the layer is very wide compaired to what was possible 30 years ago.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/mnist-basic.png")

Output:

png

Input:

sess = Session(Graph())
@tf begin
    X = placeholder(Float32, shape=[-1, 28*28])
    Y = placeholder(Float32, shape=[-1, 10])

    W1 = get_variable([28*28, 1024], Float32)
    b1 = get_variable([1024], Float32)
    Z1 = nn.sigmoid(X*W1 + b1)

    W2 = get_variable([1024, 10], Float32)
    b2 = get_variable([10], Float32)
    Z2 = Z1*W2 + b2 # Affine layer on its own, to get the unscaled logits
    Y_probs = nn.softmax(Z2)

    losses = nn.softmax_cross_entropy_with_logits(;logits=Z2, labels=Y) #This loss function takes the unscaled logits
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 18:53:18.598588: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.598620: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.598626: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.789486: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-02 18:53:18.789997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.42GiB
2017-08-02 18:53:18.790010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-02 18:53:18.790016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-02 18:53:18.790027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

Train

We use normal minibatch training with Adam. We do use relatively large minibatches, as that gets best performance advantage on GPU, by minimizing memory transfers. A more advanced implementation might do a the batching within Tensorflow, rather than batching outside tensorflow and invoking it via run.

Input:

traindata = (flatten_images(MNIST.traintensor()), onehot_encode_labels(MNIST.trainlabels()))
run(sess, global_variables_initializer())


basic_train_loss = Float64[]
@showprogress for epoch in 1:100
    epoch_loss = Float64[]
    for (batch_x, batch_y) in eachbatch(traindata, 1000, (ObsDim.Last(), ObsDim.First()))
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x', Y=>batch_y))
        push!(epoch_loss, loss_o)
    end
    push!(basic_train_loss, mean(epoch_loss))
    #println("Epoch $epoch: $(train_loss[end])")
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:25

Input:

plot(basic_train_loss, label="training loss")

Output:

2550751000.250.500.751.00training loss

Test

Input:

testdata_x = flatten_images(MNIST.testtensor())
testdata_y = onehot_encode_labels(MNIST.testlabels())

y_probs_o = run(sess, Y_probs, Dict(X=>testdata_x'))
acc = mean(mapslices(indmax, testdata_y, 2) .== mapslices(indmax, y_probs_o, 2) )

println("Error Rate: $((1-acc)*100)%")

Output:

Error Rate: 1.9299999999999984%

Advanced MNIST classifier

Here we will use more advanced TensorFlow features, like indmax, and also a more advanced network.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/mnist-advanced.png")

Output:

png

Input:

sess = Session(Graph())

# Network Definition
begin
    X = placeholder(Float32, shape=[-1, 28*28])
    Y = placeholder(Float32, shape=[-1])
    KeepProb = placeholder(Float32, shape=[])
    
    # Network parameters
    hl_sizes = [512, 512, 512]
    activation_functions = Vector{Function}(size(hl_sizes))
    activation_functions[1:end-1]=z->nn.dropout(nn.relu(z), KeepProb)
    activation_functions[end] = identity #Last function should be idenity as we need the logits

    Zs = [X]
    for (ii,(hlsize, actfun)) in enumerate(zip(hl_sizes, activation_functions))
        Wii = get_variable("W_$ii", [get_shape(Zs[end], 2), hlsize], Float32)
        bii = get_variable("b_$ii", [hlsize], Float32)
        Zii = actfun(Zs[end]*Wii + bii)
        push!(Zs, Zii)
    end
    
    Y_probs = nn.softmax(Zs[end])
    Y_preds = indmax(Y_probs,2)-1 # Minus 1, to offset 1 based indexing

    losses = nn.sparse_softmax_cross_entropy_with_logits(;logits=Zs[end], labels=Y+1) # Plus 1, to offset 1 based indexing 
    #This loss function takes the unscaled logits, and the numerical labels
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 19:27:57.180945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

Train

Input:

traindata_x = flatten_images(MNIST.traintensor())
normer = fit(FeatureNormalizer, traindata_x)
predict!(normer, traindata_x); # perhaps oddly, in current version of MLDataUtils the Normalizer commands to normalize is `predict`

traindata_y = Int.(MNIST.trainlabels());

Input:

run(sess, global_variables_initializer())
adv_train_loss = Float64[]
@showprogress for epoch in 1:100
    epoch_loss = Float64[]
    for (batch_x, batch_y) in eachbatch((traindata_x, traindata_y), 1000, ObsDim.Last())
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x', Y=>batch_y, KeepProb=>0.5f0))
        push!(epoch_loss, loss_o)
    end
    push!(adv_train_loss, mean(epoch_loss))
    #println("Epoch $epoch: $(train_loss[end])")
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:10

Input:

plot([basic_train_loss, adv_train_loss], label=["basic", "advanced"])

Output:

2550751000.51.0basicadvanced

Test

Input:

testdata_x = predict!(normer, flatten_images(MNIST.testtensor()))
testdata_y = Int.(MNIST.testlabels());

y_preds_o = run(sess, Y_preds, Dict(X=>testdata_x', KeepProb=>1.0f0))
acc = mean(testdata_y .== y_preds_o )

println("Error Rate: $((1-acc)*100)%")

Output:

Error Rate: 1.770000000000005%

It can be seen that overall all the extra stuff done in the advanced model did not gain much. The margin is small enough that it can be attributed to in part to luck – repeating it can do better or worse depending on the random initialisations. Classifying MNIST is perhaps too simpler problem for deep techneques to pay off.

Bottle-knecking Autoencoder

An autoencoder is a neural network designed to recreate its inputs. There are many varieties, include RBMs, DBNs, SDAs, mSDAs, VAEs. This is one of the simplest being based on just a feedforward neural network.

The network narrows into to a very small central layer – in this case just 2 neurons, before exampanding back to the full size. It is sometimes called a Hour-glass, or Wine-glass autoencoder to describe this shape.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/autoencoder.png")

Output:

png

Input:

sess = Session(Graph())

# Network Definition

begin
    X = placeholder(Float32, shape=[-1, 28*28])
    
    # Network parameters
    hl_sizes = [512, 128, 64, 2, 64, 128, 512, 28*28]
    activation_functions = Vector{Function}(size(hl_sizes))
    activation_functions[1:end-1] = x -> 0.01x + nn.relu6(x)
        # Neither sigmoid, nor relu work anywhere near as well here
        # relu6 works sometimes, but the hidden neurons die too often
        # So we define a leaky ReLU6 as above
    activation_functions[end] = nn.sigmoid #Between 0 and 1


    Zs = [X]
    for (ii,(hlsize, actfun)) in enumerate(zip(hl_sizes, activation_functions))
        Wii = get_variable("W_$ii", [get_shape(Zs[end], 2), hlsize], Float32)
        bii = get_variable("b_$ii", [hlsize], Float32)
        Zii = actfun(Zs[end]*Wii + bii)
        push!(Zs, Zii)
    end
    
    
    Z_code = Zs[end÷2 + 1] # A name for the coding layer
    has_died = reduce_any(reduce_all(Z_code.==0f0, axis=2))
    
    losses = 0.5(Zs[end]-X)^2
    
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 19:52:56.573001: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.573045: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.573054: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.810042: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-02 19:52:56.810561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.42GiB
2017-08-02 19:52:56.810575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-02 19:52:56.810584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-02 19:52:56.810602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

The choice of activation function here, is (as mentioned in the comments) a bit special. On this particular problem, as a deep network, sigmoid was not going well presumably because of the exploding/vanishing gradient issue that normally cases it to not work out (though I did not check).

Switching to ReLU did not help, though I now suspect I didn’t give it enough tries. ReLU6 worked great the first few tries, but coming back to it later, and I found I couldn’t get it to train because one or both of the hidden units would die, which I did see the first times I trained it but not as commonly.

The trick to make this never happen was to allow the units to turn themselves back on. This is done by providing a non-zero gradient for the off-states. A leaky RELU6 unit. Mathematically it is given by

Training

Input:

train_images = flatten_images(MNIST.traintensor())
test_images = flatten_images(MNIST.testtensor());

Input:

run(sess, global_variables_initializer())
auto_loss = Float64[]
@showprogress for epoch in 1:75
    epoch_loss = Float64[]
    for batch_x in eachbatch(train_images, 1_000, ObsDim.Last())
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x'))
        push!(epoch_loss, loss_o)
    end
    push!(auto_loss, mean(epoch_loss))
    #println("Epoch $epoch loss: $(auto_loss[end])")
    
    ### Check to see if it died
    if run(sess, has_died, Dict(X=>train_images'))
        error("Neuron in hidden layer has died, must reinitialize.")
    end
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:51

Input:

plot([auto_loss], label="Autoencoder Loss")

Output:

2040600.020.040.06Autoencoder Loss

Input:

function reconstruct(img::Vector)
    run(sess, Zs[end], Dict(X=>reshape(img, (1,28*28))))[:]
end

Output:

reconstruct (generic function with 1 method)

Input:

id = 120
heatmap([one_image(train_images[:,id]) one_image(reconstruct(train_images[:,id]))])

Output:

20406010203000.10.20.30.40.50.60.70.80.91.0

Input:

id = 1
heatmap([one_image(test_images[:,id]) one_image(reconstruct(test_images[:,id]))])

Output:

20406010203000.10.20.30.40.50.60.70.80.91.0

Visualising similarity

One of the key uses of an autoencoder such as this is to project from a the high dimentional space of the inputs, to the low dimentional space of the code layer.

Input:

function scatter_image(images, res)
    canvas = ones(res, res)
    
    codes = run(sess, Z_code, Dict(X=>images'))
    codes = (codes .- minimum(codes))./(maximum(codes)-minimum(codes))
    @assert(minimum(codes) >= 0.0)
    @assert(maximum(codes) <= 1.0)
    
    function target_area(code)
        central_res = res-frames_image_res-1
        border_offset = frames_image_res/2 + 1
        x,y = code*central_res + border_offset
        
        get_pos(v) = round(Int, v-frames_image_res/2)
        x_min = get_pos(x)
        x_max = x_min + frames_image_res-1
        y_min =  get_pos(y)
        y_max = y_min + frames_image_res-1
        
        @view canvas[x_min:x_max, y_min:y_max]
    end
    
    for ii in 1:size(codes, 1)
        code = codes[ii,:]
        img = images[:,ii]
        area = target_area(code)        
        any(area.<1) && continue # Don't draw over anything
        area[:] = one_image(img)
    end
    canvas
end
heatmap(scatter_image(test_images, 700))

Output:

20040060020040060000.10.20.30.40.50.60.70.80.91.0

Input:

heatmap(scatter_image(test_images, 4_000))
savefig("mnist_scatter.pdf")

A high-resolution PDF with more numbers shown can be downloaded from here

So the position of each digit shown on the scatter-plot is given by the level of activation of the coding layer neurons. Which are basically a compressed repressentation of the image.

We can see not only are the images roughly grouped acording to their number, they are also positioned accoridng to appeance. In the top-right it can be seen arrayed are all the ones. With there posistion (seemingly) determined by the slant. Other numbers with similarly slanted potions are positioned near them. The implict repressentation found using the autoencoder unviels hidden properties of the images.

Conclusion

We have presented a few fairly basic neural network models. Hopefully, the techneques shown encourage you to experiment further with machine learning with Julia, and TensorFlow.jl.

]]>
By: A Technical Blog -- julia

Re-posted from: http://white.ucc.asn.au/2017/08/02/Intro-to-Machine-Learning-with-TensorFlow.jl.html

In this blog post, I am going to go through as series of neural network structures.
This is intended as a demonstration of the more basic neural net functionality.
This blog post serves as an accompanyment to the introduction to machine learning chapter of the short book I am writing (
Currently under the working title “Neural Network Representations for Natural Language Processing”)

I do have an earlier blog covering some similar topics.
However, I exect the code in this one to be a lot more sensible,
since I am now much more familar with TensorFlow.jl, having now written a significant chunk of it.
Also MLDataUtils.jl is in different state to what it was.

Input:

using TensorFlow
using MLDataUtils
using MLDatasets

using ProgressMeter

using Base.Test
using Plots
gr()
using FileIO
using ImageCore

MNIST classifier

This is the most common benchmark for neural network classifiers.
MNIST is a collection of hand written digits from 0 to 9.
The task is to determine which digit is being shown.
With neural networks this is done by flattening the images into vectors,
and using one-hot encoded outputs with softmax.

Input:

"""Makes 1 hot, row encoded labels."""
onehot_encode_labels(labels_raw) = convertlabel(LabelEnc.OneOfK, labels_raw, LabelEnc.NativeLabels(collect(0:9)),  LearnBase.ObsDim.First())
"""Convert 3D matrix of row,column,observation to vector,observation"""
flatten_images(img_raw) = squeeze(mapslices(vec, img_raw,1:2),2)




@testset "data prep" begin
    @test onehot_encode_labels([4,1,2,3,0]) == [0 0 0 0 1 0 0 0 0 0
                                  0 1 0 0 0 0 0 0 0 0
                                  0 0 1 0 0 0 0 0 0 0
                                  0 0 0 1 0 0 0 0 0 0
                                  1 0 0 0 0 0 0 0 0 0]
    
    data_b1 = flatten_images(MNIST.traintensor())
    @test size(data_b1) == (28*28, 60_000)
    labels_b1 = onehot_encode_labels(MNIST.trainlabels())
    @test size(labels_b1) == (60_000, 10)
end;

Output:

Test Summary: | Pass  Total
data prep     |    3      3

A visualisation of one of the examples from MNIST.
Code is a little complex because of the unflattening, and adding a border.

Input:

const frames_image_res = 30

"Convests a image vector into a framed 2D image"
function one_image(img::Vector)
    ret = zeros((frames_image_res, frames_image_res))
    ret[2:end-1, 2:end-1] = 1-rotl90(reshape(img, (28,28)))
    ret
end

train_images=flatten_images(MNIST.traintensor())
heatmap(one_image(train_images[:,10]))

Output:



















10


20


30


10


20


30













0


0.1


0.2


0.3


0.4


0.5


0.6


0.7


0.8


0.9


1.0

In this basic example we use a traditional sigmoid feed-forward neural net.
It uses just a single wide hidden layer.
It works surprisingly well compaired to early benchmarks.
This is becuase the layer is very wide compaired to what was possible 30 years ago.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/mnist-basic.png")

Output:

png

Input:

sess = Session(Graph())
@tf begin
    X = placeholder(Float32, shape=[-1, 28*28])
    Y = placeholder(Float32, shape=[-1, 10])

    W1 = get_variable([28*28, 1024], Float32)
    b1 = get_variable([1024], Float32)
    Z1 = nn.sigmoid(X*W1 + b1)

    W2 = get_variable([1024, 10], Float32)
    b2 = get_variable([10], Float32)
    Z2 = Z1*W2 + b2 # Affine layer on its own, to get the unscaled logits
    Y_probs = nn.softmax(Z2)

    losses = nn.softmax_cross_entropy_with_logits(;logits=Z2, labels=Y) #This loss function takes the unscaled logits
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 18:53:18.598588: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.598620: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.598626: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 18:53:18.789486: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-02 18:53:18.789997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.42GiB
2017-08-02 18:53:18.790010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-02 18:53:18.790016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-02 18:53:18.790027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

Train

We use normal minibatch training with Adam.
We do use relatively large minibatches, as that gets best performance advantage on GPU,
by minimizing memory transfers.
A more advanced implementation might do a the batching within Tensorflow,
rather than batching outside tensorflow and invoking it via run.

Input:

traindata = (flatten_images(MNIST.traintensor()), onehot_encode_labels(MNIST.trainlabels()))
run(sess, global_variables_initializer())


basic_train_loss = Float64[]
@showprogress for epoch in 1:100
    epoch_loss = Float64[]
    for (batch_x, batch_y) in eachbatch(traindata, 1000, (ObsDim.Last(), ObsDim.First()))
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x', Y=>batch_y))
        push!(epoch_loss, loss_o)
    end
    push!(basic_train_loss, mean(epoch_loss))
    #println("Epoch $epoch: $(train_loss[end])")
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:25

Input:

plot(basic_train_loss, label="training loss")

Output:



















25


50


75


100


0.25


0.50


0.75


1.00


training loss

Test

Input:

testdata_x = flatten_images(MNIST.testtensor())
testdata_y = onehot_encode_labels(MNIST.testlabels())

y_probs_o = run(sess, Y_probs, Dict(X=>testdata_x'))
acc = mean(mapslices(indmax, testdata_y, 2) .== mapslices(indmax, y_probs_o, 2) )

println("Error Rate: $((1-acc)*100)%")

Output:

Error Rate: 1.9299999999999984%

Advanced MNIST classifier

Here we will use more advanced TensorFlow features, like indmax,
and also a more advanced network.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/mnist-advanced.png")

Output:

png

Input:

sess = Session(Graph())

# Network Definition
begin
    X = placeholder(Float32, shape=[-1, 28*28])
    Y = placeholder(Float32, shape=[-1])
    KeepProb = placeholder(Float32, shape=[])
    
    # Network parameters
    hl_sizes = [512, 512, 512]
    activation_functions = Vector{Function}(size(hl_sizes))
    activation_functions[1:end-1]=z->nn.dropout(nn.relu(z), KeepProb)
    activation_functions[end] = identity #Last function should be idenity as we need the logits

    Zs = [X]
    for (ii,(hlsize, actfun)) in enumerate(zip(hl_sizes, activation_functions))
        Wii = get_variable("W_$ii", [get_shape(Zs[end], 2), hlsize], Float32)
        bii = get_variable("b_$ii", [hlsize], Float32)
        Zii = actfun(Zs[end]*Wii + bii)
        push!(Zs, Zii)
    end
    
    Y_probs = nn.softmax(Zs[end])
    Y_preds = indmax(Y_probs,2)-1 # Minus 1, to offset 1 based indexing

    losses = nn.sparse_softmax_cross_entropy_with_logits(;logits=Zs[end], labels=Y+1) # Plus 1, to offset 1 based indexing 
    #This loss function takes the unscaled logits, and the numerical labels
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 19:27:57.180945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

Train

Input:

traindata_x = flatten_images(MNIST.traintensor())
normer = fit(FeatureNormalizer, traindata_x)
predict!(normer, traindata_x); # perhaps oddly, in current version of MLDataUtils the Normalizer commands to normalize is `predict`

traindata_y = Int.(MNIST.trainlabels());

Input:

run(sess, global_variables_initializer())
adv_train_loss = Float64[]
@showprogress for epoch in 1:100
    epoch_loss = Float64[]
    for (batch_x, batch_y) in eachbatch((traindata_x, traindata_y), 1000, ObsDim.Last())
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x', Y=>batch_y, KeepProb=>0.5f0))
        push!(epoch_loss, loss_o)
    end
    push!(adv_train_loss, mean(epoch_loss))
    #println("Epoch $epoch: $(train_loss[end])")
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:10

Input:

plot([basic_train_loss, adv_train_loss], label=["basic", "advanced"])

Output:



















25


50


75


100


0.5


1.0


basic


advanced

Test

Input:

testdata_x = predict!(normer, flatten_images(MNIST.testtensor()))
testdata_y = Int.(MNIST.testlabels());

y_preds_o = run(sess, Y_preds, Dict(X=>testdata_x', KeepProb=>1.0f0))
acc = mean(testdata_y .== y_preds_o )

println("Error Rate: $((1-acc)*100)%")

Output:

Error Rate: 1.770000000000005%

It can be seen that overall all the extra stuff done in the advanced model did not gain much.
The margin is small enough that it can be attributed to in part to luck – repeating it can do better or worse depending on the random initialisations.
Classifying MNIST is perhaps too simpler problem for deep techneques to pay off.

Bottle-knecking Autoencoder

An autoencoder is a neural network designed to recreate its inputs.
There are many varieties, include RBMs, DBNs, SDAs, mSDAs, VAEs.
This is one of the simplest being based on just a feedforward neural network.

The network narrows into to a very small central layer – in this case just 2 neurons,
before exampanding back to the full size.
It is sometimes called a Hour-glass, or Wine-glass autoencoder to describe this shape.

Input:

load("Intro\ to\ Machine\ Learning\ with\ Tensorflow.jl/autoencoder.png")

Output:

png

Input:

sess = Session(Graph())

# Network Definition

begin
    X = placeholder(Float32, shape=[-1, 28*28])
    
    # Network parameters
    hl_sizes = [512, 128, 64, 2, 64, 128, 512, 28*28]
    activation_functions = Vector{Function}(size(hl_sizes))
    activation_functions[1:end-1] = x -> 0.01x + nn.relu6(x)
        # Neither sigmoid, nor relu work anywhere near as well here
        # relu6 works sometimes, but the hidden neurons die too often
        # So we define a leaky ReLU6 as above
    activation_functions[end] = nn.sigmoid #Between 0 and 1


    Zs = [X]
    for (ii,(hlsize, actfun)) in enumerate(zip(hl_sizes, activation_functions))
        Wii = get_variable("W_$ii", [get_shape(Zs[end], 2), hlsize], Float32)
        bii = get_variable("b_$ii", [hlsize], Float32)
        Zii = actfun(Zs[end]*Wii + bii)
        push!(Zs, Zii)
    end
    
    
    Z_code = Zs[end÷2 + 1] # A name for the coding layer
    has_died = reduce_any(reduce_all(Z_code.==0f0, axis=2))
    
    losses = 0.5(Zs[end]-X)^2
    
    loss = reduce_mean(losses)
    optimizer = train.minimize(train.AdamOptimizer(), loss)
end

Output:

2017-08-02 19:52:56.573001: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.573045: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.573054: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-02 19:52:56.810042: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-02 19:52:56.810561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.42GiB
2017-08-02 19:52:56.810575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-02 19:52:56.810584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-02 19:52:56.810602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
<Tensor Group:1 shape=unknown dtype=Any>

The choice of activation function here, is (as mentioned in the comments) a bit special.
On this particular problem, as a deep network, sigmoid was not going well presumably because of the exploding/vanishing gradient issue that normally cases it to not work out (though I did not check).

Switching to ReLU did not help, though I now suspect I didn’t give it enough tries.
ReLU6 worked great the first few tries, but coming back to it later,
and I found I couldn’t get it to train because one or both of the hidden units would die,
which I did see the first times I trained it but not as commonly.

The trick to make this never happen was to allow the units to turn themselves back on.
This is done by providing a non-zero gradient for the off-states.
A leaky RELU6 unit.
Mathematically it is given by

Training

Input:

train_images = flatten_images(MNIST.traintensor())
test_images = flatten_images(MNIST.testtensor());

Input:

run(sess, global_variables_initializer())
auto_loss = Float64[]
@showprogress for epoch in 1:75
    epoch_loss = Float64[]
    for batch_x in eachbatch(train_images, 1_000, ObsDim.Last())
        loss_o, _ = run(sess, (loss, optimizer), Dict(X=>batch_x'))
        push!(epoch_loss, loss_o)
    end
    push!(auto_loss, mean(epoch_loss))
    #println("Epoch $epoch loss: $(auto_loss[end])")
    
    ### Check to see if it died
    if run(sess, has_died, Dict(X=>train_images'))
        error("Neuron in hidden layer has died, must reinitialize.")
    end
end

Output:

Progress: 100%|█████████████████████████████████████████| Time: 0:01:51

Input:

plot([auto_loss], label="Autoencoder Loss")

Output:



















20


40


60


0.02


0.04


0.06


Autoencoder Loss

Input:

function reconstruct(img::Vector)
    run(sess, Zs[end], Dict(X=>reshape(img, (1,28*28))))[:]
end

Output:

reconstruct (generic function with 1 method)

Input:

id = 120
heatmap([one_image(train_images[:,id]) one_image(reconstruct(train_images[:,id]))])

Output:



















20


40


60


10


20


30













0


0.1


0.2


0.3


0.4


0.5


0.6


0.7


0.8


0.9


1.0

Input:

id = 1
heatmap([one_image(test_images[:,id]) one_image(reconstruct(test_images[:,id]))])

Output:



















20


40


60


10


20


30













0


0.1


0.2


0.3


0.4


0.5


0.6


0.7


0.8


0.9


1.0

Visualising similarity

One of the key uses of an autoencoder such as this is to project from a the high dimentional space of the inputs, to the low dimentional space of the code layer.

Input:

function scatter_image(images, res)
    canvas = ones(res, res)
    
    codes = run(sess, Z_code, Dict(X=>images'))
    codes = (codes .- minimum(codes))./(maximum(codes)-minimum(codes))
    @assert(minimum(codes) >= 0.0)
    @assert(maximum(codes) <= 1.0)
    
    function target_area(code)
        central_res = res-frames_image_res-1
        border_offset = frames_image_res/2 + 1
        x,y = code*central_res + border_offset
        
        get_pos(v) = round(Int, v-frames_image_res/2)
        x_min = get_pos(x)
        x_max = x_min + frames_image_res-1
        y_min =  get_pos(y)
        y_max = y_min + frames_image_res-1
        
        @view canvas[x_min:x_max, y_min:y_max]
    end
    
    for ii in 1:size(codes, 1)
        code = codes[ii,:]
        img = images[:,ii]
        area = target_area(code)        
        any(area.<1) && continue # Don't draw over anything
        area[:] = one_image(img)
    end
    canvas
end
heatmap(scatter_image(test_images, 700))

Output:



















200


400


600


200


400


600













0


0.1


0.2


0.3


0.4


0.5


0.6


0.7


0.8


0.9


1.0

Input:

heatmap(scatter_image(test_images, 4_000))
savefig("mnist_scatter.pdf")

A high-resolution PDF with more numbers shown can be downloaded from here

So the position of each digit shown on the scatter-plot is given by the level of activation of the coding layer neurons.
Which are basically a compressed repressentation of the image.

We can see not only are the images roughly grouped acording to their number,
they are also positioned accoridng to appeance.
In the top-right it can be seen arrayed are all the ones.
With there posistion (seemingly) determined by the slant.
Other numbers with similarly slanted potions are positioned near them.
The implict repressentation found using the autoencoder unviels hidden properties of the images.

Conclusion

We have presented a few fairly basic neural network models.
Hopefully, the techneques shown encourage you to experiment further with machine learning with Julia, and TensorFlow.jl.

]]>
3756
Introduction to the suite of JuliaFin packages (part 1) – Miletus.jl http://www.juliabloggers.com/introduction-to-the-suite-of-juliafin-packages-part-1-miletus-jl/ Wed, 26 Jul 2017 00:00:00 +0000 http://juliacomputing.com/blog/2017/07/26/juliafin-miletus As we roll out JuliaPro v0.6.0.1 (now supporting Julia 0.6), we see it as the best time to introduce our old users and new to our product JuliaFin, and also briefly introduce the enhanced suite of Julia packages it is now shipped with.

JuliaFin is a specialized Julia Computing product used in asset management, risk management, algorithmic trading, backtesting, and many other areas of computational finance, including the modelling of financial contracts.

Miletus.jl

Miletus is one of the core components of JuliaFin. Essentially a Domain Specific Language written on top of Julia, it can be used for financial contract definition and modelling. It is also used as a valuation framework.

Financial contracts are typically modelled in a functional programming style with languages like Haskell or OCaml, and the implementation of the valuation processes is then done in a second language like C++ or Java. Miletus on the other hand, leverages on Julia’s strong type system and multiple dispatch capabilities to both express these contract primitive constructs and generate efficient valuation code, solving yet another “two language problem”.

Milteus lets you construct complex contracts with a combination of simple primitive components and operations.

Example Contract Definition and Valuation

In the example below, we define and value a contract for a European call option, defined both in terms of a set of primitive contract types, as well as convenient, high-level constructors.

The first step would be to import the Miletus library.

# Import the library
 In[1]: using Miletus
        import Miletus: When, Give, Receive, Buy, Both, At, Either, Zero

The type constructors provided by Miletus lets you perform a few basic operations.

# Receive an amount of 100 USD
 In[2]: x=Receive(100USD)
Out[2]: Amount
         └─100USD

# Pay is the opposite of Receive
 In[3]: x=Pay(100USD)
Out[3]: Give
         └─Amount
            └─100USD

# Models are constructed and valued on a generic SingleStock type
 In[4]: s=SingleStock()
Out[4]: SingleStock

This is where it really gets interesting. These basic primitives can now be combined into higher level operations.

# Acquisition of a stock by paying 100USD
 In[5]: x=Both(s, Pay(100USD))
Out[5]: Both
    	 ├─SingleStock
     	 └─Give
            └─Amount
               └─100USD

# Which is equivalent to buying the stock
 In[6]: x=Buy(s, 100USD)
Out[6]: Both
    	 ├─SingleStock
     	 └─Give
            └─Amount
               └─100USD

# The notion of optionality is expressed by a choice of two outcomes
 In[7]: x=Either(s, Zero())
Out[7]: Either
    	 ├─SingleStock
     	 └─Zero

Another important aspect of any contract is the notion of time. Below we define a temporal condition on which to receive a payment of 100USD.

 In[8]: x=When(At(Date("2017-12-25")), Receive(100USD))
Out[8]: When
          ├─{==}
     	    ├─DateObs
     	    └─2017-12-25
     	  └─Amount
	      └─100USD

Combining that temporal condition with optionality defines a basic European Call option.

 In[9]: x=When(At(Date("2017-12-25")), Either(Buy(s, 100USD), Zero()))
Out[9]: When
    	 ├─{==}
     	   ├─DateObs
    	   └─2017-12-25
     	 └─Either
            ├─Both
              ├─SingleStock
              └─Give
                 └─Amount
                   └─100USD
            └─Zero

 In[10]: eucall = EuropeanCall(Date("2017-12-25"), SingleStock(), 100USD)
Out[10]: When
    	  ├─{==}
     	    ├─DateObs
    	    └─2017-12-25
     	  └─Either
             ├─Both
               ├─SingleStock
               └─Give
                  └─Amount
                    └─100USD
             └─Zero

And that would just be the beginning of the things you can do with Miletus. Read the documentation for more, and if we really did get you started, try it out for yourself!

About Julia Computing and Julia

Julia Computing was founded in 2015 by the co-creators of the Julia language to provide support to businesses and researchers who use Julia.

Julia is the fastest modern high performance open source computing language for data and analytics. It combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of Java and C++. Julia delivers dramatic improvements in simplicity, speed, scalability, capacity and productivity. Julia provides parallel computing capabilities out of the box and literally infinite scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia adoption is growing rapidly in finance, energy, robotics, genomics and many other fields.

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/07/26/juliafin-miletus.html

As we roll out JuliaPro v0.6.0.1 (now supporting Julia 0.6), we see it as the best time to introduce our old users and new to our product JuliaFin, and also briefly introduce the enhanced suite of Julia packages it is now shipped with.

JuliaFin is a specialized Julia Computing product used in asset management, risk management, algorithmic trading, backtesting, and many other areas of computational finance, including the modelling of financial contracts.

Miletus.jl

Miletus is one of the core components of JuliaFin. Essentially a Domain Specific Language written on top of Julia, it can be used for financial contract definition and modelling. It is also used as a valuation framework.

Financial contracts are typically modelled in a functional programming style with languages like Haskell or OCaml, and the implementation of the valuation processes is then done in a second language like C++ or Java. Miletus on the other hand, leverages on Julia’s strong type system and multiple dispatch capabilities to both express these contract primitive constructs and generate efficient valuation code, solving yet another “two language problem”.

Milteus lets you construct complex contracts with a combination of simple primitive components and operations.

Example Contract Definition and Valuation

In the example below, we define and value a contract for a European call option, defined both in terms of a set of primitive contract types, as well as convenient, high-level constructors.

The first step would be to import the Miletus library.

# Import the library
 In[1]: using Miletus
        import Miletus: When, Give, Receive, Buy, Both, At, Either, Zero

The type constructors provided by Miletus lets you perform a few basic operations.

# Receive an amount of 100 USD
 In[2]: x=Receive(100USD)
Out[2]: Amount
         └─100USD

# Pay is the opposite of Receive
 In[3]: x=Pay(100USD)
Out[3]: Give
         └─Amount
            └─100USD

# Models are constructed and valued on a generic SingleStock type
 In[4]: s=SingleStock()
Out[4]: SingleStock

This is where it really gets interesting. These basic primitives can now be combined into higher level operations.

# Acquisition of a stock by paying 100USD
 In[5]: x=Both(s, Pay(100USD))
Out[5]: Both
    	 ├─SingleStock
     	 └─Give
            └─Amount
               └─100USD

# Which is equivalent to buying the stock
 In[6]: x=Buy(s, 100USD)
Out[6]: Both
    	 ├─SingleStock
     	 └─Give
            └─Amount
               └─100USD

# The notion of optionality is expressed by a choice of two outcomes
 In[7]: x=Either(s, Zero())
Out[7]: Either
    	 ├─SingleStock
     	 └─Zero

Another important aspect of any contract is the notion of time. Below we define a temporal condition on which to receive a payment of 100USD.

 In[8]: x=When(At(Date("2017-12-25")), Receive(100USD))
Out[8]: When
          ├─{==}
     	    ├─DateObs
     	    └─2017-12-25
     	  └─Amount
	      └─100USD

Combining that temporal condition with optionality defines a basic European Call option.

 In[9]: x=When(At(Date("2017-12-25")), Either(Buy(s, 100USD), Zero()))
Out[9]: When
    	 ├─{==}
     	   ├─DateObs
    	   └─2017-12-25
     	 └─Either
            ├─Both
              ├─SingleStock
              └─Give
                 └─Amount
                   └─100USD
            └─Zero

 In[10]: eucall = EuropeanCall(Date("2017-12-25"), SingleStock(), 100USD)
Out[10]: When
    	  ├─{==}
     	    ├─DateObs
    	    └─2017-12-25
     	  └─Either
             ├─Both
               ├─SingleStock
               └─Give
                  └─Amount
                    └─100USD
             └─Zero

And that would just be the beginning of the things you can do with Miletus. Read the documentation for more, and if we really did get you started, try it out for yourself!

About Julia Computing and Julia

Julia Computing was founded in 2015 by the co-creators of the Julia language to provide support to businesses and researchers who use Julia.

Julia is the fastest modern high performance open source computing language for data and analytics. It combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of Java and C++. Julia delivers dramatic improvements in simplicity, speed, scalability, capacity and productivity. Julia provides parallel computing capabilities out of the box and literally infinite scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia adoption is growing rapidly in finance, energy, robotics, genomics and many other fields.

]]>
3763
Julia features in Intel’s Parallel Universe Magazine http://www.juliabloggers.com/julia-features-in-intels-parallel-universe-magazine/ Tue, 25 Jul 2017 00:00:00 +0000 http://juliacomputing.com/blog/2017/07/25/julia-in-intel-parallel-universe Parallel Universe is Intel’s quarterly publication, known to cover stories on notable and latest innovations in the field of software development, from high performance computing to threading hybrid applications.

Julia received a special mention in the Editor’s (Henry A. Gabb, Senior Principal Engineer at Intel Corporation) letter of Issue 29, the latest publication that came out this week. Henry talks about using Julia and how it gave him startling performance gains, not just with numerically intensive applications, but also with string manipulation applications. He summed this positive mention up saying that Julia has his attention.

The main story on Julia, co-authored by Julia Computing’s Viral Shah, Ranjan Anantharaman and Prof. Alan Edelman, begins on page 23 of the magazine, and comprehensively covers an overview of the language and a summary of it’s powerful features. It also goes on to talk about how Julia solves the “two language problem”, about parallelization in Julia, and ends on a high with coverage on Project Celeste, a joint project of Julia Computing, Lawrence Berkeley Labs, Intel and UC Berkeley researchers, an ambitiously compute intensive project aiming to catalog a digital atlas of the universe.

The link to the magazine is available here

About Henry A. Gabb

Henry A. Gabb, Senior Principal Engineer at Intel Corporation, is a longtime high-performance and parallel computing practitioner and has published numerous articles on parallel programming. He was editor/coauthor of “Developing Multithreaded Applications: A Platform Consistent Approach” and was program manager of the Intel/ Microsoft Universal Parallel Computing Research Centers.

About Julia Computing and Julia

Julia Computing was founded in 2015 by the co-creators of the Julia language to provide support to businesses and researchers who use Julia.

Julia is the fastest modern high performance open source computing language for data and analytics. It combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of Java and C++. Julia delivers dramatic improvements in simplicity, speed, scalability, capacity and productivity. Julia provides parallel computing capabilities out of the box and literally infinite scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia adoption is growing rapidly in finance, energy, robotics, genomics and many other fields.

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/07/25/julia-in-intel-parallel-universe.html

Parallel Universe is Intel’s quarterly publication, known to cover stories on notable and latest innovations in the field of software development, from high performance computing to threading hybrid applications.

Julia received a special mention in the Editor’s (Henry A. Gabb, Senior Principal Engineer at Intel Corporation) letter of Issue 29, the latest publication that came out this week. Henry talks about using Julia and how it gave him startling performance gains, not just with numerically intensive applications, but also with string manipulation applications. He summed this positive mention up saying that Julia has his attention.

The main story on Julia, co-authored by Julia Computing’s Viral Shah, Ranjan Anantharaman and Prof. Alan Edelman, begins on page 23 of the magazine, and comprehensively covers an overview of the language and a summary of it’s powerful features. It also goes on to talk about how Julia solves the “two language problem”, about parallelization in Julia, and ends on a high with coverage on Project Celeste, a joint project of Julia Computing, Lawrence Berkeley Labs, Intel and UC Berkeley researchers, an ambitiously compute intensive project aiming to catalog a digital atlas of the universe.

The link to the magazine is available here

About Henry A. Gabb

Henry A. Gabb, Senior Principal Engineer at Intel Corporation, is a longtime high-performance and parallel
computing practitioner and has published numerous articles on parallel programming. He was editor/coauthor of
“Developing Multithreaded Applications: A Platform Consistent Approach” and was program manager of the Intel/
Microsoft Universal Parallel Computing Research Centers.

About Julia Computing and Julia

Julia Computing was founded in 2015 by the co-creators of the Julia language to provide support to businesses and researchers who use Julia.

Julia is the fastest modern high performance open source computing language for data and analytics. It combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of Java and C++. Julia delivers dramatic improvements in simplicity, speed, scalability, capacity and productivity. Julia provides parallel computing capabilities out of the box and literally infinite scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia adoption is growing rapidly in finance, energy, robotics, genomics and many other fields.

]]>
3747
Julia calling C: A more minimal example http://www.juliabloggers.com/julia-calling-c-a-more-minimal-example/ Fri, 14 Jul 2017 09:58:34 +0000 http://perfectionatic.org/?p=470 ]]> By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=470

Earlier I presented a minimal example of Julia calling C. It mimics how one would go about writing C code, wrapping it a library and then calling it from Julia. Today I came across and even more minimal way of doing that while reading an excellent blog on Julia’s syntactic loop fusion. Associated with the blog was notebook that explores the matter further.

Basically, you an write you C in a string and pass it directly to the compiler. It goes something like

C_code= """
       double mean(double a, double b) {
         return (a+b) / 2;
       }
       """
const Clib=tempname()
open(`gcc -fPIC -O3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
     print(f, C_code)
end

The tempname function generate a unique temporary file path. On my Linux system Clib will be string like "/tmp/juliaivzRkT". That path is used to generate a library name "/tmp/juliaivzRkT.so" which will then used in the ccall:

julia> x=ccall((:mean,Clib),Float64,(Float64,Float64),2.0,5.0)
3.5

This approach would be be recommended if are writing anything sophisticated in C. However, it fun to experiment with for short bits of C code that you might like to call from Julia. Saves you the hassle of creating a Makefile, compiling, etc…

]]>
3738
Julia in ecology: why multiple dispatch is good http://www.juliabloggers.com/julia-in-ecology-why-multiple-dispatch-is-good/ Mon, 10 Jul 2017 13:45:54 +0000 http://armchairecology.blog/?p=1095 Continue reading "Julia in ecology: why multiple dispatch is good"

]]>
By: Timothée Poisot

Re-posted from: https://armchairecology.blog/2017/07/10/julia-in-ecology-why-multiple-dispatch-is-good/

In what is going to be the most technical note so far, I will try to reflect on a few years of using the Julia programming language for computational ecology projects. In particular, I will discuss how multiple dispatch changed my life (for the better), and how it can be used to make ecological analyses streamlined. I will most likely add a few entries to this series during the fall, leading up to a class I will give in the winter.

But what is multiple dispatch?

Imagine a recipe that calls for onions, and you have left in the cupboard is shallots. You know that shallots are little delicate bundles of gustative pleasure, and so you cook them differently (butter and half a teaspoon of sugar), extra gently. And when it’s done, you add them to the rest of the ingredients. This is multiple dispatch.

In computer terms now, we can express this  recipe as the following pseudocode:

function cook(x::Onion)
   return fry(x, butter)
end

function cook(x::Shallot)
   return roast(x, butter, sugar)
end

If x is an onion, then we fry it. If it is a shallot, we roast it. The important point is that the interface is the same: no matter what x is, we can cook it.

And where is the ecology in that?

20170709_114207Let’s talk about species interaction networks! One of the things that has been bugging me for a while was that we have no good, common interface to analyze them. There are a variety of packages that are either specific to some types of networks, or specific to some measures, or (worth) both. Because there are many different types of ecological networks.

Or are there? In EcologicalNetwork.jl, I reduced them to a combination of two factors. Are they bipartite or unipartite, and are they quantitative, probabilistic, or deterministic.

In Julia, this can be explained by a number of types and unions of types, and this hierarchy allows to create a number of functions that have the same name, but behave in the correct way based on their input. For example, the number of species in a network is calculated differently if it is bipartite or unipartite:

function richness(N::Bipartite)
   return sum(size(N.A))
end

function richness(N::Unipartite)
   return size(N.A, 1)
end

Where this becomes more interesting, is when we start chaining functions. For example, we can take an empirical network, generate the probabilistic version for a null model, then generate replicates, and finally measure the nestedness on every replicate:

using EcologicalNetwork
ollerton() |> null2 |> nullmodel .|> (x) -> nodf(x)[1]

This lines takes advantage of the fact that each function will take the “right” decision based on the type of its input. Specifically, it goes this way: the empirical network is a bipartite and deterministic one. The null2 function generates a probabilistic network which is also bipartite. This is passed to nullmodel, which will generate a number of bipartite deterministic networks, all of them are then passed through  the nodf function to measure their nestedness.

And the resulting pipeline is also clear to read, and expresses what we want to do (how we do it is determined based on the types). As a consequence, we can have a much more general package for network analysis.

But why does this matter?

Because, in short, it lets us (and yes, there are other paradigms that let us do the same thing) express what we want to do. A good example would be measuring the diversity of an ecological community. Let’s say we have a site by species matrix, and this matrix has presence/absence data. We can measure diversity as the number of species as the sum of each row:

function diversity(x::Array{Bool,2})
   return sum(A, 2)
end

But if we have quantitative information, then we may want to apply Pielou’s measure on each row instead:

function diversity(x::Array{Number,2})
   return mapslices(pielou, x, 2)
end

In the case where we have a phylogenetic tree, then what about using PD?

function diversity(x::Array{Number,2}, t::PhyloTree)
 return mapslices(n -> pd(n, t), x, 2)
end

And so on and so forth. In all of these situations, we know that the same concept (diversity) means different things as a function of the context – and for this reason, we want to do different things.

I like  this approach because it lets me focus on the intent of what I want to do. The (still young) EcoJulia project led by Michael Krabbe Borregaard is an attempt to use some of the niftiest features of Julia to develop general interfaces to some types of ecological data. This is something I am really excited to see happen.

]]>
3736
Video Introduction to DifferentialEquations.jl http://www.juliabloggers.com/video-introduction-to-differentialequations-jl/ Fri, 07 Jul 2017 17:41:22 +0000 http://www.stochasticlifestyle.com/?p=685 ]]> By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/video-introduction-differentialequations-jl/

Videos can be much easier to follow than text (though they usually have fewer details!). So, here’s a video introduction to DifferentialEquations.jl from JuliaCon. In this talk I walk through the major features of DifferentialEquations.jl by walking through the the tutorials in the documentation, highlighting usage details and explaining how to properly think about the code. I hope this helps make it easier to adopt DifferentialEquations.jl!

The post Video Introduction to DifferentialEquations.jl appeared first on Stochastic Lifestyle.

]]>
3707
High Order Rosenbrock and Symplectic Methods http://www.juliabloggers.com/high-order-rosenbrock-and-symplectic-methods/ Fri, 07 Jul 2017 01:30:00 +0000 http://juliadiffeq.org/2017/07/07/SymplecticRosenbrock.html For awhile I have been saying that JuliaDiffEq really needs some fast high accuracy stiff solvers and symplectic methods to take it to the next level. I am happy to report that these features have arived, along with some other exciting updates. And yes, they benchmark really well. With new Rosenbrock methods specifically designed for stiff nonlinear parabolic PDE discretizations, SSPRK enhancements specifically for hyperbolic PDEs, and symplectic methods for Hamiltonian systems, physics can look at these release notes with glee. Here’s the full ecosystem release notes.

]]>
By: JuliaDiffEq

Re-posted from: http://juliadiffeq.org/2017/07/07/SymplecticRosenbrock.html

For awhile I have been saying that JuliaDiffEq really needs some fast high
accuracy stiff solvers and symplectic methods to take it to the next level.
I am happy to report that these features have arived, along with some other
exciting updates. And yes, they benchmark really well. With new Rosenbrock methods
specifically designed for stiff nonlinear parabolic PDE discretizations, SSPRK
enhancements specifically for hyperbolic PDEs, and symplectic methods for Hamiltonian
systems, physics can look at these release notes with glee. Here’s the full ecosystem
release notes.

]]>
3705
Solving the Fish Riddle with JuMP http://www.juliabloggers.com/solving-the-fish-riddle-with-jump/ Thu, 29 Jun 2017 19:58:33 +0000 http://perfectionatic.org/?p=438 ]]> By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=438

Recently I came across a nice Ted-Ed video presenting a Fish Riddle.

I thought it would be fun to try solving it using Julia’s award winning JuMP package. Before we get started, please watch the above video-you might want to pause at 2:24 if you want to solve it yourself.

To attempt this problem in Julia, you will have to install the JuMP package.

julia> Pkg.add("JuMP")

JuMP provides an algebraic modeling language for dealing with mathematical optimization problems. Basically, that allows you to focus on describing your problem in a simple syntax and it would then take care of transforming that description in a form that can be handled by any number of solvers. Those solvers can deal with several types of optimization problems, and some solvers are more generic than others. It is important to pick the right solver for the problem that you are attempting.

The problem premises are:
1. There are 50 creatures in total. That includes sharks outside the tanks and fish
2. Each SECTOR has anywhere from 1 to 7 sharks, with no two sectors having the same number of sharks.
3. Each tank has an equal number of fish
4. In total, there are 13 or fewer tanks
5. SECTOR ALPHA has 2 sharks and 4 tanks
6. SECTOR BETA has 4 sharsk and 2 tanks
We want to find the number of tanks in sector GAMMA!

Here we identify the problem as mixed integer non-linear program (MINLP). We know that because the problem involves an integer number of fish tanks, sharks, and number of fish inside each tank. It also non-linear (quadratic to be exact) because it involves multiplying two two of the problem variables to get the total number or creatures. Looking at the table of solvers in the JuMP manual. pick the Bonmin solver from AmplNLWriter package. This is an open source solver, so installation should be hassle free.

julia> Pkg.add("AmplNLWriter")

We are now ready to write some code.

using JuMP, AmplNLWriter
 
# Solve model
m = Model(solver=BonminNLSolver())
 
# Number of fish in each tank
@variable(m, n>=1, Int)
 
# Number of sharks in each sector
@variable(m, s[i=1:3], Int)
 
# Number of tanks in each sector
@variable(m, nt[i=1:3]>=0, Int)
 
@constraints m begin
    # Constraint 2
    sharks[i=1:3], 1 <= s[i] <= 7
    numfish[i=1:3], 1 <= nt[i]
      # Missing uniqueness in restriction
    # Constraint 4
    sum(nt) <= 13
    # Constraint 5
    s[1] == 2
    nt[1] == 4
    # Constraint 6
    s[2] == 4
    nt[2] == 2
end
 
# Constraints 1 & 3
@NLconstraint(m, s[1]+s[2]+s[3]+n*(nt[1]+nt[2]+nt[3]) == 50)
 
# Solve it
status = solve(m)
 
sharks_in_each_sector=getvalue(s)
fish_in_each_tank=getvalue(n)
tanks_in_each_sector=getvalue(nt)
 
@printf("We have %d fishes in each tank.\n", fish_in_each_tank)
@printf("We have %d tanks in sector Gamma.\n",tanks_in_each_sector[3])
@printf("We have %d sharks in sector Gamma.\n",sharks_in_each_sector[3])

In that representation we could not capture the restriction that “no two sectors having the same number of sharks”. We end up with the following output:

We have 4 fishes in each tank.
We have 4 tanks in sector Gamma.
We have 4 sharks in sector Gamma.

Since the problem domain is limited, we can possible fix that by adding a constrain that force the number of sharks in sector Gamma to be greater than 4.

@constraint(m,s[3]>=5)

This will result in an answer that that does not violate any of the stated constraints.

We have 3 fishes in each tank.
We have 7 tanks in sector Gamma.
We have 5 sharks in sector Gamma.

However, this seems like a bit of kludge. The proper way go about it is represent the number of sharks in the each sector as binary array, with only one value set to 1.

# Number of sharks in each sector
@variable(m, s[i=1:3,j=1:7], Bin)

We will have to modify our constraint block accordingly

@constraints m begin
    # Constraint 2
    sharks[i=1:3], sum(s[i,:]) == 1
    u_sharks[j=1:7], sum(s[:,j]) <=1 # uniquness
    # Constraint 4
    sum(nt) <= 13
    # Constraint 5
    s[1,2] == 1
    nt[1] == 4
    # Constraint 6
    s[2,4] == 1
    nt[2] == 2
end

We invent a new variable array st to capture the number of sharks in each sector. This simply obtained by multiplying the binary array by the vector [1,2,\ldots,7]^\top

@variable(m,st[i=1:3],Int)
@constraint(m, st.==s*collect(1:7))

We rewrite our last constraint as

# Constraints 1 & 3
@NLconstraint(m, st[1]+st[2]+st[3]+n*(nt[1]+nt[2]+nt[3]) == 50)

After the model has been solved, we extract our output for the number of sharks.

sharks_in_each_sector=getvalue(st)

…and we get the correct output.

This problem might have been an overkill for using a full blown mixed integer non-linear optimizer. It can be solved by a simple table as shown in the video. However, we might not alway find ourselves in such a fortunate position. We could have also use mixed integer quadratic programming solver such as Gurobi which would be more efficient for that sort of problem. Given the small problem size, efficiency hardly matters here.

]]>
3703
Julia Computing Awarded $910,000 Grant by Alfred P. Sloan Foundation, Including $160,000 for STEM Diversity http://www.juliabloggers.com/julia-computing-awarded-910000-grant-by-alfred-p-sloan-foundation-including-160000-for-stem-diversity-2/ Mon, 26 Jun 2017 00:00:00 +0000 http://juliacomputing.com/press/2017/06/26/sloan-grant Cambridge, MA – Julia Computing has been granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community.

The grant will support Julia training, adoption, usability, compilation, package development, tooling and documentation.

The diversity portion of the grant will fund a new full-time Director of Diversity Initiatives plus travel, scholarships, training sessions, workshops, hackathons and Webinars. Further information about the new Director of Diversity Initiatives position is below for interested applicants.

Julia Computing CEO Viral Shah says, “Diversity of backgrounds increases diversity of ideas. With this grant, the Sloan Foundation is setting a new standard of support for diversity which we hope will be emulated throughout STEM.”

Diversity efforts in the Julia community have been led by JuliaCon Diversity Chair, Erica Moszkowski. According to Moszkowski, “This year, we awarded $12,600 in diversity grants to help 16 participants travel to, attend and present at JuliaCon 2017. Those awards, combined with anonymous talk review, directed outreach, and other efforts have paid off. To give one example, there are many more women attending and presenting than in previous years, but there is a lot more we can do to expand participation from underrepresented groups in the Julia community. This support from the Sloan Foundation will allow us to scale up these efforts and apply them not just at JuliaCon, but much more broadly through Julia workshops and recruitment.”

Julia Computing seeks job applicants for Director of Diversity Initiatives. This is a full-time salaried position. The ideal candidate would have the following characteristics:

  • Familiarity with Julia
  • Strong scientific, mathematical or numeric programming skills required – e.g. Julia, Python, R
  • Eager to travel, organize and conduct Julia trainings, conferences, workshops and hackathons
  • Enthusiastic about outreach, developing and leveraging relationships with universities and STEM diversity organizations such as YesWeCode, Girls Who Code, Code Latino and Black Girls Code
  • Strong organizational, communication, public speaking and training skills required
  • Passionate evangelist for Julia, open source computing, scientific computing and increasing diversity in the Julia community and STEM
  • This position is based in Cambridge, MA

Interested applicants should send a resume and statement of interest to jobs@juliacomputing.com.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of Python, R, Matlab and other languages can easily integrate their existing code into Julia.

  4. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing, and has advanced libraries that make coding simple and fast, and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  5. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

The Alfred P. Sloan Foundation is a not-for-profit grantmaking institution based in New York City. Founded by industrialist Alfred P. Sloan Jr., the Foundation makes grants in support of basic research and education in science, technology, engineering, mathematics, and economics. This grant was provided through the Foundation’s Data and Computational Research program, which makes grants that seek to leverage developments in digital information technology to maximize the efficiency and trustedness of research. sloan.org

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/press/2017/06/26/sloan-grant.html

Cambridge, MA – Julia Computing has been granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community.

The grant will support Julia training, adoption, usability, compilation, package development, tooling and documentation.

The diversity portion of the grant will fund a new full-time Director of Diversity Initiatives plus travel, scholarships, training sessions, workshops, hackathons and Webinars. Further information about the new Director of Diversity Initiatives position is below for interested applicants.

Julia Computing CEO Viral Shah says, “Diversity of backgrounds increases diversity of ideas. With this grant, the Sloan Foundation is setting a new standard of support for diversity which we hope will be emulated throughout STEM.”

Diversity efforts in the Julia community have been led by JuliaCon Diversity Chair, Erica Moszkowski. According to Moszkowski, “This year, we awarded $12,600 in diversity grants to help 16 participants travel to, attend and present at JuliaCon 2017. Those awards, combined with anonymous talk review, directed outreach, and other efforts have paid off. To give one example, there are many more women attending and presenting than in previous years, but there is a lot more we can do to expand participation from underrepresented groups in the Julia community. This support from the Sloan Foundation will allow us to scale up these efforts and apply them not just at JuliaCon, but much more broadly through Julia workshops and recruitment.”

Julia Computing seeks job applicants for Director of Diversity Initiatives. This is a full-time salaried position. The ideal candidate would have the following characteristics:

  • Familiarity with Julia
  • Strong scientific, mathematical or numeric programming skills required – e.g. Julia, Python, R
  • Eager to travel, organize and conduct Julia trainings, conferences, workshops and hackathons
  • Enthusiastic about outreach, developing and leveraging relationships with universities and STEM diversity organizations such as YesWeCode, Girls Who Code, Code Latino and Black Girls Code
  • Strong organizational, communication, public speaking and training skills required
  • Passionate evangelist for Julia, open source computing, scientific computing and increasing diversity in the Julia community and STEM
  • This position is based in Cambridge, MA

Interested applicants should send a resume and statement of interest to jobs@juliacomputing.com.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to
    1,000x for insurance model estimation, 225x for parallel
    supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and
    comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of
    Python, R, Matlab and other languages can easily integrate their
    existing code into Julia.

  4. Elegant code. Julia was built from the ground up for
    mathematical, scientific and statistical computing, and has advanced
    libraries that make coding simple and fast, and dramatically reduce
    the number of lines of code required – in some cases, by 90%
    or more.

  5. Julia solves the two language problem. Because Julia combines
    the ease of use and familiar syntax of Python, R and Matlab with the
    speed of C, C++ or Java, programmers no longer need to estimate
    models in one language and reproduce them in a faster
    production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

The Alfred P. Sloan Foundation is a not-for-profit grantmaking institution based in New York City. Founded by industrialist Alfred P. Sloan Jr., the Foundation makes grants in support of basic research and education in science, technology, engineering, mathematics, and economics. This grant was provided through the Foundation’s Data and Computational Research program, which makes grants that seek to leverage developments in digital information technology to maximize the efficiency and trustedness of research. sloan.org

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
3710
Julia Computing Awarded $910,000 Grant by Alfred P. Sloan Foundation, Including $160,000 for STEM Diversity http://www.juliabloggers.com/julia-computing-awarded-910000-grant-by-alfred-p-sloan-foundation-including-160000-for-stem-diversity/ Mon, 26 Jun 2017 00:00:00 +0000 http://juliacomputing.com/press/2017/06/26/grant Cambridge, MA – Julia Computing has been granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community.

The grant will support Julia training, adoption, usability, compilation, package development, tooling and documentation.

The diversity portion of the grant will fund a new full-time Director of Diversity Initiatives plus travel, scholarships, training sessions, workshops, hackathons and Webinars. Further information about the new Director of Diversity Initiatives position is below for interested applicants.

Julia Computing CEO Viral Shah says, “Diversity of backgrounds increases diversity of ideas. With this grant, the Sloan Foundation is setting a new standard of support for diversity which we hope will be emulated throughout STEM.”

Diversity efforts in the Julia community have been led by JuliaCon Diversity Chair, Erica Moszkowski. According to Moszkowski, “This year, we awarded $12,600 in diversity grants to help 16 participants travel to, attend and present at JuliaCon 2017. Those awards, combined with anonymous talk review, directed outreach, and other efforts have paid off. To give one example, there are many more women attending and presenting than in previous years, but there is a lot more we can do to expand participation from underrepresented groups in the Julia community. This support from the Sloan Foundation will allow us to scale up these efforts and apply them not just at JuliaCon, but much more broadly through Julia workshops and recruitment.”

Julia Computing seeks job applicants for Director of Diversity Initiatives. This is a full-time salaried position. The ideal candidate would have the following characteristics:

  • Familiarity with Julia
  • Strong scientific, mathematical or numeric programming skills required – e.g. Julia, Python, R
  • Eager to travel, organize and conduct Julia trainings, conferences, workshops and hackathons
  • Enthusiastic about outreach, developing and leveraging relationships with universities and STEM diversity organizations such as YesWeCode, Girls Who Code, Code Latino and Black Girls Code
  • Strong organizational, communication, public speaking and training skills required
  • Passionate evangelist for Julia, open source computing, scientific computing and increasing diversity in the Julia community and STEM
  • This position is based in Cambridge, MA

Interested applicants should send a resume and statement of interest to jobs@juliacomputing.com.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of Python, R, Matlab and other languages can easily integrate their existing code into Julia.

  4. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing, and has advanced libraries that make coding simple and fast, and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  5. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

The Alfred P. Sloan Foundation is a not-for-profit grantmaking institution based in New York City. Founded by industrialist Alfred P. Sloan Jr., the Foundation makes grants in support of basic research and education in science, technology, engineering, mathematics, and economics. This grant was provided through the Foundation’s Data and Computational Research program, which makes grants that seek to leverage developments in digital information technology to maximize the efficiency and trustedness of research. sloan.org

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/press/2017/06/26/grant.html

Cambridge, MA – Julia Computing has been granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community.

The grant will support Julia training, adoption, usability, compilation, package development, tooling and documentation.

The diversity portion of the grant will fund a new full-time Director of Diversity Initiatives plus travel, scholarships, training sessions, workshops, hackathons and Webinars. Further information about the new Director of Diversity Initiatives position is below for interested applicants.

Julia Computing CEO Viral Shah says, “Diversity of backgrounds increases diversity of ideas. With this grant, the Sloan Foundation is setting a new standard of support for diversity which we hope will be emulated throughout STEM.”

Diversity efforts in the Julia community have been led by JuliaCon Diversity Chair, Erica Moszkowski. According to Moszkowski, “This year, we awarded $12,600 in diversity grants to help 16 participants travel to, attend and present at JuliaCon 2017. Those awards, combined with anonymous talk review, directed outreach, and other efforts have paid off. To give one example, there are many more women attending and presenting than in previous years, but there is a lot more we can do to expand participation from underrepresented groups in the Julia community. This support from the Sloan Foundation will allow us to scale up these efforts and apply them not just at JuliaCon, but much more broadly through Julia workshops and recruitment.”

Julia Computing seeks job applicants for Director of Diversity Initiatives. This is a full-time salaried position. The ideal candidate would have the following characteristics:

  • Familiarity with Julia
  • Strong scientific, mathematical or numeric programming skills required – e.g. Julia, Python, R
  • Eager to travel, organize and conduct Julia trainings, conferences, workshops and hackathons
  • Enthusiastic about outreach, developing and leveraging relationships with universities and STEM diversity organizations such as YesWeCode, Girls Who Code, Code Latino and Black Girls Code
  • Strong organizational, communication, public speaking and training skills required
  • Passionate evangelist for Julia, open source computing, scientific computing and increasing diversity in the Julia community and STEM
  • This position is based in Cambridge, MA

Interested applicants should send a resume and statement of interest to jobs@juliacomputing.com.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to
    1,000x for insurance model estimation, 225x for parallel
    supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and
    comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of
    Python, R, Matlab and other languages can easily integrate their
    existing code into Julia.

  4. Elegant code. Julia was built from the ground up for
    mathematical, scientific and statistical computing, and has advanced
    libraries that make coding simple and fast, and dramatically reduce
    the number of lines of code required – in some cases, by 90%
    or more.

  5. Julia solves the two language problem. Because Julia combines
    the ease of use and familiar syntax of Python, R and Matlab with the
    speed of C, C++ or Java, programmers no longer need to estimate
    models in one language and reproduce them in a faster
    production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

The Alfred P. Sloan Foundation is a not-for-profit grantmaking institution based in New York City. Founded by industrialist Alfred P. Sloan Jr., the Foundation makes grants in support of basic research and education in science, technology, engineering, mathematics, and economics. This grant was provided through the Foundation’s Data and Computational Research program, which makes grants that seek to leverage developments in digital information technology to maximize the efficiency and trustedness of research. sloan.org

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
3697
Julia Computing Raises $4.6M in Seed Funding http://www.juliabloggers.com/julia-computing-raises-4-6m-in-seed-funding-2/ Mon, 19 Jun 2017 00:00:00 +0000 http://juliacomputing.com/press/2017/06/19/seed-funding Berkeley, California – Julia Computing is pleased to announce seed funding of $4.6M from investors General Catalyst and Founder Collective.

Julia Computing CEO Viral Shah says, “We selected General Catalyst and Founder Collective as our initial investors because of their success backing entrepreneurs with business models based on open source software. This investment helps us accelerate product development and continue delivering outstanding support to our customers, while the entire Julia community benefits from Julia Computing’s contributions to the Julia open source programming language.”

The General Catalyst team was led by Donald Fischer, who was an early product manager for Red Hat Enterprise Linux, and the Founder Collective team was led by David Frankel.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

According to Tim Thornham, Director of Financial Solutions Modeling at Aviva, Britain’s second-largest insurer, “Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.”

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of Python, R, Matlab and other languages can easily integrate their existing code into Julia.

  4. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing, and has advanced libraries that make coding simple and fast, and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  5. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia. Julia Computing’s founders are Viral Shah, Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi.

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/press/2017/06/19/seed-funding.html

Berkeley, California – Julia Computing is pleased to announce seed funding of $4.6M from investors General Catalyst and Founder Collective.

Julia Computing CEO Viral Shah says, “We selected General Catalyst and Founder Collective as our initial investors because of their success backing entrepreneurs with business models based on open source software. This investment helps us accelerate product development and continue delivering outstanding support to our customers, while the entire Julia community benefits from Julia Computing’s contributions to the Julia open source programming language.”

The General Catalyst team was led by Donald Fischer, who was an early product manager for Red Hat Enterprise Linux, and the Founder Collective team was led by David Frankel.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia, a product of the open source community, MIT CSAIL and MIT Mathematics, combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

According to Tim Thornham, Director of Financial Solutions Modeling at Aviva, Britain’s second-largest insurer, “Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.”

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to
    1,000x for insurance model estimation, 225x for parallel
    supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and
    comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of
    Python, R, Matlab and other languages can easily integrate their
    existing code into Julia.

  4. Elegant code. Julia was built from the ground up for
    mathematical, scientific and statistical computing, and has advanced
    libraries that make coding simple and fast, and dramatically reduce
    the number of lines of code required – in some cases, by 90%
    or more.

  5. Julia solves the two language problem. Because Julia combines
    the ease of use and familiar syntax of Python, R and Matlab with the
    speed of C, C++ or Java, programmers no longer need to estimate
    models in one language and reproduce them in a faster
    production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia. Julia Computing’s founders are Viral Shah, Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi.

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
3712
Julia Computing Raises $4.6M in Seed Funding http://www.juliabloggers.com/julia-computing-raises-4-6m-in-seed-funding/ Mon, 19 Jun 2017 00:00:00 +0000 http://juliacomputing.com/press/2017/06/19/funding Berkeley, California – Julia Computing is pleased to announce seed funding of $4.6M from investors General Catalyst and Founder Collective.

Julia Computing CEO Viral Shah says, “We selected General Catalyst and Founder Collective as our initial investors because of their success backing entrepreneurs with business models based on open source software. This investment helps us accelerate product development and continue delivering outstanding support to our customers, while the entire Julia community benefits from Julia Computing’s contributions to the Julia open source programming language.”

The General Catalyst team was led by Donald Fischer, who was an early product manager for Red Hat Enterprise Linux, and the Founder Collective team was led by David Frankel.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

According to Tim Thornham, Director of Financial Solutions Modeling at Aviva, Britain’s second-largest insurer, “Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.”

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to 1,000x for insurance model estimation, 225x for parallel supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of Python, R, Matlab and other languages can easily integrate their existing code into Julia.

  4. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing, and has advanced libraries that make coding simple and fast, and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  5. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia. Julia Computing’s founders are Viral Shah, Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi.

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/press/2017/06/19/funding.html

Berkeley, California – Julia Computing is pleased to announce seed funding of $4.6M from investors General Catalyst and Founder Collective.

Julia Computing CEO Viral Shah says, “We selected General Catalyst and Founder Collective as our initial investors because of their success backing entrepreneurs with business models based on open source software. This investment helps us accelerate product development and continue delivering outstanding support to our customers, while the entire Julia community benefits from Julia Computing’s contributions to the Julia open source programming language.”

The General Catalyst team was led by Donald Fischer, who was an early product manager for Red Hat Enterprise Linux, and the Founder Collective team was led by David Frankel.

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1 million downloads and +161% annual growth, Julia is one of the top 10 programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

According to Tim Thornham, Director of Financial Solutions Modeling at Aviva, Britain’s second-largest insurer, “Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.”

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Raytheon and Uber.

  1. Julia is lightning fast. Julia provides speed improvements up to
    1,000x for insurance model estimation, 225x for parallel
    supercomputing image analysis and 11x for macroeconomic modeling.

  2. Julia is easy to learn. Julia’s flexible syntax is familiar and
    comfortable for users of Python, R and Matlab.

  3. Julia integrates well with existing code and platforms. Users of
    Python, R, Matlab and other languages can easily integrate their
    existing code into Julia.

  4. Elegant code. Julia was built from the ground up for
    mathematical, scientific and statistical computing, and has advanced
    libraries that make coding simple and fast, and dramatically reduce
    the number of lines of code required – in some cases, by 90%
    or more.

  5. Julia solves the two language problem. Because Julia combines
    the ease of use and familiar syntax of Python, R and Matlab with the
    speed of C, C++ or Java, programmers no longer need to estimate
    models in one language and reproduce them in a faster
    production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia. Julia Computing’s founders are Viral Shah, Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi.

A few examples of how Julia is being used today include:

BlackRock, the world’s largest asset manager, is using Julia to power their trademark Aladdin analytics platform.

Aviva, Britain’s second-largest insurer, is using Julia to make Solvency II compliance models run 1,000x faster using just 7% as much code as the legacy program it replaced.

“Solvency II compliant models in Julia are 1,000x faster than Algorithmics, use 93% fewer lines of code and took one-tenth the time to implement.” Tim Thornham, Director of Financial Solutions Modeling

Berkery Noyes is using Julia for mergers and acquisitions analysis.

“Julia is 20 times faster than Python, 100 times faster than R, 93 times faster than Matlab and 1.3 times faster than Fortran. What really excites us is that it’s interesting that you can write high-level, scientific and numerical computing but without having to re-translate that. Usually, if you have something in R or Matlab and you want to make it go faster, you have to re-translate it to C++, or some other faster language; with Julia, you don’t—it sits right on top.” Keith Lubell, CTO

UC Berkeley Autonomous Race Car (BARC) is using Julia for self-driving vehicle navigation.

“Julia has some amazing new features for our research. The port to ARM has made it easy for us to translate our research codes into real world applications.” Francesco Borrelli, Professor of Mechanical Engineering and co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley

Federal Aviation Administration (FAA) and MIT Lincoln Labs are using Julia for the Next-Generation Aircraft Collision Avoidance System.

“The previous way of doing things was very costly. Julia is very easy to understand. It’s a very familiar syntax, which helps the reader understand the document with clarity, and it helps the writer develop algorithms that are concise. Julia resolves many of our conflicts, reduces cost during technology transfer, and because Julia is fast, it enables us to run the entire system and allows the specification to be executed directly. We continue to push Julia as a standard for specifications in the avionics industry. Julia is the right answer for us and exceeds all our needs.” Robert Moss, MIT Lincoln Labs

Augmedics is using Julia to give surgeons ‘x-ray vision’ via augmented reality.

“I stumbled upon Julia and gave it a try for a few days. I fell in love with the syntax, which is in so many ways exactly how I wanted it to be. The Julia community is helpful, Juno (the interactive development environment for Julia) is super-helpful. I don’t know how one can write without it. As a result, we are achieving much more and far more efficiently using Julia.” Tsur Herman, Senior Algorithms Developer

Path BioAnalytics is using Julia for personalized medicine.

“We were initially attracted to Julia because of the innovation we saw going on in the community. The computational efficiency and interactivity of the data visualization packages were exactly what we needed in order to process our data quickly and present results in a compelling fashion. Julia is instrumental to the efficient execution of multiple workflows, and with the dynamic development of the language, we expect Julia will continue to be a key part of our business going forward.” Katerina Kucera, Lead Scientist

Voxel8 is using Julia for 3D printing and drone manufacture.

“The expressiveness of a language matters. Being high level and having an ability to iterate quickly makes a major difference in a fast-paced innovative environment like at Voxel8. The speed at which we’ve been able to develop this has been incredible. If we were doing this in a more traditional language like C or C++, we wouldn’t be nearly as far as we are today with the number of developers we have, and we wouldn’t be able to respond nearly as quickly to customer feedback regarding what features they want. There is a large number of packages for Julia that we find useful. Julia is very stable – the core language is stable and fast and most packages are very stable.” Jack Minardi, Co-Founder and Software Lead

Federal Reserve Bank of New York and Nobel Laureate Thomas J. Sargent are using Julia to solve macroeconomic models 10x faster.

“We tested our code and found that the model estimation is about ten times faster with Julia than before, a very large improvement. Our ports (computer lingo for “translations”) of certain algorithms, such as Chris Sims’s gensys (which computes the model solution), also ran about six times faster in Julia than the … versions we had previously used.” Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski and Micah Smith, Federal Reserve Bank of New York

“Julia is a great tool. We like Julia. We are very excited about Julia because our models are complicated. It’s easy to write the problem down, but it’s hard to solve it – especially if our model is high dimensional. That’s why we need Julia. Figuring out how to solve these problems requires some creativity. This is a walking advertisement for Julia.” Thomas J. Sargent, Nobel Laureate

Intel, Lawrence Berkeley National Laboratory, UC Berkeley and the National Energy Research Scientific Computing Center are using Julia for parallel supercomputing to increase the speed of astronomical image analysis 225x.

Barts Cancer Institute, Institute of Cancer Research, University College London and Queen Mary University of London are using Julia to model cancer genomes.

“Coming from using Matlab, Julia was pretty easy, and I was surprised by how easy it was to write pretty fast code. Obviously the speed, conciseness and dynamic nature of Julia is a big plus and the initial draw, but there are other perhaps unexpected benefits. For example, I’ve learned a lot about programming through using Julia. Learning Julia has helped me reason about how to write better and faster code. I think this is primarily because Julia is very upfront about why it can be fast and nothing is hidden away or “under the hood”. Also as most of the base language and packages are written in Julia, it’s great to be able to delve into what’s going on without running into a wall of C code, as might be the case in other languages. I think this is a big plus for its use in scientific research too, where we hope that our methods and conclusions are reproducible. Having a language that’s both fast enough to implement potentially sophisticated algorithms at a big scale but also be readable by most people is a great resource. Also, I find the code to be very clean looking, which multiple dispatch helps with a lot, and I like the ability to write in a functional style.” Marc Williams, Barts Cancer Institute, Queen Mary University of London and University College London

]]>
3690
Reading DataFrames with non-UTF8 encoding in Julia http://www.juliabloggers.com/reading-dataframes-with-non-utf8-encoding-in-julia/ Mon, 12 Jun 2017 15:51:55 +0000 http://perfectionatic.org/?p=414 ]]> By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=414

Recently I ran into problem where I was trying to read a CSV files from a Scandinavian friend into a DataFrame. I was getting errors it could not properly parse the latin1 encoded names.

I tried running

using DataFrames
dataT=readtable("example.csv", encoding=:latin1)

but the got this error

ArgumentError: Argument 'encoding' only supports ':utf8' currently.

The solution make use of (StringEncodings.jl)[https://github.com/nalimilan/StringEncodings.jl] to wrap the file data stream before presenting it to the readtable function.

f=open("example.csv","r")
s=StringDecoder(f,"LATIN1", "UTF-8")
dataT=readtable(s)
close(s)
close(f)

The StringDecoder generates an IO stream that appears to be utf8 for the readtable function.

]]>
3684
Tupper’s self-referential formula in Julia http://www.juliabloggers.com/tuppers-self-referential-formula-in-julia/ Mon, 12 Jun 2017 15:15:29 +0000 http://perfectionatic.org/?p=399 ]]> By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=399

I was surprised when I came across on Tupper’s formula on twitter. I felt the compulsion to implement it in Julia.

The formula is expressed as

{1\over 2} < \left\lfloor \mathrm{mod}\left(\left\lfloor {y \over 17} \right\rfloor 2^{-17 \lfloor x \rfloor - \mathrm{mod}(\lfloor y\rfloor, 17)},2\right)\right\rfloor

and yields bitmap facsimile of itself.

In [1]:
k=big"960 939 379 918 958 884 971 672 962 127 852 754 715 004 339 660 129 306 651 505 519 271 702 802 395 266 424 689 642 842 174 350 718 121 267 153 782 770 623 355 993 237 280 874 144 307 891 325 963 941 337 723 487 857 735 749 823 926 629 715 517 173 716 995 165 232 890 538 221 612 403 238 855 866 184 013 235 585 136 048 828 693 337 902 491 454 229 288 667 081 096 184 496 091 705 183 454 067 827 731 551 705 405 381 627 380 967 602 565 625 016 981 482 083 418 783 163 849 115 590 225 610 003 652 351 370 343 874 461 848 378 737 238 198 224 849 863 465 033 159 410 054 974 700 593 138 339 226 497 249 461 751 545 728 366 702 369 745 461 014 655 997 933 798 537 483 143 786 841 806 593 422 227 898 388 722 980 000 748 404 719"
setprecision(BigFloat,10000);

In the above, the big integer is the magic number that lets us generate the image of the formula. I also need to setprecision of BigFloat to be very high, as rounding errors using the default precision does not get us the desired results. The implementation was inspired by the one in Python, but I see Julia a great deal more concise and clearer.

In [2]:
function tupper_field(k)
    field=Array{Bool}(17,106)
    for (ix,x) in enumerate(0.0:1:105.0), (iy,y) in enumerate(k:k+16)
        field[iy,107-ix]=1/2<floor(mod(floor(y/17)*2^(-17*floor(x)-mod(floor(y),17)),2))
    end
   field 
end
In [3]:
f=tupper_field(k);
using Images
img = colorview(Gray,.!f)
Out[3]:

I just inverted the boolean array here to get the desired bitmap output.

 

]]>
3686
Sampling variation in effective sample size estimates (MCMC) http://www.juliabloggers.com/sampling-variation-in-effective-sample-size-estimates-mcmc-2/ Mon, 12 Jun 2017 14:25:57 +0000 http://tpapp.github.io/post/ess-sampling/ess-sampling/ ]]> By: Julia on Tamás K. Papp's website

Re-posted from: http://tpapp.github.io/post/ess-sampling/ess-sampling/

Introduction MCMC samples, used in Bayesian statistics, are not independent — in fact, unless one uses specialized methods or modern HMC, posterior draws are usually at highly autocorrelated. For independent draws, [ \text{variance of simulation mean} \propto \frac1N ] where $N$ is the sample size, but for correlated draws, one has to scale the sample size with a factor [ \tau = \frac{1}{1+2\sum_{k=1}^\infty \rho_k} ] where $\rho_k$ is the lag-$k$ autocorrelation.

]]>
3688
Sampling variation in effective sample size estimates (MCMC) http://www.juliabloggers.com/sampling-variation-in-effective-sample-size-estimates-mcmc/ Mon, 12 Jun 2017 14:25:57 +0000 http://tpapp.github.io/post/ess-sampling/ ]]> By: Julia on Tamás K. Papp's website

Re-posted from: http://tpapp.github.io/post/ess-sampling/

Introduction MCMC samples, used in Bayesian statistics, are not independent — in fact, unless one uses specialized methods or modern HMC, posterior draws are usually at highly autocorrelated. For independent draws, [ \text{variance of simulation mean} \propto \frac1N ] where $N$ is the sample size, but for correlated draws, one has to scale the sample size with a factor [ \tau = \frac{1}{1+2\sum_{k=1}^\infty \rho_k} ] where $\rho_k$ is the lag-$k$ autocorrelation.

]]>
3682
Optim.jl v0.9.0 http://www.juliabloggers.com/optim-jl-v0-9-0/ Fri, 02 Jun 2017 20:47:33 +0000 http://www.pkofod.com/?p=265 Continue reading Optim.jl v0.9.0 ]]> By: pkofod

Re-posted from: http://www.pkofod.com/2017/06/02/optim-jl-v0-9-0/

I am very happy to say that we can finally announce that Optim.jl v0.9.0 is out. This version has quite a few user facing changes. Please read about the changes below if you use Optim.jl in a package, a script, or anything else, as you will quite likely have to make some changes to your code.

As always, I have to thank my two partners in crime: Asbjørn Nilsen Riseth (@anriseth) and Christoph Ortner (@cortner) for their help in making the changes, transitions, and tests that are included in v0.9.0.

The last update (form v0.6.0 to v0.7.0 had) some changes that were a long time coming, and so does v0.9.0. Hopefully, these fixes to old design problems will greatly improve the user experience and performance of Optim.jl, and pave the way for more exiting features in the future.

We’ve tried to make the transition as smooth as possible, although we do have breaking changes in this update. Please consult the documentation if you face problems, join us on gitter or ask the community at discourse!

Okay, now to the changes.

Why not v0.8.0?
First of all, why v0.9.0? Last version was v0.7.8! The is because we are dropping support for Julia v0.4 and v0.5 simultaneously, so we are reserving v0.8.0 for backporting serious fixes to Julia v0.5. However, v0.6 should be just around the corner. With Julia v0.7 and v1.0.0 not too far out in the horizon either, I’ve decided it’s more important to move forward than to keep v0.4 and v0.5 up to speed. The dev time is constrained, so currently it’s one or the other. Of course, for users of Julia v0.5. they can simply continue to use Optim.jl v0.7.8. Post Julia’s proper release, backwards compatibility and continuity will be more important, even if it comes at the expense of development speed.

Another note about the version number: The next version of Optim.jl will be v1.0.0, and we will follow SEMVER 2.0 fully.

Change order of evaluation point and storage arguments
This one is very breaking, although we have set up op a system such that all gradients and Hessians will be checked before proceeding. This check will be removed shortly in a v1.0.0 version bump, so please correct your code now. Basically, we closed a very old issue (#156) concerning the input argument order in gradients and Hessians. In Julia, an in-place function typically has an exclamation mark at the end of its name, and the cache as the first argument. In Optim.jl it has been the other way around for the argument order. We’ve changed that, and this means that you now have to provide “g” or “H” as the first argument, and “x” as the second. The old version

function g!(x, g)
    ... do something ...
end

is now

function g!(g, x)
    ... do something ...
end

NLSolversBase.jl
Since v0.7.0, we’ve moved some of the basic infrastructure of Optim.jl to NLSolversBase.jl. This is currently the Non-, Once-, and TwiceDifferentiable types and constructors. This is done to, as a first step, share code between Optim.jl and LineSearches.jl, and but also NLsolve.jl in the future. At the same time, we’ve made the code a little smarter, such that superfluous calls to the objective function, gradient, and Hessian are now avoided. As an example, compare the objective and gradient calls in the example in our readme. Here, we optimize the Rosenbrock “banana” function using BFGS. Since last version of Optim we had to change the output, as it has gone from 157 calls to 53. Much of this comes from this refactoring, but some of it also comes form a better choices for initial line search steps for BFGS and Newton introduced in #328.

As mentioned, we’ve made the *Differentiable-types a bit smarter, including moving the gradient and Hessian caches into the respective types. This also means, that a OnceDifferentiable type instance needs to know what the return type of the gradient is. This is done by providing an x seed in the constructor

rosenbrock = Optim.UnconstrainedProblems.examples["Rosenbrock"]
f = rosenbrock.f
g! = rosenbrock.g!
x_seed = rosenbrock.initial_x
od = OnceDifferentiable(f, g!, x_seed)

If the seed also happens to be the initial x, then you do not have to provide an x when calling optimize

julia> optimize(od, BFGS(), Optim.Options(g_tol=0.1))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [1.0005999613152214,1.001138415164852]
 * Minimizer: [1.0005999613152214,1.001138415164852]
 * Minimum: 7.427113e-07
 * Iterations: 13
 * Convergence: true
   * |x - x'| < 1.0e-32: false 
     |x - x'| = 1.08e-02 
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN 
   * |g(x)| < 1.0e-01: true 
     |g(x)| = 2.60e-02 
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 45
 * Gradient Calls: 45

If you’ve used Optim.jl before, you’ll notice that the output carries a bit more information about the convergence criteria.

LineSearches.jl turned Julian
Line searches used to be chosen using symbols in the method constructor for line search based methods such as GradientDescent, BFGS, and Newton by use of the linesearch keyword. The new version of LineSearches.jl uses types and dispatch exactly like Optim.jl does for solvers. This means that you now have to pass a type instance instead of a keyword, and this also means that we can open up for easy tweaking of line search parameters through fields in the line search types.

Let us illustrate by the following example how the new syntax works. First, we construct a BFGS instance without specifying the linesearch. This defaults to HagerZhang.

julia> rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)
 
julia> result = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926033423,0.9999999852005353]
 * Minimum: 5.471433e-17
 * Iterations: 16
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 3.47e-07
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 2.33e-09
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 53
 * Gradient Calls: 53

or we could choose a backtracking line search instead

 
julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking()))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926655744,0.9999999853309254]
 * Minimum: 5.379380e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 1.13e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 8.79e-11
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 31
 * Gradient Calls: 24

this defaults to cubic backtracking, but quadratic can be chosen using the order keyword

julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking(order = 2)))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926644578,0.9999999853284671]
 * Minimum: 5.381020e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 4.73e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 1.76e-10
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 29
 * Gradient Calls: 24

LineSearches.jl should have better documentation coming soon, but the code is quite self-explanatory for those who want to twiddle around with these parameters.

The method state is now an argument to optimize
While not always that useful to know for users, we use method states internally to hold all the pre-allocated cache variables that are needed. In the new version of Optim.jl, this can be explicitly provided by the user such that you can retrieve various diagnostics after the optimization routine is done. One such example is the inverse Hessian estimate that BFGS spits out.

method = BFGS()
options = Optim.Options()
initial_x = rand(2)
d = OnceDifferentiable(f, g!, initial_x)
my_state = Optim.initial_state(method, options, d, initial_x)
optimize{d, method, options, my_state)

The future
We have more changes coming in the near future. There’s PR #356 for a Trust Region solver for cases where you can explicitly calculate Hessian-vector products without forming the Hessian (from @jeff-regier from the Celeste.jl project), the interior point replacement for our current barrier function approach to box constrained optimization in PR #303, and more.

]]>
3676