Forbes Names Julia Computing Co-Founder Keno Fischer to ‘30 Under 30’ List

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/communication/2018/11/19/30u30.html

Cambridge, MA – Forbes has named Julia Computing Co-Founder and
Chief Technology Officer Keno
Fischer

to its prestigious ‘30 Under 30’ list of young leaders in enterprise
technology.

The Forbes ‘30 Under 30’ list recognizes 30 extraordinary individuals
under the age of 30 for their accomplishments.

Keno Fischer began contributing to Julia when the language was first
released in 2012. At the time, Keno was a 16 year-old high school
student. Keno is a native of Hösel, Germany who co-founded Julia
Computing in 2015 and graduated from Harvard University in 2016.

According to Viral Shah, CEO of Julia Computing, “Keno’s contributions
are fundamental to Julia’s growth and development. Keno started
contributing to Julia in high school when he led the Julia port to
Windows. Keno also led Julia Computing’s efforts on
Celeste, which
is the first petascale application in a dynamic computing language, and
Google.ai lead Jeff
Dean

recognized Keno’s work porting Julia to Google Cloud Tensor Processing
Units
(TPUs)

for artificial intelligence and machine learning. Keno is only 23 years
old and he is just getting started!”

About Julia and Julia Computing

  • Julia is free and open source with a large and growing community of
    more than 800 contributors, 2 million downloads, 1,900 packages, 41
    thousand GitHub stars (cumulative for Julia language and
    Julia packages) and +101% annual download growth

  • Julia combines the high-level productivity and ease of use of Python
    and R with the lightning-fast speed of C++

  • Julia users, partners and employers hiring Julia programmers include
    Amazon, Apple, BlackRock, Booz Allen Hamilton, Capital One, Comcast,
    Disney, Ernst & Young, Facebook, Federal Aviation Administration,
    Federal Reserve Bank of New York, Ford, Google, IBM, Intel, KPMG,
    Microsoft, NASA, Netflix, Oracle, PwC and Uber

  • Julia is used at more than 1,500 universities, research laboratories
    and research institutions worldwide including Harvard, MIT, UC
    Berkeley, Stanford, University of Chicago, Caltech, Carnegie Mellon,
    Cambridge, Oxford, Lawrence Berkeley National Laboratory, Oak Ridge
    National Laboratory, Los Alamos National Laboratory, National Energy
    Research Scientific Computing Center, Lawrence Livermore National
    Laboratory, Alan Turing Institute, Max Planck Institute, National
    Renewable Energy Laboratory, Argonne National Laboratory, Ames
    Laboratory and Barts Cancer Institute

  • Julia is the only high-level dynamic language that has run at
    petascale

  • Julia leveraged 650,000 cores and 1.3 million threads on 9,300
    Knights Landing (KNL) nodes to
    catalog
    188 million astronomical objects in just 14.6 minutes using the
    world’s sixth most powerful supercomputer

  • Julia provides speed and performance improvements of 1,000x or more
    for applications such as insurance risk
    modeling
    and
    astronomical image
    analysis

  • Julia delivers vast improvements in speed and performance on a wide
    range of architectures from a single laptop to the world’s sixth
    most powerful supercomputer, and from one node to thousands of nodes
    including multithreading, GPU and parallel computing capabilities

  • Julia powers the Federal Aviation Administration’s NextGen
    Aircraft Collision Avoidance
    System (ACAS-X)
    ,
    BlackRock’s trademarket Aladdin analytics
    platform

    and the New York Federal Reserve Bank’s Dynamic Stochastic General
    Equilibrium (DSGE) macroeconomic
    model

  • Julia Computing was founded in
    2015 by all of the co-creators of Julia to provide Julia users with
    Julia products, Julia training, and Julia support. Julia Computing
    is headquartered in Boston with offices in London and Bangalore

Julia at NIPS and the Future of Machine Learning Tools

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2018/11/15/julia-ml-three-papers.html

We are excited to share several research papers on the Julia and Flux machine learning ecosystem, to be presented at the NIPS Systems for ML Workshop. Since initially proposing the need for a first-class language and ecosystem for machine learning (ML), we have made considerable progress, including the ability to take gradients of arbitrary computations by leveraging Julia’s compiler, and compiling the resulting programs to specialized hardware such as Google’s Tensor Processing Units.

Here we talk about these papers and the projects that have brought these to life, namely: Flux.jl [paper], Zygote.jl [paper] and XLA.jl [paper].

Flux.jl is a library that gives a fresh take on machine learning as it exposes powerful tools to the user in a non-intrusive manner while remaining completely hackable, right to its core.

“Careful design of the underlying automatic differentiation allows freely mixing mathematical expressions, built-in and custom layers and algorithms with control flow in one model. This makes Flux unusually easy to extend to new problems.”



Flux plays nicely with the entire Julia ecosystem, leveraging Julia’s multiple dispatch to make sharing types and data between Flux and many widely used array types transparent (eg. CuArrays for effortless translation of models and data to the GPU). It even lets users extend Julia’s compiler and write custom GPU kernels within the same program.

In the Flux paper, we demonstrate the ease with which one is able to take advantage of the underlying ecosystem to express ideas and complicated thoughts. One example is how Flux models can be learned with custom training loops that can house arbitrary logic, including more complex gradient flows than a typical machine learning framework might support.

for x, c, d in training_set
    c_hat, d_hat = model(x)
    c_loss = loss(c_hat, y) + λ*loss(d_hat, 1 - d)
    d_loss = loss(d_hat, d)
    back!(c_loss)
    back!(d_loss)
    opt() 
end 

Flux.jl has been shown to run on par with contemporary deep learning libraries while being dramatically simpler, providing intelligent abstractions and maintaining a minimalist API.

Calculating derivatives is a recurrent and intensive task while training any large model, and compiler level optimisations for differentiable code have seen a recent surge in interest. Automatic Differentiation, a topic of much interest in the current ML landscape, can be used almost transparently when hooked into the language compiler.

Zygote.jl is one such example of doing source-to-source transformations of the Static Single Assignment (SSA) form, taking advantage of many of the recent improvements made to the base Julia compiler. Similar efforts such as Capstan.jl showcase an alternative application of these same compiler primitives toward automatic differentiation for different applications.



Zygote transparently generates adjoint code for arbitrary Julia functions, sacrificing neither speed nor the dynamism of the full Julia language. It interacts directly with Julia’s existing compiler and utilizes its full set of optimisation heuristics. It exposes a familiar interface, making usage extremely simple, as shown by the following example:

>>> @code_llvm derivative(x -> 5x+3, 1)
define i64 @"julia_#625_38792"(i64) 
{ top:
    ret i64 5
}

It enables reverse mode AD while preserving existing language semantics. The Zygote paper also presents some benchmarks for simple functions against contemporary methods.

“It opens up the opportunity for robust traditional compiler techniques to be extended to machine learning, enabling kernel fusion or compilation for accelerators with no artificial limitations on the kinds of models that researchers can express.
This combination has not previously been possible in a high-level, general-purpose programming language.”

XLA.jl, released recently shows the ability to repurpose the Julia compiler to target Google’s TPUs.



This package combines the simple and elegant Flux models, applies Zygote’s AD and offloads the entire forward and backward pass onto the TPU for the utmost speed, bringing the entire story full circle. The XLA paper details its methodology, using Google’s latest XRT API to compile Julia code to XLA IR. It explains how the forward and backward passes are generated, as well as handling things such as control flow and compiling dynamic Julia code down to static sub-segments for execution on the TPU.

“Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s”

XLA.jl is written in under 1000 lines of code, a truly impressive feat considering the opportunities it opens up. It also shines a light on the language’s expressive power.

# An HLO operand that generates a random
# uniform random number of the specificed
# shape and element type:
struct HloRng <: HloOp{:rng}
    Type
    Shape
end

"""A function that adds random numbers to
each entry of a 1000x1000 matrix"""
@eval function add_rand_1000x1000(
        A::XRTArray{Float32, (1000, 1000), 2}
        random = $(HloRng(Float32,(1000, 1000)))()
    result = $(HloAdd())(random, A)
    return result
end

Google cloud TPUs provide an efficient, extremely high-performance computational platform able to dramatically speed up the demanding task of training models. From the BFloat16s.jl package which allows prototyping of algorithms on CPUs to check algorithmic stability with the restricted precision available within TPUs, to the internal compiler and related ML ecosystem, Julia supports a dynamic, familiar and high-performance environment for taking advantage of this special hardware. The progress made within the past few months and the recognition received have us very excited about the future of machine learning in the Julia ecosystem and the world at large.

Julia at NIPS and the Future of Machine Learning Tools

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2018/11/15/arxiv-papers.html

We are excited to share several research papers on the Julia and Flux machine learning ecosystem, to be presented at the NIPS Systems for ML Workshop. Since initially proposing the need for a first-class language and ecosystem for machine learning (ML), we have made considerable progress, including the ability to take gradients of arbitrary computations by leveraging Julia’s compiler, and compiling the resulting programs to specialized hardware such as Google’s Tensor Processing Units.

Here we talk about these papers and the projects that have brought these to life, namely: Flux.jl [paper], Zygote.jl [paper] and XLA.jl [paper].

Flux.jl is a library that gives a fresh take on machine learning as it exposes powerful tools to the user in a non-intrusive manner while remaining completely hackable, right to its core.

“Careful design of the underlying automatic differentiation allows freely mixing mathematical expressions, built-in and custom layers and algorithms with control flow in one model. This makes Flux unusually easy to extend to new problems.”



Flux plays nicely with the entire Julia ecosystem, leveraging Julia’s multiple dispatch to make sharing types and data between Flux and many widely used array types transparent (eg. CuArrays for effortless translation of models and data to the GPU). It even lets users extend Julia’s compiler and write custom GPU kernels within the same program.

In the Flux paper, we demonstrate the ease with which one is able to take advantage of the underlying ecosystem to express ideas and complicated thoughts. One example is how Flux models can be learned with custom training loops that can house arbitrary logic, including more complex gradient flows than a typical machine learning framework might support.

for x, c, d in training_set
    c_hat, d_hat = model(x)
    c_loss = loss(c_hat, y) + λ*loss(d_hat, 1 - d)
    d_loss = loss(d_hat, d)
    back!(c_loss)
    back!(d_loss)
    opt() 
end 

Flux.jl has been shown to run on par with contemporary deep learning libraries while being dramatically simpler, providing intelligent abstractions and maintaining a minimalist API.

Calculating derivatives is a recurrent and intensive task while training any large model, and compiler level optimisations for differentiable code have seen a recent surge in interest. Automatic Differentiation, a topic of much interest in the current ML landscape, can be used almost transparently when hooked into the language compiler.

Zygote.jl is one such example of doing source-to-source transformations of the Static Single Assignment (SSA) form, taking advantage of many of the recent improvements made to the base Julia compiler. Similar efforts such as Capstan.jl showcase an alternative application of these same compiler primitives toward automatic differentiation for different applications.



Zygote transparently generates adjoint code for arbitrary Julia functions, sacrificing neither speed nor the dynamism of the full Julia language. It interacts directly with Julia’s existing compiler and utilizes its full set of optimisation heuristics. It exposes a familiar interface, making usage extremely simple, as shown by the following example:

>>> @code_llvm derivative(x -> 5x+3, 1)
define i64 @"julia_#625_38792"(i64) 
{ top:
    ret i64 5
}

It enables reverse mode AD while preserving existing language semantics. The Zygote paper also presents some benchmarks for simple functions against contemporary methods.

“It opens up the opportunity for robust traditional compiler techniques to be extended to machine learning, enabling kernel fusion or compilation for accelerators with no artificial limitations on the kinds of models that researchers can express.
This combination has not previously been possible in a high-level, general-purpose programming language.”

XLA.jl, released recently shows the ability to repurpose the Julia compiler to target Google’s TPUs.



This package combines the simple and elegant Flux models, applies Zygote’s AD and offloads the entire forward and backward pass onto the TPU for the utmost speed, bringing the entire story full circle. The XLA paper details its methodology, using Google’s latest XRT API to compile Julia code to XLA IR. It explains how the forward and backward passes are generated, as well as handling things such as control flow and compiling dynamic Julia code down to static sub-segments for execution on the TPU.

“Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s”

XLA.jl is written in under 1000 lines of code, a truly impressive feat considering the opportunities it opens up. It also shines a light on the language’s expressive power.

# An HLO operand that generates a random
# uniform random number of the specificed
# shape and element type:
struct HloRng <: HloOp{:rng}
    Type
    Shape
end

"""A function that adds random numbers to
each entry of a 1000x1000 matrix"""
@eval function add_rand_1000x1000(
        A::XRTArray{Float32, (1000, 1000), 2}
        random = $(HloRng(Float32,(1000, 1000)))()
    result = $(HloAdd())(random, A)
    return result
end

Google cloud TPUs provide an efficient, extremely high-performance computational platform able to dramatically speed up the demanding task of training models. From the BFloat16s.jl package which allows prototyping of algorithms on CPUs to check algorithmic stability with the restricted precision available within TPUs, to the internal compiler and related ML ecosystem, Julia supports a dynamic, familiar and high-performance environment for taking advantage of this special hardware. The progress made within the past few months and the recognition received have us very excited about the future of machine learning in the Julia ecosystem and the world at large.