## Forbes Names Julia Computing Co-Founder Keno Fischer to ‘30 Under 30’ List

Cambridge, MA – Forbes has named Julia Computing Co-Founder and
Chief Technology Officer Keno
Fischer

to its prestigious ‘30 Under 30’ list of young leaders in enterprise
technology.

The Forbes ‘30 Under 30’ list recognizes 30 extraordinary individuals
under the age of 30 for their accomplishments.

Keno Fischer began contributing to Julia when the language was first
released in 2012. At the time, Keno was a 16 year-old high school
student. Keno is a native of Hösel, Germany who co-founded Julia
Computing in 2015 and graduated from Harvard University in 2016.

According to Viral Shah, CEO of Julia Computing, “Keno’s contributions
are fundamental to Julia’s growth and development. Keno started
contributing to Julia in high school when he led the Julia port to
Windows. Keno also led Julia Computing’s efforts on
Celeste, which
is the first petascale application in a dynamic computing language, and
Dean

recognized Keno’s work porting Julia to Google Cloud Tensor Processing
Units
(TPUs)

for artificial intelligence and machine learning. Keno is only 23 years
old and he is just getting started!”

• Julia is free and open source with a large and growing community of
thousand GitHub stars (cumulative for Julia language and

• Julia combines the high-level productivity and ease of use of Python
and R with the lightning-fast speed of C++

• Julia users, partners and employers hiring Julia programmers include
Amazon, Apple, BlackRock, Booz Allen Hamilton, Capital One, Comcast,
Federal Reserve Bank of New York, Ford, Google, IBM, Intel, KPMG,
Microsoft, NASA, Netflix, Oracle, PwC and Uber

• Julia is used at more than 1,500 universities, research laboratories
and research institutions worldwide including Harvard, MIT, UC
Berkeley, Stanford, University of Chicago, Caltech, Carnegie Mellon,
Cambridge, Oxford, Lawrence Berkeley National Laboratory, Oak Ridge
National Laboratory, Los Alamos National Laboratory, National Energy
Research Scientific Computing Center, Lawrence Livermore National
Laboratory, Alan Turing Institute, Max Planck Institute, National
Renewable Energy Laboratory, Argonne National Laboratory, Ames
Laboratory and Barts Cancer Institute

• Julia is the only high-level dynamic language that has run at
petascale

• Julia leveraged 650,000 cores and 1.3 million threads on 9,300
Knights Landing (KNL) nodes to
catalog
188 million astronomical objects in just 14.6 minutes using the
world’s sixth most powerful supercomputer

• Julia provides speed and performance improvements of 1,000x or more
for applications such as insurance risk
modeling
and
astronomical image
analysis

• Julia delivers vast improvements in speed and performance on a wide
range of architectures from a single laptop to the world’s sixth
most powerful supercomputer, and from one node to thousands of nodes
including multithreading, GPU and parallel computing capabilities

• Julia Computing was founded in
2015 by all of the co-creators of Julia to provide Julia users with
Julia products, Julia training, and Julia support. Julia Computing
is headquartered in Boston with offices in London and Bangalore

## Julia at NIPS and the Future of Machine Learning Tools

We are excited to share several research papers on the Julia and Flux machine learning ecosystem, to be presented at the NIPS Systems for ML Workshop. Since initially proposing the need for a first-class language and ecosystem for machine learning (ML), we have made considerable progress, including the ability to take gradients of arbitrary computations by leveraging Julia’s compiler, and compiling the resulting programs to specialized hardware such as Google’s Tensor Processing Units.

Here we talk about these papers and the projects that have brought these to life, namely: Flux.jl [paper], Zygote.jl [paper] and XLA.jl [paper].

Flux.jl is a library that gives a fresh take on machine learning as it exposes powerful tools to the user in a non-intrusive manner while remaining completely hackable, right to its core.

“Careful design of the underlying automatic differentiation allows freely mixing mathematical expressions, built-in and custom layers and algorithms with control flow in one model. This makes Flux unusually easy to extend to new problems.”

Flux plays nicely with the entire Julia ecosystem, leveraging Julia’s multiple dispatch to make sharing types and data between Flux and many widely used array types transparent (eg. CuArrays for effortless translation of models and data to the GPU). It even lets users extend Julia’s compiler and write custom GPU kernels within the same program.

In the Flux paper, we demonstrate the ease with which one is able to take advantage of the underlying ecosystem to express ideas and complicated thoughts. One example is how Flux models can be learned with custom training loops that can house arbitrary logic, including more complex gradient flows than a typical machine learning framework might support.

for x, c, d in training_set
c_hat, d_hat = model(x)
c_loss = loss(c_hat, y) + λ*loss(d_hat, 1 - d)
d_loss = loss(d_hat, d)
back!(c_loss)
back!(d_loss)
opt()
end


Flux.jl has been shown to run on par with contemporary deep learning libraries while being dramatically simpler, providing intelligent abstractions and maintaining a minimalist API.

Calculating derivatives is a recurrent and intensive task while training any large model, and compiler level optimisations for differentiable code have seen a recent surge in interest. Automatic Differentiation, a topic of much interest in the current ML landscape, can be used almost transparently when hooked into the language compiler.

Zygote.jl is one such example of doing source-to-source transformations of the Static Single Assignment (SSA) form, taking advantage of many of the recent improvements made to the base Julia compiler. Similar efforts such as Capstan.jl showcase an alternative application of these same compiler primitives toward automatic differentiation for different applications.

Zygote transparently generates adjoint code for arbitrary Julia functions, sacrificing neither speed nor the dynamism of the full Julia language. It interacts directly with Julia’s existing compiler and utilizes its full set of optimisation heuristics. It exposes a familiar interface, making usage extremely simple, as shown by the following example:

>>> @code_llvm derivative(x -> 5x+3, 1)
define i64 @"julia_#625_38792"(i64)
{ top:
ret i64 5
}


It enables reverse mode AD while preserving existing language semantics. The Zygote paper also presents some benchmarks for simple functions against contemporary methods.

“It opens up the opportunity for robust traditional compiler techniques to be extended to machine learning, enabling kernel fusion or compilation for accelerators with no artificial limitations on the kinds of models that researchers can express.
This combination has not previously been possible in a high-level, general-purpose programming language.”

XLA.jl, released recently shows the ability to repurpose the Julia compiler to target Google’s TPUs.

This package combines the simple and elegant Flux models, applies Zygote’s AD and offloads the entire forward and backward pass onto the TPU for the utmost speed, bringing the entire story full circle. The XLA paper details its methodology, using Google’s latest XRT API to compile Julia code to XLA IR. It explains how the forward and backward passes are generated, as well as handling things such as control flow and compiling dynamic Julia code down to static sub-segments for execution on the TPU.

“Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s”

XLA.jl is written in under 1000 lines of code, a truly impressive feat considering the opportunities it opens up. It also shines a light on the language’s expressive power.

# An HLO operand that generates a random
# uniform random number of the specificed
# shape and element type:
struct HloRng <: HloOp{:rng}
Type
Shape
end

"""A function that adds random numbers to
each entry of a 1000x1000 matrix"""
A::XRTArray{Float32, (1000, 1000), 2}
random = $(HloRng(Float32,(1000, 1000)))() result =$(HloAdd())(random, A)
return result
end


Google cloud TPUs provide an efficient, extremely high-performance computational platform able to dramatically speed up the demanding task of training models. From the BFloat16s.jl package which allows prototyping of algorithms on CPUs to check algorithmic stability with the restricted precision available within TPUs, to the internal compiler and related ML ecosystem, Julia supports a dynamic, familiar and high-performance environment for taking advantage of this special hardware. The progress made within the past few months and the recognition received have us very excited about the future of machine learning in the Julia ecosystem and the world at large.

## Julia at NIPS and the Future of Machine Learning Tools

Re-posted from: http://juliacomputing.com/blog/2018/11/15/arxiv-papers.html

We are excited to share several research papers on the Julia and Flux machine learning ecosystem, to be presented at the NIPS Systems for ML Workshop. Since initially proposing the need for a first-class language and ecosystem for machine learning (ML), we have made considerable progress, including the ability to take gradients of arbitrary computations by leveraging Julia’s compiler, and compiling the resulting programs to specialized hardware such as Google’s Tensor Processing Units.

Here we talk about these papers and the projects that have brought these to life, namely: Flux.jl [paper], Zygote.jl [paper] and XLA.jl [paper].

Flux.jl is a library that gives a fresh take on machine learning as it exposes powerful tools to the user in a non-intrusive manner while remaining completely hackable, right to its core.

“Careful design of the underlying automatic differentiation allows freely mixing mathematical expressions, built-in and custom layers and algorithms with control flow in one model. This makes Flux unusually easy to extend to new problems.”

Flux plays nicely with the entire Julia ecosystem, leveraging Julia’s multiple dispatch to make sharing types and data between Flux and many widely used array types transparent (eg. CuArrays for effortless translation of models and data to the GPU). It even lets users extend Julia’s compiler and write custom GPU kernels within the same program.

In the Flux paper, we demonstrate the ease with which one is able to take advantage of the underlying ecosystem to express ideas and complicated thoughts. One example is how Flux models can be learned with custom training loops that can house arbitrary logic, including more complex gradient flows than a typical machine learning framework might support.

for x, c, d in training_set
c_hat, d_hat = model(x)
c_loss = loss(c_hat, y) + λ*loss(d_hat, 1 - d)
d_loss = loss(d_hat, d)
back!(c_loss)
back!(d_loss)
opt()
end


Flux.jl has been shown to run on par with contemporary deep learning libraries while being dramatically simpler, providing intelligent abstractions and maintaining a minimalist API.

Calculating derivatives is a recurrent and intensive task while training any large model, and compiler level optimisations for differentiable code have seen a recent surge in interest. Automatic Differentiation, a topic of much interest in the current ML landscape, can be used almost transparently when hooked into the language compiler.

Zygote.jl is one such example of doing source-to-source transformations of the Static Single Assignment (SSA) form, taking advantage of many of the recent improvements made to the base Julia compiler. Similar efforts such as Capstan.jl showcase an alternative application of these same compiler primitives toward automatic differentiation for different applications.

Zygote transparently generates adjoint code for arbitrary Julia functions, sacrificing neither speed nor the dynamism of the full Julia language. It interacts directly with Julia’s existing compiler and utilizes its full set of optimisation heuristics. It exposes a familiar interface, making usage extremely simple, as shown by the following example:

>>> @code_llvm derivative(x -> 5x+3, 1)
define i64 @"julia_#625_38792"(i64)
{ top:
ret i64 5
}


It enables reverse mode AD while preserving existing language semantics. The Zygote paper also presents some benchmarks for simple functions against contemporary methods.

“It opens up the opportunity for robust traditional compiler techniques to be extended to machine learning, enabling kernel fusion or compilation for accelerators with no artificial limitations on the kinds of models that researchers can express.
This combination has not previously been possible in a high-level, general-purpose programming language.”

XLA.jl, released recently shows the ability to repurpose the Julia compiler to target Google’s TPUs.

This package combines the simple and elegant Flux models, applies Zygote’s AD and offloads the entire forward and backward pass onto the TPU for the utmost speed, bringing the entire story full circle. The XLA paper details its methodology, using Google’s latest XRT API to compile Julia code to XLA IR. It explains how the forward and backward passes are generated, as well as handling things such as control flow and compiling dynamic Julia code down to static sub-segments for execution on the TPU.

“Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s”

XLA.jl is written in under 1000 lines of code, a truly impressive feat considering the opportunities it opens up. It also shines a light on the language’s expressive power.

# An HLO operand that generates a random
# uniform random number of the specificed
# shape and element type:
struct HloRng <: HloOp{:rng}
Type
Shape
end

"""A function that adds random numbers to
each entry of a 1000x1000 matrix"""
random = $(HloRng(Float32,(1000, 1000)))() result =$(HloAdd())(random, A)