Author Archives: Julia Computing, Inc.

Newsletter October 2017

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/10/06/october-newsletter.html

We wanted to thank all Julia users and well wishers for the continued use of and support for Julia, and share some of the latest developments from Julia Computing and the Julia community.

  1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran
  2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips
  3. Julia Computing and Julia Featured in Forbes
  4. Julia Meetups in Your City
  5. Julia Computing at Open Data Science Conference
  6. Julia at a Conference Near You
  7. Julia Goes to Hollywood
  8. Recent Julia Events in New York
  9. Contact Us


  1. Comparison of Differential Equation Solvers in Julia, R, Python, C, Matlab, Mathematica, Maple and Fortran: Julia contributor Christopher Rackauckus from UC Irvine wrote a seminal blog post comparing the capabilities of differential equation solvers in Julia and other languages. Follow the detailed discussion on Hacker News.

  2. JuliaCon 2017: Machine Learning / Deep Learning Video Clips

    Mike Innes: Flux: Machine Learning with Julia

    Jonathan Malmaud: Modern Machine Learning in Julia with TensorFlow.jl

    Deniz Yuret: Machine Learning in 5 Slides

    Deniz Yuret: What is Knet.jl and Why Use Julia for Deep Learning

    JuliaCon 2017

  3. Julia Computing and Julia Featured in Forbes: Forbes published a major feature about Julia Computing and Julia titled “How a New Programming Language Created by Four Scientists Is Now Used by the World’s Biggest Companies” by Suparna Dutt D’Cunha. The article describes the origin of Julia and Julia Computing and how Julia is being used today. The article has more than 75 thousand page views.

  4. Julia Meetups in Your City: There are dozens of Julia meetup groups with more than 13,000 members around the globe. Julia Computing is looking to support Julia meetups by providing training materials, Webcasts and other support. Please email us if you would like to organize a meetup group or partner with Julia Computing on a meetup event. Upcoming events include:

  5. Julia Computing at Open Data Science Conference: Julia Computing will be presenting at the Open Data Science Conference in London Oct 12-14 and in San Francisco Nov 2-4.

  6. Julia at a Conference Near You: Julia Computing will be present at several upcoming conferences and talks including:

  7. Julia Goes to Hollywood: Julia has been featured in two Hollywood television programs: The 100 and Casual. In The 100, Julia is used to represent the language of artificial intelligence in the year 2150. In Casual, Julia is described as a language for the latest generation of programmers.

  8. Recent Julia Events in New York: Julia co-creators Viral Shah and Stefan Karpinski presented an introduction to Julia at the Data Driven NYC meetup on September 25. Their presentation is available here. They also presented Julia and Spark, Better Together at the Strata Data conference. Try out the Julia bindings for Spark with Spark.jl, with a special thanks to Andrei Zhabinski.

  9. Contact Us: Please contact us if you wish to:

    • Purchase or obtain license information for Julia products such as JuliaPro, JuliaRun, JuliaDB, JuliaFin or JuliaBox
    • Obtain pricing for Julia consulting projects for your enterprise
    • Schedule Julia training for your organization
    • Share information about exciting new Julia case studies or use cases
    • Partner with Julia Computing to organize a Julia meetup group, hackathon, workshop, training or other event in your city

    About Julia and Julia Computing

    Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.

    Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:

    • JuliaPro for data science professionals and researchers to install and run Julia with more than one hundred carefully curated popular Julia packages on a laptop or desktop computer.
    • JuliaRun for deploying Julia at scale on dozens, hundreds or thousands of nodes in the public or private cloud, including AWS and Microsoft Azure.
    • JuliaFin for financial modeling, algorithmic trading and risk analysis including Bloomberg and Excel integration, Miletus for designing and executing trading strategies and advanced time-series analytics.
    • JuliaDB for in-database in-memory analytics and advanced time-series analysis.
    • JuliaBox for students or new Julia users to experience Julia in a Jupyter notebook right from a Web browser with no download or installation required.

    To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.

    Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.

Demystifying Auto-vectorization in Julia

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/blog/2017/09/27/auto-vectorization-in-julia.html

Introduction

One of the most impressive features of Julia is that it lets you write generic code. Julia’s powerful LLVM-based compiler can automatically generate highly efficient machine code for base functions and user-written functions alike, on any architecture supported by LLVM, making you worry less about writing specialized code for each of these architectures.

One additional benefit of relying on the compiler for performance rather than hand-coding hot loops in assembly is that it is significantly more future proof. Whenever a next generation instruction set architecture comes out, your Julia code automatically gets faster.

Following a (very) brief look at what the hardware provides, we’ll look at a simple example (the sum function) to see how the compiler can take advantage of the hardware architecture to accelerate generic Julia functions.

Intel SIMD Hardware and Addition Instructions

Modern Intel chips provide a range of instruction set extensions. Among these are the various revisions of the Streaming SIMD Extension (SSE) and several generations of Advanced Vector Extensions (available with their latest processor families). These extensions provide Single Instruction Multiple Data (SIMD)-style programming, providing significant speed up for code amenable to such a programming style.

SIMD Registers are 128 (SSE), 256 (AVX) or 512 (AVX512) bits wide. They can generally be used in chunks of 8/16/32 or 64 bits, but exactly which divisions are available and which operations can be performed depend on the exact hardware achitecture.

Here is how the Add operation instructions on this architecture look like:

  • (V)ADDPS: Takes two 128/256/512 bit values and adds 4/8/16 single precision values in parallel
  • (V)ADDPD: Takes two 128/256/512 bit values and adds 2/4/8 double precision values in parallel
  • (V)PADD(B/W/D/Q): Takes two 128/256/512 bit values and adds (up to 64) 8/16/32/64-bit integers in parallel
  • (V)ADDSUBP(S,W): Takes two inputs, the operation is (+,-,+,-,…) on packed values
  • There are also a few more exotic instructions that involve horizontal adds, saturating, etc.

An Example

The following code snippet shows a simple sum function (returning the sum of all the elements in an Array ‘a’) in Julia:

function mysum(a::Vector)
total = zero(eltype(a))
@simd for x in a				
total += x
end						
return total
end

We can visualize this sequential operation, as a simple sequence of memory loads and additions:

However, this is not the code that Julia actually generates under the hood. By taking advantage of the SIMD instruction set, the add operation is performed in two phases:

  1. During the first step (denoted “Vector Body” below), intermediate values are accumulated four at a time (in our example – depends on the hardware of course).

  2. The reduction step, in which the final four elements are summed together.

This picture is simplified a bit, but conveys the general idea of the transformation, in the real code, there is a few extra caveats that the compiler has to pay attention to.

  • If the array length is not known to be a multiple of the vector width, the compiler may have to generate an additional scalar part to sum the remaining elements (for high vector width – e.g. 32 that may itself use vector instructions). Depending on the hardware, the same is true if the memory alignment is not known.

  • To take advantage of “superscalarness” (the ability of a processor to execute more than one instruction in parallel over and above SIMD), compilers will often “unroll” the vector body, keeping more than one SIMD register’s worth of state at the expense of a larger reduction step. (on the previous illustration, imagine the vector body copied four times vertically, with sums happening every fourth set of four values).

  • If you’re summing floating-point, the above transformation may need to be explicitly allowed (the julia “@simd” macro does this for you), since floating-point arithmetic is in general non-associative (i.e. the result of the sum may differ between the two methods of computing it).

Machine Code Generated by the Compiler

In julia, we can use the @code_native macro to inspect the native code generated for any particular function. Trying this for our “mysum” function looks like for a summation of 100000 random numbers of Float64 type on a machine that supports AVX2, we can see precisely the pattern we expected:

@code_native mysum(rand(Float64 , 100000)) ;
vaddpd %ymm5, %ymm0, %ymm0
vaddpd %ymm6, %ymm2, %ymm2					
vaddpd %ymm7, %ymm3, %ymm3
vaddpd %ymm8, %ymm4, %ymm4
; NOTE: Omitting length check/branch here vaddpd %ymm0, %ymm2, %ymm0
vaddpd %ymm0, %ymm3, %ymm0
vaddpd %ymm0, %ymm4, %ymm0
vextractf128 $1, %ymm0, %xmm2
vaddpd %ymm2, %ymm0, %ymm0
vhaddpd %ymm0, %ymm0, %ymm0

The vector body phase on this machine is unrolled four times, using ymm0, 2, 3, and 4 as the accumulation registers.
The reduction step phase accumulates ymm2,3 and 4 into ymm0, and finally sums up parts of ymm0 itself to give the final result.

Here is how the machine code for the same function (arguments of type Float64) would look like on a machine that supports AVX512:

julia > @code_native mysum(rand(Float64 , 100000))  ;
vaddpd -192(%rdx), %zmm0, %zmm0
vaddpd -128(%rdx), %zmm2, %zmm2				
vaddpd -64(%rdx), %zmm3, %zmm3
vaddpd (%rdx), %zmm4, %zmm4
; NOTE: Omitting length check/branch here vaddpd %zmm0, %zmm2, %zmm0
vaddpd %zmm0, %zmm3, %zmm0
vaddpd %zmm0, %zmm4, %zmm0
vshuff64x2 $14, %zmm0, %zmm0, %zmm2 vaddpd %zmm2, %zmm0, %zmm0
vpermpd $238, %zmm0, %zmm2
vaddpd %zmm2, %zmm0, %zmm0
vpermilpd $1, %zmm0, %zmm2
vaddpd %zmm2, %zmm0, %zmm0

It is evident that the machine code generated might look different on other architectures, or with different data types, and might even look more complicated, but the pattern of generating the best machine code possible for Vector Body and Reduction phases on any architecture is consistent, therefore making auto-vectorization in Julia a cake walk.

Julia Computing and Julia Featured in Forbes Asia

By: Julia Computing, Inc.

Re-posted from: http://juliacomputing.com/press/2017/09/25/julia-computing-featured-in-forbes-asia.html

Bengaluru, India – Forbes Asia has published a major feature on Julia Computing and Julia.

The article is titled “How a New Programming Language Created by Four Scientists Is Now Used by the World’s Biggest Companies” by Suparna Dutt D’Cunha. It describes the geneses of Julia and Julia Computing and how Julia is being used today.

For example, Julia users, partners and firms hiring Julia programmers include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC and Uber.

So far, the article has more than 60 thousand page views and that number is still climbing.

As Stefan Karpinski, Julia Computing CTO for Open Source, explains, “Julia empowers data scientists, physicists, quantitative finance traders and robot designers to solve problems without having to become computer programmers or hire computer programmers to translate their functions into computer code.”

About Julia and Julia Computing

Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC and Uber.

  1. Julia is lightning fast. Julia is being used in production today and has generated speed improvements up to 1,000x for insurance model estimation and parallel supercomputing astronomical image analysis.

  2. Julia provides unlimited scalability. Julia applications can be deployed on large clusters with a click of a button and can run parallel and distributed computing quickly and easily on tens of thousands of nodes.

  3. Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.

  4. Julia integrates well with existing code and platforms. Users of C, C++, Python, R and other languages can easily integrate their existing code into Julia.

  5. Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing. It has advanced libraries that make programming simple and fast and dramatically reduce the number of lines of code required – in some cases, by 90% or more.

  6. Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.