The reaction-diffusion equation is a PDE commonly handled in systems biology which is a diffusion equation plus a nonlinear reaction term. ... READ MORE
The post Solving Systems of Stochastic PDEs and using GPUs in Julia appeared first on Stochastic Lifestyle.
]]>Re-posted from: http://www.stochasticlifestyle.com/solving-systems-stochastic-pdes-using-gpus-julia/
What I want to describe in this post is how to solve stochastic PDEs in Julia using GPU parallelism. I will go from start to finish, describing how to use the type-genericness of the DifferentialEquations.jl library in order to write a code that uses within-method GPU-parallelism on the system of PDEs. This is mostly a proof of concept: the most efficient integrators for this problem are not compatible with GPU parallelism yet, and the GPU parallelism isn’t fully efficient yet. However, I thought it would be nice to show an early progress report showing that it works and what needs to be fixed in Base Julia and various libraries for us to get the full efficiency.
The reaction-diffusion equation is a PDE commonly handled in systems biology which is a diffusion equation plus a nonlinear reaction term. The dynamics are defined as:
But this doesn’t need to only have a single “reactant” u: this can be a vector of reactants and the is then the nonlinear vector equations describing how these different pieces react together. Let’s settle on a specific equation to make this easier to explain. Let’s use a simple model of a 3-component system where A can diffuse through space to bind with the non-diffusive B to form the complex C (also non-diffusive, assume B is too big and gets stuck in a cell which causes C=A+B to be stuck as well). Other than the binding, we make each of these undergo a simple birth-death process, and we write down the equations which result from mass-action kinetics. If this all is meaningless to you, just understand that it gives the system of PDEs:
One addition that was made to the model is that we let be the production of , and we let that be a function of space so that way it only is produced on one side of our equation. Let’s make it a constant when x>80, and 0 otherwise, and let our spatial domain be and .
This model is spatial: each reactant is defined at each point in space, and all of the reactions are local, meaning that at spatial point only uses . This is an important fact which will come up later for parallelization.
In order to solve this via a method of lines (MOL) approach, we need to discretize the PDE into a system of ODEs. Let’s do a simple uniformly-spaced grid finite difference discretization. Choose and so that we have 100*100=10000 points for each reactant. Notice how fast that grows! Put the reactants in a matrix such that A[i,j] = , i.e. the columns of the matrix is the values and the rows are the values (this way looking at the matrix is essentially like looking at the discretized space).
So now we have 3 matrices (A, B, and C) for our reactants. How do we discretize the PDE? In this case, the diffusion term simply becomes a tridiagonal matrix where is central band. You can notice that performs diffusion along the columns of , and so this is diffusion along the . Similarly, flips the indices and thus does diffusion along the rows of making this diffusion along . Thus is the discretized Laplacian (we could have separate diffusion constants and if we want by using different constants on the , but let’s not do that for this simple example. I’ll leave that as an exercise for the reader). I enforced a Neumann boundary condition with zero derivative (also known as a no-flux boundary condition) by reflecting the changes over the boundary. Thus the derivative operator is generated as:
const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])) const My = copy(Mx) # Do the reflections, different for x and y operators Mx[2,1] = 2.0 Mx[end-1,end] = 2.0 My[1,2] = 2.0 My[end,end-1] = 2.0
I also could have done this using the DiffEqOperators.jl library, but I wanted to show what it truly is at its core.
Since all of the reactions are local, we only have each point in space react separately. Thus this represents itself as element-wise equations on the reactants. Thus we can write it out quite simply. The ODE which then represents the PDE is thus in pseudo Julia code:
DA = D*(M*A + A*M) @. DA + α₁ - β₁*A - r₁*A*B + r₂*C @. α₂ - β₂*B - r₁*A*B + r₂*C @. α₃ - β₃*C + r₁*A*B - r₂*C
Note here that I am using α₁ as a matrix (or row-vector, since that will broadcast just fine) where every point in space with x<80 has this zero, and all of the others have it as a constant. The other coefficients are all scalars. How do we do this with the ODE solver?
The ArrayPartition is an interesting type from RecursiveArrayTools.jl which allows you to define “an array” as actually being different discrete subunits of arrays. Let’s assume that our initial condition is zero for everything and let the production terms build it up. This means that we can define:
A = zeros(M,N); B = zeros(M,N); C = zeros(M,N)
Now we can put them together as:
u0 = ArrayPartition((A,B,C))
You can read the RecursiveArrayTools.jl README to get more familiar with what the ArrayPartition is, but really it’s an array where u[i] indexes into A first, B second, then C. It also has efficient broadcast, doing the A, B and C parts together (and this is efficient even if they don’t match types!). But since this acts as an array, to DifferentialEquations.jl it is an array!
The important part is that we can “decouple” the pieces of the array at anytime by accessing u.x, which holds our tuple of arrays. Thus our ODE using this ArrayPartition as its container can be written as follows:
function f(t,u,du) A,B,C = u.x dA,dB,dC = du.x DA = D*(M*A + A*M) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end
where this is using @. to do inplace updates on our du to say how the full ArrayPartition should update in time. Note that we can make this more efficient by adding some cache variables to the diffusion matrix multiplications and using A_mul_B!, but let’s ignore that for now.
Together, the ODE which defines our PDE is thus:
prob = ODEProblem(f,u0,(0.0,100.0)) sol = solve(prob,BS3())
if I want to solve it on . Done! The solution gives back ArrayPartitions (and interpolates to create new ones if you use sol(t)). We can plot it in Plots.jl
and see the pretty gradients. Using this 3rd order explicit adaptive Runge-Kutta method we solve this equation in about 40 seconds. That’s okay.
There are some optimizations that can still be done. When we do A*B as matrix multiplication, we create another temporary matrix. These allocations can bog down the system. Instead we can pre-allocate the outputs and use the inplace functions A_mul_B! to make better use of memory. The easiest way to store these cache arrays are constant globals, but you can use closures (anonymous functions which capture data, i.e. (x)->f(x,y)) or call-overloaded types to do it without globals. The globals way (the easy way) is simply:
const MyA = zeros(N,N) const AMx = zeros(N,N) const DA = zeros(N,N) function f(t,u,du) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(MyA,My,A) A_mul_B!(AMx,A,Mx) @. DA = D*(MyA + AMx) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end
For reference, closures looks like:
MyA = zeros(N,N) AMx = zeros(N,N) DA = zeros(N,N) function f_full(t,u,du,MyA,AMx,DA) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(MyA,My,A) A_mul_B!(AMx,A,Mx) @. DA = D*(MyA + AMx) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end f = (t,u,du)-> f_full(t,u,du,MyA,AMx,DA)
and a call overloaded type looks like:
struct MyFunction{T} <: Function MyA::T AMx::T DA::T end # Now define the overload function (ff::MyFunction)(t,u,du) # This is a function which references itself via ff A,B,C = u.x dA,dB,dC = du.x A_mul_B!(ff.MyA,My,A) A_mul_B!(ff.AMx,A,Mx) @. ff.DA = D*(ff.MyA + ff.AMx) @. dA = f.DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end MyA = zeros(N,N) AMx = zeros(N,N) DA = zeros(N,N) f = MyFunction(MyA,AMx,DA) # Now f(t,u,du) is our function!
These last two ways enclose the pointer to our cache arrays locally but still present a function f(t,u,du) to the ODE solver.
Now since PDEs are large, many times we don’t care about getting the whole timeseries. Using the output controls from DifferentialEquations.jl, we can make it only output the final timepoint.
sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)
Also, if you’re using Juno this’ll give you a nice progress bar so you can track how it’s going.
We are using an explicit Runge-Kutta method here because that’s what works with GPUs so far. Matrix factorizations need to be implemented for GPUArrays before the implicit (stiff) solvers will be available, so here we choose BS3 since it’s fully broadcasting (not all methods are yet) and it’s fully GPU compatible. In practice, right now using an NxNx3 tensor as the initial condition / dependent variable with either OrdinaryDiffEq’s Rosenbrock23(), Rodas4(), or Sundials’ CVODE_BDF() is actually more efficient right now. But after Julia fixes its broadcasting issue and with some updates to Julia’s differentiation libraries to handle abstract arrays like in DiffEqDiffTools.jl, the stiff solvers will be usable with GPUs and all will be well.
Thus for reference I will show some ways to do this efficiently with stiff solvers. With a stiff solver we will not want to factorize the dense Jacobian since that would take forever. Instead we can use something like Sundials’ Krylov method:
u0 = zeros(N,N,3) const MyA = zeros(N,N); const AMx = zeros(N,N); const DA = zeros(N,N) function f(t,u,du) A = @view u[:,:,1] B = @view u[:,:,2] C = @view u[:,:,3] dA = @view du[:,:,1] dB = @view du[:,:,2] dC = @view du[:,:,3] A_mul_B!(MyA,My,A) A_mul_B!(AMx,A,Mx) @. DA = D*(MyA + AMx) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end # Solve the ODE prob = ODEProblem(f,u0,(0.0,100.0)) using Sundials @time sol = solve(prob,CVODE_BDF(linear_solver=:BCG))
and that will solve it in about a second. In this case it wouldn’t be more efficient to use the banded linear solver since the system of equations tends to have different parts of the system interact which makes the bands large, and thus a Krylov method is preferred. See this part of the docs for details on the available linear solvers from Sundials. DifferentialEquations.jl exposes a ton of Sundials’ possible choices so hopefully one works for your problem (preconditioners coming soon).
To do something similar with OrdinaryDiffEq.jl, we would need to make use of the linear solver choices in order to override the internal linear solve functions with some kind of sparse matrix solver like a Krylov method from IterativeSolvers.jl. For this size of problem though a multistep method like BDF is probably preferred though, at least until we implement some IMEX methods.
So if you want to solve it quickly right now, that’s how you do it. But let’s get back to our other story: the future is more exciting.
As a summary, here’s a full PDE code:
using OrdinaryDiffEq, RecursiveArrayTools # Define the constants for the PDE const α₂ = 1.0 const α₃ = 1.0 const β₁ = 1.0 const β₂ = 1.0 const β₃ = 1.0 const r₁ = 1.0 const r₂ = 1.0 const D = 100.0 const γ₁ = 0.1 const γ₂ = 0.1 const γ₃ = 0.1 const N = 100 const X = reshape([i for i in 1:100 for j in 1:100],N,N) const Y = reshape([j for i in 1:100 for j in 1:100],N,N) const α₁ = 1.0.*(X.>=80) const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])) const My = copy(Mx) Mx[2,1] = 2.0 Mx[end-1,end] = 2.0 My[1,2] = 2.0 My[end,end-1] = 2.0 # Define the initial condition as normal arrays A = zeros(N,N); B = zeros(N,N); C = zeros(N,N) u0 = ArrayPartition((A,B,C)) const MyA = zeros(N,N); const AMx = zeros(N,N); const DA = zeros(N,N) # Define the discretized PDE as an ODE function function f(t,u,du) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(MyA,My,A) A_mul_B!(AMx,A,Mx) @. DA = D*(MyA + AMx) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end # Solve the ODE prob = ODEProblem(f,u0,(0.0,100.0)) sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false) using Plots; pyplot() p1 = surface(X,Y,sol[end].x[1],title = "[A]") p2 = surface(X,Y,sol[end].x[2],title = "[B]") p3 = surface(X,Y,sol[end].x[3],title = "[C]") plot(p1,p2,p3,layout=grid(3,1))
That was all using the CPU. How do we make turn on GPU parallelism with DifferentialEquations.jl? Well, you don’t. DifferentialEquations.jl “doesn’t have GPU bits”. So wait… can we not do GPU parallelism? No, this is the glory of type-genericness, especially in broadcasted operations. To make things use the GPU, we simply use a GPUArray. If instead of zeros(N,M) we used GPUArray(zeros(N,M)), then u becomes an ArrayPartition of GPUArrays. GPUArrays naturally override broadcast such that dotted operations are performed on the GPU. DifferentialEquations.jl uses broadcast internally (except in this list of current exceptions due to a limitation with Julia’s inference engine which I have discussed with Jameson Nash (@vtjnash) who mentioned this should be fixed in Julia’s 1.0 release), and thus just by putting the array as a GPUArray, the array-type will take over how all internal updates are performed and turn this algorithm into a fully GPU-parallelized algorithm that doesn’t require copying to the CPU. Wasn’t that simple?
From that you can probably also see how to multithread everything, or how to set everything up with distributed parallelism. You can make the ODE solvers do whatever you want by defining an array type where the broadcast does whatever special behavior you want.
So to recap, the entire difference from above is changing to:
using CLArrays gA = CLArray(A); gB = CLArray(B); gC = CLArray(C) const gMx = CLArray(Mx) const gMy = CLArray(My) const gα₁ = CLArray(α₁) gu0 = ArrayPartition((gA,gB,gC)) const gMyA = zeros(N,N) const gAMx = zeros(N,N) const gDA = zeros(N,N) function gf(t,u,du) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(gMyA,gMy,A) A_mul_B!(gAMx,A,gMx) @. DA = D*(gMyA + AgMx) @. dA = DA + gα₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end prob2 = ODEProblem(gf,gu0,(0.0,100.0)) GPUArrays.allowslow(false) # makes sure none of the slow fallbacks are used @time sol = solve(prob2,BS3(),progress=true,dt=0.003,adaptive=false,save_everystep=false,save_start=false) prob2 = ODEProblem(gf,gu0,(0.0,100.0)) sol = solve(prob2,BS3(),progress=true,save_everystep=false,save_start=false) # Adaptivity currently fails due to https://github.com/JuliaGPU/CLArrays.jl/issues/10
You can use CUArrays if you want as well. It looks exactly the same as using CLArrays except you exchange the CLArray calls to CUArray. Go have fun.
Why not make it an SPDE? All that we need to do is extend each of the PDE equations to have a noise function. In this case, let’s use multiplicative noise on each reactant. This means that our noise update equation is:
function g(t,u,du) A,B,C = u.x dA,dB,dC = du.x @. dA = γ₁*A @. dB = γ₂*A @. dC = γ₃*A end
Now we just define and solve the system of SDEs:
prob = SDEProblem(f,g,u0,(0.0,100.0)) sol = solve(prob,SRIW1())
We can see the cool effect that diffusion dampens the noise in [A] but is unable to dampen the noise in [B] which results in a very noisy [C]. The stiff SPDE takes much longer to solve even using high order plus adaptivity because stochastic problems are just that much more difficult (current research topic is to make new algorithms for this!). It gets GPU’d just by using GPUArrays like before. But there we go: solving systems of stochastic PDEs using high order adaptive algorithms with within-method GPU parallelism. That’s gotta be a first? The cool thing is that nobody ever had to implement the GPU-parallelism either, it just exists by virtue of the Julia type system.
Warning: This can take awhile to solve! An explicit Runge-Kutta algorithm isn’t necessarily great here, though to use a stiff solver on a problem of this size requires once again smartly choosing sparse linear solvers. The high order adaptive method is pretty much necessary though since something like Euler-Maruyama is simply not stable enough to solve this at a reasonable dt. Also, the current algorithms are not so great at handling this problem. Good thing there’s a publication coming along with some new stuff…
Note: the version of SRIW1 which uses broadcast for GPUs is not on the current versions of StochasticDiffEq.jl since it’s slower due to a bug when fusing too many broadcasts which will hopefully get fixed in one of Julia’s 1.x releases. Until then, GPUs cannot be used with this algorithm without a (quick) modification.
So that’s where we’re at. GPU parallelism works because of abstract typing. But in some cases we need to help the GPU array libraries get up to snuff to handle all of the operations, and then we’ll really be in business! Of course there’s more optimizing that needs to be done, and we can do this by specializing code paths on bottlenecks as needed.
I think this is at least a nice proof of concept showing that Julia’s generic algorithms allow for one to not only take advantage of things like higher precision, but also take advantage of parallelism and extra hardware without having to re-write the underlying algorithm. There’s definitely more work that needs to be done, but I can see this usage of abstract array typing as being one of Julia’s “killer features” in the coming years as the GPU community refines its tools. I’d give at least a year before all of this GPU stuff is compatible with stiff solvers and linear solver choices (so that way it can make use of GPU-based Jacobian factorizations and Krylov methods). And comparable methods for SDEs are something I hope to publish soon since the current tools are simply not fit for this scale of problem: high order, adaptivity, sparse linear solvers, and A/L-stability all need to be combined in order to tackle this problem efficiently.
Here’s the full script for recreating everything:
####################################################### ### Solve the PDE ####################################################### using OrdinaryDiffEq, RecursiveArrayTools # Define the constants for the PDE const α₂ = 1.0 const α₃ = 1.0 const β₁ = 1.0 const β₂ = 1.0 const β₃ = 1.0 const r₁ = 1.0 const r₂ = 1.0 const D = 100.0 const γ₁ = 0.1 const γ₂ = 0.1 const γ₃ = 0.1 const N = 100 const X = reshape([i for i in 1:100 for j in 1:100],N,N) const Y = reshape([j for i in 1:100 for j in 1:100],N,N) const α₁ = 1.0.*(X.>=80) const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])) const My = copy(Mx) Mx[2,1] = 2.0 Mx[end-1,end] = 2.0 My[1,2] = 2.0 My[end,end-1] = 2.0 # Define the initial condition as normal arrays A = zeros(N,N); B = zeros(N,N); C = zeros(N,N) u0 = ArrayPartition((A,B,C)) const MyA = zeros(N,N); const AMx = zeros(N,N); const DA = zeros(N,N) # Define the discretized PDE as an ODE function function f(t,u,du) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(MyA,My,A) A_mul_B!(AMx,A,Mx) @. DA = D*(MyA + AMx) @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end # Solve the ODE prob = ODEProblem(f,u0,(0.0,100.0)) @time sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false) using Plots; pyplot() p1 = surface(X,Y,sol[end].x[1],title = "[A]") p2 = surface(X,Y,sol[end].x[2],title = "[B]") p3 = surface(X,Y,sol[end].x[3],title = "[C]") plot(p1,p2,p3,layout=grid(3,1)) ####################################################### ### Solve the PDE using CLArrays ####################################################### using CLArrays gA = CLArray(A); gB = CLArray(B); gC = CLArray(C) const gMx = CLArray(Mx) const gMy = CLArray(My) const gα₁ = CLArray(α₁) gu0 = ArrayPartition((gA,gB,gC)) const gMyA = CLArray(MyA) const gAMx = CLArray(AMx) const gDA = CLArray(DA) function gf(t,u,du) A,B,C = u.x dA,dB,dC = du.x A_mul_B!(gMyA,gMy,A) A_mul_B!(gAMx,A,gMx) @. gDA = D*(gMyA + gAMx) @. dA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C @. dB = α₂ - β₂*B - r₁*A*B + r₂*C @. dC = α₃ - β₃*C + r₁*A*B - r₂*C end prob2 = ODEProblem(gf,gu0,(0.0,100.0)) GPUArrays.allowslow(false) @time sol = solve(prob2,BS3(),progress=true,dt=0.003,adaptive=false,save_everystep=false,save_start=false) prob2 = ODEProblem(gf,gu0,(0.0,100.0)) sol = solve(prob2,BS3(),progress=true,save_everystep=false,save_start=false) # Adaptivity currently fails due to https://github.com/JuliaGPU/CLArrays.jl/issues/10 ####################################################### ### Solve the SPDE ####################################################### using StochasticDiffEq function g(t,u,du) A,B,C = u.x dA,dB,dC = du.x @. dA = γ₁*A @. dB = γ₂*A @. dC = γ₃*A end prob3 = SDEProblem(f,g,u0,(0.0,100.0)) sol = solve(prob3,SRIW1(),progress=true,save_everystep=false,save_start=false) p1 = surface(X,Y,sol[end].x[1],title = "[A]") p2 = surface(X,Y,sol[end].x[2],title = "[B]") p3 = surface(X,Y,sol[end].x[3],title = "[C]") plot(p1,p2,p3,layout=grid(3,1)) # Exercise: Do SPDE + GPU
The post Solving Systems of Stochastic PDEs and using GPUs in Julia appeared first on Stochastic Lifestyle.
]]>Re-posted from: http://juliadiffeq.org/2017/12/11/Events.html
DifferentialEquations.jl 3.2 is just a nice feature update. This hits a few
long requested features.
Re-posted from: http://www.breloff.com/software-one-point-five/
I recently read Andrej Karpathy’s recent blog post proclaiming that we are entering an era of “Software 2.0”, where traditional approaches to developing software (a team of human developers writing code in their programming language of choice… i.e. v1.0) will become less prevalent and important.
Instead, the world will be run by neural networks. Why not? They’re really great at recognizing objects in images, winning at board games, and even writing movie scripts. (Well maybe not movie scripts.)
I can’t decide if he’s being naive or if we should be scared (no… not from an army of infinitely intelligent super-robots).
Neural networks are very powerful. There’s no question. But human software engineers do more than just pattern match inputs into outputs. In software development, it’s not enough to produce correct outputs 99% of the time (though even that is seemingly unachievable for most complex tasks). Imagine if your bank deposits only landed in the right account 99% of the time. Or if an air traffic control tower only assured your plane would land safely 99% of the time.
There are too many tasks that require near-certain guarantees on performance. And most importantly, many of those tasks require full human understanding of the processes and algorithms which determine the outcome. This is something we simply cannot expect from end-to-end neural (statistical) models.
I think he’s naive for claiming that statistical modeling can replace good ol’ fashion software engineering.
Neural networks are fragile, complicated, opaque, compute-heavy, and easily tricked. They are simultaneously hard to understand and easy for bad actors to manipulate. But… they get some amazing results in certain domains (most notably sensorial tasks like vision, hearing, and speech).
Humans are gullible animals. We have implicit biases, and constantly change the facts to match our understanding of the world. In a world filled with Software 2.0, where the software programs are written by statistical models, the output of that software will start to look like magic. So much so that people will start to believe that it is magic.
Throughout history, people have been happy to worship and serve a power greater than them. What if people start to believe in computing magic, and trust important life decisions to a statistical model? Insurance companies might deny your coverage because a neural network told them a procedure wouldn’t help you. Employers will discriminate based on expected performance. Police will monitor and arrest people through statistical profiling, predicting crime that hasn’t yet happened. Courts will prosecute and sentence based on expectations of repeat offense.
You might be saying… “This is already happening!” I know. I think we should be scared of relying on statistical models without properly accounting for their biases and shortcomings.
Just like the spreading IoT time bomb, placing blind trust in Software 2.0 is a trojan horse. We let it into our lives without full understanding, and it puts us at risk in ways we can’t realize.
The path forward is in developing human-led technology. Building machines that can help and advise, but do not assert full control. We shouldn’t worship a machine, and we shouldn’t put our blind trust in statistical methods. Humans are more than just pattern matchers. We can transfer our experience to new environments. We can plan and reason, without having to fail at a task millions of times first.
Instead of rushing to Software 2.0, lets view neural networks in proper context: they are models, not magic.
]]>Re-posted from: http://juliasnippets.blogspot.com/2017/12/tutorial-on-dataframes-in-julia.html
This time a blog post is more of an announcement :).
After the release of DataFrames 0.11 in Julia I have decided that it the ecosystem has stabilized enough to add it to my teaching materials.
There were some Issues and PRs submitted in the process. Finally, I have managed to compile the basic notes about using DataFrames. If you would be interested in them I have published them on GitHub here.
]]>Re-posted from: http://juliacomputing.com/blog/2017/12/05/december-newsletter.html
Happy holidays from Julia Computing and best wishes for a prosperous and productive 2018.
i. Major New Release of DataFrames.jl v0.11:
DataFrames v0.11 has been released by the Julia community with a number of important updates:
Don’t let the version number fool you. This version has been in the works for a long time, and leverages important new features in the Julia compiler. See the release announcement and try it out!
ii. JuliaBox – Commercial Version Now Available for University and Corporate Users with Enhancements and Support:
In response to user demand, Julia Computing has introduced a new and improved JuliaBox experience with increased memory and more support. For pricing and more information about the new commercial version of JuliaBox, please contact us. The free version of JuliaBox remains available for current and new users.
iii. Improved C++ Interoperability Interface:
Cxx.jl and CxxWrap.jl allow users to wrap C++ libraries in Julia. Use Cxx.jl to write the wrapper package in Julia code or CxxWrap.jl to write it entirely in C++ and call from Julia with a single line of Julia code. It is also possible to write and call Julia code from within C++, giving Julia and C++ complete two-way interoperability.
iv. JuliaPro Amazon Machine Image and Docker Image:
JuliaPro, the fastest on-ramp for quants, data scientists and researchers, is now available as an Amazon Machine Image on AWS EC2 (Red Hat Enterprise Linux v7.4 and Ubuntu 16.04) and as a Docker image (Ubuntu 16.04 and Centos 7) for use in containerized environments such as Kubernetes. More information is available here.
Denver hosted the Intel HPC Developer Conference November 11-12 and SC17 November 12-17. Julia Computing participated in both conferences and presented the Celeste case study, one of the latest and most exciting developments in high performance computing using Julia. Julia Computing’s Ranjan Anantharaman was recognized for providing the Best Tutorial at the Intel HPC Developer Conference.
Julia Computing’s Ranjan Anantharaman (left), winner of the Best Tutorial Award at the Intel HPC Developer Conference 2017
Julia Computing was featured as part of the Intel keynote presentation about the future of high performance computing at Analytics Vidhya’s DataHack Summit in Bangalore, India held November 9-11. Julia Computing’s Rajshekar Behar presented Julia’s work with Celeste, Intel and Intel Skylake architecture.
Helge Eichhorn, Software Engineer at Telespazio VEGA Deutschland, presented Astrodynamics.jl: An Open-Source Framework for Interactive High-Performance Mission Analysis at the Open Source Cubesat Workshop on Nov 23 at the European Space Operations Center (ESOC/ESA) in Darmstadt, Germany.
Professor Mark Vogelsberger, Theoretical Astrophysicist at MIT, published an article in Linux Magazine in Jan 2016 titled “Getting Parallel: Creating Parallel Applications with the Julia Programming Language.” According to Professor Vogelsberger: “The Julia code is … more than 100 times faster than the equivalent Python code. Multiple dispatch with function calls gives Julia extremely efficient code that is practically superior to any high-level language. Faster code in Julia can be achieved without any tricks like vectorization or outsourcing to C extensions. By contrast, such tricks are often necessary to speed up Python or R code.”
Tangent Works Uses Julia to Win IEEE Global Energy Forecasting Competition 2017: Tangent Works, a European machine learning company, used Julia to win the IEEE Global Energy Forecasting Competition 2017 (GEFCom2017).
Julia Featured in insideHPC’s “AI-HPC Is Happening Now” White Paper: insideHPC, a leading blog in the high performance computing community, featured Julia and Julia Computing in this white paper about artificial intelligence and high performance computing.
Julia Climbs to #35 on TIOBE Index of Most Popular Programming Languages: Julia entered the Top 50 most popular programming languages for the first time in September 2016, and has climbed to #35 since last year.
Julia Computing Featured Among 10 Most Innovative Startups in India That Will Rule in 2018 and Beyond: KnowStartup featured Julia Computing among the 10 Most Innovative Startups in India That Will Rule in 2018 and Beyond.
Julia Language Delivers Petascale HPC Performance: TheNextPlatform explains that “the Celeste team demonstrated that the Julia language can support both petascale compute and terascale big data analysis on a leadership HPC system plus scale to handle the seven petabytes of data expected to be produced by the Large Synoptic Survey Telescope (LSST) every year.”
Julia Computing CEO Viral Shah Featured in FactorDaily Outliers Podcast: FactorDaily’s Outliers Podcast with Pankaj Mishra featured an interview with Julia Computing CEO Viral Shah: “You can thank Viral, … along with Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi … the next time you have a safe flight in US airspace.”
Intel Reports Faster Stock Price Estimation Using Julia: @IntelBusiness reports that Julia Computing’s stock price estimation tool runs up to 38% faster – “a big gain for a fast-moving industry.”
Plotting in Julia: Tom Breloff published a blog post titled “Plots: Past, Present and Future” about plotting in Julia.
Feigenbaum’s Alpha: Professor Stuart Brorson from Northeastern University’s Department of Mathematics published a blog post entitled “A High Precision Calculation of Feigenbaum’s Alpha in Julia”.
Xavier Gandibleux, Professor of Operations Research and Computer Science at the Université de Nantes, is writing a book in French about using Julia and JuMP for modeling and solving linear optimization problems in Operations Research.
Do you know of any upcoming conferences, meetups, trainings, hackathons, talks, presentations or workshops involving Julia? Would you like to organize a Julia event on your own, or in partnership with your company, university or other organization? Let us help you spread the word and support your event by sending us an email with details. Here are some upcoming events:
Do you want to share photos, videos or details of your most recent conference, meetup, training, hackathon, talk, presentation or workshop involving Julia? Please send us an email with details and links.
Please contact us if you wish to:
Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.
Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:
To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.
Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.
]]>Re-posted from: https://tpapp.github.io/post/large-files-julia/
When writing software, especially libraries, a natural question is how to organize source code into files. Some languages, eg Matlab, encourage a very fragmented style (one function per file), while for some other languages (C/C++), a separation between the interface (.h
) and the implementation (.c
/.cpp
) is traditional.
Julia has no such constraint: include
allows the source code for a module to be organized into small pieces, possibly scattered in multiple directories, or it can be a single monolithic piece of code. The choice on this spectrum is up to the authors, and is largely a matter of personal preference.
When I started working with Julia, I was following the example of some prominent packages, and organized code into small pieces (~ 500 LOC). Lately, whenever I refactored my code, I ended up putting it in a single file.
I found the following Emacs tools very helpful for navigation.
Form feed, or \f
, is an ASCII control character that was used to request a new page in line printers. Your editor may display it as ^L
. It has a long history of being used as a separator, and Emacs supports it in various ways.
By default, C-x [
and C-x ]
take you to the previous and next form feed separators. Combined with numerical prefixes, eg C-3 C-x [
, you can jump across multiple ones very quickly. Other commands with page
in their name allow narrowing, marking, and other functions.
Many Emacs packages provide extra functionality for page breaks. My favorite is page-break-lines, which replaces ^L
with a horizontal line, so that the output looks like this:
export ML_estimator
# general API """ ML_estimator(ModelType, data...) Estimate `ModelType` using maximum likelihood on `data`, which is model-specific. """ function ML_estimator end
I am using helm pervasively. helm-occur
is very handy for listing all occurrences of something, and navigating them. The following is an except from base/operators.jl
, looking for isless
:
operators.jl:213:types with a canonical total order should implement `isless`. operators.jl:227:<(x, y) = isless(x, y) operators.jl:300:# this definition allows Number types to implement < instead of isless, operators.jl:302:isless(x::Real, y::Real) = x<y operators.jl:303:lexcmp(x::Real, y::Real) = isless(x,y) ? -1 : ifelse(isless(y,x), 1, 0)
You can move across these matches, jump to one in an adjacent buffer while keeping this list open, or save the list for later use. Its big brother helm-do-grep-ag
is even more powerful, using ag to find something in a directory tree.
With these two tools, I find navigating files around 5K LOC very convenient — the better I learn Emacs, the larger my threshold for a “large” file becomes.^{1}
Base
at the moment. ^{[return]}Re-posted from: http://www.breloff.com/plots-past-present-future/
Earlier this year, I backed away from the Julia community to pursue a full time opportunity with the exciting AI startup Elemental Cognition as a senior engineer. Elemental Cognition was founded by Dave Ferrucci, the AI visionary that led the original IBM Watson team to victory in Jeopardy. We’re a small team (though we’re hiring!) of talented and passionate researchers and engineers, some of whom were instrumental in the success of Watson, working to build machines with common sense and reasoning (thought partners, if you will). It’s incredibly interesting, but it also doesn’t leave time for hobbies.
In this post I wanted to provide some perspective on the Plots project, from origin to today, as well as to speculate on its future. If you have further questions about this project (or any of my other open source efforts), please use the public forums (Github, Discourse, Gitter, Slack) to seek help from other users and developers, as I have very little capacity to answer emails or messages sent directly to me. (Not to mention I probably won’t have the most up to date answer!)
I spent my career in finance building custom visualization software to analyze and monitor my trading and portfolios. When I started using Julia, the visualization options were not exciting. Most available packages were slow, lacking features, or cumbersome to use (or all of those things). As both the primary designer and user of my software in my previous roles, I knew a better approach was possible.
In 2015, early in my Julia experience, I created Qwt.jl, a Julia interface to a slightly customized wrapper of the Qwt visualization framework. I used it primarily to analyze trading simulations and watch networks of spiking neurons fire. It was (IMO) a massive step up in cleanliness and usability compared to my experiences doing visualization in Python, C++, and Java. I am a nut for convenience, and made sure all the defaults were set such that 90% of the time they were exactly what I wanted. Qwt.jl could be thought of as the design inspiration for the API of Plots.
In August of 2015, a bunch of devs in the Julia community (most of which had “competing” visualization packages) set up the JuliaPlot (note the missing “s”) organization to discuss the state of Julia visualization. We all agreed that the community was too fragmented but most thought it was too hard a problem to tackle properly. Each package had many strengths and weaknesses, and there was large difference in supported feature sets and API style.
I laid out a rough plan for “one interface to rule them all”. It was not well received, with the biggest objection that it wasn’t likely to be successful. People, after all, have very different preferences in naming, styles, and requirements. It would be impossible to please enough people enough of the time to make the time investment worthwhile. Now, telling me something is impossible is an effective way to motivate me. I pushed the initial commits of Plots that weekend.
Plots (and the larger JuliaPlots ecosystem) has been (again, IMO) a wildly successful project. Is it perfect? Of course not. Nothing is. There are precompilation issues, unsatisfying customization of legends, minimal interactivity, and more. But it has received a large following of loyal users and (much more important) dedicated contributors and maintainers.
Sadly, I don’t have the ability to work on the project, as described above. In fact, I bet you can guess when I joined Elemental Cognition given my Github activity:
However, even though I’ve backed away from the project, it is in good hands with many people invested in its continued success. Looking at the list of contributors to Plots (64 people at the time of writing this post) and the graph of commits (below), it seems very clear that this is an active and passionate community of Julia visualizers that care about the success of the ecosystem.
In fact, this graph seems to show that activity has risen since I handed over responsibility of the organization to the JuliaPlots team. I reason that my departure gave other members the courage to take a more active role, before which their contributions were not as aggressive and passionate.
The design of RecipesBase and the recipes concept has ensured that, even if something better comes along to replace the Plots core, things like StatPlots, PlotRecipes, and many other custom recipes can still be used. This is a motivating idea when deciding whether to invest time in a project… knowing that a contributed recipe can outlast the plotting package it was designed for. This is a primary reason that I expect JuliaPlots to remain active and vibrant.
There are many ways to make visualization in Julia better. We need better compilation performance, fewer bugs, better support for interactive workflows, more complete documentation, as well as countless other issues. The number of things that can be improved is a testament to how insanely difficult it is to build a visualization platform. It’s perfectly natural to have 10 different solutions, because there are 1,000 different ways to look at a dataset. How could one solution possibly cover everything?
During (and after) JuliaCon 2016, Simon Danisch and I had a bunch of brainstorming sessions diving into how we could improve the core Plots engine. These conversations were mostly centered around strategies to support better performance and interactivity in the Plots API and core loop. We also wanted to give backends more control over lazily recomputing attributes and data, and optional updates to subparts of the visualization (when few things have changed). The goal was marrying extreme flexibility with extreme performance (similar to the goal of Julia itself).
I hope that Simon’s latest project MakiE is the realization of those ideas and goals. I would consider it a big success if he could replace the core of Plots with a new engine, without losing any of the flexibility and features that currently exist. Of course, it will be a ridiculously massive effort to achieve feature-parity without tapping into recipes framework and the Plots API. So my skepticism rests on the question of whether the existing concepts can be mapped into a MakiE engine. I wish Simon the best of luck!
Aside from large rebuilds, there is some low-hanging fruit to a better ecosystem, some of which will be helped by things like “Pkg3” and other core Julia improvements. Also, the (eventual) release of Julia 1.0 will bring a wave of new development effort to fill in the gaps and add missing features.
All things told, I have high hopes for the future of Julia and especially the visualization, data science, and machine learning sub-communities within. I hope to find my way back to the language someday!
]]>Re-posted from: https://giordano.github.io/blog/2017-12-02-julia-assignment/
Today I realized by chance that in
the Julia programming language, like in C/C++,
assignment returns the assigned value.
To be more precise, assignment returns the right-hand side of the assignment
expression. This doesn’t appear to be documented in the current (as of December
2017) stable version of the manual, but
a
FAQ has
been added to its development version. Be aware of this subtlety, because you
can get unexpected results if you rely on the type of the returned value of an
assignment:
julia> let a
a = 3.0
end
3.0
julia> let a
a::Int = 3.0
end
3.0
In both cases the let
blocks return 3.0
, even though in the former case a
is really 3.0
and in the latter case it’s been marked to be an Int
.
The fact that assignment returns the assigned value is a very handy feature.
First of all, this is probably the reason why in the Julia’s REPL the assigned
value is printed after an assignment, so that you don’t have to type again the
name of the variable to view its value:
julia> x = sin(2.58) ^ 2 + cos(2.58) ^ 2
1.0
Instead in Python, for example, assignment returns None
. Thus, in the
interactive mode of the interpreter you have to type again the name of the
variable to check its value:
>>> from math import *
>>> x = sin(2.58) ** 2 + cos(2.58) ** 2
>>> x
1.0
Another useful consequence of this feature is that you can use assignment as
condition in, e.g., if
and while
, making them more concise. For example,
consider this brute-force method to compute
the Euler’s number
stopping when the desired precision is reached:
euler = 0.0
i = 0
while true
tmp = inv(factorial(i))
euler += tmp
tmp < eps(euler) && break
i += 1
end
By using this feature the while
can be simplified to:
euler = 0.0
i = 0
while (tmp = inv(factorial(i))) >= eps(euler)
euler += tmp
i += 1
end
A common objection against having this feature in Python is that it is
error-prone. Quoting from
a Python tutorial:
Note that in Python, unlike C, assignment cannot occur inside expressions. C
programmers may grumble about this, but it avoids a common class of problems
encountered in C programs: typing=
in an expression when==
was intended.
This is indeed an issue because in C and Python conditions can be pretty much
anything that can be converted to plain-bit 0 and 1, including, e.g.,
characters. In addition, in Python a variable can freely change the type, so
that
Consider the following C program:
#include <stdio.h>
int main()
{
char a = '\0';
if (a = 0)
printf("True\n");
else
printf("False\n");
if (a == 0)
printf("True\n");
else
printf("False\n");
return 0;
}
which prints
False
True
Mistyping the condition (=
instead of ==
or vice-versa) in this language can
lead to unexpected results.
Instead, this is mostly a non-issue in Julia because conditions can only be
instances of Bool
type. One way to make an error in Julia is the following:
julia> a = false
false
julia> if (a = false)
println("True")
else
println("False")
end
False
julia> if a == false
println("True")
else
println("False")
end
True
but it is kind of useless to test equality with a Bool
as a condition, since
you can use the Bool
itself. The only other possible error you can make in
Julia I came up with is the following:
julia> a = false
false
julia> b = false
false
julia> if (a = b)
println("True")
else
println("False")
end
False
julia> if a == b
println("True")
else
println("False")
end
True
For this case I don’t have an easy replacement that could prevent you from an
error, but this is also an unlikely situation. The Julia style guide suggests
to
not parenthesize conditions.
Following this recommendation is a good way to catch the error in this case
because the assignment requires parens:
julia> if a = b
println("True")
else
println("False")
end
ERROR: syntax: unexpected "="
Re-posted from: https://tpapp.github.io/post/plot-workflow/
In light of recent discussions on Julia’s Discourse forum about getting “publication-quality” or simply “nice” plots in Julia, I thought it would be worthwhile to briefly summarize what works for me.^{1} If you are a seasoned Julia user, this post may have nothing new for you, but I hope that newcomers to Julia find it useful.
I try to separate data generation and plotting. The first may be time-consuming (some calculations can take hours or days to run), and I find it best to save the results independently of any plotting. Recently I was sitting at a conference where a presentation about a really interesting topic had some plots that were extremely hard to see: if I remember correctly, something like 10×2 subplots, with almost all fine detail lost due to the resolution of the projector or the human eye. When someone in the audience asked about this, the presenting author replied that he is aware of the issue, but remaking the plots would involve rerunning the calculations, which take weeks. Saving the data separately will ensure that you are never in this situation; also, you can benefit from updates to plotting libraries when tweaking your plots.
For saving results, JLD2 is probably the most convenient tool: while it is technically work in progress, it is stable, fast, and convenient.^{2} The key question is where to save the data: I find it best to use a consistent path that you can just include in scripts.
You have several options:
define a global variable in your ~/.juliarc
for your projects, and construct a path with joinpath
,
if you have packaged your code, Pkg.dir
can be used to obtain a subdirectory in the package root,
if your code is in a module, you can wrap @__DIR__
in a function to obtain a directory.
For this blog post I used the first option, while in practice I use the second and the third.
To illustrate plots, I use the code below to generate random variates for sample skewness, and save it.
download as data.jl
using StatsBase # for skewness
using JLD2 # saving data
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
sample_skewness = [skewness(randn(100)) for _ in 1:1000]
@save "data.jld2" sample_skewness # save data
No plotting so far, so let’s remedy that. I use Plots.jl, which is a metapackage that unifies syntax for plotting via various plotting backends in Julia. I find this practical, because I can quickly switch backends for different purposes, and experiment with various options when I find the output suboptimal. The price you pay for this flexibility is compilation time, a known issue which means that you have to wait a bit to get your first plot. Separating plotting and data generation has the advantage that once I fire up the plotting infrastructure, I switch to “plotting mode” and clean up several plots at the same time.
Users frequently ask what the “best” backend is. This all depends on your needs, but these days I use the pgfplots()
backend almost exclusively.^{3} The gr()
backend is also useful, because it is very fast.
Time to tweak the plot! I find the attributes documentation the most useful for this. For this plot I need axis labels, a title, and prefer to disable the legend since I am plotting a single series. I am also using LaTeXStrings.jl, which means that I can use LaTeX-compatible syntax for labels seamlessly (notice the L
before the string).
download as plot.jl
using JLD2 # loading data
using Plots; pgfplots() # PGFPlots backend
using LaTeXStrings # nice LaTeX strings
cd(joinpath(BLOG_POSTS, "plot-workflow")) # default path
@load "data.jld2" # load data
# make plot and tweak; this is the end result
plot(histogram(sample_skewness, normalize = true),
xlab = L"\gamma_1", fillcolor = :lightgray,
yaxis = ("frequency", (0, 2)), title = "sample skewness", legend = false)
# finally save
savefig("sample_skewness.svg") # for quick viewing and web content
savefig("sample_skewness.tex") # for inclusion into papers
savefig("sample_skewness.pdf") # for quick viewing
Having generated the plot, I save it in various formats with savefig
. The SVG output is shown below.
If you cannot achieve the desired output, you can
reread the Plots.jl manual,
study the example plots,
ask for help in the Visualization topic.
For the third option, make sure you include a self-contained minimal working example,^{4} which also generates or loads the data, so that others can run your code as is. Randomly generated data should be fine, or standard datasets from RDatasets.jl.
Sometimes you will find that the feature you are looking for is not (yet) supported. You should check if there is an open issue for your problem (the discussion forum linked above is useful for this), and if not, open one.
When asking for help or just discussing plotting libraries in Julia, please keep in mind that they are a community effort with volunteers devoting their time to address a very difficult problem. Plotting is not a well-defined exercise, it involves a lot of heuristics and special cases, and most languages took years to get it right (for a given value of “right”). Make it easy for people to help you by making a reproducible, clean MWE: it is very hard to explain how to improve your plot without the actual code and output.
```
to format your code. ^{[return]}Re-posted from: http://juliacomputing.com/blog/2017/12/01/cxx-and-cxxwrap-intro.html
Cxx.jl is a Julia package that provides a C++ interoperability interface for Julia. It also provides an experimental C++ REPL mode for the Julia REPL. With Cxx.jl, it is possible to directly access C++ using the @cxx
macro from Julia.
With Cxx.jl and CxxWrap.jl. when facing the task of wrapping a C++ library in a Julia package, authors now have 2 options:
Functionality
There are two ways to access the main functionality provided by this package. The first is using the @cxx
macro, which puns on Julia syntax to provide C++ compatibility.
The macro supports two main usages:
@cxx mynamespace::func(args...)
@cxx m->foo(args...)
Additionally, this package provides the cxx""
and icxx""
custom string literals for inputting C++ syntax directly. The two string literals are distinguished by the C++ level scope they represent.
In summary, the two approaches to embed C++ functions in Julia discussed above would look like this :
# Using @cxx (e.g.):
cxx""" void cppfunction(args){ . . .} """ => @cxx cppfunction(args)
# Using icxx (e.g.):
julia_function (args) icxx""" *code here* """
The C++ REPL
This package contains an experimental C++ REPL feature. Using the package will automatically add a new pane to your REPL that is accessible by pressing the <
key.
Installation
The package is installable on Julia 0.5 and newer and is available through Julia’s package manager:
Pkg.add("Cxx")
Building the C++ code requires the same system tools necessary for building Julia from source. Further, Debian/Ubuntu users should install libedit-dev
and libncurses5-dev
, and RedHat/CentOS users should install libedit-devel
.
Using Cxx.jl with examples
Example 1: Embedding a simple C++ function in Julia
# include headers
julia> using Cxx
julia> cxx""" #include<iostream> """
# Declare the function
julia> cxx"""
void mycppfunction() {
int z = 0;
int y = 5;
int x = 10;
z = x*y + 2;
std::cout << "The number is " << z << std::endl;
}
"""
# Convert C++ to Julia function
julia> julia_function() = @cxx mycppfunction()
julia_function (generic function with 1 method)
# Run the function
julia> julia_function()
The number is 52
Example 2: Pass numeric arguments from Julia to C++
julia> jnum = 10
10
julia> cxx"""
void printme(int x) {
std::cout << x << std::endl;
}
"""
julia> @cxx printme(jnum)
10
Example 3: Pass strings from Julia to C++
julia> cxx"""
void printme(const char *name) {
// const char* => std::string
std::string sname = name;
// print it out
std::cout << sname << std::endl;
}
"""
julia> @cxx printme(pointer("John"))
John
Example 4: Pass a Julia expression to C++
julia> cxx"""
void testJuliaPrint() {
$:(println("\nTo end this test, press any key")::Nothing);
}
"""
julia> @cxx testJuliaPrint()
To end this test, press any key
Example 5: Embedding C++ code inside a Julia function
function playing()
for i = 1:5
icxx"""
int tellme;
std::cout<< "Please enter a number: " << std::endl;
std::cin >> tellme;
std::cout<< "\nYour number is "<< tellme << "\n" <<std::endl;
"""
end
end
playing();
Click here for more information, examples, and documentation.
This package lets you write the code for the Julia wrapper in C++, and then use a one-liner on the Julia side to make the wrapped C++ library available there.
The mechanism behind this package is that functions and types are registered in C++ code that is compiled into a dynamic library. This dynamic library is then loaded into Julia, where the Julia part of this package uses the data provided through a C interface to generate functions accessible from Julia. The functions are passed to Julia either as raw function pointers (for regular C++ functions that don’t need argument or return type conversion) or std::functions (for lambda expressions and automatic conversion of arguments and return types). The Julia side of this package wraps all this into Julia methods automatically.
Installation
Like any other registered Julia package, installation completes by running the following package manager command:
Pkg.add("CxxWrap")
Features
A Hello, World example with CxxWrap.jl
Suppose we want to expose the following C++ function to Julia in a module called CppHello
:
std::string greet()
{
return "hello, world";
}
Using the C++ side of CxxWrap
, this can be exposed as follows:
#include "jlcxx/jlcxx.hpp"
JULIA_CPP_MODULE_BEGIN(registry)
jlcxx::Module& hello = registry.create_module("CppHello");
hello.method("greet", &greet);
JULIA_CPP_MODULE_END
Once this code is compiled into a shared library (say libhello.so
) it can be used in Julia as follows:
using CxxWrap
# Load the module and generate the functions
wrap_modules(joinpath("path/to/built/lib","libhello"))
# Call greet and show the result
@show CppHello.greet()
More such examples and documentation for the package can be found here.
This post was formatted for the Julia Computing blog by Rajshekar Behar
]]>Re-posted from: https://cbrownley.wordpress.com/2017/11/29/data-wrangling-in-julia-based-on-dplyr-flights-tutorials/
A couple of my favorite tutorials for wrangling data in R with dplyr are Hadley Wickham’s dplyr package vignette and Kevin Markham’s dplyr tutorial. I enjoy the tutorials because they concisely illustrate how to use a small set of verb-based functions to carry out common data wrangling tasks.
I tend to use Python to wrangle data, but I’m exploring the Julia programming language so I thought creating a similar dplyr-based tutorial in Julia would be a fun way to examine Julia’s capabilities and syntax. Julia has several packages that make it easier to deal with tabular data, including DataFrames and DataFramesMeta.
The DataFrames package provides functions for reading and writing, split-apply-combining, reshaping, joining, sorting, querying, and grouping tabular data. The DataFramesMeta package provides a set of macros that are similar to dplyr’s verb-based functions in that they offer a more convenient, readable syntax for munging data and chaining together multiple operations.
For this tutorial, let’s following along with Kevin’s tutorial and use the hflights dataset. You can obtain the dataset from R with the following commands or simply download it here: hflights.csv
install.packages("hflights")
library(hflights)
write.csv(hflights, "hflights.csv")
To begin, let’s start the Julia REPL, load the DataFrames and DataFramesMeta packages, and load and inspect the hflights dataset:
using DataFrames
using DataFramesMeta
hflights = readtable("/Users/clinton/Documents/Julia/hflights.csv");
size(hflights)
names(hflights)
head(hflights)
describe(hflights)
The semicolon on the end of the readtable command prevents it from printing the dataset to the screen. The size command returns the number of rows and columns in the dataset. You can specify you only want the number of rows with size(hflights, 1) or columns with size(hflights, 2). This dataset contains 227,496 rows and 21 columns. The names command lists the column headings. By default, the head command prints the header row and six data rows. You can specify the number of data rows to display by adding a second argument, e.g. head(hflights, 10). The describe command prints summary statistics for each column.
AND: All of the conditions must be true for the returned rows
# Julia DataFrames approach to view all flights on January 1
hflights[.&(hflights[:Month] .== 1, hflights[:DayofMonth] .== 1), :]
# DataFramesMeta approach
@where(hflights, :Month .== 1, :DayofMonth .== 1)
Julia’s DataFrames’ row filtering syntax is similar to R’s syntax. To specify multiple AND conditions, use “.&()” and place the filtering conditions, separated by commas, between the parentheses. Like dplyr’s filter function, DataFramesMeta’s @where macro simplifies the syntax and makes the command easier to read.
OR: One of the conditions must be true for the returned rows
# Julia DataFrames approach to view all flights where either AA or UA is the carrier
hflights[.|(hflights[:UniqueCarrier] .== "AA", hflights[:UniqueCarrier] .== "UA"), :]
# DataFramesMeta approach
@where(hflights, .|(:UniqueCarrier .== "AA", :UniqueCarrier .== "UA"))
To specify multiple OR conditions, use “.|()” and place the filtering conditions between the parentheses. Again, the DataFramesMeta approach is more concise.
SET: The values in a column are in a set of interest
# Julia DataFrames approach to view all flights where the carrier is in Set(["AA", "UA"])
carriers_set = Set(["AA", "UA"])
hflights[findin(hflights[:UniqueCarrier], carriers_set), :]
# DataFramesMeta approach
@where(hflights, findin(:UniqueCarrier, carriers_set))
To filter for rows where the values in a particular column are in a specific set of interest, create a Set with the values you’re interested in and then specify the column and your set of interest in the findin function.
PATTERN / REGULAR EXPRESSION: The values in a column match a pattern
# Julia DataFrames approach to view all flights where the carrier matches the regular expression r"AA|UA"
carriers_pattern = r"AA|UA"
hflights[[ismatch(carriers_pattern, String(carrier)) for carrier in hflights[:UniqueCarrier]], :]
# DataFramesMeta approach
@where(hflights, [ismatch(carriers_pattern, String(carrier)) for carrier in :UniqueCarrier])
To filter for rows where the values in a particular column match a pattern, create a regular expression and then use it in the ismatch function in an array comprehension.
# Julia DataFrames approach to selecting columns
hflights[:, [:DepTime, :ArrTime, :FlightNum]]
# DataFramesMeta approach
@select(hflights, :DepTime, :ArrTime, :FlightNum)
Julia’s DataFrames’ syntax for selecting columns is similar to R’s syntax. Like dplyr’s select function, DataFramesMeta’s @select macro simplifies the syntax and makes the command easier to read.
# Julia DataFrames approach to selecting columns
# first three columns
hflights[:, 1:3]
# pattern / regular expression
heading_pattern = r"Taxi|Delay"
hflights[:, [ismatch(heading_pattern, String(name)) for name in names(hflights)]]
# startswith
hflights[:, filter(name -> startswith(String(name), "Arr"), names(hflights))]
# endswith
hflights[:, filter(name -> endswith(String(name), "Delay"), names(hflights))]
# contains
hflights[:, filter(name -> contains(String(name), "Month"), names(hflights))]
# AND conditions
hflights[:, filter(name -> startswith(String(name), "Arr") && endswith(String(name), "Delay"), names(hflights))]
# OR conditions
hflights[:, filter(name -> startswith(String(name), "Arr") || contains(String(name), "Cancel"), names(hflights))]
# DataFramesMeta approach
# first three columns
@select(hflights, 1:3)
# pattern / regular expression
heading_pattern = r"Taxi|Delay"
@select(hflights, [ismatch(heading_pattern, String(name)) for name in names(hflights)])
# startswith
@select(hflights, filter(name -> startswith(String(name), "Arr"), names(hflights)))
# endswith
@select(hflights, filter(name -> endswith(String(name), "Delay"), names(hflights)))
# contains
@select(hflights, filter(name -> contains(String(name), "Month"), names(hflights)))
# AND conditions
@select(hflights, filter(name -> startswith(String(name), "Arr") && endswith(String(name), "Delay"), names(hflights)))
# OR conditions
@select(hflights, filter(name -> startswith(String(name), "Arr") || contains(String(name), "Cancel"), names(hflights)))
# Kevin Markham's multiple select conditions example
# select(flights, Year:DayofMonth, contains("Taxi"), contains("Delay"))
# Julia Version of Kevin's Example
# Taxi or Delay in column heading
mask = [ismatch(r"Taxi|Delay", String(name)) for name in names(hflights)]
# Also include first three columns, i.e. Year, Month, DayofMonth
mask[1:3] = true
@select(hflights, mask)
These examples show you can select columns by position and name, and you can combine multiple conditions with AND, “&&”, or OR, “||”. Similar to filtering rows, you can select specific columns based on a pattern by using the ismatch function in an array comprehension. You can also use contains, startswith, and endswith in the filter function to select columns that contain, start with, or end with a specific text pattern.
In R, dplyr provides, via the magrittr package, the %>% operator, which enables you to chain together multiple commands into a single data transformation pipeline in a very readable fashion. In Julia, the DataFramesMeta package provides the @linq macro and |> symbol to enable similar functionality. Alternatively, you can load the Lazy package and use an @> begin end block to chain together multiple commands.
# Chaining commands with DataFrameMeta’s @linq macro
@linq hflights[find(.!isna.(hflights[:,:DepDelay])), :] |>
@where(:DepDelay .> 60) |>
@select(:UniqueCarrier, :DepDelay)
# Chaining commands with Lazy’s @> begin end block
using Lazy
@> begin
hflights[find(.!isna.(hflights[:,:DepDelay])), :]
@where(:DepDelay .> 60)
@select(:UniqueCarrier, :DepDelay)
end
These two blocks of code produce the same result, a DataFrame containing carrier names and departure delays for which the departure delay is greater than 60. In each chain, the first expression is the input DataFrame, e.g. hflights. In these examples, I use the find and !isna. functions to start with a DataFrame that doesn’t contain NA values in the DepDelay column because the commands fail when NAs are present. I prefer the @linq macro version over the @> begin end version because it’s so similar to the dplyr-magrittr syntax, but both versions are more succinct and readable than their non-chained versions. The screen shot shows how to assign the pipeline results to variables.
Both DataFrames and DataFramesMeta provide functions for sorting rows in a DataFrame by values in one or more columns. In the first pair of examples, we want to select the UniqueCarrier and DepDelay columns and then sort the results by the values in the DepDelay column in descending order. The last example shows how to sort by multiple columns with the @orderby macro.
# Julia DataFrames approach to sorting
sort(hflights[find(.!isna.(hflights[:,:DepDelay])), [:UniqueCarrier, :DepDelay]], cols=[order(:DepDelay, rev=true)])
# DataFramesMeta approach (add a minus sign before the column symbol for descending)
@linq hflights[find(.!isna.(hflights[:,:DepDelay])), :] |>
@select(:UniqueCarrier, :DepDelay) |>
@orderby(-:DepDelay)
# Sort hflights dataset by Month, descending, and then by DepDelay, ascending
@linq hflights |>
@orderby(-:Month, :DepDelay)
DataFrames provides the sort and sort! functions for ordering rows in a DataFrame. sort! orders the rows, inplace. The DataFrames user guide provides additional examples of ordering rows, in ascending and descending order, based on multiple columns, as well as applying functions to columns, e.g. uppercase, before using the column for sorting.
DataFramesMeta provides the @orderby macro for ordering rows in a DataFrame. Specify multiple column names in the @orderby macro to sort the rows by multiple columns. Use a minus sign before a column name to sort in descending order.
Creating new variables in Julia DataFrames is similar to creating new variables in Python and R. You specify a new column name in square brackets after the name of the DataFrame and assign it a collection of values, sometimes based on values in other columns. DataFramesMeta’s @transform macro simplifies the syntax and makes the transformation more readable.
# Julia DataFrames approach to creating new variable
hflights[:Speed] = hflights[:Distance] ./ hflights[:AirTime] .* 60
hflights[:, [:Distance, :AirTime, :Speed]]
# Delete the variable so we can recreate it with DataFramesMeta approach
delete!(hflights, :Speed)
# DataFramesMeta approach
@linq hflights |>
@select(:Distance, :AirTime) |>
@transform(Speed = :Distance ./ :AirTime .* 60) |>
@select(:Distance, :AirTime, :Speed)
# Save the new column in the original DataFrame
hflights = @linq hflights |>
@transform(Speed = :Distance ./ :AirTime .* 60)
The first code block illustrates how to create a new column in a DataFrame and assign it values based on values in other columns. The second code block shows you can use delete! to delete a column. The third example demonstrates the DataFramesMeta approach to creating a new column using the @transform macro. The last example shows how to save a new column in an existing DataFrame using the @transform macro by assigning the result of the transformation to the existing DataFrame.
dplyr provides group_by and summarise functions for grouping and summarising data. DataFrames and DataFramesMeta also support the split-apply-combine strategy with the by function and the @by macro, respectively. Here Julia versions of Kevin’s summarise examples.
# Julia DataFrames approach to grouping and summarizing
by(hflights[complete_cases(hflights[[Symbol(name) for name in names(hflights)]]), :],
:Dest,
df -> DataFrame(meanArrDelay = mean(df[:ArrDelay])))
# DataFramesMeta approach
@linq hflights[complete_cases(hflights[[Symbol(name) for name in names(hflights)]]), :] |>
@by(:Dest, meanArrDelay = mean(:ArrDelay))
DataFrames and DataFramesMeta don’t have dplyr’s summarise_each function, but it’s easy to apply different functions to multiple columns inside the @by macro.
@linq hflights |>
@by(:UniqueCarrier,
meanCancelled = mean(:Cancelled), meanDiverted = mean(:Diverted))
@linq hflights[complete_cases(hflights[[Symbol(name) for name in names(hflights)]]), :] |>
@by(:UniqueCarrier,
minArrDelay = minimum(:ArrDelay), maxArrDelay = maximum(:ArrDelay),
minDepDelay = minimum(:DepDelay), maxDepDelay = maximum(:DepDelay))
DataFrames and DataFramesMeta also don’t have dplyr’s n and n_distinct functions, but you can count the number of rows in a group with size(df, 1) or nrow(df), and you can count the number of distinct values in a group with countmap.
# Group by Month and DayofMonth, count the number of flights, and sort descending
# Count the number of rows with size(df, 1)
sort(by(hflights, [:Month,:DayofMonth], df -> DataFrame(flight_count = size(df, 1))), cols=[order(:flight_count, rev=true)])
# Group by Month and DayofMonth, count the number of flights, and sort descending
# Count the number of rows with nrow(df)
sort(by(hflights, [:Month,:DayofMonth], df -> DataFrame(flight_count = nrow(df))), cols=[order(:flight_count, rev=true)])
# Split grouping and sorting into two separate operations
g = by(hflights, [:Month,:DayofMonth], df -> DataFrame(flight_count = nrow(df)))
sort(g, cols=[order(:flight_count, rev=true)])
# For each destination, count the total number of flights and the number of distinct planes
by(hflights[find(.!isna.(hflights[:,:TailNum])),:], :Dest) do df
DataFrame(flight_count = size(df,1), plane_count = length(keys(countmap(df[:,:TailNum]))))
end
While these examples reproduce the results in Kevin’s dplyr tutorial, they’re definitely not as succinct and readable as the dplyr versions. Grouping by multiple columns, summarizing with counts and distinct counts, and gracefully chaining these operations are areas where DataFrames and DataFramesMeta can improve.
Randomly sampling a fixed number or fraction of rows from a DataFrame can be a helpful operation. dplyr offers the sample_n and sample_frac functions to perform these operations. In Julia, StatsBase provides the sample function, which you can repurpose to achieve similar results.
using StatsBase
# randomly sample a fixed number of rows
hflights[sample(1:nrow(hflights), 5), :]
hflights[sample(1:size(hflights,1), 5), :]
# randomly sample a fraction of rows
hflights[sample(1:nrow(hflights), ceil(Int,0.0001*nrow(hflights))), :]
hflights[sample(1:size(hflights,1), ceil(Int,0.0001*size(hflights,1))), :]
Randomly sampling a fixed number of rows is fairly straightforward. You use the sample function to randomly select a fixed number of rows, in this case five, from the DataFrame. Randomly sampling a fraction of rows is slightly more complicated because, since the sample function takes an integer for the number of rows to return, you need to use the ceil function to convert the fraction of rows, in this case 0.0001*nrow(hflights), into an integer.
In R, dplyr sets a high bar for wrangling data well with succinct, readable code. In Julia, DataFrames and DataFramesMeta provide many useful functions and macros that produce similar results; however, some of the syntax isn’t as concise and clear as it is with dplyr, e.g. selecting columns in different ways and chaining together grouping and summarizing operations. These are areas where Julia’s packages can improve.
I enjoyed becoming more familiar with Julia by reproducing much of Kevin’s dplyr tutorial. It was also informative to see differences in functionality and readability between dplyr and Julia’s packages. I hope you enjoyed this tutorial and find it to be a useful reference for wrangling data in Julia.
Filed under: Analytics, General, Julia, Python, R, Statistics Tagged: DataFrames, DataFramesMeta, dplyr, Julia, Python, R
]]>Re-posted from: http://juliacomputing.com/blog/2017/11/29/juliapro-ami-and-docker-image.html
We are pleased to announce the release of JuliaPro in the form of a an AMI (Amazon Machine Image) for use on the AWS EC2 platform, as well as a Docker image for use in containerised environments, including Kubernetes.
JuliaPro is the fastest on-ramp to Julia for individual researchers, quants, traders, economists, engineers, scientists, students and others. Beginners and experts can build better software quicker while benefiting from Julia’s unparalleled high performance. It includes a Julia compiler, a profiler, and a Julia IDE (integrated development environment) bundled with over a 100 curated packages that include data visualization and plotting.
JuliaPro was always available as a single installer bundle, making it easy for desktop users to get started.
However, requiring an installation step makes devops more difficult than it should be for production workloads. We know many of our users are running Julia applications on large server clusters in production, and we wanted to make it easy to do so.
We are releasing 2 variants of JuliaPro for the AMI
Contents of the AMI
Both variants of JuliaPro mentioned above have the following additional softwares installed
JuliaPro packages such as PyCall, JavaCall, RCall, ZMQ.jl, and HDF5.jl are configured to work with pre-installed softwares, so the AMI is ready to use as soon as you boot up your instance.
Accessing the JuliaPro AMIs
JuliaPro v0.6.1.1 is installed in the following location on both AMI variants
"$HOME/JuliaPro-0.6.1.1
The JuliaPro REPL can be accessed from the following location
"$HOME/JuliaPro-0.6.1.1 /Julia/bin/julia”
Search for JuliaPro in the following regions to access our AMIs:
The main purpose of making this image available is to enable Docker and Kubernetes users to easily work with Julia packages, and to also extend the JuliaPro infrastructure to meet their needs.
JuliaPro’s Docker Image is hosted on Dockerhub and comes with two variants of the base images:
The following are the available tags:
The Docker Image can be pulled using the command
docker pull juliacomputing/juliapro:latest
The JuliaPro Installation Path in the container is
/juliapro/bin/JuliaPro-[version]/
Ways to access the JuliaPro Docker Image
By starting the Julia REPL with the command: docker run -it juliacomputing/juliapro:latest
By starting a Jupyter Notebook with the command: docker run -it -p 8888:8888 --entrypoint jupyter_notebook juliacomputing/juliapro:latest
, followed by opening the displayed link in a web browser.
By directly running Julia Expressions: docker run -it --entrypoint julia juliacomputing/juliapro:latest -e "println(1+2)"
Or by running Bash: docker run -it --entrypoint bash juliacomputing/juliapro:latest
The theory of dynamical systems has been an area of active research amongst physicists and mathematicians for several decades. Perhaps one of the most interesting objects of study in dynamical systems is the phenomenon of chaos.
Chaos is a phenomenon observed in some deterministic systems where the system’s time-dependent trajectory neither diverges, nor converges to a limit point, nor moves periodically. Rather, the system moves on an orbit which looks random.
A simple model system which evidences chaos is the so-called logistic map:
where is a parameter allowed to vary over the domain , and is the dynamical variable which varies over the domain .
The idea is to choose a value of , then start with an arbitrary , and insert it into the equation to get a new value, . Then insert into the equation to get an update, .
Repeat this process for large numbers of iterations and for many values of the value converges to a value (or set of values). This value is called the “fixed point” of the iteration, typically written . The value of depends upon .
A plot of the fixed point vs. is shown in Figure 1 below. What’s interesting are different types of behavior obtained for different values of .
For , converges to a single value. But starting at , a new behavior emerges: instead of converging to one value, the variable hops from one value to a second one, then back.
For example, when , hops between approximately and . This is called a “period two” orbit, since visits two values, alternating with each iteration. Moreover, increasing leads to a situation where visits 4 values, then 8 then 16, and eventually at a particular value of lambda, the period becomes infinity, meaning that has no periodic orbit. In this case, wanders around the unit interval quasi-randomly – the behavior we call chaos.
Figure 1: The bifurcation diagram for the logistic map. (From Wikipedia.)
This transition from the period-one orbit to chaos is called the “period-doubling” route to chaos. It attracted the attention of many physicists in the 1970s and 80s since it offered the promise of illustrating a mechanism by which we could understand the development of complicated chaotic phenomena such as turbulence in fluids. Indeed, several physical systems were identified which evidenced a period-doubling route to chaos, including Duffing’s oscillator, a dripping faucet, and some simple electronic circuits. Unfortunately, the larger ambition of finally getting a grasp on turbulence by studying the logistic equation did not pan out. Nonetheless, some very interesting mathematics was discovered in the process.
In the late 1970s, Mitchell Feigenbaum, a mathematician at Los Alamos research laboratory, was playing around with the logistic equation using a hand calculator. He noticed some interesting numbers characterize the period doubling transition to chaos in the logistic map. He observed the following:
Examine the length of each interval in as the period doubles from etc. Feigenbaum found that the ratio of succeeding interval lengths converged to a value which he called . The process is shown pictorially in Figure 2. Feigenbaum defined as Figure 2: A close-up of the logistic map’s bifurcation diagram showing how is defined. (Figure adapted from Wikipedia.)
Examine the width of the opening of each parabolic segment when it hits the value . (This value of is called the “superstable” point, and its value depends upon which map is under consideration. The value is 1/2 for the logistic map.) The ratio of successive widths also converges to a value. This is shown in Figure 3. Feigenbaum called this value . We have Figure 3: A close-up of the logistic map’s bifurcation diagram showing how is defined. (Figure adapted from Wikipedia.)
Most importantly, Feigenbaum found that the value of these numbers was independent of the exact map used to generate period-doubling and chaos. Specifically, as long as the the map of the unit interval to itself,
is strictly concave and has a quadratic peak, then the map will evidence a period doubling transition to chaos characterized by numbers delta and alpha having exactly the values shown above.
For example, and behave the same as the logistic map. Of course, the exact location of the period doubling event occurs at different values, but the ratios and are the same. Therefore, the numbers and are universal numerical constants on equal footing as the commonly known constants , , and .
This raises the question: How to calculate high-precision values for and ?
I’ll save for a future post, and concentrate on here. Using a renormalization group argument, Feigenbaum proposed that the stretching behavior of the logistic map around was described by a universal function which obeyed this equation in the limit:
and specifically, could be found via . I won’t try to explain where this equation comes from here. Interested readers can read Feigenbaum’s summary paper [4].
My goal is to compute this function, and then use it to compute a high-precision value for .
I am aware of three earlier, high-precision computations of alpha. The first was performed by Australian mathematician Keith Briggs in his PhD thesis [5]. He computed 576 decimal places (of which 344 were later found to be correct). The second was performed by UK mathematician David Broadhurst in 1999 [6], and is linked from the OEIS. He computed 1018 digits (after the decimal point). Finally, while I was preparing this post Professor Broadhurst pointed me to a paper by Andrea Molteni [7] who used a Chebyshev expansion instead of a Taylor’s expansion for and got 10,000 digits.
I became interested in chaos as an undergraduate, my curiosity including learning about the universal numbers and .
Decades later, when I became aware of Julia and its capabilities, I realized that it has some features which make it a good tool for computing these numbers. Julia’s important features include:
Support for BigFloat. Prior computations used C or other low-level languages and manually linked to arbitrary-precision floating point libraries. This is complicated and error-prone. Julia makes it easy: Just use big() or BigFloat when declaring your high-precision numbers, and the BigFloat type will propagate through the rest of the calculation.
Autodiff support. Like Briggs and Broadhurst, I employ the multidimensional Newton’s method to find the universal function above. This involves computing a Jacobian. Computing a Jacobian by hand is also complicated and error-prone. Fortunately, over the last ten years or so, automatic differentiation techniques using dual numbers have become commonplace [8]. Even better, Julia makes a number of autodiff packages available; they are gathered at the website http://www.juliadiff.org/. For my calculation I used the package ForwardDiff.jl.
With these features in mind, here is the method I used to compute alpha. It is the same used by Briggs and by Broadhurst, except that I implemented the computation in Julia.
Expand using an even power series to order :
Define a function from the equation for . I will do root finding on using Newton’s method on the interval . We have,
Evaluate at discrete points , . This gives equations in unknowns (the unknowns are the expansion coefficients ). The system of equations is
Then use Newton’s method to solve the system. I use gradient.jl (part of the ForwardDiff.jl package) to compute the gradient of at each , then assemble the gradients into a Jacobian matrix. Newton’s method returns the expansion coefficients .
I wrap Newton’s method with an outer loop which calls it repeatedly. I start by asking for only, say, 10 digits. Newton’s method performs the solve then returns the expansion coefficients used to get the 10 digits. I then call Newton’s method again, but ask for more digits, say 30. I use the previously-found coefficients as the starting point for the new computation. This helps keep Newton’s method from wandering away from the desired solution as the number of requested digits increases. It is also useful when validating my result since I have a sequence of converging values which I can compare to each other.
My code to calculate alpha is available on GitHub for you to peruse, Here is the result of a typical run. As you see, I start by requesting a small number of digits, then walk up the requested precision. The variable reports the requested precision. Note that the number of converged digits is usually somewhat larger than the requested precision.
=========
N = 10
-2.5029078750941558550433538847562588590939864154825157840834005501784649816887308647883415497467233528729158301607769681388012068696297105559293578143978
==========
N = 30
-2.50290787509589282228390287321821578646680176860039691031775654917766795265509729744526592646450042740446507878591787900049867737721570507320536950555730233579295835513049755147443371058966655808267708238725531317972934168867025986314993674780442067259540137373182289118085931144660762909479796000659221064009292629086715493687279575196998009641864622893829398787716116103377437075875363832302242567069139487987577674934100040362548883506087905376040253
==========
N = 50
-2.502907875095892822283902873218215786381271376727149977336192057101332548200937314996958507672489476908625081364772906241928997687752969884387075775409826023146215641547577466532741243269601282751925295432306116410106657184918989460966694448022967464829388745475907553581492862863985138991456223981472184668490051357340519579612964363612398303916281396319853356754143270841575980410327246872085525908340194148771845711576975756396502129782335009689753534469725319971417668188501056069881195819569086079634680349686470334534259147981662223720090416466839684581834935585761392847958319659186046886869906843047200313111109896554722068418988625250918702774965150788866328601960325890564496371195820144217636314897987844658619824730317832913871839316087814207
If you compare each successive result to the next one, you will see that the sequence converges – the N=10
result contains 11 converged digits (after the decimal point), the N=30
result contains 36 converged digits, and so on.
Here is my best result so far, 1177 digits. This includes new digits at the end, beyond those computed by Broadhurst.
2.50290787509589282228390287321821578638127137672714997733619205677923546317959020670329964974643383412959523186999585472394218237778544517927286331499337257811216359487950374478126099738059867123971173732892766540440103066983138346000941393223644906578899512205843172507873377463087853424285351988587500042358246918740820428170090171482305182162161941319985606612938274264970984408447010080545496779367608881264464068851815527093240075425064971570470475419932831783645332562415378693957125097066387979492654623137674591890981311675243422111013091312783716095115834123084150371649970202246812196440812166865274580430262457825610671501385218216449532543349873487413352795815351016583605455763513276501810781194836945957485023739823545262563277947539726990201289151664579394201989202488033940516996865514944773965338769797412323540617819896112494095990353128997733611849847377946108428833293833903950900891408635152562680338141466927991331074334970514354520134464342647520016213846107299226419943327729189777690538025968518850841613864279936834741390166705544353112159412076097447476975360415562684762316863202036958955323302591969942848633937659618260681047820499176267330237410
To check my results, I wrote a Julia script which accepts two files holding computed digits. The script then starts at the beginning of each file, and counts the number of matching characters.
stest = readstring("alpha_me.txt") #reading the test file
sref = readstring("alpha_oeis.txt") #reading the reference file
s = ""
count = 0
for i in length(min(stest,sref)) #iterating to count matches
if(stest[i] == sref[i])
count = count + 1
s = s * string(stest[i])
end
end
println("alpha_me.txt has $(length(stest)) digits.") #printing results
println("alpha_oeis.txt has $(length(sref)) digits.")
println("Test file agrees with reference file to $count digits.")
println("Digits of agreement: alpha = $s")
Here’s a run comparing one of my results against the digits reported by Broadhurst:
alpha_me.txt has 2441 digits.
alpha_oeis.txt has 1021 digits.
Test file agrees with reference file to 1020 digits.
Digits of agreement: alpha =
2.5029078750958928222839028732182157863812713767271499773361920567792354631795902067032996497464338341295952318699958547239421823777854451792728633149933725781121635948795037447812609973805986712397117373289276654044010306698313834600094139322364490657889951220584317250787337746308785342428535198858750004235824691874082042817009017148230518216216194131998560661293827426497098440844701008054549677936760888126446406885181552709324007542506497157047047541993283178364533256241537869395712509706638797949265462313767459189098131167524342211101309131278371609511583412308415037164997020224681219644081216686527458043026245782561067150138521821644953254334987348741335279581535101658360545576351327650181078119483694595748502373982354526256327794753972699020128915166457939420198920248803394051699686551494477396533876979741232354061781989611249409599035312899773361184984737794610842883329383390395090089140863515256268033814146692799133107433497051435452013446434264752001621384610729922641994332772918977769053802596851
This result matches all of Broadhurst’s digits, giving good confidence that my results are correct. Then, as I compute higher and higher number of digits, I can compare successive files against one another to find out how many digits have converged. As mentioned above, my current best result (above) is 1177 digits.
I am not yet the world’s record holder when it comes to computing – that honor belongs to Andrea Molteni who computed 10000 digits.
However, her calculation was written in C and linked to MPFR and MPI, so undoubtedly took some time and effort to write and debug. By taking advantage of Julia’s high-level features, I was able to write my program inside of a few hours.
Holmes, P., Whitley, D. “On the attracting set for Duffing’s equation”, Physica D, 111–123 (1983)
A. D’Innocenzo and L. Renna, “Modeling leaky faucet dynamics”, Phys. Rev. E, 55, 6676 (1997).
Paul Lindsay, “Period Doubling and Chaotic Behavior in a Driven Anharmonic Oscillator”, Physical Review Letters, 47 1349 (1981).
Mitchell J. Feigenbaum, “Universal Behavior in Nonlinear Systems,” in Los Alamos Science, Summer 1980, pp. 4-27 (available online at http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-80-5007).
Keith Briggs, “Feigenbaum scaling in discrete dynamical systems” (PhD Thesis, University of Melbourne, 1997) (available online via http://keithbriggs.info/thesis.html.
David Broadhurst, private communication. His results are available online at http://www.plouffe.fr/simon/constants/feigenbaum.txt.
Andrea Molteni, “An Efficient Method for the Computation of the Feigenbaum Constants to High Precision”, https://arxiv.org/pdf/1602.02357.pdf. Her results are available online at http://converge.to/feigenbaum/.
Philipp H. W. Hoffmann, “A Hitchhiker’s Guide to Automatic Differentiation”, https://arxiv.org/pdf/1411.0583.pdf.
This is a guest blog post, written and edited by Stuart Brorson, Dept of Mathematics, Northeastern University, and formatted by Julia Computing’s Rajshekar Behar
]]>Re-posted from: http://juliacomputing.com/blog/2017/11/27/high-precision-feigenbaum-alpha-calc-using-julia.html
The theory of dynamical systems has been an area of active research amongst physicists and mathematicians for several decades. Perhaps one of the most interesting objects of study in dynamical systems is the phenomenon of chaos.
Chaos is a phenomenon observed in some deterministic systems where the system’s time-dependent trajectory neither diverges, nor converges to a limit point, nor moves periodically. Rather, the system moves on an orbit which looks random.
A simple model system which evidences chaos is the so-called logistic map:
where is a parameter allowed to vary over the domain , and is the dynamical variable which varies over the domain .
The idea is to choose a value of , then start with an arbitrary , and insert it into the equation to get a new value, . Then insert into the equation to get an update, .
Repeat this process for large numbers of iterations and for many values of the value converges to a value (or set of values). This value is called the “fixed point” of the iteration, typically written . The value of depends upon .
A plot of the fixed point vs. is shown in Figure 1 below. What’s interesting are different types of behavior obtained for different values of .
For , converges to a single value. But starting at , a new behavior emerges: instead of converging to one value, the variable hops from one value to a second one, then back.
For example, when , hops between approximately and . This is called a “period two” orbit, since visits two values, alternating with each iteration. Moreover, increasing leads to a situation where visits 4 values, then 8 then 16, and eventually at a particular value of lambda, the period becomes infinity, meaning that has no periodic orbit. In this case, wanders around the unit interval quasi-randomly – the behavior we call chaos.
Figure 1: The bifurcation diagram for the logistic map. (From Wikipedia.)
This transition from the period-one orbit to chaos is called the “period-doubling” route to chaos. It attracted the attention of many physicists in the 1970s and 80s since it offered the promise of illustrating a mechanism by which we could understand the development of complicated chaotic phenomena such as turbulence in fluids. Indeed, several physical systems were identified which evidenced a period-doubling route to chaos, including Duffing’s oscillator, a dripping faucet, and some simple electronic circuits. Unfortunately, the larger ambition of finally getting a grasp on turbulence by studying the logistic equation did not pan out. Nonetheless, some very interesting mathematics was discovered in the process.
In the late 1970s, Mitchell Feigenbaum, a mathematician at Los Alamos research laboratory, was playing around with the logistic equation using a hand calculator. He noticed some interesting numbers characterize the period doubling transition to chaos in the logistic map. He observed the following:
Examine the length of each interval in as the period doubles from etc. Feigenbaum found that the ratio of succeeding interval lengths converged to a value which he called . The process is shown pictorially in Figure 2. Feigenbaum defined as
Figure 2: A close-up of the logistic map’s bifurcation diagram showing how is defined. (Figure adapted from Wikipedia.)
Examine the width of the opening of each parabolic segment when it hits the value . (This value of is called the “superstable” point, and its value depends upon which map is under consideration. The value is 1/2 for the logistic map.) The ratio of successive widths also converges to a value. This is shown in Figure 3. Feigenbaum called this value . We have
Figure 3: A close-up of the logistic map’s bifurcation diagram showing how is defined. (Figure adapted from Wikipedia.)
Most importantly, Feigenbaum found that the value of these numbers was independent of the exact map used to generate period-doubling and chaos. Specifically, as long as the the map of the unit interval to itself,
is strictly concave and has a quadratic peak, then the map will evidence a period doubling transition to chaos characterized by numbers delta and alpha having exactly the values shown above.
For example, and behave the same as the logistic map. Of course, the exact location of the period doubling event occurs at different values, but the ratios and are the same. Therefore, the numbers and are universal numerical constants on equal footing as the commonly known constants , , and .
This raises the question: How to calculate high-precision values for and ?
I’ll save for a future post, and concentrate on here. Using a renormalization group argument, Feigenbaum proposed that the stretching behavior of the logistic map around was described by a universal function which obeyed this equation in the limit:
and specifically, could be found via . I won’t try to explain where this equation comes from here. Interested readers can read Feigenbaum’s summary paper [4].
My goal is to compute this function, and then use it to compute a high-precision value for .
I am aware of three earlier, high-precision computations of alpha. The first was performed by Australian mathematician Keith Briggs in his PhD thesis [5]. He computed 576 decimal places (of which 344 were later found to be correct). The second was performed by UK mathematician David Broadhurst in 1999 [6], and is linked from the OEIS. He computed 1018 digits (after the decimal point). Finally, while I was preparing this post Professor Broadhurst pointed me to a paper by Andrea Molteni [7] who used a Chebyshev expansion instead of a Taylor’s expansion for and got 10,000 digits.
I became interested in chaos as an undergraduate, my curiosity including learning about the universal numbers and .
Decades later, when I became aware of Julia and its capabilities, I realized that it has some features which make it a good tool for computing these numbers. Julia’s important features include:
Support for BigFloat. Prior computations used C or other low-level languages and manually linked to arbitrary-precision floating point libraries. This is complicated and error-prone. Julia makes it easy: Just use big() or BigFloat when declaring your high-precision numbers, and the BigFloat type will propagate through the rest of the calculation.
Autodiff support. Like Briggs and Broadhurst, I employ the multidimensional Newton’s method to find the universal function above. This involves computing a Jacobian. Computing a Jacobian by hand is also complicated and error-prone. Fortunately, over the last ten years or so, automatic differentiation techniques using dual numbers have become commonplace [8]. Even better, Julia makes a number of autodiff packages available; they are gathered at the website http://www.juliadiff.org/. For my calculation I used the package ForwardDiff.jl.
With these features in mind, here is the method I used to compute alpha. It is the same used by Briggs and by Broadhurst, except that I implemented the computation in Julia.
Expand using an even power series to order :
Define a function from the equation for . I will do root finding on using Newton’s method on the interval . We have,
Evaluate at discrete points , . This gives equations in unknowns (the unknowns are the expansion coefficients ). The system of equations is
Then use Newton’s method to solve the system. I use gradient.jl (part of the ForwardDiff.jl package) to compute the gradient of at each , then assemble the gradients into a Jacobian matrix. Newton’s method returns the expansion coefficients .
I wrap Newton’s method with an outer loop which calls it repeatedly. I start by asking for only, say, 10 digits. Newton’s method performs the solve then returns the expansion coefficients used to get the 10 digits. I then call Newton’s method again, but ask for more digits, say 30. I use the previously-found coefficients as the starting point for the new computation. This helps keep Newton’s method from wandering away from the desired solution as the number of requested digits increases. It is also useful when validating my result since I have a sequence of converging values which I can compare to each other.
My code to calculate alpha is available on GitHub for you to peruse, Here is the result of a typical run. As you see, I start by requesting a small number of digits, then walk up the requested precision. The variable reports the requested precision. Note that the number of converged digits is usually somewhat larger than the requested precision.
=========
N = 10
-2.5029078750941558550433538847562588590939864154825157840834005501784649816887308647883415497467233528729158301607769681388012068696297105559293578143978
==========
N = 30
-2.50290787509589282228390287321821578646680176860039691031775654917766795265509729744526592646450042740446507878591787900049867737721570507320536950555730233579295835513049755147443371058966655808267708238725531317972934168867025986314993674780442067259540137373182289118085931144660762909479796000659221064009292629086715493687279575196998009641864622893829398787716116103377437075875363832302242567069139487987577674934100040362548883506087905376040253
==========
N = 50
-2.502907875095892822283902873218215786381271376727149977336192057101332548200937314996958507672489476908625081364772906241928997687752969884387075775409826023146215641547577466532741243269601282751925295432306116410106657184918989460966694448022967464829388745475907553581492862863985138991456223981472184668490051357340519579612964363612398303916281396319853356754143270841575980410327246872085525908340194148771845711576975756396502129782335009689753534469725319971417668188501056069881195819569086079634680349686470334534259147981662223720090416466839684581834935585761392847958319659186046886869906843047200313111109896554722068418988625250918702774965150788866328601960325890564496371195820144217636314897987844658619824730317832913871839316087814207
If you compare each successive result to the next one, you will see that the sequence converges – the N=10
result contains 11 converged digits (after the decimal point), the N=30
result contains 36 converged digits, and so on.
Here is my best result so far, 1177 digits. This includes new digits at the end, beyond those computed by Broadhurst.
2.50290787509589282228390287321821578638127137672714997733619205677923546317959020670329964974643383412959523186999585472394218237778544517927286331499337257811216359487950374478126099738059867123971173732892766540440103066983138346000941393223644906578899512205843172507873377463087853424285351988587500042358246918740820428170090171482305182162161941319985606612938274264970984408447010080545496779367608881264464068851815527093240075425064971570470475419932831783645332562415378693957125097066387979492654623137674591890981311675243422111013091312783716095115834123084150371649970202246812196440812166865274580430262457825610671501385218216449532543349873487413352795815351016583605455763513276501810781194836945957485023739823545262563277947539726990201289151664579394201989202488033940516996865514944773965338769797412323540617819896112494095990353128997733611849847377946108428833293833903950900891408635152562680338141466927991331074334970514354520134464342647520016213846107299226419943327729189777690538025968518850841613864279936834741390166705544353112159412076097447476975360415562684762316863202036958955323302591969942848633937659618260681047820499176267330237410
To check my results, I wrote a Julia script which accepts two files holding computed digits. The script then starts at the beginning of each file, and counts the number of matching characters.
stest = readstring("alpha_me.txt") #reading the test file
sref = readstring("alpha_oeis.txt") #reading the reference file
s = ""
count = 0
for i in length(min(stest,sref)) #iterating to count matches
if(stest[i] == sref[i])
count = count + 1
s = s * string(stest[i])
end
end
println("alpha_me.txt has $(length(stest)) digits.") #printing results
println("alpha_oeis.txt has $(length(sref)) digits.")
println("Test file agrees with reference file to $count digits.")
println("Digits of agreement: alpha = $s")
Here’s a run comparing one of my results against the digits reported by Broadhurst:
alpha_me.txt has 2441 digits.
alpha_oeis.txt has 1021 digits.
Test file agrees with reference file to 1020 digits.
Digits of agreement: alpha =
2.5029078750958928222839028732182157863812713767271499773361920567792354631795902067032996497464338341295952318699958547239421823777854451792728633149933725781121635948795037447812609973805986712397117373289276654044010306698313834600094139322364490657889951220584317250787337746308785342428535198858750004235824691874082042817009017148230518216216194131998560661293827426497098440844701008054549677936760888126446406885181552709324007542506497157047047541993283178364533256241537869395712509706638797949265462313767459189098131167524342211101309131278371609511583412308415037164997020224681219644081216686527458043026245782561067150138521821644953254334987348741335279581535101658360545576351327650181078119483694595748502373982354526256327794753972699020128915166457939420198920248803394051699686551494477396533876979741232354061781989611249409599035312899773361184984737794610842883329383390395090089140863515256268033814146692799133107433497051435452013446434264752001621384610729922641994332772918977769053802596851
This result matches all of Broadhurst’s digits, giving good confidence that my results are correct. Then, as I compute higher and higher number of digits, I can compare successive files against one another to find out how many digits have converged. As mentioned above, my current best result (above) is 1177 digits.
I am not yet the world’s record holder when it comes to computing – that honor belongs to Andrea Molteni who computed 10000 digits.
However, her calculation was written in C and linked to MPFR and MPI, so undoubtedly took some time and effort to write and debug. By taking advantage of Julia’s high-level features, I was able to write my program inside of a few hours.
Holmes, P., Whitley, D. “On the attracting set for Duffing’s equation”, Physica D, 111–123 (1983)
A. D’Innocenzo and L. Renna, “Modeling leaky faucet dynamics”, Phys. Rev. E, 55, 6676 (1997).
Paul Lindsay, “Period Doubling and Chaotic Behavior in a Driven Anharmonic Oscillator”, Physical Review Letters, 47 1349 (1981).
Mitchell J. Feigenbaum, “Universal Behavior in Nonlinear Systems,” in Los Alamos Science, Summer 1980, pp. 4-27 (available online at http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-80-5007).
Keith Briggs, “Feigenbaum scaling in discrete dynamical systems” (PhD Thesis, University of Melbourne, 1997) (available online via http://keithbriggs.info/thesis.html.
David Broadhurst, private communication. His results are available online at http://www.plouffe.fr/simon/constants/feigenbaum.txt.
Andrea Molteni, “An Efficient Method for the Computation of the Feigenbaum Constants to High Precision”, https://arxiv.org/pdf/1602.02357.pdf. Her results are available online at http://converge.to/feigenbaum/.
Philipp H. W. Hoffmann, “A Hitchhiker’s Guide to Automatic Differentiation”, https://arxiv.org/pdf/1411.0583.pdf.
This is a guest blog post, written and edited by Stuart Brorson, Dept of Mathematics, Northeastern University, and formatted by Julia Computing’s Rajshekar Behar
]]>Re-posted from: http://juliasnippets.blogspot.com/2017/11/basics-of-generating-random-numbers-in.html
Recently there were two questions regarding random number generation in Julia: one in discussion on Discourse and the other on Stack Overflow.
In this post I have tried to collect the basic information related to those two topics.
An important detail of rand() is that it produces a value in the interval [0,1[ and generates exactly 52 bits of randomness – it can produce 2⁵² distinct values. In order to understand why you must know that internally Julia generates a value from the interval [1, 2[ and then subtracts 1 from it. The reason for this approach is that Float64 has 52 bit significand and in range [1,2[ all representable floating point values are equally spaced, see e.g. Wikipedia. This ensures that values generated by rand() are uniformly distributed.
There are many options here, please refer to the Julia Manual for details. What I want to highlight here are the consequences of the fact that the underlying random number generator produces only 52 bits of randomness. Below we measure the time of obtaining one random sample of Int32 and Int64. Here is the benchmark:
julia> using BenchmarkTools
julia> @btime rand()
7.464 ns (0 allocations: 0 bytes)
0.5055246442408914
julia> @btime rand(Int32)
7.464 ns (0 allocations: 0 bytes)
37355051
julia> @btime rand(Int64)
10.730 ns (0 allocations: 0 bytes)
2952626120715896373
The reason for the difference is that in order to get Int64 we need to sample twice because 64>52. You might ask if this would matter in practical code. Actually it does, because on most machines you run Julia Int is Int64. Here is an example of sampling a value from a range (my machine is 64 bit):
julia> @btime rand(1:10)
37.789 ns (0 allocations: 0 bytes)
5
julia> @btime Int(rand(Int32(1):Int32(10)))
26.126 ns (0 allocations: 0 bytes)
4
If you can accept a bit of inaccuracy in the distribution you can get more speed for generating 1:n range if n is small with the following code (I use n=10 as above):
julia> @btime ceil(Int, 10rand())
15.395 ns (0 allocations: 0 bytes)
6
The discussion why it is inaccurate is presented on Discourse. Here let me give you an extreme example (the proper value of the evaluated expression is 0.5 in both cases):
julia> mean(ceil(Int64, 2^60 * rand()) % Bool for i in 1:10^6)
0.0
julia> mean(rand(1:2^60) % Bool for i in 1:10^6)
0.500582
We know why the first approach fails – Float64 does not have enough precision and we will always get an even number if we multiply such a value by 2⁶⁰.
And this is a less extreme example, but when multiplying by something smaller than 2⁵² (the correct result is 1):
julia> mean(rand(1:(2^50+2^49)) % 3 for i in 1:10^7)
1.0000001
julia> mean(ceil(Int64, (2^50+2^49) * rand()) % 3 for i in 1:10^7)
0.9586021
And again we see a bias (although not as large).
Sometimes it is enough to have a single source of pseudorandomness. In such cases Base.GLOBAL_RNG is enough. However, in many settings (like the one discussed on Stack Overflow) we need more than one independent random number generator. The usual approach is to generate seeds for consecutive random number generators that ensure that sequences produced by them do not overlap. In the paper Efficient Jump Ahead for F₂-Linear RandomNumber Generators you can find the detailed discussion of the topic.
In Julia you can generate such independent generators using randjump function. You give it a number of generators you require and they are guaranteed to use seeds that correspond to random number generator states separated by 10²⁰ steps.
There is one good practice if you decide use randjump. Call it only once do produce the required generators. Consecutive uses of randjump based on the same random number generator can give surprising results. In order to understand the reason for this approach consider the following example:
julia> srand(1)
MersenneTwister(UInt32[0x00000001], Base.dSFMT.DSFMT_state(Int32[1749029653, 1072851681, 1610647787, 1072862326, 1841712345, 1073426746, -198061126, 1073322060, -156153802, 1073567984 … 1977574422, 1073209915, 278919868, 1072835605, 1290372147, 18858467, 1815133874, -1716870370, 382, 0]), [1.23603, 1.34652, 1.31271, 1.00791, 1.48861, 1.21097, 1.95192, 1.9999, 1.25166, 1.98667 … 1.1489, 1.77623, 1.16774, 1.00214, 1.46738, 1.03244, 1.36938, 1.75271, 1.43027, 1.71293], 382)
julia> rand(100);
julia> x = randjump(Base.GLOBAL_RNG, 2)
2-element Array{MersenneTwister,1}:
MersenneTwister(UInt32[0x00000001], Base.dSFMT.DSFMT_state(Int32[-423222098, 1072940746, 1823958146, 1073056597, 94617959, 1073021145, 2081944769, 1072701541, -1344696523, 1073205595 … -1005791883, 1073144418, 24484970, 1073440808, 1370926729, 1336278534, -1527371338, -19485865, 382, 0]), [1.23603, 1.34652, 1.31271, 1.00791, 1.48861, 1.21097, 1.95192, 1.9999, 1.25166, 1.98667 …
1.1489, 1.77623, 1.16774, 1.00214, 1.46738, 1.03244, 1.36938, 1.75271, 1.43027, 1.71293], 100)
MersenneTwister(UInt32[0x00000001], Base.dSFMT.DSFMT_state(Int32[2095040610, 1073104849, 93118390, 1073625566, 1967429533, 1073003175, -889983042, 1073278562, 29166119, 1073602896 … 1102161927, 1072737217, -1901932835, 1073559995, 1843649418, -1848103721, -1079494630, -1219397333, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 382)
julia> rand();
julia> y = randjump(Base.GLOBAL_RNG, 2)
2-element Array{MersenneTwister,1}:
MersenneTwister(UInt32[0x00000001], Base.dSFMT.DSFMT_state(Int32[-423222098, 1072940746, 1823958146, 1073056597, 94617959, 1073021145, 2081944769, 1072701541, -1344696523, 1073205595 … -1005791883, 1073144418, 24484970, 1073440808, 1370926729, 1336278534, -1527371338, -19485865, 382, 0]), [1.23603, 1.34652, 1.31271, 1.00791, 1.48861, 1.21097, 1.95192, 1.9999, 1.25166, 1.98667 …
1.1489, 1.77623, 1.16774, 1.00214, 1.46738, 1.03244, 1.36938, 1.75271, 1.43027, 1.71293], 101)
MersenneTwister(UInt32[0x00000001], Base.dSFMT.DSFMT_state(Int32[2095040610, 1073104849, 93118390, 1073625566, 1967429533, 1073003175, -889983042, 1073278562, 29166119, 1073602896 … 1102161927, 1072737217, -1901932835, 1073559995, 1843649418, -1848103721, -1079494630, -1219397333, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 382)
julia> x[1] === y[1]
true
julia> x[2] === y[2]
false
julia> x[2] == y[2]
true
What do we learn from it:
Re-posted from: http://juliadiffeq.org/2017/11/24/Jacobians.html
The DifferentialEquations.jl 3.0 release had most of the big features and was
featured in a separate blog post.
Now in this release we had a few big incremental developments. We expanded
the capabilities of our wrapped libraries and completed one of the most
requested features: passing Jacobians into the IDA and DASKR DAE solvers.
Let’s just get started there:
Re-posted from: http://juliacomputing.com/press/2017/11/21/tangent-works-uses-julia-to-win-ieee-competition.html
Piscataway, NJ – Tangent Works used Julia to win the IEEE Global Energy Forecasting Competition 2017 (GEFCom2017).
Tangent Works is a European machine learning company that conducts real time energy forecasting.
Out of 177 competition teams, Tangent Works was one of just two competitors to win both the qualifying match and the final match.
According to Jan Dolinsky, Chief Technology Officer at Tangent Works, “We relied 100% on Julia to win this competition. At the core of our product we do numerous matrix operations where we rely heavily on unique aspects of the language such as out of the box support of SIMD instruction sets, direct access to BLAS, broadcasting, syntactic loop fusion and others.”
Dr. Tao Hong, Chair of the IEEE Working Group on Energy Forecasting and General Chair of the Global Energy Forecasting Competition explains:
Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.
Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:
To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.
Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.
Tangent Works is a European machine learning company providing the next generation of prediction solutions for businesses by automating the predictive model building process. With its breakthrough solution, TIM, companies can now generate accurate predictive models for time series analysis in seconds, fully automatic, as a transparent formula providing useful insights about the dynamics hidden in the data. TIM eliminates the data science skills burden, and is extremely efficient on computing time which allows machine learning solutions to be executed directly on a device. The core of the technology is applicable for a variety of advanced analytics challenges such as time series analysis, complex pattern recognition, anomaly detection and classification. Tangent Works was initiated in 2013 by four experienced partners, combining advanced machine learning knowledge, software product development, and years of international marketing & sales development in high tech environments. The company employs 15 specialists in offices in Belgium and Slovakia.
]]>What makes this formula stand out among other approximations of is that it allows one to directly extract the -th fractional digit of the hexadecimal value of without computing the preceding ones.
Image credit: Cormullion, Julia code here.
The Wikipedia article about the Bailey–Borwein–Plouffe formula explains that the -th fractional digit (well, actually it’s the -th) is given by
where
Only the fractional part of expression in square brackets on the right side of is relevant, thus, in order to avoid rounding errors, when we compute each term of the finite sum above we can take only the fractional part. This allows us to always use ordinary double precision floating-point arithmetic, without resorting to arbitrary-precision numbers. In addition note that the terms of the infinite sum get quickly very small, so we can stop the summation when they become negligible.
Here is a Julia implementation of the algorithm to extract the -th fractional digit of :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Return the fractional part of x, modulo 1, always positive
fpart(x) = mod(x, one(x))
function Σ(n, j)
# Compute the finite sum
s = 0.0
denom = j
for k in 0:n
s = fpart(s + powermod(16, n - k, denom) / denom)
denom += 8
end
# Compute the infinite sum
num = 1 / 16
frac = num / denom
while frac > eps(s)
s += frac
num /= 16
denom += 8
frac = num / denom
end
return fpart(s)
end
pi_digit(n) =
floor(Int, 16 * fpart(4Σ(n-1, 1) - 2Σ(n-1, 4) - Σ(n-1, 5) - Σ(n-1, 6)))
pi_string(n) = "0x3." * join(hex.(pi_digit.(1:n))) * "p0"
The pi_digit
function gives the -th hexadecimal fractional digit of
as a base-10 integer, and the pi_string
function returns the first
hexadecimal digits of as a valid hexadecimal floating-point
literal:
julia> pi_digit(1)
2
julia> pi_digit(6)
10
julia> pi_string(1000)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89452821e638d01377be5466cf34e90c6cc0ac29b7c97c50dd3f84d5b5b54709179216d5d98979fb1bd1310ba698dfb5ac2ffd72dbd01adfb7b8e1afed6a267e96ba7c9045f12c7f9924a19947b3916cf70801f2e2858efc16636920d871574e69a458fea3f4933d7e0d95748f728eb658718bcd5882154aee7b54a41dc25a59b59c30d5392af26013c5d1b023286085f0ca417918b8db38ef8e79dcb0603a180e6c9e0e8bb01e8a3ed71577c1bd314b2778af2fda55605c60e65525f3aa55ab945748986263e8144055ca396a2aab10b6b4cc5c341141e8cea15486af7c72e993b3ee1411636fbc2a2ba9c55d741831f6ce5c3e169b87931eafd6ba336c24cf5c7a325381289586773b8f48986b4bb9afc4bfe81b6628219361d809ccfb21a991487cac605dec8032ef845d5de98575b1dc262302eb651b8823893e81d396acc50f6d6ff383f442392e0b4482a484200469c8f04a9e1f9b5e21c66842f6e96c9a670c9c61abd388f06a51a0d2d8542f68960fa728ab5133a36eef0b6c137a3be4ba3bf0507efb2a98a1f1651d39af017666ca593e82430e888cee8619456f9fb47d84a5c33b8b5ebee06f75d885c12073401a449f56c16aa64ed3aa62363f77061bfedf72429b023d37d0d724d00a1248db0fead3p0"
While I was preparing this post I found an unregistered package PiBBP.jl that implements the Bailey–Borwein–Plouffe formula. This is faster than my code above, mostly thanks to a function for modular exponentiation more efficient than that available in Julia standard library.
Let’s check if the function is working correctly. We can use
the
parse
function
to convert the string to a decimal floating point
number. IEEE 754 double precision
floating-point numbers have a 53-bit mantissa, amounting to hexadecimal digits:
julia> pi_string(13)
"0x3.243f6a8885a30p0"
julia> parse(Float64, pi_string(13))
3.141592653589793
julia> Float64(π) == parse(Float64, pi_string(13))
true
Generator expressions allow us to obtain the decimal value of the number in a very simple way, without using the hexadecimal string:
julia> 3 + sum(pi_digit(n)/16^n for n in 1:13)
3.141592653589793
We can use
the
arbitrary-precision BigFloat
to check the correctness of the result for even more digits. By default,
BigFloat
numbers in Julia have a 256-bit mantissa:
julia> precision(BigFloat)
256
The result is correct for the first hexadecimal digits:
julia> pi_string(64)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89p0"
julia> BigFloat(π) == parse(BigFloat, pi_string(64))
true
julia> 3 + sum(pi_digit(n)/big(16)^n for n in 1:64)
3.141592653589793238462643383279502884197169399375105820974944592307816406286198
It’s possible to increase the precision of BigFloat
numbers, to further test
the accuracy of the Bailey–Borwein–Plouffe formula:
julia> setprecision(BigFloat, 4000) do
BigFloat(π) == parse(BigFloat, pi_string(1000))
end
true
Since the Bailey–Borwein–Plouffe formula extracts the -th digit of
without computing the other ones, we can write a multi-threaded version of
pi_string
, taking advantage of native support
for
multi-threading
in Julia:
1
2
3
4
5
6
7
function pi_string_threaded(N)
digits = Vector{Int}(N)
Threads.@threads for n in eachindex(digits)
digits[n] = pi_digit(n)
end
return "0x3." * join(hex.(digits)) * "p0"
end
For example, running Julia with 4 threads gives a 2× speed-up:
julia> Threads.nthreads()
4
julia> using BenchmarkTools
julia> pi_string(1000) == pi_string_threaded(1000)
true
julia> @benchmark pi_string(1000)
BenchmarkTools.Trial:
memory estimate: 105.28 KiB
allocs estimate: 2016
--------------
minimum time: 556.228 ms (0.00% GC)
median time: 559.198 ms (0.00% GC)
mean time: 559.579 ms (0.00% GC)
maximum time: 564.502 ms (0.00% GC)
--------------
samples: 9
evals/sample: 1
julia> @benchmark pi_string_threaded(1000)
BenchmarkTools.Trial:
memory estimate: 113.25 KiB
allocs estimate: 2018
--------------
minimum time: 270.577 ms (0.00% GC)
median time: 271.075 ms (0.00% GC)
mean time: 271.598 ms (0.00% GC)
maximum time: 278.350 ms (0.00% GC)
--------------
samples: 19
evals/sample: 1
Re-posted from: https://giordano.github.io/blog/2017-11-21-hexadecimal-pi/
The
Bailey–Borwein–Plouffe formula is
one of the
several
algorithms to compute .
Here it is:
What makes this formula stand out among other approximations of is that
it allows one to directly extract the -th fractional digit of the
hexadecimal value of without computing the preceding ones.
Image credit: Cormullion, Julia
code
here.
The Wikipedia article about the Bailey–Borwein–Plouffe formula explains that the
-th fractional digit (well, actually it’s the -th) is
given by
where
Only the fractional part of expression in square brackets on the right side of
is relevant, thus, in order to avoid rounding errors, when we compute
each term of the finite sum above we can take only the fractional part. This
allows us to always use ordinary double precision floating-point arithmetic,
without resorting to arbitrary-precision numbers. In addition note that the
terms of the infinite sum get quickly very small, so we can stop the summation
when they become negligible.
Here is a Julia implementation of the algorithm to
extract the -th fractional digit of :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Return the fractional part of x, modulo 1, always positive
fpart(x) = mod(x, one(x))
function Σ(n, j)
# Compute the finite sum
s = 0.0
denom = j
for k in 0:n
s = fpart(s + powermod(16, n - k, denom) / denom)
denom += 8
end
# Compute the infinite sum
num = 1 / 16
frac = num / denom
while frac > eps(s)
s += frac
num /= 16
denom += 8
frac = num / denom
end
return fpart(s)
end
pi_digit(n) =
floor(Int, 16 * fpart(4Σ(n-1, 1) - 2Σ(n-1, 4) - Σ(n-1, 5) - Σ(n-1, 6)))
pi_string(n) = "0x3." * join(hex.(pi_digit.(1:n))) * "p0"
The pi_digit
function gives the -th hexadecimal fractional digit of
as a base-10 integer, and the pi_string
function returns the first
hexadecimal digits of as a valid hexadecimal floating-point
literal:
julia> pi_digit(1)
2
julia> pi_digit(6)
10
julia> pi_string(1000)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89452821e638d01377be5466cf34e90c6cc0ac29b7c97c50dd3f84d5b5b54709179216d5d98979fb1bd1310ba698dfb5ac2ffd72dbd01adfb7b8e1afed6a267e96ba7c9045f12c7f9924a19947b3916cf70801f2e2858efc16636920d871574e69a458fea3f4933d7e0d95748f728eb658718bcd5882154aee7b54a41dc25a59b59c30d5392af26013c5d1b023286085f0ca417918b8db38ef8e79dcb0603a180e6c9e0e8bb01e8a3ed71577c1bd314b2778af2fda55605c60e65525f3aa55ab945748986263e8144055ca396a2aab10b6b4cc5c341141e8cea15486af7c72e993b3ee1411636fbc2a2ba9c55d741831f6ce5c3e169b87931eafd6ba336c24cf5c7a325381289586773b8f48986b4bb9afc4bfe81b6628219361d809ccfb21a991487cac605dec8032ef845d5de98575b1dc262302eb651b8823893e81d396acc50f6d6ff383f442392e0b4482a484200469c8f04a9e1f9b5e21c66842f6e96c9a670c9c61abd388f06a51a0d2d8542f68960fa728ab5133a36eef0b6c137a3be4ba3bf0507efb2a98a1f1651d39af017666ca593e82430e888cee8619456f9fb47d84a5c33b8b5ebee06f75d885c12073401a449f56c16aa64ed3aa62363f77061bfedf72429b023d37d0d724d00a1248db0fead3p0"
While I was preparing this post I found an unregistered
package PiBBP.jl that implements the
Bailey–Borwein–Plouffe formula. This is faster than my code above, mostly
thanks to a function
for
modular exponentiation
more efficient than that available in Julia standard library.
Let’s check if the function is working correctly. We can use
the
parse
function
to convert the string to a decimal floating point
number. IEEE 754 double precision
floating-point numbers have a 53-bit mantissa, amounting to hexadecimal digits:
julia> pi_string(13)
"0x3.243f6a8885a30p0"
julia> parse(Float64, pi_string(13))
3.141592653589793
julia> Float64(π) == parse(Float64, pi_string(13))
true
Generator expressions allow
us to obtain the decimal value of the number in a very simple way, without using
the hexadecimal string:
julia> 3 + sum(pi_digit(n)/16^n for n in 1:13)
3.141592653589793
We can use
the
arbitrary-precision BigFloat
to check the correctness of the result for even more digits. By default,
BigFloat
numbers in Julia have a 256-bit mantissa:
julia> precision(BigFloat)
256
The result is correct for the first hexadecimal
digits:
julia> pi_string(64)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89p0"
julia> BigFloat(π) == parse(BigFloat, pi_string(64))
true
julia> 3 + sum(pi_digit(n)/big(16)^n for n in 1:64)
3.141592653589793238462643383279502884197169399375105820974944592307816406286198
It’s possible to increase the precision of BigFloat
numbers, to further test
the accuracy of the Bailey–Borwein–Plouffe formula:
julia> setprecision(BigFloat, 4000) do
BigFloat(π) == parse(BigFloat, pi_string(1000))
end
true
Since the Bailey–Borwein–Plouffe formula extracts the -th digit of
without computing the other ones, we can write a multi-threaded version of
pi_string
, taking advantage of native support
for
multi-threading
in Julia:
1
2
3
4
5
6
7
function pi_string_threaded(N)
digits = Vector{Int}(N)
Threads.@threads for n in eachindex(digits)
digits[n] = pi_digit(n)
end
return "0x3." * join(hex.(digits)) * "p0"
end
For example, running Julia with 4 threads gives a 2× speed-up:
julia> Threads.nthreads()
4
julia> using BenchmarkTools
julia> pi_string(1000) == pi_string_threaded(1000)
true
julia> @benchmark pi_string(1000)
BenchmarkTools.Trial:
memory estimate: 105.28 KiB
allocs estimate: 2016
--------------
minimum time: 556.228 ms (0.00% GC)
median time: 559.198 ms (0.00% GC)
mean time: 559.579 ms (0.00% GC)
maximum time: 564.502 ms (0.00% GC)
--------------
samples: 9
evals/sample: 1
julia> @benchmark pi_string_threaded(1000)
BenchmarkTools.Trial:
memory estimate: 113.25 KiB
allocs estimate: 2018
--------------
minimum time: 270.577 ms (0.00% GC)
median time: 271.075 ms (0.00% GC)
mean time: 271.598 ms (0.00% GC)
maximum time: 278.350 ms (0.00% GC)
--------------
samples: 19
evals/sample: 1
Re-posted from: http://white.ucc.asn.au/2017/11/20/Thread-Parallelism-in-Julia.html
Julia has 3 kinds of parallelism.
The well known, safe, slowish and easyish, distributed parallelism, via pmap
, @spawn
and @remotecall
.
The wellish known, very safe, very easy, not-actually-parallelism, asynchronous parallelism via @async
.
And the more obscure, less documented, experimental, really unsafe, shared memory parallelism via @threads
.
It is the last we are going to talk about today.
I’m not sure if I can actually teach someone how to write threaded code.
Let alone efficient threaded code.
But this is me giving it a shot.
The example here is going to be fairly complex.
For a much simpler example of use,
on a problem that is more easily parallelizable,
see my recent stackoverflow post on parallelizing sorting.
(Spoilers: in the end I don’t manage to extract any serious performance gains from paralyzing this prime search. Unlike parallelizing that sorting. Paralising sorting worked out great)
In a previous post,
I used prime generation as an example to motivate the use of coroutines as generators.
Now coroutines are neither parallelism, nor fast.
Lets see how fast we can go if we want to crank it up using Base.Threading
.
(answer: not as much as you might hope).
I feel that julia threading is a bit nerfed.
In that all threading must take place in a for-loop, where work is distributed equally to all threads.
And the loop end blocks til all threads are done.
You can’t just fire off one thread to do a thing and then let it go.
I spent some time a while ago trying to workout how to do that,
and in short I found that end of thread block is hard to get around.
@async
on its own can’t break out of it.
Though one could rewrite ones whole program to never actually exit that loop.
But then one ends up building own own threading system.
And I have a thesis to finish.
This is the same paragraph from that earlier post.
I’ll let you know now, this is not an optimal prime number finding algorithm, by any means.
We’re just using it for demonstration. It has a good kind of complexity for talking about shared memory parallelism.
If a number is prime, then no prime (except the number itself), will divide it.
Since if it has a divisor that is non-prime, then that divisor itself, will have a prime divisor that will divide the whole.
So we only need to check primes as candidate divisors.
Further: one does not need to check divisibility by all prior primes in order to check if a number $s$ is prime.
One only needs to check divisibility by the primes less than or equal to $\sqrt{x}$, since if $x=a \times b$, for some $a>\sqrt{x}$ that would imply that $b<\sqrt{x}$, and so its composite nature would have been found when $b$ was checked as a divisor.
Here is the channel code for before:
Input:
Input:
Output:
primes_ch (generic function with 2 methods)
Input:
Output:
So the first and obvious thing to do is to switch to doing this eagerly with an array.
Input:
Output:
primes_array (generic function with 2 methods)
Input:
Output:
Input:
Output:
This gives an improvement, but not as much as we might really hope for.
(as you will see below getting more out of it is harder).
@threads
forthe @threads
macro eats for-loops and breaks up their ranges equally, one block per thread.
That isn’t very practical is your plan does not just a processing of some data that doesn’t depend strongly on the order of processing.
We don’t plan on sequentially processing the data, since breaking all numbers into equal blocks, would result the final thread not being able to do anything until almost all the other threads were done.
For this algorithm we need to know all the prime numbers less than $\sqrt{x}$ before we can check if $x$ is prime.
So we have a sequential component.
So we gut the @threads
macro, taking the core functionality,
and we will manage giving work to the threads ourselves.
Input:
Output:
everythread
Just to check it is working:
Input:
Output:
Before we can get into actually working on the parallelism, we need another part.
Pushing to the end of our list of known_primes
is no longer going to guarantee order.
One thing we will need is the ability to push!
that does maintain order.
Because otherwise we could endup thinking we have checked enough factors but actually we skipped over one.
(I made that mistake in an earlier version of this code).
We could use a priority queue for this, but since known primes will always be almost sorted,
I think it is going to be faster just to insert the elements into a normal vector.
Less pointer dereferencing than using a heap.
Input:
Output:
Ok so here we go with the main content of this piece.
Here is our plan.
known_primes
recording what we know.So what can go wrong?
The most important part of getting share memory parallism correct is making sure at no point is the same piece of memory being both written and read (or written and written).
There is no promise that any operation is actually atomic, except atomic operations, and the setting of locks.
Which brings me to our two tools for dealing with ensuring that memory is not dual operated on.
Atomic operators are a small set of operations available on primitive types.
They run on atomic types.
They might not perform quiet the operation you expect (so read the docs).
For example atomic_add!(a::Atomic{T}, b::T)
updates the value of a
, but returns its old value, as type T
.
Julia’s atomics come out of LLVM, more or less directly.
Then there are locks.
These are what you use if you want to make a block of code (which might modify non-primitively typed memory) not run at the same time as some other block of code.
Julia has two kinds of locks TatasLock
/SpinLock
, and Mutex
.
We’re going to use the first kind, they are based on atomics.
The second kind (the Mutex
) is based on lib_uv’s OS agnostic wrapper of they OS’s locking system.
So what do we need to restrict:
next_check
: the integer that keeps track what is the next. If we let multiple threads read it at the same time then they will initially keep checking the same numbers as each other. Once they get out of sync bad things will happen. Since it is a primitive type (unless a BigInt or similar is passed as the type) we can use an atomic.known_primes
: the list of primes we know about. Here are the operations we need to prevent against:
Vector
basically reserves (and uses) the right to move itself in memory whenever an element is added, even if you sizehint!
it. If this occurs in the middle of a getindex
then the value you think you are reading might not be there any more.The other thing we have going on is that we want to sleep our current thread if we are blocked by waiting for a missing prime.
This is done using Condition
, wait
and notify
(docs).
The advantage of sleeping the thread while it is waiting is that if oversubscribed (or you are doing other things on the computer), any threads currently waiting for a turn on the CPU can get it. I’m not oversubscribing here so it doesn’t really matter. If anything it is slowing it down.
Still it is good practice, and makes you a polite multi-threading citizen.
Input:
Output:
primes_threaded (generic function with 2 methods)
Input:
Output:
0-element Array{Int64,1}
Input:
Output:
That is right,
this multi-threaded code is much slower that the array code.
Getting performance out of multi-threading is hard.
I can’t teach it.
But I can show you want I am going to work it out next.
My theory is that there is too much lock contention.
Working in blocks reduces contention, it also results in more cache-friendly code.
Instead of each thread asking for one number to check then checking it,
then asking for another,
each thread asks for a bunch to check at a time.
The obvious contention reduction is with the atomic next_check
.
The less obvious is in the lock for known_primes
which is checked ever time one wants to know how long it is to test if it is time to exit the loop.
In the code that follows, while each thread asks for a block of numbers to check at a time, it reports found primes individually. I looked at having each thread collect them up localizing in a block and then inserting them into the main-list all at once. But I found that actually slowed things down. It mean allocating a lot more memory, and (I guess) the longer consecutive time in which known_primes
was locked for the big inserts was problematic.
Delaying checks for longer.
The really big cause of contention, I feel is the time to read known_primes
.
Especially, the earlier elements.
The smaller the prime the more likely it is to be a factor.
So we would like to at least be able to check these early primes without worrying about getting locks.
To do that we need to maintain a separate list of them.
I initially, just wanted to have an atomic value keeping track of up to how far in known_primes
, was safe to read, without having to worry about the elements changing.
Such that everything was in one array; and we knew which were safe to read.
But we can’t do that, because inserting elements can cause the array to reallocate, so requires it to be locked.
So we just use a second array.
Input:
Output:
primes_avoid (generic function with 3 methods)
Input:
Output:
Input:
Output:
So with that we’ve manage to scrape in a bit closers, but we are still losing to the single threaded array.
There is actually a flaw in this algorithm, I think.
Potentially, if your threads are far enough out of sync,
one could be waiting for a prime potential factor,
and the prime factor that arrives next, is not actually the next prime;
and further more that prime arriving early is larger than $\sqrt{x}$, so terminates the search;
incorrectly reporting $x$ as prime.
Which if the next prime to arrive was smaller than $\sqrt{x}$ and was a prime factor of $x$, that would make $x$ not a prime.
One solution would be to keep trace of which indices are actually stable.
We know an index is stable if it every thread is now working on checking a number that is greater than the prime at that index.
Pretty sure it is super unlikely and never happens,
but a fix for it gives me an idea for how to go faster
Before I said we were working in blocks but we were still pushing everything into a single array at the end.
We could actually work in Vector of Vectors,
it makes indexing harder but lets us be fine grained with our locks.
So what we are going to do is at the start of each block,
we are going to reserve a point in out Vector of Vectors known primes,
as well as what numbers we are going to check.
We need to allocate all the block locations at the start,
because increasing the size of an array is not threadsafe.
A big complication is we don’t know how many blocks we are going to need.
It took me a long time to workout the solution to this.
What we do is when we run out of allocated memory we let the block of code that is running on all threads terminate,
then we allocate more memory and restart it.
This code is pretty complex.
As you can see from all the assert statements it took me a fair bit of debugging to get it right.
Its still not much (if at all) better than serial.
But I think it well illustrates how you have to turn problems around to eak out speed when trying to parallelize them.
Note in particular how reserved_blocks
is a vector of atomics indexed by threadid()
to keep track of what memory is being held by what thread.
Input:
Output:
primes_blockmore (generic function with 3 methods)
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
So with all that,
We are still losing to the single threaded code.
Maybe if we were using more threads.
Or if the code was smarter,
we could pull ahead and go faster.
But today, I am willing to admit defeat.
It is really hard to make this kinda code actually speed-up.
If you can do better, I’ld be keen to know.
You can get the notebook that is behind this post from
github,
you could even fork it and make a PR and I’ll regenerate this blog post (:-D).
One way to make it much much faster is to use a different algorithm.
I’m sure there actually exist well documented parallel prime finders.
Re-posted from: http://white.ucc.asn.au/2017/11/18/Lazy-Sequences-in-Julia.html
I wanted to talk about using Coroutines for lazy sequences in julia.
Because I am rewriting CorpusLoaders.jl to do so in a nondeprecated way.
This basically corresponds to C# and Python’s yield
return statements.
(Many other languages also have this but I think they are the most well known).
The goal of using lazy sequences is to be able to iterate though something,
without having to load it all into memory.
Since you are only going to be processing it a single element at a time.
Potentially for some kind of moving average, or for acausal language modelling,
a single window of elements at a time.
Point is, at no point do I ever want to load all 20Gb of wikipedia into my program,
nor all 100Gb of Amazon product reviews.
And I especially do not want to load $\infty$ bytes of every prime number.
First some definitions.
An iterable is in essence something can processed sequentially.
The iterator is the tool uses to do this.
Sometimes the terms are uses casually and interchangably.
They exact implementation varies from language to language a surprising amount.
Though the core idea remains the same.
In julia they are define by (perhaps the most formal of all) informal interfaces.
In short iterators in julia are defined by a start
function which takes an iterable,
and gives back the initial state for an iterator upon it.
a next
function that takes an iterable and state and gives back the next value, and a new state,
and a done
function that takes the same and returns a boolean sayign if there iteratator is complete.
They also have Holy traits to define what eltype they return and how long they are.
Not going to go in to this here, but I will note that specifying the elype
is a feature channels have that the 0.5 producer based code does not.
and that 0.6 generators don’t have either right now.
Honestly if you are not familar with iterators the rest of this probably isn’t going to make much sense.
Coroutines are the generalisation of Functions (subroutines),
to have have multiple entry and exit points.
In julia they are called Task.
They are most notable for being used in the implementation of @async
etc.
For our purposes,
they have the ability to yield up control to another coroutine;
and when they get control passed back to them, they resume where they left off.
Its worth noting that they are not threads.
They do not themselves ever cause more than 1 thing to happen at a time on a computer.
Though there are similarities in how they operate.
Coroutines can be uses to implement iterators as we will show below
I think this is best explained by examples.
So here are 3.
Input:
We will start, just to explain the concepts with the fibonacci sequence.
We are not going to touch the slow recussive definition (recall that julia does not have tail call recursion).
That is not the point we want to illustrate here.
To begin with an array based methods.
This of course does not actually satisfy our goal of being an infinite sequence.
You must tell it how many to generate in advance.
As we will see later it is absurdly fast compaired to any of the lazy methods.
It does however use the full amount of RAM it ever needs all the time.
Where as the lazy methods that follow only need at any point a fixed amount of RAM.
This doesn’t matter for Fibonacci, as it doesn’t use much memory.
It would however matter if you were loading hundreds of gigabytes of machine learning data.
As a side note fibonacci numbers very quickly overflow an Int64
long before it can get to the 10_000. but that is ok for out demonstration purposes
Input:
Output:
fib_array (generic function with 1 method)
Input:
Output:
Tasks in 0.5 used to have a kind of built in functionality similar to Channel/put!
.
(They still kind of do at a lower level, but it is no longer exposed as an iterable)
This was depreated in 0.6 at #19841,
and is not in 0.7-dev.
Here for reference is what it would look like.
Input:
Output:
fib_task (generic function with 1 method)
It is actually easier to explain than the newer Channels way so I will explain it first.
In the main task there is (will be) an iterator.
When next
is called on that iterator,
the main task calls consume
, which yields control to a Task which is running the code in the do block.
When that code hits a produce
it yields control back to the main iterator Task, returning the value from produce to the consume
call, which results n next
returning that value.
When next
is called again in the main task, it will again call consume
which will result in the control being yielded back to the generating task, which will continue on from where it left, immediately after the produce
call.
This is how coroutines are used for generating data.
As they can pause midway through a function return a result and contine after when asked to.
I’m not displaying its timings here as it basically shoots a giant pile of deprecation warnings.
It is about 5 seconds, though some of that slowness might be attributed to the deprecation warnings slowing things down.
This is what we are really talking about.
From a pure logic standpoint on can think of this as functioning just like the task.
Where put!
triggers a yeild to the main iterator task, and take!
(inside next
) triggers a yield back down the the generating task.
What is actually happening is similar, but with a buffer involved.
The function passed to a channel runs its own Task until it tries to do a put!
when the buffer is full.
When that happens it yields control back to the main Task, and “sleeps” the generating task.
Which durng iteration will take!
an element off the buffer,
which will rewake the Task that wants to put things on the buffer (it @scheduals
it to act).
But it will not nesciarily get it to run straight away,
the main (iterator) Task does not yield until the buffer is empty and it can’t do a take!
.
(though something else, like IO might triger a yield, but not in this code).
Input:
Output:
fib_ch (generic function with 2 methods)
Input:
Output:
It took me a little time to get my head around what the buffer was (because Task producer didn’t use them).
It retrospect is is obvious.
It is a buffer.
The point of the buffer from a practical point of view is that it means that there is less task switching.
Modern CPU speeds work because the CPU is very efficient at running predictable code.
Everything stays in cache, pipelining occurs, maybe even SIMD.
Not to mention doing a task switch itself has an overhead.
So the buffer lets that be avoided, while still avoiding doing all the (potentially infinite) amount of work that would be done in the array case.
The trade-off is in how big to make your buffer.
Too small and you do task switching anyway, losing the advantages.
Too large and you end up calculating more outputs than you will consume,
and use a lot of memory at any point in time.
I initially made the mistake of setting the buffer size to typemax(Int64)
,
which resulted in the code hanging, until I did an interupt with Ctrl + C.
Which resulted it in completing normally with no errors.
(because it killed the task, midway through filling the huge buffer, well after the points I needed were enqueued).
For interest below is a plot of the timings with different buffer sizes.
Input:
Output:
So channels are pretty great.
Easy to write and nice.
One problem is, as you can see compared to arrays they are very slow.
Especially if you have a buffer that is too small or too large.
They also allocate (and then free) a lot of memory.
Enter ResumableFunctions.jl.
ResumableFunctions lets you keep the same “coroutines pausing after every output” logical model.
But in implementation, it is actually using macros to rewrite your code into a normal iterator,
with a state machine (in the state) tracking where it is upto.
The result of this is (normally) faster, more memory efficient code.
It uses @yield
instead of put!
(or produce
),
and the whole function has to be wrapped in a @resumable
macro, which does the rewrite.
Input:
Output:
fib_rf (generic function with 1 method)
Input:
Output:
For interest this is what the code it is generating looks like:
Input:
Output:
quote
begin
mutable struct ##835 <: ResumableFunctions.FiniteStateMachineIterator
_state::UInt8
prev::Int64
cur::Int64
function ##835(; )::##835
fsmi = new()
fsmi._state = 0x00
fsmi
end
end
end
function (_fsmi::##835)(_arg::Any=nothing; )::Any
_fsmi._state == 0x00 && $(Expr(:symbolicgoto, Symbol("#367#_STATE_0")))
_fsmi._state == 0x01 && $(Expr(:symbolicgoto, Symbol("#366#_STATE_1")))
error("@resumable function has stopped!")
$(Expr(:symboliclabel, Symbol("#367#_STATE_0")))
_fsmi._state = 0xff
_arg isa Exception && throw(_arg)
_fsmi.prev = 0
_fsmi.cur = 1
while true
_fsmi._state = 0x01
return _fsmi.cur
$(Expr(:symboliclabel, Symbol("#366#_STATE_1")))
_fsmi._state = 0xff
_arg isa Exception && throw(_arg)
(_fsmi.prev, _fsmi.cur) = (_fsmi.cur, _fsmi.prev + _fsmi.cur)
end
end
function fib_rf(; )::##835
##835()
end
end
Its pretty neat.
those $(Expr(:symbolicgoto...
are julia’s good old fashion @label
and @goto
macros (just expanded because macroexpand).
See there is no actual Task switching occuring in the expanded code.
On difference compared to using Channels is that the final element returned from a ResumableFunction is the return value of the resumable function itself.
In the case of infinite series (like fib
) this doesnn’t matter, since there is no late value.
but for finite iterables it does.
and it can make writing the resumable function difficult.
(I’m not sure if it would be possible to modify @resumable
in order to not do this.
I think it would be nontrivial.)
See ResumableFunctions.jl/#2.
Perhaps the most well known way of creating lazy sequences in julia are generator expressions.
(foo(x) for x in bar)
.
They can also be written in do block functional form as shown below.
Input:
Output:
fib_gen (generic function with 1 method)
Input:
Output:
Generators are not as flexible as the coroutine technieques above.
Because they can only have a single return statement.
They are fine this time, because that is all you need for Fibonacci.
The will not work for out next example though.
(You can do it but it is complicated and I think kinda slow)
Interleave is a iterator operation.
Interleave takes an set of iterables,
and outputs one element from each in turn.
It is basically the transpose of IterTools.chain
.
It is the basic of the logic programming language Microkanren.
It allows the language to have infinite pending parellel checks, but to terminate if any one branch find the answer.
We are not going to implement microkanren today, though some people have made julia implementsions before.
MuKanren.jl, LilKanren.jl (I’ve not tried these to see if they currently work. Today is not a day I need a logic programming language.).
Input:
Output:
interleave_ch (generic function with 1 method)
Input:
Output:
The resumable function version of this is broken at time of the writing this blog post.
See ResumableFunctions.jl/#11.
This example is not so good because even the iterator version has to keep all past outputs in memory.
However, I’ll show it off here,
because its a kinda neat algorith for finding all primes.
(The array version is also going to be faster, even faster in this case, because one can use the Prime Number Theorem to estimate how many primes are less than the number to check up to, and sizehint!
the array the to be returned.)
If a number is prime, then no prime (except the number itself), will divide it.
Since if it has a divisor that is nonprime, then that divisor itself, will have a prime divisor that will divide the whole.
So we only need to check primes as candidate divisors.
Further: one does not need to check divisiability by all prior primes in order to check if a number $s$ is prime.
One only needs to check divisibility by the primes less than or equal to $\sqrt{x}$, since if $x=a \times b$, for some $a>\sqrt{x}$ that would imply that $b<\sqrt{x}$, and so its composite nature would have been found when $b$ was checked as a divisor.
Input:
Output:
primes_ch (generic function with 1 method)
Input:
Output:
Input:
Output:
Input:
Output:
primes_rf (generic function with 1 method)
Input:
Output:
Interestingly, the channel version is 2x as fast as the ResumableFunctions version.
I’m not sure why that is.
Could be cache related.
These coroutine based sequence generators are pretty great.
They are much more flexible than generator expressions.
They are much less annoying to write than custom iterators.
They let you do things lazily to avoid using all your RAM.
They’ll probably get faster in future version of julia.
ResumableFunctions.jl is a neat package to keep an eye on.
Re-posted from: http://juliacomputing.com/press/2017/11/17/jc-wins-risktech-award.html
New York, NY – Julia Computing was selected by Chartis Research as a RiskTech Rising Star for 2018.
The RiskTech100® Rankings are acknowledged globally as the most comprehensive and independent study of the world’s major players in risk and compliance technology. Based on nine months of detailed analysis by Chartis Research, the RiskTech100® Rankings assess the market effectiveness and performance of firms in this rapidly evolving space.
Rob Stubbs, Chartis Research Head of Research, explains, “We interviewed thousands of risk technology buyers, vendors, consultants and systems integrators to identify the leading RiskTech firms for 2018. We know that risk analysis, risk management and regulatory requirements are increasingly complex and require solutions that demand speed, performance and ease of use. Julia Computing has been developing next-generation solutions to meet many of these requirements.”
For example, Aviva, Britain’s second-largest insurer, selected Julia to achieve compliance with the European Union’s new Solvency II requirements. According to Tim Thornham, Aviva’s Director of Financial Modeling Solutions, “Solvency II compliant models in Julia are 1,000x faster than our legacy system, use 93% fewer lines of code and took 1/10 the time to implement.” Furthermore, the server cluster size required to run Aviva’s risk model simulations fell 95% from 100 servers to 5 servers, and simpler code not only saves programming, testing and execution time and reduces mistakes, but also increases code transparency and readability for regulators, updates, maintenance, analysis and error checking.
About Julia and Julia Computing
Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.
Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:
To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.
Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.
About Chartis Research
Chartis Research is the leading provider of research and analysis on the global market for risk technology. It is part of Infopro Digital, which owns market-leading brands such as Risk and WatersTechnology. Chartis’ goal is to support enterprises as they drive business performance through improved risk management, corporate governance and compliance, and to help clients make informed technology and business decisions by providing in-depth analysis and actionable advice on virtually all aspects of risk technology.
]]>Online algorithms accept input one observation at a time. Consider a mean of n
data points:
By adding a single observation, the mean could be recalculated from scratch (offline):
Or we could use only the current estimate and the new observation (online):
A big advantage of online algorithms is that data does not need to be revisited when new observations are added. It is therefore not necessary for the dataset to be fixed in size or small enough to fit in computer memory. The disadvantage is that not everything can be calculated exactly like the mean above. Whenever exact solutions are impossible, OnlineStats relies on state of the art stochastic approximation algorithms.
The statistics/models of OnlineStats are subtypes of OnlineStat
:
using OnlineStats, Plots
# Each OnlineStat is a type
o = IHistogram(100)
o2 = Sum()
# OnlineStats are grouped together in a Series
s = Series(o, o2)
# Updating the Series updates the grouped OnlineStats
y = randexp(100_000)
# fit!(s, y) translates to:
for yi in y
fit!(s, yi)
end
plot(o)
A Series groups together any number of OnlineStats which share a common input. The input (single observation) of an OnlineStat can be a scalar (e.g. Variance
), a vector (e.g. CovMatrix
), or a vector/scalar pair (e.g. LinReg
).
The Series constructor optionally accepts data to fit!
right away.
julia> Series(randn(100), Mean(), Variance())
▦ Series{0} with EqualWeight
├── nobs = 100
├── Mean(0.0899071)
└── Variance(0.952008)
MV
type can turn a scalar-input OnlineStat into a vector-input version.julia> Series(randn(100, 2), CovMatrix(2), MV(2, Mean()))
▦ Series{1} with EqualWeight
├── nobs = 100
├── CovMatrix([0.916472 0.089655; 0.089655 0.984442])
└── MV{Mean}(0.17287277199330608, -0.12199728546589127)
julia> Series((randn(100, 3), randn(100)), LinReg(3))
▦ Series{(1, 0)} with EqualWeight
├── nobs = 100
└── LinReg: β(0.0) = [-0.0486756 -0.0437766 -0.160813]
value
returns the stat’s valuejulia> o = Mean()
Mean(0.0)
julia> value(o)
0.0
value
on a Series maps value
to the statsjulia> s = Series(Mean(), Variance())
▦ Series{0} with EqualWeight
├── nobs = 0
├── Mean(0.0)
└── Variance(-0.0)
julia> value(s)
(0.0, -0.0)
stats
returns a tuple of statsjulia> m, v = stats(s)
(Mean(0.0), Variance(-0.0))
At first glance, it appears necessary that a Series must be fit!
t-ed serially, but OnlineStats
provides merge
/merge!
methods for combining two Series into one. This is how
JuliaDB is able to use OnlineStats in a
distributed fashion. Below is a simple (not actually parallel) example of merging.
s1 = Series(Mean(), Variance())
s2 = Series(Mean(), Variance())
s3 = Series(Mean(), Variance())
fit!(s1, randn(1000))
fit!(s2, randn(1000))
fit!(s3, randn(1000))
merge!(s1, s2)
merge!(s1, s3)
This is a small sample of OnlineStats functionality. For more information, stay tuned for future posts or check out the OnlineStats Github repo and documentation.
]]>Re-posted from: http://juliacomputing.com/blog/2017/11/16/onlinestats.html
OnlineStats is a package for computing statistics and models via online algorithms. It is designed for taking on big data and can naturally handle out-of-core processing, parallel/distributed computing, and streaming data. JuliaDB fully integrates OnlineStats for providing analytics on large persistent datasets. While future posts will dive into this integration, this post serves as a light introduction to OnlineStats.
Online algorithms accept input one observation at a time. Consider a mean of n
data points:
By adding a single observation, the mean could be recalculated from scratch (offline):
Or we could use only the current estimate and the new observation (online):
A big advantage of online algorithms is that data does not need to be revisited when new observations are added. It is therefore not necessary for the dataset to be fixed in size or small enough to fit in computer memory. The disadvantage is that not everything can be calculated exactly like the mean above. Whenever exact solutions are impossible, OnlineStats relies on state of the art stochastic approximation algorithms.
The statistics/models of OnlineStats are subtypes of OnlineStat
:
using OnlineStats, Plots
# Each OnlineStat is a type
o = IHistogram(100)
o2 = Sum()
# OnlineStats are grouped together in a Series
s = Series(o, o2)
# Updating the Series updates the grouped OnlineStats
y = randexp(100_000)
# fit!(s, y) translates to:
for yi in y
fit!(s, yi)
end
plot(o)
A Series groups together any number of OnlineStats which share a common input. The input (single observation) of an OnlineStat can be a scalar (e.g. Variance
), a vector (e.g. CovMatrix
), or a vector/scalar pair (e.g. LinReg
).
The Series constructor optionally accepts data to fit!
right away.
julia> Series(randn(100), Mean(), Variance())
▦ Series{0} with EqualWeight
├── nobs = 100
├── Mean(0.0899071)
└── Variance(0.952008)
MV
type can turn a scalar-input OnlineStat into a vector-input version.julia> Series(randn(100, 2), CovMatrix(2), MV(2, Mean()))
▦ Series{1} with EqualWeight
├── nobs = 100
├── CovMatrix([0.916472 0.089655; 0.089655 0.984442])
└── MV{Mean}(0.17287277199330608, -0.12199728546589127)
julia> Series((randn(100, 3), randn(100)), LinReg(3))
▦ Series{(1, 0)} with EqualWeight
├── nobs = 100
└── LinReg: β(0.0) = [-0.0486756 -0.0437766 -0.160813]
value
returns the stat’s valuejulia> o = Mean()
Mean(0.0)
julia> value(o)
0.0
value
on a Series maps value
to the statsjulia> s = Series(Mean(), Variance())
▦ Series{0} with EqualWeight
├── nobs = 0
├── Mean(0.0)
└── Variance(-0.0)
julia> value(s)
(0.0, -0.0)
stats
returns a tuple of statsjulia> m, v = stats(s)
(Mean(0.0), Variance(-0.0))
At first glance, it appears necessary that a Series must be fit!
t-ed serially, but OnlineStats
provides merge
/merge!
methods for combining two Series into one. This is how
JuliaDB is able to use OnlineStats in a
distributed fashion. Below is a simple (not actually parallel) example of merging.
s1 = Series(Mean(), Variance())
s2 = Series(Mean(), Variance())
s3 = Series(Mean(), Variance())
fit!(s1, randn(1000))
fit!(s2, randn(1000))
fit!(s3, randn(1000))
merge!(s1, s2)
merge!(s1, s3)
This is a small sample of OnlineStats functionality. For more information, stay tuned for future posts or check out the OnlineStats Github repo and documentation.
]]>Re-posted from: http://juliacomputing.com/blog/2017/11/16/julia-featured-in-insidehpc.html
Julia and Julia Computing are featured in a new insideHPC white paper titled “AI-HPC Is Happening Now.”
insideHPC is a leading blog in the high-performance computing (HPC) community.
The article notes that “Julia … recently delivered a peak performance of 1.54 petaflops using 1.3 million threads on 9,300 Intel Xeon Phi processor nodes of the Cori supercomputer at NERSC. The Celeste project utilized a code written entirely in Julia that processed approximately 178 terabytes of celestial image data and produced estimates for 188 million stars and galaxies in 14.6 minutes.”
Julia Computing CTO (Tools) Keno Fischer explains, “We used Julia on the world’s sixth most powerful supercomputer to achieve a performance improvement of 1,000x over unoptimized single core execution. We have demonstrated that Julia scales effectively and efficiently from a single laptop or desktop to dozens or hundreds of nodes in the cloud and multithreaded parallel supercomputing at petascale. Julia has been downloaded more than 1.2 million times, an annual increase of +161%. Julia is also helping quantitative finance analysts on Wall Street and rocket scientists at NASA’s Jet Propulsion Laboratory achieve faster computing speeds with higher productivity.”
About Julia and Julia Computing
Julia is the fastest modern high performance open source computing language for data, analytics, algorithmic trading, machine learning and artificial intelligence. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C++ and Java. Julia delivers dramatic improvements in simplicity, speed, capacity and productivity. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub and adoption is growing rapidly in finance, insurance, energy, robotics, genomics, aerospace and many other fields.
Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, Grindr, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC and Uber.
Julia is lightning fast. Julia is being used in production today and has generated speed improvements up to 1,000x for insurance model estimation and parallel supercomputing astronomical image analysis.
Julia provides unlimited scalability. Julia applications can be deployed on large clusters with a click of a button and can run parallel and distributed computing quickly and easily on tens of thousands of nodes.
Julia is easy to learn. Julia’s flexible syntax is familiar and comfortable for users of Python, R and Matlab.
Julia integrates well with existing code and platforms. Users of C, C++, Python, R and other languages can easily integrate their existing code into Julia.
Elegant code. Julia was built from the ground up for mathematical, scientific and statistical computing. It has advanced libraries that make programming simple and fast and dramatically reduce the number of lines of code required – in some cases, by 90% or more.
Julia solves the two language problem. Because Julia combines the ease of use and familiar syntax of Python, R and Matlab with the speed of C, C++ or Java, programmers no longer need to estimate models in one language and reproduce them in a faster production language. This saves time and reduces error and cost.
Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.
]]>Re-posted from: https://giordano.github.io/blog/2017-11-12-analemma/
You may know that if you check the position of the Sun every day in the same
place at the same time (accounting for daylight saving time if necessary),
you’ll find that it slightly moves. This is a combination of the tilt of the
Earth’s axis and the Earth’s orbital eccentricity. The path traced out by the
position in the sky of the Sun during its wandering is
called analemma.
Afternoon analemma taken in 1998–99 by Jack Fishburn in Murray Hill, New
Jersey, USA. Image
credit:
Jfishburn, Wikimedia Commons, GFDL 1.2+ and CC-BY-SA 3.0.
We can use Julia to plot the analemma. In particular,
we’ll employ AstroLib.jl to do the
needed calculations. Throughout this post I’ll assume you have installed
the latest stable version of Julia and the
necessary packages with
the
built-in package manager.
What we want to do is to determine
the position of the Sun for
a specific time every day in a year, say at noon for the whole 2018. This is
the recipe:
The trickiest part is to get the right Julian dates.
The
jdcnv
function
in AstroLib.jl
assumes that times are given
in UTC standard, but
Heidelberg is one hour ahead of Greenwich. In order to work around this issue
we can use the TimeZones.zdt2julian
provided by
the TimeZones.jl
package which
takes care of the time zones. In addition, Germany adopts daylight saving time
from March to October, thus noon on May 15th is not actually the
same time of day as
noon on November 7th. However, noon on January 1st is the same time of day as
noon on December 31st, so we can create a range between these two times with
step one (Julian) day.
using AstroLib, TimeZones
function analemma(start_date, end_date,
latitude, longitude, elevation)
julian_dates = TimeZones.zdt2julian(start_date):TimeZones.zdt2julian(end_date)
right_ascension, declination = sunpos(julian_dates)
altaz = eq2hor.(right_ascension, declination,
julian_dates, latitude, longitude, elevation)
altitude = getindex.(altaz, 1)
azimuth = getindex.(altaz, 2)
return azimuth, altitude
end
We have
used
sunpos
to
get the position of the Sun in equatorial coordinates and converted them
with
eq2hor
to
horizontal coordinates, specifying the coordinates of Heidelberg.
The broadcast version of this
function returns an array of 2-tuples, being the first element the altitude of
the Sun and the second element its azimuth. We’ve used getindex.(altaz, i)
to
obtain the arrays with the i
-th elements of the tuples. Now we can draw the
analemma. I recommend using
the Plots.jl
package, which provides
a single interface to several different back-ends (GR, PyPlot, PGFPlots,
etc…).
using Plots, Base.Dates
azimuth, altitude =
analemma(ZonedDateTime(2018, 1, 1, 12, tz"Europe/Berlin"),
ZonedDateTime(2018, 12, 31, 12, tz"Europe/Berlin"),
ten(49, 25), ten(8, 43), 114)
scatter(azimuth, altitude, aspect_ratio = :equal,
xlabel = "Azimuth (°)", ylabel = "Altitude (°)")
You can check with the JPL HORIZONS System
that this is accurate within a
few arcminutes.
Re-posted from: http://juliacomputing.com/blog/2017/11/08/november-newsletter.html
We wanted to thank all Julia users and well wishers for the continued use of and support for Julia, and share some of the latest developments from Julia Computing and the Julia community.
Julia Computing Selected for RiskTech100 2018 Rising Star Award: Julia Computing was honored to be selected for the RiskTech100 2018 Rising Star Award.
Julia Computing CEO Viral Shah (center) accepts RiskTech100 2018 Rising Star Award from Chartis Research Head of Research Rob Stubbs (left) and Neuberger Berman Senior Portfolio Manager Steve Eisman (right)
NVIDIA: High Performance GPU Computing in the Julia Programming Language by Julia developer Tim Besard – Oct 25, 2017
The chart below compares CUDAnative.jl performance with CUDA C++ for 10 benchmarks. CUDAnative.jl provides a 30%+ performance improvement compared with CUDA C++ for the nn benchmark and is comparable (+/- 7%) for the other nine benchmarks tested.
Path BioAnalytics and Julia Computing Research Collaboration: Path BioAnalytics and Julia Computing entered into a research collaboration to advance precision medicine and drug development for cystic fibrosis.
Upcoming Events Featuring Julia: Do you know of any upcoming conferences, meetups, trainings, hackathons, talks, presentations or workshops involving Julia? Would you like to organize a Julia event on your own, or in partnership with your company, university or other organization? Let us help you spread the word and support your event by sending us an email with details. Here are a few upcoming events:
Recent Events Featuring Julia: Do you want to share photos, videos or details of your most recent conference, meetup, training, hackathon, talk, presentation or workshop involving Julia? Please send us an email with details and links.
Recent highlights include:
Grace Hopper Celebration of Women in Computing. Jane Herriman, Director of Diversity and Outreach, represented Julia Computing at the Grace Hopper Celebration of Women in Computing in Orlando, FL October 4-6. Details are available here.
Alan Turing Institute. Julia Computing’s Mike Innes and UCL’s Pontus Stenetorp presented “Best Practice from Julia: Impact through Efficient Research Code at the British Library on October 24. Details and a link to the video is available here.
Julia Computing Presents Celeste at the US Library of Congress. The Planetary Society invited Julia Computing to present the Celeste project at the US Library of Congress in Washington DC on October 25. Details are available here.
Other recent Julia events include:
Recent Blog Posts in the Julia Community:
Writing Extendable and Hardware Agnostic GPU Libraries by Simon Danisch – Nov 6, 2017
“I hope this blog post illustrates how nice it can be to write GPU code using Julia!”
Drawing 2.7 Billion Points in 10 Seconds by Simon Danisch – Oct 31, 2017
“Since I’ve been very happy at how quickly I was able to create a very fast solution [using Julia], I decided to share my experience!”
DifferentialEquations.jl 3.0 and a Roadmap for 4.0 by Christopher Rackauckas – Oct 30, 2017
“The 30 people who make up the JuliaDiffEq team have really built a software which has the methods to solve most differential equations that users encounter and also do so efficiently.”
Defining Custom Units in Julia and Python by Erik Engheim – Oct 29, 2017
“This is an interesting case for Julia because it shows quite clearly the advantages of using a language supporting multiple dispatch in comparison to a more traditional object-oriented language such as Python, which relies on single dispatch.”
I Like Julia Because It Scales and Is Productive – Insights from a Julia Developer by Christopher Rackauckas – Oct 13, 2017
“Julia is not only a fast language, but what makes it unique is how predictable the performance and the compilation process is.”
Non-Linear Regression in Julia by Julio Cardenas-Rodriguez – June 14, 2017
“[F]or a simple processing task of calculating a T1 map of a lemon, Julia is 10 times faster than Python and ~635 times faster than Matlab.”
Contact Us: Please contact us if you wish to:
About Julia and Julia Computing
Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.
Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:
To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.
Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.
Re-posted from: http://juliacomputing.com/blog/2017/11/03/celeste-at-library-of-congress.html
Julia Computing was invited by The Planetary Society to represent the Celeste team at a space research reception for members of Congress, their staff, scientists and space, science and technology policymakers at the Library of Congress in Washington DC on October 25.
Participants included Julia Computing, Boeing, Lockheed Martin, Johns Hopkins Applied Physics Lab, Aerospace Industries Association, The Planetary Society, several members of Congress and more than 100 Congressional staff, current and former NASA employees and contributors, scientists, policymakers and advocates.
Photos by Tushar Dayal for The Planetary Society
“The Celeste project wouldn’t have happened without government support,” affirmed Julia Computing’s Keno Fischer. Two of the Celeste partners – Lawrence Berkeley National Laboratory and the National Energy Research Scientific Computing Center (NERSC) – are federally funded institutions. They contributed thousands of hours of staff time plus the world’s sixth most powerful supercomputer to help the Celeste team achieve the following milestones:
Analyzed 178 terabytes of astronomical image data from the Sloan Digital Sky Survey and catalogued 188 million stars and galaxies in 14.6 minutes
Achieved peak performance of 1.54 petaflop per second using 1.3 million threads on 9,300 Knight Landing (KNL) nodes on the world’s sixth most powerful supercomputer
Achieved a performance improvement of 1,000x in single threaded execution
When the Large Synoptic Survey Telescope (LSST) begins operation in 2019, it will produce more visual data every few days than the Sloan Digital Sky Survey’s Apache Point telescope has produced in 20 years. Using Julia and the Cori supercomputer, the Celeste team will be able to analyze and catalog every object in the LSST nightly images in just 5 minutes.
]]>