### Christina Lee

So far in this blog, all my plots have either used PyPlot, or the experimental GLVisualize. BUT, NO LONGER!

While writing up my research notes, I’ve been trying to improve my PyPlot graphs. I was hoping for them to look nicer; Julia’s PyPlot was saying no. We had a falling out over background colors and resolution. So, I have decided to move on to a plotting partner more willing to cope with my demands.

When I first started learning Julia, Gadfly seemed too unfamiliar, and control too tricky. I didn’t give it much further thought. After some digging, Gadfly has changed my mind, and I hope today I can change yours too. Right now, like Julia, Gadfly doesn’t have the maturity or documentation of PyPlot, but once it does, you better watch out.

Gadfly preferentially produces plots in SVG (scalable vector graphics) form, instead of the rasterized form of PNG or JPG favored by PyPlot. As displayed in the figure left, rasterized images are composed of colored squares, so don’t look the best when highly magnifified. SVG, which is based on XML, stores the position of vector elements and scales to high resolution without artifacts. Tools such as Inkscape also work in SVG, and can edit the individual pieces of a graph since they haven’t been rasterized together.

## Let’s plot some graphs!

By Yug, modifications by 3247 (Unknown) [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons
</div>

# Time to load our Packages
#Pkg.update()

using Compose

# Some test data
x=collect(1:.4:10);
y1=sin(x);
y2=cos(x);

# A simple plot
plot(x=x,y=y1)


Unlike PyPlot, we can give plot functions as well as points. After receiving a function, a max value, and a min value, Gadfly figures everything else out for itself.

For the function, we can pass in either an inbuilt function, a user defined function, or an anonymous function.

The brackets [ ] allow us to group multiple functions together in the plot.

# plotting function

function testfunction(x::Real)
return 1-x^2
end

plot([cos,testfunction,x->x^2],0,1)


But that’s just one set of points…
How do we plot more than one set of data? That’s where layers come in. Instead of using plot, we use layer, and set it as a variable. We can then plot those layers.

p1=layer(x=x,y=y1,Geom.point)
p2=layer(x=x,y=y2,Geom.point)
plot(p1,p2)


## Different style of plotting: Geom

Now, what if we want to plot something other than lines?
Gadfly allows a wide variety of other ways to plot our data. We control this with the Geom (Geometry) object.

plot(x=x,y=y1,Geom.point,Geom.line)

plot(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.point,Geom.errorbar)

plot(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.line,Geom.ribbon)

plot(x=x,y=y1,Geom.bar)

plot(x=x,y=y1,Geom.step)

plot(x=y1,y=y2,Geom.path())


## Edit Point and Line style

Disclaimer: I’ve learned how to do the next two things by reading the code, not the documentation.

It seems like they are newer and less developed features than everything else I’m discussing here. The syntax seems less polished than in other areas, so I believe it’s still under construction.

z=collect(0:.2:2)

xp=[z;z+.5;z+1;z+1.5;z+2;z+2.5;z+3;z+3.5]
yp=[z;z;z;z;z;z;z;z]
sh=[ones(z);2*ones(z);3*ones(z);4*ones(z);5*ones(z);6*ones(z);7*ones(z);8*ones(z)]

plot(x=xp,y=yp,shape=sh,Geom.point,Theme(default_point_size=5pt))

# or Compose.mm for smaller sizes
# These ratios and numbers changed around how ever you like

dash = 6 * Compose.cm
dot = 1.5 * Compose.cm
gap = 1 * Compose.cm

l1=layer(x=x,y=zeros(x),Geom.line,Theme(line_style=[dot]))
l2=layer(x=x,y=ones(x),Geom.line,Theme(line_style=[dash]))
l3=layer(x=x,y=2*ones(x),Geom.line,Theme(line_style=[gap]))
l4=layer(x=x,y=3*ones(x),Geom.line,Theme(line_style=[dot,dash,dot]))
l5=layer(x=x,y=4*ones(x),Geom.line,Theme(line_style=[gap,dash]))

plot(l1,l2,l3,l4,l5,Coord.Cartesian(ymin=-.5,ymax=4.5),Theme(grid_line_width=0pt))


## Guide: Labeling Axes

Where Geom alters how we plot, Guide alters the labeling.
Guide ties in with Compose.jl through the Guide.annotate command, but that will take a tutorial in itself.

plot(x=x,y=y2,color=y2,
Guide.xlabel("xlabel"),Guide.ylabel("ylabel"),Guide.title("Something explanatory"),
Guide.colorkey("y2"))


Here’s something we can do with a combination of Guide and Scale. Using Guide, we can set where we specifically want our xticks to be, namely at multiples of $\pi$. But then, the labeling would write out some irrational number, making the plot look horrible. So, we create a function that takes in a number and outputs a string label for Scale to use.

function labelname(x::Real)
n=round(Int,x/π) #nearest integer*pi
if n==0
return "0"
elseif n==1
return "π"
end
return("$n π") end xmarks=[0,π,2π,3π] ymarks=[-1,0,1] plot(x=x,y=y2, Guide.yticks(ticks=ymarks), Guide.xticks(ticks=xmarks),Scale.x_continuous(labels=labelname))  Some other cool things we can do with Scale: * Automatically transform the axis according to log, log10,log2, asinh, or sqrt. * Write numbers in :plain, :scientific, :engineering, or :auto plot(x->exp(x),0,10,Scale.y_log,Scale.x_continuous(format=:scientific))  # Themes I mostly chose Gadfly because of the control I could have with the Theme command. http://dcjones.github.io/Gadfly.jl/themes.html has a much more exhaustive list than what I will be demonstrating with here. # Solarized Colors that I like working with # http://ethanschoonover.com/solarized using Colors base03=parse(Colorant,"#002b36"); base02=parse(Colorant,"#073642"); base01=parse(Colorant,"#586e75"); base00=parse(Colorant,"#657b83"); base0=parse(Colorant,"#839496"); base1=parse(Colorant,"#839496"); base2=parse(Colorant,"#eee8d5"); base3=parse(Colorant,"#fdf6e3"); yellow=parse(Colorant,"#b58900"); orange=parse(Colorant,"#cb4b16"); red=parse(Colorant,"#dc322f"); magenta=parse(Colorant,"#d33682"); violet=parse(Colorant,"#6c71c4"); blue=parse(Colorant,"#268bd2"); cyan=parse(Colorant,"#3aa198"); green=parse(Colorant,"#859900");  plot(x=x,y=y1, Theme(highlight_width=0pt, # lose the white ring around the points default_point_size=3pt, # size of the dots default_color=magenta, # color of the dots background_color=base03, # ... background color ... grid_color=base2, # the lines minor_label_color=base2, # numbers on axes color key_label_color=base2, # color key numbers color key_title_color=cyan, # label of color key color major_label_color=cyan, # axes label and title color major_label_font_size=20pt, # font size for axes label and title minor_label_font_size=15pt, #font size for numbers on axes panel_opacity=1 #setting background to opaque );)  ## Coord: Setting the boundries The documentation states that you can change the range of the axes using Scale, but I’ve found it better to use this format to set my min and max values. plot(x=x,y=y1,Geom.line,Coord.Cartesian(ymin=0,ymax=1,xmin=0,xmax=10))  So far, we have covered seperately Guide, Themes, and partially Coord and Scale. Individually, each aspect doesn’t add too much clunkiness to the code. However, if we start to add everything together, then the plot function would look quite nasty. Luckily, just like layers for data points, we can put our modifiers into variables. Then we can simply call the variables in our plot function. This also helps for when we want to use one theme for a series of graphs. We can define the theme variable up top, and then include it in every graph there after, never having to declare it again. This helps me to keep my graphs uniform in style. function labelname(x::Real) n=round(Int,x/π) if n==0 return "0" elseif n==1 return "π" else return("$n π")
end
return("$n π") end xticks=[0,π,2π,3π] yticks=[-1,0,1] data=layer(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.line,Geom.ribbon) f=layer(cos,0,3π) yt=Guide.yticks(ticks=yticks) xt=Guide.xticks(ticks=xticks) xs=Scale.x_continuous(labels=labelname) t= Theme(highlight_width=0pt, default_point_size=3pt, default_color=blue, background_color=base03, grid_color=base2, minor_label_color=base2, key_label_color=base2, key_title_color=cyan, major_label_color=cyan, major_label_font_size=20pt, minor_label_font_size=15pt, panel_opacity=1) xl=Guide.xlabel("time") yl=Guide.ylabel("y axis") GT=Guide.title("Important title") plot(data,f,yt,xt,xs,t,xl,yl,GT)  Although we still have to call quite a few variables, this is a much simpler way of doing things. # Saving the Figure Julia naturally saves to SVG (or SVGJS). We have to specify the x and y dimensions of the plot, but since these images rescale so easily, we can choose some reasonable numbers. p=plot(data,f,yt,xt,xs,t,xl,yl,GT) draw(SVG("myplot.svg", 15cm, 9cm), p)  If you want to save to PNG, PDF, or PS, the package Cairo.jl provides that ability. using Cairo draw(PNG("myplot.png",15cm,9cm),p)  # Happy Plotting! “julia “ # GPU Programming in Julia Re-posted from: http://juliacomputing.com/blog/2016/06/09/julia-gpu.html Scientific problems have been traditionally solved with powerful clusters of homogeneous CPUs connected in a variety of network topologies. However, the number of supercomputers that employ accelerators has steadily been on the rise. The latest Top500 list released at SC ‘15 shows that the number of supercomputers that employ accelerators has risen to 109. Accelerators that are employed in practice are mostly graphics processing units (GPUs), Xeon Phis and FPGAs. These accelerators take advantage of many-core architectures which can be used to exploit coarse and fine grained parallelism. However, the traditional problem with using GPUs and other accelerators has been the ease (or lack thereof) of programming them. To this end, NVIDIA Corporation designed the currently pervasive Compute Unified Device Architecture (CUDA) to allow for a C-like interface for scientific and general purpose programming. This was a considerable improvement over previous frameworks such as DirectX or OpenGL that required advanced skills in graphics programming. However, CUDA would still feature low on a productivity curve, with programmers having to fine tune their applications for different devices and algorithms. In this context, interactive programming on the GPU would provide tremendous benefits to scientists and programmers who not only wish to prototype their applications, but to deploy them with little or no code changes. # Julia on GPUs Julia offers programmers the ability to code interactively on the GPU. There are several libraries wrapped in Julia, giving Julia users access to accelerated BLAS, FFTs, sparse routines and solvers, and deep learning. With a combination of these packages, programmers can interactively develop custom GPU kernels. One such example is the Conjugate Gradient, which has been benchmarked below: However, one might argue that low-level wrapper libraries do not in any manner increase programmer productivity as they involve working with obscure function interfaces. In such a case, it would be ideal to have a clean array interface for arrays on the GPU with a convenient standard library that operates on these arrays. Each operation would in turn be tuned with regards to the device in question to achieve great performance. The folks over at ArrayFire have put together a high quality open source library to work on scientific problems with GPUs. # ArrayFire.jl ArrayFire.jl is a set of Julia bindings to the library. It is designed to mimic the Julia standard library in its versatility and ease of use, providing an easy-yet-powerful rrray interface that points to locations on GPU memory. Julia’s multiple dispatch and generic programming capabilities make it possible for users to write natural mathematical code and transparently leverage GPUs for performance.This is done by defining a type AFArray as a subtype of AbstractArray. AFArray now acts as an interface to an array on device memory. A set of functions are imported from Base Julia and are dispatched across the new AFArray type. Thus users may be able to write code in Julia that runs on the CPU and port it to run on the GPU with very few code changes. In addition to functions that mimic Julia’s standard library, ArrayFire.jl provides powerful functions in image processing and computer vision, amongst others. ### Usage The following examples illustrate high level usage: using ArrayFire #Random number generation a = rand(AFArray{Float64}, 100, 100) b = randn(AFArray{Float64}, 100, 100) #Transfer to device from the CPU host_to_device = AFArray(rand(100,100)) #Transfer back to CPU device_to_host = Array(host_to_device) #Basic arithmetic operations c = sin(a) + 0.5 d = a * 5 #Logical operations c = a .> b any_trues = any(c) #Reduction operations total_max = maximum(a) colwise_min = min(a,2) #Matrix operations determinant = det(a) b_positive = abs(b) product = a * b dot_product = a .* b transposer = a' #Linear Algebra lu_fact = lu(a) cholesky_fact = chol(a*a') #Multiplied to create a positive definite matrix qr_fact = qr(a) svd_fact = svd(a) #FFT fast_fourier = fft(a)  ### Benchmarks ArrayFire.jl has also been benchmarked for common operations (Note that Julia’s default RNG and the one that ArrayFire uses are not directly comparable): The benefits of accelerated code can be seen in real world applications. Consider the following image segmentation demo on satellite footage of the Hurricane Katrina. Image segmentation is an important step in weather forecasting, and should be performed on many high definition images on a daily basis. In such a use-case, interactive GPU programming would allow the applications designer to leverage powerful graphics processing on the GPU with little or no code changes from his original prototype. The application used the K-means algorithm which can easily be expressed in Julia and accelerated by ArrayFire.jl. It initializes some random clusters and then reassigns the clusters according to Manhattan distances. Another interesting example is non-negative matrix factorization, which is often used in linear algebra and multivariate analysis. It is applied in fields such as computer vision, document clustering, chemometrics, audio signal processing, and recommender systems. The following application implements the Lee-Seung algorithm: ### Changing Backends ArrayFire.jl also has the added advantage that it can switch backends at runtime, which allows a user to choose the appropriate backend according to hardware availability. setBackend(AF_BACKEND_CUDA) ### Future Work • Allowing ArrayFire.jl users to easily interface with other packages in the JuliaGPU ecosystem would allow them access to accelerated and possibly more memory-efficient kernels (for signal processing or deep learning, for example). • Currently, only dense linear algebra is supported. It would be worthwhile to wrap sparse linear algebra libraries and interface with them seamlessly. • Allow users to interface with packages such as GLVisualize.jl for 3D visualizations on the GPU using OpenGL (or Vulkan, its recently released successor.) # Using Julia’s Type System For Hidden Performance Gains What I want to share today is how you can use Julia’s type system to hide performance gains in your code. What I mean is this: in many cases you may find out that the optimal way to do some calculation is not a “clean” solution. What do you do? What I want to do is show how you can define special arrays which are wrappers such that these special “speedups” are performed in the background, while having not having to keep all of that muck in your main algorithms. This is easiest to show by example. The examples I will be building towards are useful for solving ODEs and SDEs. Indeed, these tricks have all been implemented as part of DifferentialEquations.jl and so these examples come from a real use case! They really highlight a main feature of Julia: since Julia code is fast (as long as you have type stability!), you don’t need to worry about writing code outside of Julia, and you can take advantage of higher-level features (with caution). In Python, normally one would only get speed benefits by going down to C, and so utilizing these complex objects would not get speed benefits over simply using numpy arrays in the most vectorized fashion. The same holds for R. In MATLAB… it would be really tough to implement most of this in MATLAB! ## Taking Random Numbers in Chunks: ChunkedArrays Let’s say we need to take random numbers in a loop like the following: for i = 1:N dW = randn(size(u)) #Do some things and add dW to u end While this is the “intuitive” code to write, it’s not necessarily the best. While there have been some improvements made since early Julia, in principle it’s just slower to make 1000 random numbers via randn() than to use randn(1000). This is because of internal speedups due to caching, SIMD, etc. and you can find mentions of this fact all over the web especially when people are talking about fast random number generators like from the VSL library. So okay, what we really want to do is the following. Every “bufferSize” steps, create a new random number dW which is of size size(u)*bufferSize, and go through using the buffer until it is all used up, and then grab another buffer. for i = 1:N if i%bufferSize == 0 dW = randn(size(u),bufferSize) end #Do some things and add dW[..,i] to u end But wait? What if we don’t always use one random number? Sometimes the algorithm may need to use more than one! So you can make an integer k which tracks the current state in the buffer, and then at each point where it can be incremented, you add the conditional to grab a new buffer, etc. Also, what if you want to have the buffer generated in parallel? As you can see, code complexity explosion, just to go a little faster? This is where ChunkedArrays come in. What I did is defined an array which essentially does the chunking/buffering in the background, so that way the code in the algorithm could be clean. A ChunkedArray is a wrapper over an array, and then used the next command to hide all of this complexity. Thus, to generate random numbers in chunks to get this speed improvement, you can use code like this: rands = ChunkedArray(u) for i = 1:N if i%bufferSize == 0 dW = next(rands) end #Do some things and add dW[..,i] to u end Any time another random number is needed, you just call next. It internally stores an array and the state of the buffer, and the next function automatically check / replenishes the buffer, and can launch another process to do this in parallel if the user wants. Thus we get the optimal solution without sacrificing cleanliness. I chopped off about 10% of a runtime in Euler-Maruyama code in DifferentialEquations.jl by switching to ChunkedArrays, and haven’t thought about doing a benchmark since. ## Safe Vectors of Arrays and Conversion: GrowableArrays First let’s look at the performance difference between Vectors of Arrays and higher-dimensional contiguous arrays when using them in a loop. Julia’s arrays can take in a parametric type which makes the array hold arrays, this makes the array essentially an array of pointers. The issue here is that this adds an extra cost every time the array is dereferenced. However, for high-dimensional arrays, the : way of referencing has to generate a slice each time. Which way is more performant? function test1() u = Array{Int}(4,4,3) u[:,:,1] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] u[:,:,2] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] u[:,:,3] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] j = 1 for i = 1:100000 j += sum(u[:,:,1] + u[:,:,2] + 3u[:,:,3] + u[:,:,i%3+1] - u[:,:,(i%j)%2+1]) end end function test2() u = Vector{Matrix{Int}}(3) u[1] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] u[2] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] u[3] = [1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1] j = 1 for i = 1:100000 j += sum(u[1] + u[2] + 3u[3] + u[i%3+1] - u[(i%j)%2+1]) end end function test3() u = Array{Int}(4,4,4) u[1,:,:] = reshape([1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1],(1,4,4)) u[2,:,:] = reshape([1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1],(1,4,4)) u[3,:,:] = reshape([1 2 3 4 1 3 3 4 1 5 6 3 5 2 3 1],(1,4,4)) j = 1 for i = 1:100000 j += sum(u[1,:,:] + u[2,:,:] + 3u[3,:,:] + u[i%3+1,:,:] - u[(i%j)%2+1,:,:]) end end #Pre-compile test1() test2() test3() t1 = @elapsed for i=1:10 test1() end t2 = @elapsed for i=1:10 test2() end t3 = @elapsed for i=1:10 test3() end println("Test results: t1=$t1, t2=$t2, t3=$t3")
#Test results: t1=1.239379946, t2=0.576053075, t3=1.533462129

So using Vectors of Arrays is fast for dereferecing.

Now think about adding to an array. If you have a Vector of pointers and need to resize the array, it’s much easier to resize and copy over some pointers then it is to copy over all of the arrays. So, if you’re going to grow an array in a loop, the Vector of Arrays is the fastest implementation! Here’s a quick benchmark from GrowableArrays.jl:

using GrowableArrays, EllipsisNotation
using Base.Test

tic()
const NUM_RUNS = 100
const PROBLEM_SIZE = 1000
function test1()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = u
for i = 1:PROBLEM_SIZE
uFull = hcat(uFull,u)
end
uFull
end

function test2()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = u

for i = 1:PROBLEM_SIZE
uFull = vcat(uFull,u)
end
uFull
end

function test3()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = Vector{Int}(0)
sizehint!(uFull,PROBLEM_SIZE*16)
append!(uFull,vec(u))

for i = 1:PROBLEM_SIZE
append!(uFull,vec(u))
end
reshape(uFull,4,4,PROBLEM_SIZE+1)
uFull
end

function test4()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = Vector{Array{Int}}(0)
push!(uFull,copy(u))

for i = 1:PROBLEM_SIZE
push!(uFull,copy(u))
end
uFull
end

function test5()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = Vector{Array{Int,2}}(0)
push!(uFull,copy(u))

for i = 1:PROBLEM_SIZE
push!(uFull,copy(u))
end
uFull
end

function test6()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = Vector{typeof(u)}(0)
push!(uFull,u)

for i = 1:PROBLEM_SIZE
push!(uFull,copy(u))
end
uFull
end

function test7()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = GrowableArray(u)
for i = 1:PROBLEM_SIZE
push!(uFull,u)
end
uFull
end

function test8()
u =    [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]

uFull = GrowableArray(u)
sizehint!(uFull,PROBLEM_SIZE)
for i = 1:PROBLEM_SIZE
push!(uFull,u)
end
uFull
end

println("Run Benchmarks")
println("Pre-Compile")
#Compile Test Functions
test1()
test2()
test3()
test4()
test5()
test6()
test7()
test8()

println("Running Benchmarks")
t1 = @elapsed for i=1:NUM_RUNS test1() end
t2 = @elapsed for i=1:NUM_RUNS test2() end
t3 = @elapsed for i=1:NUM_RUNS test3() end
t4 = @elapsed for i=1:NUM_RUNS test4() end
t5 = @elapsed for i=1:NUM_RUNS test5() end
t6 = @elapsed for i=1:NUM_RUNS test6() end
t7 = @elapsed for i=1:NUM_RUNS test7() end
t8 = @elapsed for i=1:NUM_RUNS test8() end

println("Benchmark results: $t1$t2 $t3$t4 $t5$t6 $t7$t8")

#Benchmark results: 1.923640854 2.131108443 0.012493308 0.00866045 0.005246504 0.00532613 0.00773568 0.00819909

As you can see in test7 and test8, I created a “GrowableArray” which is an array which acts like a Vector of Arrays. However, it has an added functionality that if you copy(G), then what you get is the contiguous array. Therefore in the loop you can grow the array the quickest way as a storage machine, but after the loop (say to plot the array), but at any time you can copy it to a contiguous array which is more suited for interop with C and other goodies.

It also hides a few painful things. Notice that in the code we pushed a copy of u (copy(u)). This is because when u is an array, it’s only the reference to the array, so if we simply push!(uFull,u), every element of uFull is actually the same item! This benchmark won’t catch this issue, but try changing u and you will see that every element of uFull changes if you don’t use copy. This can be a nasty bug, so instead we build copy() into the push!() command for the GrowableArray. This gives another issue. Since copying a GrowableArray changes it, you need to make sure push! doesn’t copy on arguments of GrowableArrays (to create GrowableArrays of GrowableArrays). However, this is easily managed via dispatch.

## Helping Yourself and the Julia Community

Simple projects like these lead to re-usable solutions to improve performance while allowing for ease of use. I have just detailed some projects I have personally done (and have more to do!), but there are others that should be pointed out. I am fond of projects like VML.jl which speedup standard functions, and DoubleDouble.jl which implements efficient quad-precision numbers that you can then use in place of other number types.

I think Julia will succeed not by the “killer packages” that are built in Julia, but by a rich type ecosystem that will make everyone want to build their “killer package” in Julia.

The post Using Julia’s Type System For Hidden Performance Gains appeared first on Stochastic Lifestyle.