Tag Archives: plotting

GLVisualize – A modern Graphics Platform for Julia

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2016/11/glvisualize-a-modern-graphics-platform-for-julia/

GLVisualize is a package written in pure Julia, which can create 2D/3D visualizations and GUIs. It’s GPU accelerated and can thus easily handle large datasets and complex 2D/3D scenes. At its core it’s a visualization library heavily using Reactive signals allowing the user to animate and interact with the graphics. I recently wrote a backend for Plots which is a great general purpose plotting library. As a result you can have much of the standard plotting functionality known from libraries such as Matplotlib with GLVisualize + Plots.jl.

Just install it and try it out! I’ve collected some videos, which should give a good overview about what’s possible:

 

Why GLVisualize?

Creating a library like GLVisualize completely from scratch is pretty involved. So I’ve been asking myself why go through all this trouble when there are wonderful alternatives to GLVisualize. Why not concentrate on one existing package?

First of all, some of the established libraries don’t really hold up to what I would call a fast, modern and easy to extend library.

More modern libraries seem to be mostly written for the web. It’s great, but sometimes one really wants a library that works well on the desktop, is not written in JS, and plays well with other desktop APIs (e.g. GPGPU). But I see the beauty of sharing things online, so I do want to break with my `all Julia` approach and write a web backend for GLVisualize as soon as possible!

GUIs are another important aspect to look at. I’ve seen a lot of frameworks using third party GUI libraries which is a very reasonable way to go! But I’ve some use cases in mind which need the graphics rendering to seamlessly integrate with the GUI library. E.g. creating a point cloud where every point is a button. Or draw a line plot which acts as the control for a complex parameter space. I already needed to create quite a few complex 3D UI elements which wouldn’t be possible in most GUI libraries. So at the end of the day putting time into an API for interactive widgets makes a lot of sense.

Another piece of motivation is that in the long run Julia will need some native graphic tools for several reasons. It’s impossible to write one plotting library solving everyone’s needs, so people will start to create their own visualization tools. Having no graphics libraries to build upon will mean throwing these users into the cold water or make them change the language. The livelier Julia’s own (GPU accelerated) Graphics ecosystem, the higher the probability that you can create great, novel visualizations and get insights into your data. GLVisualize can be the starting point, and we as a community have full control over the features and use cases!

Finally, I still haven’t seen another language bringing together speed, scientific computing and easy usage as beautifully as Julia. These are all elements crucial to any visualization library, so GLVisualize is at a very good place with Julia!

State

This is the first version of GLVisualize where I feel comfortable to recommend it to the public! I’ve been working on fixing bugs, backend issues and adding most crucial features for general plotting. As you saw in the video, it now contains a large collection of widgets and many visualization types to chose from. Please note that I haven’t optimized GLVisualize much yet. This means GLVisualize should only get faster from here on! But it also means that there are a couple of bottlenecks. I sketched out a road map to remove them and to improve latency/performance in general. Depending on the use case, you can expect speed improvements by an order of magnitude. One such use case is animations with Plots.jl, which is quite slow right now. For every frame it completely tears down GLVisualize’s state and rebuilds it, which is not what I’ve optimized for yet. So even though that GLVisualize might offer the fastest animation speeds of all backends, animations done with Plots.jl can be slow due to this bad interaction. There are a couple of other things which I collected in one meta issue.

Please report bugs and feedback swiftly 🙂

Best,

Simon

Gadfly

I’m Moving to Gadfly

### Christina Lee

So far in this blog, all my plots have either used PyPlot, or the experimental GLVisualize. BUT, NO LONGER!

While writing up my research notes, I’ve been trying to improve my PyPlot graphs. I was hoping for them to look nicer; Julia’s PyPlot was saying no. We had a falling out over background colors and resolution. So, I have decided to move on to a plotting partner more willing to cope with my demands.

When I first started learning Julia, Gadfly seemed too unfamiliar, and control too tricky. I didn’t give it much further thought. After some digging, Gadfly has changed my mind, and I hope today I can change yours too. Right now, like Julia, Gadfly doesn’t have the maturity or documentation of PyPlot, but once it does, you better watch out.


Gadfly preferentially produces plots in SVG (scalable vector graphics) form, instead of the rasterized form of PNG or JPG favored by PyPlot. As displayed in the figure left, rasterized images are composed of colored squares, so don’t look the best when highly magnifified. SVG, which is based on XML, stores the position of vector elements and scales to high resolution without artifacts. Tools such as Inkscape also work in SVG, and can edit the individual pieces of a graph since they haven’t been rasterized together.

Let’s plot some graphs!

By Yug, modifications by 3247 (Unknown) [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons
</div>

# Time to load our Packages
#Pkg.update()
#Pkg.add("Gadfly")
using Gadfly

#Pkg.add("Compose")
using Compose
# Some test data
x=collect(1:.4:10);
y1=sin(x);
y2=cos(x);
# A simple plot
plot(x=x,y=y1)

Unlike PyPlot, we can give plot functions as well as points. After receiving a function, a max value, and a min value, Gadfly figures everything else out for itself.

For the function, we can pass in either an inbuilt function, a user defined function, or an anonymous function.

The brackets [ ] allow us to group multiple functions together in the plot.

# plotting function

function testfunction(x::Real)
    return 1-x^2
end

plot([cos,testfunction,x->x^2],0,1)

But that’s just one set of points…
How do we plot more than one set of data? That’s where layers come in. Instead of using plot, we use layer, and set it as a variable. We can then plot those layers.

p1=layer(x=x,y=y1,Geom.point)
p2=layer(x=x,y=y2,Geom.point)
plot(p1,p2)

Different style of plotting: Geom

Now, what if we want to plot something other than lines?
Gadfly allows a wide variety of other ways to plot our data. We control this with the Geom (Geometry) object.

plot(x=x,y=y1,Geom.point,Geom.line)
plot(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.point,Geom.errorbar)
plot(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.line,Geom.ribbon)
plot(x=x,y=y1,Geom.bar)
plot(x=x,y=y1,Geom.step)
plot(x=y1,y=y2,Geom.path())

Edit Point and Line style

Disclaimer: I’ve learned how to do the next two things by reading the code, not the documentation.

It seems like they are newer and less developed features than everything else I’m discussing here. The syntax seems less polished than in other areas, so I believe it’s still under construction.

z=collect(0:.2:2)

xp=[z;z+.5;z+1;z+1.5;z+2;z+2.5;z+3;z+3.5]
yp=[z;z;z;z;z;z;z;z]
sh=[ones(z);2*ones(z);3*ones(z);4*ones(z);5*ones(z);6*ones(z);7*ones(z);8*ones(z)]

plot(x=xp,y=yp,shape=sh,Geom.point,Theme(default_point_size=5pt))
# or Compose.mm for smaller sizes
# These ratios and numbers changed around how ever you like

dash = 6 * Compose.cm
dot = 1.5 * Compose.cm
gap = 1 * Compose.cm

l1=layer(x=x,y=zeros(x),Geom.line,Theme(line_style=[dot]))
l2=layer(x=x,y=ones(x),Geom.line,Theme(line_style=[dash]))
l3=layer(x=x,y=2*ones(x),Geom.line,Theme(line_style=[gap]))
l4=layer(x=x,y=3*ones(x),Geom.line,Theme(line_style=[dot,dash,dot]))
l5=layer(x=x,y=4*ones(x),Geom.line,Theme(line_style=[gap,dash]))

plot(l1,l2,l3,l4,l5,Coord.Cartesian(ymin=-.5,ymax=4.5),Theme(grid_line_width=0pt))

Guide: Labeling Axes

Where Geom alters how we plot, Guide alters the labeling.
Guide ties in with Compose.jl through the Guide.annotate command, but that will take a tutorial in itself.

plot(x=x,y=y2,color=y2,
Guide.xlabel("xlabel"),Guide.ylabel("ylabel"),Guide.title("Something explanatory"),
Guide.colorkey("y2"))

Here’s something we can do with a combination of Guide and Scale. Using Guide, we can set where we specifically want our xticks to be, namely at multiples of $\pi$. But then, the labeling would write out some irrational number, making the plot look horrible. So, we create a function that takes in a number and outputs a string label for Scale to use.

function labelname(x::Real)
    n=round(Int,x/π) #nearest integer*pi
    if n==0
        return "0"
    elseif n==1
        return "π"
    end
    return("$n π")
end

xmarks=[0,π,2π,3π]
ymarks=[-1,0,1]

plot(x=x,y=y2,
Guide.yticks(ticks=ymarks),
Guide.xticks(ticks=xmarks),Scale.x_continuous(labels=labelname))

Some other cool things we can do with Scale:
* Automatically transform the axis according to log, log10,log2, asinh, or sqrt.
* Write numbers in :plain, :scientific, :engineering, or :auto

plot(x->exp(x),0,10,Scale.y_log,Scale.x_continuous(format=:scientific))

Themes

I mostly chose Gadfly because of the control I could have with the Theme command. http://dcjones.github.io/Gadfly.jl/themes.html has a much more exhaustive list than what I will be demonstrating with here.

# Solarized Colors that I like working with
# http://ethanschoonover.com/solarized
using Colors
base03=parse(Colorant,"#002b36");
base02=parse(Colorant,"#073642");
base01=parse(Colorant,"#586e75");
base00=parse(Colorant,"#657b83");
base0=parse(Colorant,"#839496");
base1=parse(Colorant,"#839496");
base2=parse(Colorant,"#eee8d5");
base3=parse(Colorant,"#fdf6e3");

yellow=parse(Colorant,"#b58900");
orange=parse(Colorant,"#cb4b16");
red=parse(Colorant,"#dc322f");
magenta=parse(Colorant,"#d33682");
violet=parse(Colorant,"#6c71c4");
blue=parse(Colorant,"#268bd2");
cyan=parse(Colorant,"#3aa198");
green=parse(Colorant,"#859900");
plot(x=x,y=y1,
    Theme(highlight_width=0pt, # lose the white ring around the points
    default_point_size=3pt,    # size of the dots
    default_color=magenta,     # color of the dots
    background_color=base03,   # ... background color ...
    grid_color=base2,     # the lines
    minor_label_color=base2,  # numbers on axes color
    key_label_color=base2, # color key numbers color
    key_title_color=cyan,  # label of color key color
    major_label_color=cyan, # axes label and title color
    major_label_font_size=20pt, # font size for axes label and title
    minor_label_font_size=15pt, #font size for numbers on axes
    panel_opacity=1 #setting background to opaque
);)

Coord: Setting the boundries

The documentation states that you can change the range of the axes using Scale, but I’ve found it better to use this format to set my min and max values.

plot(x=x,y=y1,Geom.line,Coord.Cartesian(ymin=0,ymax=1,xmin=0,xmax=10))

So far, we have covered seperately Guide, Themes, and partially Coord and Scale. Individually, each aspect doesn’t add too much clunkiness to the code. However, if we start to add everything together, then the plot function would look quite nasty.

Luckily, just like layers for data points, we can put our modifiers into variables. Then we can simply call the variables in our plot function.

This also helps for when we want to use one theme for a series of graphs. We can define the theme variable up top, and then include it in every graph there after, never having to declare it again. This helps me to keep my graphs uniform in style.

function labelname(x::Real)
    n=round(Int,x/π)
    if n==0
        return "0"
    elseif n==1
        return "π"
    else
        return("$n π")
    end
    return("$n π")
end

xticks=[0,π,2π,3π]
yticks=[-1,0,1]

data=layer(x=x,y=y1,ymin=y1-.1,ymax=y1+.1,Geom.line,Geom.ribbon)
f=layer(cos,0,3π)

yt=Guide.yticks(ticks=yticks)
xt=Guide.xticks(ticks=xticks)
xs=Scale.x_continuous(labels=labelname)

t= Theme(highlight_width=0pt,
    default_point_size=3pt,
    default_color=blue,
    background_color=base03,
    grid_color=base2,
    minor_label_color=base2,
    key_label_color=base2,
    key_title_color=cyan,
    major_label_color=cyan,
    major_label_font_size=20pt,
    minor_label_font_size=15pt,
    panel_opacity=1)

xl=Guide.xlabel("time")
yl=Guide.ylabel("y axis")
GT=Guide.title("Important title")

plot(data,f,yt,xt,xs,t,xl,yl,GT)

Although we still have to call quite a few variables, this is a much simpler way of doing things.

Saving the Figure

Julia naturally saves to SVG (or SVGJS).
We have to specify the x and y dimensions of the plot, but since these images rescale so easily, we can choose some reasonable numbers.

p=plot(data,f,yt,xt,xs,t,xl,yl,GT)
draw(SVG("myplot.svg", 15cm, 9cm), p)

If you want to save to PNG, PDF, or PS, the package Cairo.jl provides that ability.

using Cairo
draw(PNG("myplot.png",15cm,9cm),p)


# Happy Plotting!

“`julia

“`

Using ASCIIPlots.jl

No Julia plotting package has been crowned king yet. Winston and Gadfly are the main competitors. PyPlot is a Julia wrapper around Python’s matplotlib; it is a stop-gap for use while the native Julia implementations mature. However, all three of these packages have non-Julia dependencies; this can cause installation frustration. An alternative is ASCIIPlots; it’s the simplest plotting package to install, due to having no dependencies. To be fair, ASCIIPlots is also a tiny package with only basic functionality.

The small size of the package makes it a great target for Julia users looking to make their first contributions to the ecosystem. There are four source files totaling about 250 lines of code; the entire premise is taking in Arrays of numbers and printing out characters. The small size and lack of conceptual complexity make it an approachable package, even for less experienced Julians. I’ll mention new feature ideas throughout this post, in the hopes that some of you will submit pull requests.

Compared to the standard approach of using images, plotting using ASCII characters has some draw backs, namely: low-resolution (256 pixels per 12pt character) and few options (2^8 to 2^24 colors vs 95 printable ASCII characters). Currently, ASCIIPlots only uses ASCII characters and does not support color, even if your terminal does support colors. Adding coloring to any of the plot types would be neat; you could use terminal escape sequences to change the styling.

You can install ASCIIPlots with Pkg.add("ASCIIPlots") at the Julia REPL. This command will clone the repo from github into your ~/.julia/v0.X directory, where all installed Julia packages are stored. When you want to start using ASCIIPlots, you’ll need to run using ASCIIPlots to get access to the package’s functions.

ASCIIPlots exports three functions: scatterplot, lineplot, and imagesc. The first two functions have fairly clear names; the last is a “heatmap” function, with a funny name because Matlab.

Scatter Plots

Of all the ASCIIPlot functions, scatterplot seems to take the least damage from the constraints of ASCII art. The points appear well placed, and it has some logic to handle too many points for it’s resolution.

scatterplot is happy to accept one or two Vectors (1 dimensional Arrays). If one vector is provided, then its values are the y-values and their indices are the x-values. If two vectors are passed in, then the first will contain the x-values and the second will contain the y-values.

Plotting a Vector

As a first example, let’s plot the integers 10 through 20. This is allows us to differentiate the values from the indices.

scatterplot([10:20])

Result:

    -------------------------------------------------------------
    |                                                           ^| 20.00
    |                                                            |
    |                                                     ^      |
    |                                                            |
    |                                               ^            |
    |                                                            |
    |                                         ^                  |
    |                                                            |
    |                                   ^                        |
    |                                                            |
    |                             ^                              |
    |                                                            |
    |                       ^                                    |
    |                                                            |
    |                 ^                                          |
    |                                                            |
    |           ^                                                |
    |                                                            |
    |     ^                                                      |
    |^                                                           | 10.00
    -------------------------------------------------------------
    1.00                                                    11.00

The placement of the points looks pretty good; forming a line is a good sign. We can see the indices are on the horizontal axis, since its range is 1 to 11; the vertical axis has a range of 10 to 20, corresponding to our values.

We can also mix up the values, to see how noisier data looks. I’ll sneak an additional option into this example.

scatterplot(shuffle!([10:20]);sym='*')

sym is an optional named argument; it takes an ASCII character to use for the plotted points. As we saw above, the default is ^.

Result:

    -------------------------------------------------------------
    |*                                                           | 20.00
    |                                                            |
    |                                   *                        |
    |                                                            |
    |     *                                                      |
    |                                                            |
    |                                                     *      |
    |                                                            |
    |                       *                                    |
    |                                                            |
    |                                         *                  |
    |                                                            |
    |                 *                                          |
    |                                                            |
    |                                                           *|
    |                                                            |
    |                             *                              |
    |                                                            |
    |                                               *            |
    |           *                                                | 10.00
    -------------------------------------------------------------
    1.00                                                    11.00

I had been hoping to use unicode snowman to plot those points. Alas, ASCIIPlots is true to its name and only uses ASCII characters. Maybe one of you could fix this and add some unicode support? Plotting with and is pretty important.

Plotting Two Vectors

If we pass in two Vectors, then the first will be the horizontal coordinates and the second will be the vertical coordinates. The Array indices will not be used, other than to match up the two coordinates for each point. We can use two non-overlapping ranges for our Vectors to see which Vector is on which axis.

scatterplot([10:20],[31:41])

Result:

    -------------------------------------------------------------
    |                                                           ^| 41.00
    |                                                            |
    |                                                     ^      |
    |                                                            |
    |                                               ^            |
    |                                                            |
    |                                         ^                  |
    |                                                            |
    |                                   ^                        |
    |                                                            |
    |                             ^                              |
    |                                                            |
    |                       ^                                    |
    |                                                            |
    |                 ^                                          |
    |                                                            |
    |           ^                                                |
    |                                                            |
    |     ^                                                      |
    |^                                                           | 31.00
    -------------------------------------------------------------
    10.00                                                    20.00

It’s not clear to me whether this is the right API. Since Julia has multidimensional Arrays, taking an Array{T,2}, with a column of x-values and a column of y-values would make at least as much sense as two vectors. Alternately, the two vector version could take a vector of tuples. API desig isn’t something I have much experience at, so I’m open to other opinions. In a well-designed API, the signature and name of a function provide a clear idea of how to use it; I’m not sure how to achieve that here.

Plotting Real Data

While plugging in random data is fine for seeing how the interface works, it doesn’t show how well ASCIIPlots might work for real data.
The RDatasets repo on github has a bunch of small, simple, clean datasets; I’ll be using Monthly Airline Passenger Numbers 1949-1960 here.

The first step is to get the data out of the file and into a reasonable format.

file = open("AirPassengers.csv")
raw_data = readcsv(file)
close(file)

raw_data is an Array{Any,2}. It has three columns: index, time, and passengers. The time format is in fractional years: January of 1950 is 1950.0, February is 1950.08, April is 1950.25, and so on.
The first row is header strings.

raw_data
145x3 Array{Any,2}:
 """"         ""time""     ""AirPassengers""
 ""1""    1949.0          112.0                 
 ""2""    1949.08         118.0                 
 ""3""    1949.17         132.0                 
 ""4""    1949.25         129.0                 
 ""5""    1949.33         121.0                 
 ""6""    1949.42         135.0                 
                                                 
 ""138""  1960.42         535.0                 
 ""139""  1960.5          622.0                 
 ""140""  1960.58         606.0                 
 ""141""  1960.67         508.0                 
 ""142""  1960.75         461.0                 
 ""143""  1960.83         390.0                 
 ""144""  1960.92         432.0 

The floating-point representation of a year-month is actually very convenient for us; these will plot in order without any work on our part. We want to get the second column as Float64s, without the first row.

The passenger counts are also written with a decimal point, but are (unsurprisingly) all integers. To get a Vector of these counts, we need the third column as Ints, again without the first row.

months = float(raw_data[2:end,2])
passengers = int(raw_data[2:end,3])

Now that we have two numeric Vectors, I’m ready to plot. The months will be the horizontal values and the passenger counts will be the vertical ones.

scatterplot(months,passengers)

Result:

    -------------------------------------------------------------
    |                                                        ^   | 622.00
    |                                                         ^  |
    |                                                            |
    |                                                   ^^       |
    |                                                        ^   |
    |                                               ^         ^  |
    |                                          ^        ^^  ^^ ^ |
    |                                              ^            ^|
    |                                     ^   ^^    ^  ^^ ^^^    |
    |                                                  ^   ^   ^ |
    |                                ^   ^^  ^^   ^^ ^^   ^      |
    |                                ^       ^  ^^^   ^          |
    |                           ^   ^ ^ ^^ ^^^  ^^   ^           |
    |                      ^    ^  ^^ ^^^  ^                     |
    |                 ^   ^^   ^ ^^^                             |
    |                ^^  ^^ ^ ^^ ^^^  ^                          |
    |            ^  ^  ^^^  ^^^  ^                               |
    |       ^  ^^ ^^^^ ^    ^                                    |
    |^ ^^ ^^^^^^   ^                                             |
    |^^ ^^^^  ^                                                  | 104.00
    -------------------------------------------------------------
    1949.00                                                    1960.92

That plot looks pretty reasonable. Due to the poor display resolution, there are multiple values plotted in some columns, despite there being only one y-axis data value per x-axis month. We can see that the data seems a bit noisy and increases over time. My hypothesis is that the noisy is due in large part to seasonal variations in passenger counts.

To test this, let’s zoom in on a couple of years to see what they look like:

1949: scatterplot(months[1:12],passengers[1:12])

    -------------------------------------------------------------
    |                                ^    ^                      | 148.00
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^               ^                 |
    |          ^                                                 |
    |                                                            |
    |                ^                                           |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |     ^                                          ^          ^|
    |                                                            |
    |                                                            |
    |^                                                           |
    |                                                            |
    |                                                            |
    |                                                     ^      | 104.00
    -------------------------------------------------------------
    1949.00                                                    1949.92

The first year, 1949, has a spike in the spring (around March) and a bigger one in the summer (peaking in July and August).

1950: scatterplot(months[13:24],passengers[13:24])

    -------------------------------------------------------------
    |                                ^    ^                      | 170.00
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                                            |
    |          ^                                                 |
    |                                                           ^|
    |                ^                                           |
    |                                                ^           |
    |                                                            |
    |     ^                                                      |
    |                     ^                                      |
    |                                                            |
    |                                                            |
    |^                                                    ^      | 114.00
    -------------------------------------------------------------
    1950.00                                                    1950.92

The second year has about the same spikes (March and July/August).

1959: scatterplot(months[end-23:end-12],passengers[end-23:end-12])

    -------------------------------------------------------------
    |                                     ^                      | 559.00
    |                                ^                           |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |          ^                                     ^          ^|
    |                ^                                           |
    |                                                            |
    |                                                            |
    |^                                                    ^      |
    |     ^                                                      | 342.00
    -------------------------------------------------------------
    1959.00                                                    1959.92

In the second to last year, the March spike is much smaller, but still there; July and August are still the peak travel months.

1960: scatterplot(months[end-11:end],passengers[end-11:end])

    -------------------------------------------------------------
    |                                ^                           | 622.00
    |                                                            |
    |                                     ^                      |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                                            |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |                ^                               ^           |
    |                                                            |
    |                                                           ^|
    |^         ^                                                 |
    |                                                            |
    |     ^                                               ^      | 390.00
    -------------------------------------------------------------
    1960.00                                                    1960.92

The final year seems to lack the March spike, but still has the overall peak in July/August.

These seasonal variations probably contribute a lot to the spread of the numbers in the 1949-1960 chart. The lowest month for each of these four years has about two-thirds the number of passengers for the highest month. As the number of passengers per year increases, so does the spread, despite still being one-third of the peak.

Line Plots

The interface for lineplot is identical to scatterplot — one or two Vectors, which control the axises in the same way as above. The difference is in the characters used to plot the points. When ASCIIPlots tries to draw a line, it picks /s, s, and -s in order to show the slope of the line at each point.

Plotting a Vector

First, I’ll plot a line. With the name lineplot, you might have some high expectations of the output here.

lineplot([11:20])

Result:

    -------------------------------------------------------------
    |                                                           /| 20.00
    |                                                            |
    |                                                            |
    |                                                    /       |
    |                                                            |
    |                                             /              |
    |                                                            |
    |                                       /                    |
    |                                                            |
    |                                /                           |
    |                                                            |
    |                          /                                 |
    |                                                            |
    |                   /                                        |
    |                                                            |
    |             /                                              |
    |                                                            |
    |      /                                                     |
    |                                                            |
    |/                                                           | 11.00
    -------------------------------------------------------------
    1.00                                                    10.00

It’s not terrible; you can see the linear-ness and easily play connect-the-dots with the slashes.

We can make things a lot harder for lineplot by shuffling the data around, so that it’s not linear.

lineplot(shuffle!([11:20]))

Result:

    -------------------------------------------------------------
    |                                                           | 20.00
    |                                                            |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |                   /                                        |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |      /                                                     |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                             /              | 11.00
    -------------------------------------------------------------
    1.00                                                    10.00

lineplot‘s output is not as good as I would like here; I find it much harder to connect-the-slashes. Part of the problem is the number of points I gave it versus the resolution it’s using. Despite the fact that more columns of characters fit between my data points, lineplot does not fill in more slashes. This is more useful here, where there’s a large vertical gap between points, that it would be for the previous example.

Plotting Two Vectors

We can see in this example that putting more slashes in makes the lines look better.

lineplot([2:20],[32:50])
    -------------------------------------------------------------
    |                                                           /| 50.00
    |                                                            |
    |                                                       /    |
    |                                                    /       |
    |                                                 /          |
    |                                             /              |
    |                                          /                 |
    |                                       /                    |
    |                                    /                       |
    |                                /                           |
    |                             /                              |
    |                          /                                 |
    |                      /                                     |
    |                   /                                        |
    |                /                                           |
    |             /                                              |
    |         /                                                  |
    |      /                                                     |
    |   /                                                        |
    |/                                                           | 32.00
    -------------------------------------------------------------
    2.00                                                    20.00

More data points is less helpful in a shuffled data set, because is also makes the line a lot more wiggly. lineplot does better the less wiggly the line is, and the more points your provide for it.

lineplot(shuffle!([2:20]),[32:50])
    -------------------------------------------------------------
    |                                                           | 70.00
    |                                                          |
    |                                                          |
    |                                                          |
    |                                                          /|
    |                                                          |
    |                              /                            |
    |                                                          |
    |      /                                                    |
    |                             /        /                     |
    |                                                          |
    |                    /                           /           |
    |          /                                              /  |
    |                                 /                         |
    |         /                                                 |
    |    /                                /                      |
    |                 /                                         |
    |                                             /     /        |
    |               /                                       /    |
    | /                      /                                   | 32.00
    -------------------------------------------------------------
    2.00                                                    40.00

Ploting Real Data

So far we’ve just been drawing lines. I’ve pulled another dataset out of RDatasets: this time, it’s Averge Yearly Temperature in New Haven.

First, we need to read in the file.

file = open("nhtemp.csv")
rawdata = readcsv(file)
close(file)

This CSV has three columns: index, year, temperature.

raw_data
61x3 Array{Any,2}:
 """"        ""time""    ""nhtemp""
 ""1""   1912.0          49.9          
 ""2""   1913.0          52.3          
 ""3""   1914.0          49.4          
 ""4""   1915.0          51.1          
 ""5""   1916.0          49.4          
                                                 
 ""56""  1967.0          50.8          
 ""57""  1968.0          51.9          
 ""58""  1969.0          51.8          
 ""59""  1970.0          51.9          
 ""60""  1971.0          53.0          

We want the years from the second column; this time they’re all integers so we want Ints.
To get the temperatures, we want the third column as Float64s.
We can pull out the two interesting columns, minus their header rows, like this:

years = int(rawdata[2:end,2])
temps = float(rawdata[2:end,3])

The plotting part is also pretty similar to the scatterplot example:

lineplot(years,temps)
    -------------------------------------------------------------
    |                                                           | 54.60
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                            |
    |                                        /                  /|
    |                                      /                  |
    |                                                           |
    |                                      -           // |
    |                  /      /     /              /         |
    |                                   /              /      |
    |                               /       /      /   /    |
    |              / /      /      /                  /         |
    |  /                                                       |
    |        /   /                                               |
    |                            /                               |
    |              /                                             |
    |     /                                                      | 47.90
    -------------------------------------------------------------
    1912.00                                                    1971.00

The plot is ok, but not great. It’s a bit hard to play connect the dots with the slashes; the line just moves up & down more than lineplot can handle gracefully. Making this better it probably mostly about fiddling with different approaches to drawing an ASCII line from points; there’s probably something better than the current approach.

Heat Map

I have a lot of trouble remembering this function’s name; it’s called imagesc due to Matlab tradition.
imagesc takes a matrix (Array{T,2}) as input. There are five different levels of shading from to @#.
If you can find more characters that clearly represent other shades, it should be pretty easy to integrate them into imagesc.

Plotting a Matrix

One easy way to produce a two-dimensional Array is with a comprehension over two variables.
Using this approach, we can make gradients that change horizontally, verically, or both.

The first variable in a two-variable comprehension will vary as you go down a column.

imagesc([x for x=1:10,y=1:10])

Result:

    . . . . . . . . . . 
    . . . . . . . . . . 
    + + + + + + + + + + 
    + + + + + + + + + + 
    # # # # # # # # # # 
    # # # # # # # # # # 
    @#@#@#@#@#@#@#@#@#@#
    @#@#@#@#@#@#@#@#@#@#

The second variable will vary as you go across a row.

imagesc([y for x=1:10,y=1:10])

Result:

        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#

We can also intersect the previous two to get a sort of corner gradient.

imagesc([max(x,y) for x=1:10,y=1:10])

Result:

              . . + # @#
              . . + # @#
              . . + # @#
              . . + # @#
              . . + # @#
    . . . . . . . + # @#
    . . . . . . . + # @#
    + + + + + + + + # @#
    # # # # # # # # # @#
    @#@#@#@#@#@#@#@#@#@#

Plotting Real Data

For a final dataset from RDatasets, I’ll use Edgar Anderson’s Iris Data. The data spans three species of iris; for each flower/data-point, they measured the petals and sepals.

file = open("iris.csv")
raw_data = readcsv(file)
close(file)

raw_data has six columns: index, sepal length, sepal width, petal length, petal width, and species. This file has the most columns of any dataset in this post; for making a heatmap, more columns of data means more columns of output.

The first row (the headers) and the first and last columns have string values; everything else is Float64s.

julia> raw_data= readcsv(file)
151x6 Array{Any,2}:
 """"      ""Sepal.Length""   ""Sepal.Width""   ""Petal.Length""   ""Petal.Width""  ""Species""  
 ""1""    5.1                  3.5                 1.4                  0.2                 ""setosa""   
 ""2""    4.9                  3.0                 1.4                  0.2                 ""setosa""   
 ""3""    4.7                  3.2                 1.3                  0.2                 ""setosa""   
 ""4""    4.6                  3.1                 1.5                  0.2                 ""setosa""   
 ""5""    5.0                  3.6                 1.4                  0.2                 ""setosa""   
 ""6""    5.4                  3.9                 1.7                  0.4                 ""setosa""   
 ""7""    4.6                  3.4                 1.4                  0.3                 ""setosa""   
 ""8""    5.0                  3.4                 1.5                  0.2                 ""setosa""   
 ""9""    4.4                  2.9                 1.4                  0.2                 ""setosa""   
 ""10""   4.9                  3.1                 1.5                  0.1                 ""setosa""   
                                                                                                           
 ""140""  6.9                  3.1                 5.4                  2.1                 ""virginica""
 ""141""  6.7                  3.1                 5.6                  2.4                 ""virginica""
 ""142""  6.9                  3.1                 5.1                  2.3                 ""virginica""
 ""143""  5.8                  2.7                 5.1                  1.9                 ""virginica""
 ""144""  6.8                  3.2                 5.9                  2.3                 ""virginica""
 ""145""  6.7                  3.3                 5.7                  2.5                 ""virginica""
 ""146""  6.7                  3.0                 5.2                  2.3                 ""virginica""
 ""147""  6.3                  2.5                 5.0                  1.9                 ""virginica""
 ""148""  6.5                  3.0                 5.2                  2.0                 ""virginica""
 ""149""  6.2                  3.4                 5.4                  2.3                 ""virginica""
 ""150""  5.9                  3.0                 5.1                  1.8                 ""virginica""

imagesc needs numeric data, so an Array{Float64,2} would be a good fit here. To generate the biggest plot, we want the largest rectangle of floating point values we can get. The middle four columns line up with that goal.

data = raw_data[2:end,2:5]

The rows are sorted by iris species, so we can get a sort of general impression from the plot:

imagesc(data)
    # +     
    # + .   
    # +     
    @##     
    # + .   
    # . .   
    # + .   
    # +     
    # +     
    # .     
    @#+ #   
    @#. #   
    # . +   
    @#+ #   
    @#+ # . 
    @#. #   
    # . +   
    @#+ # . 
    # . #   
    @#. +   
    @#+ @#. 
    @#. @#. 
    @#+ # . 
    @#+ # . 
    @#+ @#. 
    @#+ @#. 
    @#. @#. 
    @#. @#. 
    @#+ # . 
    @#. # . 

The sepal length seems to be higher for the last species; the sepal width has a less clear trigetory.
Petals also seem to be larger in the later examples; both width and length increase.

Conclusion

ASCIIPlots is easy to install and works well at the REPL. I don’t like installation problems and mostly use the REPL (rather than IJulia), so ASCIIPlots is my most-used Julia plotting package. However, there’s room for improvement; here are some features that you could add:

  • Add Unicode character support for scatterplot
  • Use Unicode characters to enhance lineplot and imagesc
  • Integrate imagesc with ImageTerm.jl
  • Change scatterplot to handle multiple datasets, each using a different symbol
  • Make lineplot lines easier to follow
  • Use escape sequences to colorize output, allowing for multiple lines or more imagesc options
  • Add optional axis labels and plot titles in scatterplot and lineplot
  • Add control over axis ranges (rather than only fitting to the data)
  • Add 3D plotting, taking inspiration from 3D ASCII games
  • Add styled html output, for using in IJulia notebooks
  • Add a barplot function, that takes a Vector or a Dict
  • Add more shades to imagesc

Exploring a new codebase can be intimidating, but it’s the first step to making a pull request. I’m planning to write another blog post about how it’s implemented, but until I find time to take you on a tour, please feel free to read the code, and consider making a pull request.