Author Archives: Leah Hanson

Using ASCIIPlots.jl

No Julia plotting package has been crowned king yet. Winston and Gadfly are the main competitors. PyPlot is a Julia wrapper around Python’s matplotlib; it is a stop-gap for use while the native Julia implementations mature. However, all three of these packages have non-Julia dependencies; this can cause installation frustration. An alternative is ASCIIPlots; it’s the simplest plotting package to install, due to having no dependencies. To be fair, ASCIIPlots is also a tiny package with only basic functionality.

The small size of the package makes it a great target for Julia users looking to make their first contributions to the ecosystem. There are four source files totaling about 250 lines of code; the entire premise is taking in Arrays of numbers and printing out characters. The small size and lack of conceptual complexity make it an approachable package, even for less experienced Julians. I’ll mention new feature ideas throughout this post, in the hopes that some of you will submit pull requests.

Compared to the standard approach of using images, plotting using ASCII characters has some draw backs, namely: low-resolution (256 pixels per 12pt character) and few options (2^8 to 2^24 colors vs 95 printable ASCII characters). Currently, ASCIIPlots only uses ASCII characters and does not support color, even if your terminal does support colors. Adding coloring to any of the plot types would be neat; you could use terminal escape sequences to change the styling.

You can install ASCIIPlots with Pkg.add("ASCIIPlots") at the Julia REPL. This command will clone the repo from github into your ~/.julia/v0.X directory, where all installed Julia packages are stored. When you want to start using ASCIIPlots, you’ll need to run using ASCIIPlots to get access to the package’s functions.

ASCIIPlots exports three functions: scatterplot, lineplot, and imagesc. The first two functions have fairly clear names; the last is a “heatmap” function, with a funny name because Matlab.

Scatter Plots

Of all the ASCIIPlot functions, scatterplot seems to take the least damage from the constraints of ASCII art. The points appear well placed, and it has some logic to handle too many points for it’s resolution.

scatterplot is happy to accept one or two Vectors (1 dimensional Arrays). If one vector is provided, then its values are the y-values and their indices are the x-values. If two vectors are passed in, then the first will contain the x-values and the second will contain the y-values.

Plotting a Vector

As a first example, let’s plot the integers 10 through 20. This is allows us to differentiate the values from the indices.

scatterplot([10:20])

Result:

    -------------------------------------------------------------
    |                                                           ^| 20.00
    |                                                            |
    |                                                     ^      |
    |                                                            |
    |                                               ^            |
    |                                                            |
    |                                         ^                  |
    |                                                            |
    |                                   ^                        |
    |                                                            |
    |                             ^                              |
    |                                                            |
    |                       ^                                    |
    |                                                            |
    |                 ^                                          |
    |                                                            |
    |           ^                                                |
    |                                                            |
    |     ^                                                      |
    |^                                                           | 10.00
    -------------------------------------------------------------
    1.00                                                    11.00

The placement of the points looks pretty good; forming a line is a good sign. We can see the indices are on the horizontal axis, since its range is 1 to 11; the vertical axis has a range of 10 to 20, corresponding to our values.

We can also mix up the values, to see how noisier data looks. I’ll sneak an additional option into this example.

scatterplot(shuffle!([10:20]);sym='*')

sym is an optional named argument; it takes an ASCII character to use for the plotted points. As we saw above, the default is ^.

Result:

    -------------------------------------------------------------
    |*                                                           | 20.00
    |                                                            |
    |                                   *                        |
    |                                                            |
    |     *                                                      |
    |                                                            |
    |                                                     *      |
    |                                                            |
    |                       *                                    |
    |                                                            |
    |                                         *                  |
    |                                                            |
    |                 *                                          |
    |                                                            |
    |                                                           *|
    |                                                            |
    |                             *                              |
    |                                                            |
    |                                               *            |
    |           *                                                | 10.00
    -------------------------------------------------------------
    1.00                                                    11.00

I had been hoping to use unicode snowman to plot those points. Alas, ASCIIPlots is true to its name and only uses ASCII characters. Maybe one of you could fix this and add some unicode support? Plotting with and is pretty important.

Plotting Two Vectors

If we pass in two Vectors, then the first will be the horizontal coordinates and the second will be the vertical coordinates. The Array indices will not be used, other than to match up the two coordinates for each point. We can use two non-overlapping ranges for our Vectors to see which Vector is on which axis.

scatterplot([10:20],[31:41])

Result:

    -------------------------------------------------------------
    |                                                           ^| 41.00
    |                                                            |
    |                                                     ^      |
    |                                                            |
    |                                               ^            |
    |                                                            |
    |                                         ^                  |
    |                                                            |
    |                                   ^                        |
    |                                                            |
    |                             ^                              |
    |                                                            |
    |                       ^                                    |
    |                                                            |
    |                 ^                                          |
    |                                                            |
    |           ^                                                |
    |                                                            |
    |     ^                                                      |
    |^                                                           | 31.00
    -------------------------------------------------------------
    10.00                                                    20.00

It’s not clear to me whether this is the right API. Since Julia has multidimensional Arrays, taking an Array{T,2}, with a column of x-values and a column of y-values would make at least as much sense as two vectors. Alternately, the two vector version could take a vector of tuples. API desig isn’t something I have much experience at, so I’m open to other opinions. In a well-designed API, the signature and name of a function provide a clear idea of how to use it; I’m not sure how to achieve that here.

Plotting Real Data

While plugging in random data is fine for seeing how the interface works, it doesn’t show how well ASCIIPlots might work for real data.
The RDatasets repo on github has a bunch of small, simple, clean datasets; I’ll be using Monthly Airline Passenger Numbers 1949-1960 here.

The first step is to get the data out of the file and into a reasonable format.

file = open("AirPassengers.csv")
raw_data = readcsv(file)
close(file)

raw_data is an Array{Any,2}. It has three columns: index, time, and passengers. The time format is in fractional years: January of 1950 is 1950.0, February is 1950.08, April is 1950.25, and so on.
The first row is header strings.

raw_data
145x3 Array{Any,2}:
 """"         ""time""     ""AirPassengers""
 ""1""    1949.0          112.0                 
 ""2""    1949.08         118.0                 
 ""3""    1949.17         132.0                 
 ""4""    1949.25         129.0                 
 ""5""    1949.33         121.0                 
 ""6""    1949.42         135.0                 
                                                 
 ""138""  1960.42         535.0                 
 ""139""  1960.5          622.0                 
 ""140""  1960.58         606.0                 
 ""141""  1960.67         508.0                 
 ""142""  1960.75         461.0                 
 ""143""  1960.83         390.0                 
 ""144""  1960.92         432.0 

The floating-point representation of a year-month is actually very convenient for us; these will plot in order without any work on our part. We want to get the second column as Float64s, without the first row.

The passenger counts are also written with a decimal point, but are (unsurprisingly) all integers. To get a Vector of these counts, we need the third column as Ints, again without the first row.

months = float(raw_data[2:end,2])
passengers = int(raw_data[2:end,3])

Now that we have two numeric Vectors, I’m ready to plot. The months will be the horizontal values and the passenger counts will be the vertical ones.

scatterplot(months,passengers)

Result:

    -------------------------------------------------------------
    |                                                        ^   | 622.00
    |                                                         ^  |
    |                                                            |
    |                                                   ^^       |
    |                                                        ^   |
    |                                               ^         ^  |
    |                                          ^        ^^  ^^ ^ |
    |                                              ^            ^|
    |                                     ^   ^^    ^  ^^ ^^^    |
    |                                                  ^   ^   ^ |
    |                                ^   ^^  ^^   ^^ ^^   ^      |
    |                                ^       ^  ^^^   ^          |
    |                           ^   ^ ^ ^^ ^^^  ^^   ^           |
    |                      ^    ^  ^^ ^^^  ^                     |
    |                 ^   ^^   ^ ^^^                             |
    |                ^^  ^^ ^ ^^ ^^^  ^                          |
    |            ^  ^  ^^^  ^^^  ^                               |
    |       ^  ^^ ^^^^ ^    ^                                    |
    |^ ^^ ^^^^^^   ^                                             |
    |^^ ^^^^  ^                                                  | 104.00
    -------------------------------------------------------------
    1949.00                                                    1960.92

That plot looks pretty reasonable. Due to the poor display resolution, there are multiple values plotted in some columns, despite there being only one y-axis data value per x-axis month. We can see that the data seems a bit noisy and increases over time. My hypothesis is that the noisy is due in large part to seasonal variations in passenger counts.

To test this, let’s zoom in on a couple of years to see what they look like:

1949: scatterplot(months[1:12],passengers[1:12])

    -------------------------------------------------------------
    |                                ^    ^                      | 148.00
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^               ^                 |
    |          ^                                                 |
    |                                                            |
    |                ^                                           |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |     ^                                          ^          ^|
    |                                                            |
    |                                                            |
    |^                                                           |
    |                                                            |
    |                                                            |
    |                                                     ^      | 104.00
    -------------------------------------------------------------
    1949.00                                                    1949.92

The first year, 1949, has a spike in the spring (around March) and a bigger one in the summer (peaking in July and August).

1950: scatterplot(months[13:24],passengers[13:24])

    -------------------------------------------------------------
    |                                ^    ^                      | 170.00
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                                            |
    |          ^                                                 |
    |                                                           ^|
    |                ^                                           |
    |                                                ^           |
    |                                                            |
    |     ^                                                      |
    |                     ^                                      |
    |                                                            |
    |                                                            |
    |^                                                    ^      | 114.00
    -------------------------------------------------------------
    1950.00                                                    1950.92

The second year has about the same spikes (March and July/August).

1959: scatterplot(months[end-23:end-12],passengers[end-23:end-12])

    -------------------------------------------------------------
    |                                     ^                      | 559.00
    |                                ^                           |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |          ^                                     ^          ^|
    |                ^                                           |
    |                                                            |
    |                                                            |
    |^                                                    ^      |
    |     ^                                                      | 342.00
    -------------------------------------------------------------
    1959.00                                                    1959.92

In the second to last year, the March spike is much smaller, but still there; July and August are still the peak travel months.

1960: scatterplot(months[end-11:end],passengers[end-11:end])

    -------------------------------------------------------------
    |                                ^                           | 622.00
    |                                                            |
    |                                     ^                      |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                                                            |
    |                          ^                                 |
    |                                                            |
    |                                          ^                 |
    |                                                            |
    |                                                            |
    |                     ^                                      |
    |                ^                               ^           |
    |                                                            |
    |                                                           ^|
    |^         ^                                                 |
    |                                                            |
    |     ^                                               ^      | 390.00
    -------------------------------------------------------------
    1960.00                                                    1960.92

The final year seems to lack the March spike, but still has the overall peak in July/August.

These seasonal variations probably contribute a lot to the spread of the numbers in the 1949-1960 chart. The lowest month for each of these four years has about two-thirds the number of passengers for the highest month. As the number of passengers per year increases, so does the spread, despite still being one-third of the peak.

Line Plots

The interface for lineplot is identical to scatterplot — one or two Vectors, which control the axises in the same way as above. The difference is in the characters used to plot the points. When ASCIIPlots tries to draw a line, it picks /s, s, and -s in order to show the slope of the line at each point.

Plotting a Vector

First, I’ll plot a line. With the name lineplot, you might have some high expectations of the output here.

lineplot([11:20])

Result:

    -------------------------------------------------------------
    |                                                           /| 20.00
    |                                                            |
    |                                                            |
    |                                                    /       |
    |                                                            |
    |                                             /              |
    |                                                            |
    |                                       /                    |
    |                                                            |
    |                                /                           |
    |                                                            |
    |                          /                                 |
    |                                                            |
    |                   /                                        |
    |                                                            |
    |             /                                              |
    |                                                            |
    |      /                                                     |
    |                                                            |
    |/                                                           | 11.00
    -------------------------------------------------------------
    1.00                                                    10.00

It’s not terrible; you can see the linear-ness and easily play connect-the-dots with the slashes.

We can make things a lot harder for lineplot by shuffling the data around, so that it’s not linear.

lineplot(shuffle!([11:20]))

Result:

    -------------------------------------------------------------
    |                                                           | 20.00
    |                                                            |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |                   /                                        |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                           |
    |                                                            |
    |      /                                                     |
    |                                                            |
    |                                                           |
    |                                                            |
    |                                             /              | 11.00
    -------------------------------------------------------------
    1.00                                                    10.00

lineplot‘s output is not as good as I would like here; I find it much harder to connect-the-slashes. Part of the problem is the number of points I gave it versus the resolution it’s using. Despite the fact that more columns of characters fit between my data points, lineplot does not fill in more slashes. This is more useful here, where there’s a large vertical gap between points, that it would be for the previous example.

Plotting Two Vectors

We can see in this example that putting more slashes in makes the lines look better.

lineplot([2:20],[32:50])
    -------------------------------------------------------------
    |                                                           /| 50.00
    |                                                            |
    |                                                       /    |
    |                                                    /       |
    |                                                 /          |
    |                                             /              |
    |                                          /                 |
    |                                       /                    |
    |                                    /                       |
    |                                /                           |
    |                             /                              |
    |                          /                                 |
    |                      /                                     |
    |                   /                                        |
    |                /                                           |
    |             /                                              |
    |         /                                                  |
    |      /                                                     |
    |   /                                                        |
    |/                                                           | 32.00
    -------------------------------------------------------------
    2.00                                                    20.00

More data points is less helpful in a shuffled data set, because is also makes the line a lot more wiggly. lineplot does better the less wiggly the line is, and the more points your provide for it.

lineplot(shuffle!([2:20]),[32:50])
    -------------------------------------------------------------
    |                                                           | 70.00
    |                                                          |
    |                                                          |
    |                                                          |
    |                                                          /|
    |                                                          |
    |                              /                            |
    |                                                          |
    |      /                                                    |
    |                             /        /                     |
    |                                                          |
    |                    /                           /           |
    |          /                                              /  |
    |                                 /                         |
    |         /                                                 |
    |    /                                /                      |
    |                 /                                         |
    |                                             /     /        |
    |               /                                       /    |
    | /                      /                                   | 32.00
    -------------------------------------------------------------
    2.00                                                    40.00

Ploting Real Data

So far we’ve just been drawing lines. I’ve pulled another dataset out of RDatasets: this time, it’s Averge Yearly Temperature in New Haven.

First, we need to read in the file.

file = open("nhtemp.csv")
rawdata = readcsv(file)
close(file)

This CSV has three columns: index, year, temperature.

raw_data
61x3 Array{Any,2}:
 """"        ""time""    ""nhtemp""
 ""1""   1912.0          49.9          
 ""2""   1913.0          52.3          
 ""3""   1914.0          49.4          
 ""4""   1915.0          51.1          
 ""5""   1916.0          49.4          
                                                 
 ""56""  1967.0          50.8          
 ""57""  1968.0          51.9          
 ""58""  1969.0          51.8          
 ""59""  1970.0          51.9          
 ""60""  1971.0          53.0          

We want the years from the second column; this time they’re all integers so we want Ints.
To get the temperatures, we want the third column as Float64s.
We can pull out the two interesting columns, minus their header rows, like this:

years = int(rawdata[2:end,2])
temps = float(rawdata[2:end,3])

The plotting part is also pretty similar to the scatterplot example:

lineplot(years,temps)
    -------------------------------------------------------------
    |                                                           | 54.60
    |                                                            |
    |                                                           |
    |                                                            |
    |                                                            |
    |                                        /                  /|
    |                                      /                  |
    |                                                           |
    |                                      -           // |
    |                  /      /     /              /         |
    |                                   /              /      |
    |                               /       /      /   /    |
    |              / /      /      /                  /         |
    |  /                                                       |
    |        /   /                                               |
    |                            /                               |
    |              /                                             |
    |     /                                                      | 47.90
    -------------------------------------------------------------
    1912.00                                                    1971.00

The plot is ok, but not great. It’s a bit hard to play connect the dots with the slashes; the line just moves up & down more than lineplot can handle gracefully. Making this better it probably mostly about fiddling with different approaches to drawing an ASCII line from points; there’s probably something better than the current approach.

Heat Map

I have a lot of trouble remembering this function’s name; it’s called imagesc due to Matlab tradition.
imagesc takes a matrix (Array{T,2}) as input. There are five different levels of shading from to @#.
If you can find more characters that clearly represent other shades, it should be pretty easy to integrate them into imagesc.

Plotting a Matrix

One easy way to produce a two-dimensional Array is with a comprehension over two variables.
Using this approach, we can make gradients that change horizontally, verically, or both.

The first variable in a two-variable comprehension will vary as you go down a column.

imagesc([x for x=1:10,y=1:10])

Result:

    . . . . . . . . . . 
    . . . . . . . . . . 
    + + + + + + + + + + 
    + + + + + + + + + + 
    # # # # # # # # # # 
    # # # # # # # # # # 
    @#@#@#@#@#@#@#@#@#@#
    @#@#@#@#@#@#@#@#@#@#

The second variable will vary as you go across a row.

imagesc([y for x=1:10,y=1:10])

Result:

        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#
        . . + + # # @#@#

We can also intersect the previous two to get a sort of corner gradient.

imagesc([max(x,y) for x=1:10,y=1:10])

Result:

              . . + # @#
              . . + # @#
              . . + # @#
              . . + # @#
              . . + # @#
    . . . . . . . + # @#
    . . . . . . . + # @#
    + + + + + + + + # @#
    # # # # # # # # # @#
    @#@#@#@#@#@#@#@#@#@#

Plotting Real Data

For a final dataset from RDatasets, I’ll use Edgar Anderson’s Iris Data. The data spans three species of iris; for each flower/data-point, they measured the petals and sepals.

file = open("iris.csv")
raw_data = readcsv(file)
close(file)

raw_data has six columns: index, sepal length, sepal width, petal length, petal width, and species. This file has the most columns of any dataset in this post; for making a heatmap, more columns of data means more columns of output.

The first row (the headers) and the first and last columns have string values; everything else is Float64s.

julia> raw_data= readcsv(file)
151x6 Array{Any,2}:
 """"      ""Sepal.Length""   ""Sepal.Width""   ""Petal.Length""   ""Petal.Width""  ""Species""  
 ""1""    5.1                  3.5                 1.4                  0.2                 ""setosa""   
 ""2""    4.9                  3.0                 1.4                  0.2                 ""setosa""   
 ""3""    4.7                  3.2                 1.3                  0.2                 ""setosa""   
 ""4""    4.6                  3.1                 1.5                  0.2                 ""setosa""   
 ""5""    5.0                  3.6                 1.4                  0.2                 ""setosa""   
 ""6""    5.4                  3.9                 1.7                  0.4                 ""setosa""   
 ""7""    4.6                  3.4                 1.4                  0.3                 ""setosa""   
 ""8""    5.0                  3.4                 1.5                  0.2                 ""setosa""   
 ""9""    4.4                  2.9                 1.4                  0.2                 ""setosa""   
 ""10""   4.9                  3.1                 1.5                  0.1                 ""setosa""   
                                                                                                           
 ""140""  6.9                  3.1                 5.4                  2.1                 ""virginica""
 ""141""  6.7                  3.1                 5.6                  2.4                 ""virginica""
 ""142""  6.9                  3.1                 5.1                  2.3                 ""virginica""
 ""143""  5.8                  2.7                 5.1                  1.9                 ""virginica""
 ""144""  6.8                  3.2                 5.9                  2.3                 ""virginica""
 ""145""  6.7                  3.3                 5.7                  2.5                 ""virginica""
 ""146""  6.7                  3.0                 5.2                  2.3                 ""virginica""
 ""147""  6.3                  2.5                 5.0                  1.9                 ""virginica""
 ""148""  6.5                  3.0                 5.2                  2.0                 ""virginica""
 ""149""  6.2                  3.4                 5.4                  2.3                 ""virginica""
 ""150""  5.9                  3.0                 5.1                  1.8                 ""virginica""

imagesc needs numeric data, so an Array{Float64,2} would be a good fit here. To generate the biggest plot, we want the largest rectangle of floating point values we can get. The middle four columns line up with that goal.

data = raw_data[2:end,2:5]

The rows are sorted by iris species, so we can get a sort of general impression from the plot:

imagesc(data)
    # +     
    # + .   
    # +     
    @##     
    # + .   
    # . .   
    # + .   
    # +     
    # +     
    # .     
    @#+ #   
    @#. #   
    # . +   
    @#+ #   
    @#+ # . 
    @#. #   
    # . +   
    @#+ # . 
    # . #   
    @#. +   
    @#+ @#. 
    @#. @#. 
    @#+ # . 
    @#+ # . 
    @#+ @#. 
    @#+ @#. 
    @#. @#. 
    @#. @#. 
    @#+ # . 
    @#. # . 

The sepal length seems to be higher for the last species; the sepal width has a less clear trigetory.
Petals also seem to be larger in the later examples; both width and length increase.

Conclusion

ASCIIPlots is easy to install and works well at the REPL. I don’t like installation problems and mostly use the REPL (rather than IJulia), so ASCIIPlots is my most-used Julia plotting package. However, there’s room for improvement; here are some features that you could add:

  • Add Unicode character support for scatterplot
  • Use Unicode characters to enhance lineplot and imagesc
  • Integrate imagesc with ImageTerm.jl
  • Change scatterplot to handle multiple datasets, each using a different symbol
  • Make lineplot lines easier to follow
  • Use escape sequences to colorize output, allowing for multiple lines or more imagesc options
  • Add optional axis labels and plot titles in scatterplot and lineplot
  • Add control over axis ranges (rather than only fitting to the data)
  • Add 3D plotting, taking inspiration from 3D ASCII games
  • Add styled html output, for using in IJulia notebooks
  • Add a barplot function, that takes a Vector or a Dict
  • Add more shades to imagesc

Exploring a new codebase can be intimidating, but it’s the first step to making a pull request. I’m planning to write another blog post about how it’s implemented, but until I find time to take you on a tour, please feel free to read the code, and consider making a pull request.

Julia Helps

Programmers are always learning. You learn how to use new APIs, new functions, new types. You learn why your code doesn’t work.

Julia has a lot of built-in tools to help you navigate, learn, and debug. You can use these in the REPL and in normal code.

In this post, I split these functions and macros up based on what they help you do: understand functions, examine types, navigate the type hierarchy, or debug code.

Exploring New Functions

If you’re in the REPL, you’re probably playing with something new, trying to make it work. You use new-to-you functions, which means the question “how do I use this function?” comes up frequently.

methods

The most basic answer to this question is to list the method signatures for the function in question. You can often guess what a method does just from the arguments’ names and types. If your problem is passing the arguments in the wrong order, this will solve it.

In [2]:
methods(open)
Out[2]:
# 4 methods for generic function "open":
open(fname::String,rd::Bool,wr::Bool,cr::Bool,tr::Bool,ff::Bool) at io.jl:316
open(fname::String) at io.jl:327
open(fname::String,mode::String) at io.jl:330
open(f::Function,args...) at io.jl:340

help

While you can get a lot from a verb and the list of nouns/types it works on, a hand-written description of what the function does it even better. We can get those at the REPL, too. The output of help is the exactly the same as the online function documentation; they’re generated from the same source files.

Currently, help will only work for functions in the base libraries. For packages, you’ll have to read their documentation online. However, there is good coverage for functions in base.

In [1]:
help(open)
Loading help data...
Base.open(file_name[, read, write, create, truncate, append]) -> IOStream

   Open a file in a mode specified by five boolean arguments. The
   default is to open files for reading only. Returns a stream for
   accessing the file.

Base.open(file_name[, mode]) -> IOStream

   Alternate syntax for open, where a string-based mode specifier is
   used instead of the five booleans. The values of "mode"
   correspond to those from "fopen(3)" or Perl "open", and are
   equivalent to setting the following boolean groups:

   +------+-----------------------------------+
   | r    | read                              |
   +------+-----------------------------------+
   | r+   | read, write                       |
   +------+-----------------------------------+
   | w    | write, create, truncate           |
   +------+-----------------------------------+
   | w+   | read, write, create, truncate     |
   +------+-----------------------------------+
   | a    | write, create, append             |
   +------+-----------------------------------+
   | a+   | read, write, create, append       |
   +------+-----------------------------------+

Base.open(f::function, args...)

   Apply the function "f" to the result of "open(args...)" and
   close the resulting file descriptor upon completion.

   **Example**: "open(readall, "file.txt")"

Because help is so useful, there’s even a short hand for it. You can use ? to call the help function:

In [2]:
?help

 Welcome to Julia. The full manual is available at

    http://docs.julialang.org

 To get help, try help(function), help("@macro"), or help("variable").
 To search all help text, try apropos("string"). To see available functions,
 try help(category), for one of the following categories:

  "Getting Around"
  "All Objects"
  "Types"
  "Generic Functions"
  "Syntax"
  "Iteration"
  "General Collections"
  "Iterable Collections"
  "Indexable Collections"
  "Associative Collections"
  "Set-Like Collections"
  "Dequeues"
  "Strings"
  "I/O"
  "Network I/O"
  "Text I/O"
  "Multimedia I/O"
  "Memory-mapped I/O"
  "Mathematical Operators"
  "Mathematical Functions"
  "Data Formats"
  "Numbers"
  "BigFloats"
  "Random Numbers"
  "Arrays"
  "Combinatorics"
  "Statistics"
  "Signal Processing"
  "Numerical Integration"
  "Parallel Computing"
  "Distributed Arrays"
  "System"
  "C Interface"
  "Errors"
  "Tasks"
  "Events"
  "Reflection"
  "Internals"
  "Collections and Data Structures"
  "Constants"
  "Filesystem"
  "Graphics"
  "Linear Algebra"
  "BLAS Functions"
  "Package Manager Functions"
  "Profiling"
  "Sorting and Related Functions"
  "Sparse Matrices"
  "Unit and Functional Testing"

Exploring New Types

Julia is a dynamically typed language where you can talk about types; types are first class values. Just as for functions, there are tools for helping you understand what you can do with unfamiliar types.

typeof

If you have a value, but aren’t sure of its type, you can use typeof.

In [11]:
typeof(5.0)
Out[11]:
Float64

The type that represents types is DataType. typeof is not just printing out a name; it is returning the type as a value.

In [12]:
typeof(Float64)
Out[12]:
DataType

methods

Types in Julia define special constructor functions of the same name as the type, like in OO languages. For other functions, you use the methods function to find out what combinations of arguments it can take; this also works for type constructors.

In [15]:
methods(Dict)
Out[15]:
# 6 methods for generic function "Dict":
Dict() at dict.jl:300
Dict{K,V}(ks::AbstractArray{K,N},vs::AbstractArray{V,N}) at dict.jl:302
Dict{K,V}(ks::(K...,),vs::(V...,)) at dict.jl:306
Dict{K}(ks::(K...,),vs::(Any...,)) at dict.jl:307
Dict{V}(ks::(Any...,),vs::(V...,)) at dict.jl:308
Dict(ks,vs) at dict.jl:303

names

Sometimes, you’ll get a new type and want to know not just what methods are already defined, but the structure of the type itself. Types in Julia are like records or structs in other languages: they have named properties. names will list the name of each property. These are Symbols instead of Strings because identifiers (variable names, etc) are distinct.

In [4]:
names(IOStream)
Out[4]:
3-element Array{Any,1}:
 :handle
 :ios   
 :name  

types

You can also get the types of the fields. They are stored in the types field of a DataType, as a tuple of DataTypes in the same order as the names returned by names.

In [3]:
IOStream.types
Out[3]:
(Ptr{None},Array{Uint8,1},String)

methodswith

Once you have a value and know its type, you want to know what can be done with it.

When you want to shell out to other programs from Julia, you create Cmds. Creating one is easy — just put in backticks what you’d type at the command line: `echo hi`. However, creating a Cmd doesn’t actually run it: you just get a value.

What can you do with your Cmd? Just ask methodswith, which prints out method signatures for all methods that take that type.

In [2]:
methodswith(Cmd)
spawn(pc::Union(ProcessChain,Bool),cmd::Cmd,stdios::(Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream),Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream),Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream)),exitcb::Union(Function,Bool),closecb::Union(Function,Bool)) at process.jl:324
ignorestatus(cmd::Cmd) at process.jl:131
show(io::IO,cmd::Cmd) at process.jl:32
detach(cmd::Cmd) at process.jl:133
readdir(cmd::Cmd) at deprecated.jl:19
setenv{S<:Union(ASCIIString,UTF8String)}(cmd::Cmd,env::Array{S<:Union(ASCIIString,UTF8String),N}) at process.jl:135
setenv(cmd::Cmd,env::Associative{K,V}) at process.jl:136

In the case of Cmd, that’s not so helpful. I still don’t see a way to run my Cmd. 🙁

However, that’s not all you can do with a Cmd or the methodswith function. Passing true as the second argument will show all of the methods that take Cmd or any of it’s super types. (Be prepared for a very long list for most types.)

In [1]:
methodswith(Cmd,true)
spawn(pc::Union(ProcessChain,Bool),cmd::Cmd,stdios::(Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream),Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream),Union(File,IOStream,FileRedirect,Ptr{None},AsyncStream)),exitcb::Union(Function,Bool),closecb::Union(Function,Bool)) at process.jl:324
spawn(pc::Union(ProcessChain,Bool),cmds::AbstractCmd,args...) at process.jl:370
spawn(cmds::AbstractCmd,args...) at process.jl:371
ignorestatus(cmd::Cmd) at process.jl:131
readall(cmd::AbstractCmd) at process.jl:437
readall(cmd::AbstractCmd,stdin::AsyncStream) at process.jl:437
|(a::AbstractCmd,b::AbstractCmd) at deprecated.jl:19
show(io::IO,cmd::Cmd) at process.jl:32
success(cmd::AbstractCmd) at process.jl:472
readsfrom(cmds::AbstractCmd) at process.jl:409
readsfrom(cmds::AbstractCmd,stdin::AsyncStream) at process.jl:411
detach(cmd::Cmd) at process.jl:133
<(a::AbstractCmd,b::String) at deprecated.jl:19
writesto(cmds::AbstractCmd,stdout::AsyncStream) at process.jl:417
writesto(cmds::AbstractCmd) at process.jl:420
&(left::AbstractCmd,right::AbstractCmd) at process.jl:138
run(cmds::AbstractCmd,args...) at process.jl:452
>>(src::AbstractCmd,dest::String) at process.jl:151
>(a::Union(File,IOStream,FileRedirect,AsyncStream),b::AbstractCmd) at deprecated.jl:19
>(a::String,b::AbstractCmd) at deprecated.jl:19
>(a::AbstractCmd,b::Union(File,IOStream,FileRedirect,AsyncStream)) at deprecated.jl:19
>(a::AbstractCmd,b::String) at deprecated.jl:19
eachline(cmd::AbstractCmd,stdin) at process.jl:400
eachline(cmd::AbstractCmd) at process.jl:406
readdir(cmd::Cmd) at deprecated.jl:19
readbytes(cmd::AbstractCmd) at process.jl:428
readbytes(cmd::AbstractCmd,stdin::AsyncStream) at process.jl:428
readandwrite(cmds::AbstractCmd) at process.jl:423
setenv{S<:Union(ASCIIString,UTF8String)}(cmd::Cmd,env::Array{S<:Union(ASCIIString,UTF8String),N}) at process.jl:135
setenv(cmd::Cmd,env::Associative{K,V}) at process.jl:136
.>(src::AbstractCmd,dest::AbstractCmd) at process.jl:140
.>(src::AbstractCmd,dest::Union(File,IOStream,FileRedirect,AsyncStream)) at process.jl:145
.>(src::AbstractCmd,dest::String) at process.jl:150

As you can see, most of the relevant methods are defined for AbstractCmd rather than Cmd. You can also see both the relevant execution functions (run,readall,readsfrom,writesto,readandwrite,etc) and the redirection ones (|,&,>,>>,etc). (Julia parses the Cmd and execs the process itself, so there’s no shell involved; instead, you use Julia code for redirection and globs. For more on Cmd see these blog posts or the manual.)

Exploring the Type Hierarchy

In Julia, types are not just individual, unconnected values. They are organized into a hierarchy, as in most languages.

super

Each type has one supertype; you can find out what it is by using the super function.

The type heirarchy is a connected graph: you can follow a path of supertypes up from any node to Any (whose supertype is Any). Let’s do that in code, starting from Float64.

In [5]:
super(Float64)
Out[5]:
FloatingPoint
In [7]:
super(FloatingPoint)
Out[7]:
Real
In [8]:
super(Real)
Out[8]:
Number
In [9]:
super(Number)
Out[9]:
Any
In [10]:
super(Any)
Out[10]:
Any

subtypes

We can also go in the other direction. Let’s see what the subtypes of Any are.

In [17]:
subtypes(Any)
Out[17]:
182-element Array{Any,1}:
 AbstractArray{T,N}                                                                                   
 AbstractCmd                                                                                          
 AbstractRNG                                                                                          
 Algorithm                                                                                            
 Any                                                                                                  
 Associative{K,V}                                                                                     
 AsyncWork                                                                                            
 Available                                                                                            
 BoundingBox                                                                                          
 Box                                                                                                  
 BuildInfo                                                                                            
 BuildStep                                                                                            
 CPUinfo                                                                                              
 ⋮                                                                                                    
 Vec2                                                                                                 
 VersionInterval                                                                                      
 VersionNumber                                                                                        
 VersionSet                                                                                           
 VersionWeight                                                                                        
 WeakRef                                                                                              
 Worker                                                                                               
 Zip                                                                                                  
 c_CholmodDense{T<:Union(Complex{Float64},Complex{Float32},Float64,Float32)}                          
 c_CholmodFactor{Tv<:Union(Complex{Float64},Complex{Float32},Float64,Float32),Ti<:Union(Int64,Int32)} 
 c_CholmodSparse{Tv<:Union(Complex{Float64},Complex{Float32},Float64,Float32),Ti<:Union(Int64,Int32)} 
 c_CholmodTriplet{Tv<:Union(Complex{Float64},Complex{Float32},Float64,Float32),Ti<:Union(Int64,Int32)}

subtypes is returing actual instances of DataType, which can be passed back into itself.

In [23]:
subtypes(Real)
Out[23]:
4-element Array{Any,1}:
 FloatingPoint       
 Integer             
 MathConst{sym}      
 Rational{T<:Integer}
In [27]:
subtypes(subtypes(subtypes(Real)[2])[end-1])
Out[27]:
5-element Array{Any,1}:
 Int128
 Int16 
 Int32 
 Int64 
 Int8  

issubtype

Some interesting type relations span more than a single generation. Stepping around using super and subtypes makes exploring these by hand tedious.

issubtype is a function to tell you if its first argument is a descendent of its second argument. One reason this is useful is that if you have a method to handle the second argument, then you don’t have to worry if there’s an implementation for the first — it will use the implementation for the closest type ancestor that has one.

In [29]:
issubtype(Int,Integer)
Out[29]:
true
In [31]:
issubtype(Float64,Real)
Out[31]:
true
In [32]:
issubtype(Any,DataType)
Out[32]:
false

Debugging

Once you’ve written your code, the built-in tools can continue to help you as you make it work correctly. Rather than showing you what’s available, these tools help you see what your code is actually doing.

Better print statement debugging with @show

While Julia has an interactive debugging package, it also has a @show macro that makes println debugging easier and more useful.

The show macro does two things:

  1. Print out a representation of the expression and the value it evaluates to
  2. Return that value
In [9]:
@show 2 + 2
+(2,2) => 4

Out[9]:
4

That second thing is important for embeding @show‘s in the middle of expressions. Because it returns the resulting value, it is equivalent to the original expression.

In [28]:
x = 5
y = x + @show x * 2
*(x,2) => 10

Out[28]:
15

@which

Something is going wrong in your code. When you read it, it looks fine: you’re definitely calling the right function. But when you run it, something is obviously wrong.

Which method is getting called there? Is that implementation correct?

The @which macro will take a function call and tell you not only what method would be called, but also give you a file name and line number.

In [13]:
@which 2 + 2
+(x::Int64,y::Int64) at int.jl:41

In [15]:
@which 2 + 2.0
+(x::Number,y::Number) at promotion.jl:148

In [19]:
@which 'h' + 2
+(x::Char,y::Integer) at char.jl:30

Each of the above examples are methods in the base library, which means that you can either clone julia or look in your source install — look in the base folder for a file of the name @which indicates. For non-base code, it gives a file path rather than just a name.

macroexpand

Writing macros tends to be a bit complex and sometimes issues of macro hygeine can be difficult to predict from looking at the code of your macro. Just running the macro on some expressions doesn’t always help; you want to see what code the macro application results in.

In Julia, you can do that. You give macroexpand a quoted expression using your macro; it will return that expression with the macro transformations applied.

In [31]:
macroexpand(:(@show 2+2))
Out[31]:
quote 
    $(Base).println("+(2,2) => ",$(Base).repr(begin  # show.jl, line 91:
                value#82 = +(2,2)
            end))
    value#82
end

You can also use macroexpand to see what other macros actually do. For example, a not uncommon pattern is to have the actual implementation in a normal function, while the macro allows uses to pass in unquoted expressions.

In [34]:
macroexpand(:(@which 2+2))
Out[34]:
:($(Base).which(+,2,2))