Tag Archives: Visualization

Visualizing Data with Julia using Makie

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/visualizing-data-with-julia-using-makie-7685d7850f06?source=rss-8bd6ec95ab58------2

Plotting in Julia with Makie

A Brief Tutorial on Makie.jl

When starting learning Julia, one might get lost in the many different packages available to do data visualization. Right out of the cuff, there is Plots, Gadfly, VegaLite … and there is Makie.

Makie is fairly new (~2018), yet, it’s very versatile, actively developed and quickly growing in number of users. This article is a quick introduction to Makie, yet, by the end of it, you will be able to do a plethora of different plots.

The Future of Plotting in Julia

When I started coding in Julia, Makie was not one of the contenders for “best” plotting libraries. As time passed, I started to here more and more about it around the community. For some reason, people were saying that:

“Makie is the future” — People in the Julia Community

I never fully understood why that was the case, and every time I tried to learn it, I’d be turned off by the verbose syntax, and, frankly, ugly examples. It was only when I bumped into Beautiful Makie that I decided to put aside my prejudices and get on with the times.

Hence, if you are starting to code in Julia, and is wondering which plotting package you should invest your time to learn, I say to you that Makie is the way to go, since I guess “Makie is the future”.

Number of GitHub Star’s in per repository. I guess indeed Makie is the future, if this trend keeps going.

Starting with Makie… Pick your backend

The versatility in Makie can make it a bit unwelcoming for those that “just want to do a damn scatter plot”. First of all, there is Makie.jl, CairoMakie.jl, GLMakie.jl WGLMakie.jl 😰. Which one should you use?

Well, here is the deal. Makie.jl is the main plotting package, but you have to choose a backend to which you will display your plots. The choice depends on your objectives. So yes, besides Makie.jl, you will need to install one of the backends. Here is a small description to help you chose:

CairoMakie.jl: It’s the easiest to use of all three, and it’s the ideal choice if you just want to produce static plots (not interactive);
GLMakie.jl: Uses OpenGL to display the plots, hence, you need to have OpenGL installed. Once you do a plot and run the display(myplot) , it’ll open an interactive window with your plot. If you want to do interactive 3D plots, then this is the backend for you;
WGLMakie.jl: It’s the hardest one to work with. Still, if you want to create interactive visualizations in the web, this is your choice.

In this tutorial, we’ll use CairoMakie.jl.

Your first plot

After picking our backend, we can now start plotting! I’ll go out on a limb and say that Makie is very similar to Matplotlib. It does not work with any fancy “Grammar of Graphics” (but if you like this sort of stuff, take a look at the AlgebraOfGraphics.jl, which implements an “Algebra of Graphics” on Makie).

Thus, there are a bunch of ready to use functions for some of the most common plots.

using CairoMakie #Yeah, no need to import Makie
scatter(rand(10,2))

Easy breezy… Yet, if you are plotting this in a Jupyter Notebook, you might be slightly ticked off by two things. First, the image is just too large. And second, it’s kind of low quality. What is going on?

By default, CairoMakie uses raster format for images, and the default size is a bit large. If you are like me and prefer your plots to be in svg and a bit smaller, then no worries! Just do the following:

using CairoMakie
CairoMakie.activate!(type = "svg")
scatter(rand(10,2),figure=(;resolution=(300,300)))

In the code above, the CairoMakie.activate!() is a command that tells Makie which backend you are using. You can import more than one backend at a time, and switch between them using this activation commands. Also, the CairoMakie backend has the option to do svg plots (to my knowledge, this is not possible for the other backends). Hence, with this small line of code, all our plots will now be displayed in high quality.

Next, we defined a “resolution” to our figure. In my opinion, this is a bit of an unfortunate name, because the resolution is actually the size of our image. Yet, as we’ll see further on, the attribute resolution actually belongs to our figure, and not to the actual scatter plot. For this reason we pass the whole figure = (; resolution=(300,300)) (if you are new to Julia, the ; is just a way of separating attributes that have names, from unnamed ones, i.e. args and kwags).

Congrats! You now know the bare minimum of Makie to do a whole bunch of different plots! Just go to the Makie’s website and see how to use all the different ready-to-use plotting functions! In order to be self contained, here is a small cheat sheet from the great book Julia Data Science.

Of course, we still haven’t talked about a bunch of important things, like titles, subplots, legends, axes limits, etc. Just keep on reading…

Storopoli, Huijzer and Alonso (2021). Julia Data Science. https://juliadatascience.io. ISBN: 9798489859165.

Figure, Axis and Plot

Commands like scatter produce a “FigureAxisPlot” object, which contains a figure, a set of axes and the actual plot. Each of these objects has different attributes and are fundamental in order to customize your visualization. By doing:

fig, ax, plt = scatter(rand(10,2))

We save each of these objects in a different variable, and can more easily modify them. In this example, the function scatter is actually creating all three objects, and not only the plot. We could instead create each of these objects individually. Here is how we do it:

fig = Figure(resolution=(600, 400)) 
ax = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
    title = "Title")
lines!(ax, 1:0.1:10, x->sin(x))

Let’s explain the code above. First, we created the empty figure and stored it in fig . Next, we created an “Axis”. But, we need to tell to which figure this object belongs, and this is where the fig[1,1] comes in. But, what is this “[1,1]”?

Every figure in Makie comes with a grid layout underneath, which enable us to easily create subplots in the same figure. Hence, the fig[1,1] means “Axis belongs to fig row 1 and column 1”. Since our figure only has one element, then our axis will occupy the whole thing. Still confused? Don’t worry, once we do subplots you’ll understand why this is so useful.

The rest of the arguments in “Axis” are easy to understand. We are just defining the names in each axis and then the title.

Finally, we add the plot using lines! . The exclamation is a standard in Julia that means that a function is actually modifying an object. In our case, the lines!(ax, 1:0.1:10, x->sin(x)) is appending a line plot to the ax axis.

It’s clear now how we can, for example, add more line plots. By running the same lines! , this will append more plots to our ax axis. In this case, let’s also add a legend to our plot.

fig = Figure(resolution=(600, 400)) 
ax = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
    title = "Title")
lines!(ax, 1:0.1:10, x->sin(x), label="sin")
stairs!(ax, 1:0.1:10, x->cos(x), label="cos", color=:black)
axislegend(ax)

#*Tip*: if you are using Jupyter and want to display your
#       visualization, you can do display(fig) or just write fig in
#       the end of the cell.

Ok, our plots are starting to look good. Let me end this section talking about subplots. As I said, this is where the whole “fig[1,1]” comes into play. If instead of doing two plots in the same axis we wanted to create two parallel plots in the same figure, here is how we would do this.

fig = Figure(resolution=(600, 300)) 
ax1 = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
    title = "Title1")
ax2 = Axis(fig[1, 2], xlabel = "x label", ylabel = "y label",
    title = "Title2")

lines!(ax1, 1:0.1:10, x->sin(x), label="sin")
stairs!(ax1, 1:0.1:10, x->cos(x), label="cos", color=:black)
density!(ax2, randn(100))

axislegend(ax)
save("figure.png", fig)

This time, in the same figure, we created two axis, but the first one is in the first row and first column, while the second one is in the second column. We then just append the plot to the respective axis. Lastly, we save the figure in “png” format.

Final Words

That’s it for this tutorial. Of course, there is much more the talk about, as we have only scratched the surface. Makie has some awesome capabilities in terms of animations, and much more attributes/objects to play with in order to create truly astonishing visualizations. If you want to learn more, take a look at Makie’s documentation, it’s very nice. And also, the Julia Data Science book has a chapter only on Makie.

References

This article draws heavily on the Julia Data Science book and Makie’s own documentation.

Storopoli, Huijzer and Alonso (2021). Julia Data Science. https://juliadatascience.io. ISBN: 9798489859165.

Danisch & Krumbiegel, (2021). Makie.jl: Flexible high-performance data visualization for Julia. Journal of Open Source Software, 6(65), 3349, https://doi.org/10.21105/joss.03349

Visualizing Data with Julia using Makie was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Analyzing Graphs with Julia

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/analyzing-graphs-with-julia-38e26d1d2f62?source=rss-8bd6ec95ab58------2

A brief tutorial on how to use Julia to analyze graphs using the JuliaGraphs packages

Example of Graph created using LigthsGraphs.jl and VegaLite.jl

First of all, let’s be clear. The goal of this article is to briefly introduce how to use Julia for analyzing graphs. Besides the many types of graphs (undirected, directed, bipartite, weighted…), there are also many methods for analyzing them (degree distribution, centrality measures, clustering measures, visual layouts …). Hence, a comprehensive introduction to Graph Analysis with Julia would be too large of a task.

Therefore, this tutorial focuses on undirected weighted graphs, since they encompass weightless graphs, and are usually more common than directed graphs¹.

JuliaGraphs Project

Almost every package you will need can be found in the JuliaGraphs Project. The project contains specific packages for plotting, network layouts, weighted graphs, and more. In our example, we’ll be using GraphPlot.jl and SimpleWeightedGraphs.jl. The good thing about the project is that these packages work together and are very similar in design. Hence, the functions you use for creating a weighted graph are very similar to the ones you use for creating a simple graph with LightGraphs.jl.

Creating your first Graph

Let’s create a DataFrame using the DataFrames.jl package. Each column will represent a person, and each row will represent an attribute. Therefore, our graph will be composed of nodes (people) and edges (people share the same attribute).

After creating the DataFrame, we create the graph. Note here that they are actually separate objects. To create the graph, you only need to specify the number of nodes, which is the number of columns. Below I present the code for the creation of the DataFrame and the Graph.

<a href="https://medium.com/media/6c44a260a10e39dfd8a1e1cafa7aac4b/href">https://medium.com/media/6c44a260a10e39dfd8a1e1cafa7aac4b/href</a>

The nodes were inserted, now we have to create the appropriate edges. This is done by using the command add_edge!() which takes the graph, the nodes that must be connected, and the edge weight.

In our example, the the weight is equal to the number of shared attributes between two columns. For example, the first and second column both have 1’s in rows 1 and 7. Therefore, they share an edge with weight 2:

add_edge!(g,1,2,2) # add_edge!(graph, node_1, node_2, weight)

The following code loops through the data, and adds the edges to the graph.

<a href="https://medium.com/media/3bfa8ba6ca142cd02f904295d4fa4678/href">https://medium.com/media/3bfa8ba6ca142cd02f904295d4fa4678/href</a>

Visualizing the Graph

With this, our graph is ready to be visualized. This can be easily done with the following command:

gplot(g,nodelabel=names(df),edgelinewidth=ew)

There are several different layouts to chose from, just take a look at the GraphPlots.jl page. Here is another example:

gplot(g,nodelabel=names(df),edgelinewidth=ew,layout=circular_layout)

Centrality and Minimum Spanning Tree

Let’s do some analysis in this graph. As I said in the beginning, there are many way to analyze a graph. Two very common methods are studying the centrality of each node, and creating a Minimum Spanning Tree (MST). The goal of this article is not to explain theoretical aspects of graphs, so I’ll assume that the reader knows what I’m talking about.

Here is an example of how to calculate the betweeness centrality and visualizing the results. Note that I added 1 in the first line of code just to enable a better visualization.

<a href="https://medium.com/media/0481ccea239376ba8929236cc1e6a7e3/href">https://medium.com/media/0481ccea239376ba8929236cc1e6a7e3/href</a>

Output of code above. The nodes colors represent the centrality.

Finally, one can also easily create a MST. To do so, we must create a new graph, since we’ll be removing edges and adjusting the nodes’ locations. The code below shows how to do this. The only new function here is the kruskal_mst() which will give us the MST. You can choose if the MST is minimizing of maximizing the path. In our case, we want to create a tree the contains only the “strongest” connections, hence, we are maximizing.

<a href="https://medium.com/media/7c7173bc6edfb31a029cdfe1360fe2ae/href">https://medium.com/media/7c7173bc6edfb31a029cdfe1360fe2ae/href</a>

Conclusion

This is the end of this brief tutorial. There are much more functionalities in the JuliaGraphs project, enabling a much more thorough analysis than the one presented here. Also, one can create more interactive visualizations using VegaLite.jl, but I’ll leave that for another article.

¹ Authors opinion.

² A Jupyter Notebook can also be found on Github.

Analyzing Graphs with Julia was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Six of One (Plot), Half-Dozen of the Other

By: randyzwitch - Articles

Re-posted from: http://badhessian.org/2014/07/six-of-one-plot-half-dozen-of-the-other/

This is a guest post by Randy Zwitch (@randyzwitch), a digital analytics and predictive modeling consultant in the Greater Philadelphia area. Randy blogs regularly about Data Science and related technologies at http://randyzwitch.com. He’s blogged at Bad Hessian before here.

WordPress Stats - Visitors vs. Views — WordPress Stats – Visitors vs. Views

For those of you with WordPress blogs and have the Jetpack Stats module installed, you’re intimately familiar with this chart. There’s nothing particularly special about this chart, other than you usually don’t see bar charts with the bars shown superimposed.

I wanted to see what it would take to replicate this chart in R, Python and Julia. Here’s what I found. (download the data).

R: ggplot2

Although I prefer to use other languages these days for my analytics work, there’s a certain inertia/nostalgia that when I think of making charts, I think of using ggplot2 and R. Creating the above chart is pretty straightforward, though I didn’t quite replicate the chart, as I couldn’t figure out how to make my custom legend not do the diagonal bar thing.

The R Cookbook talks about a hack to remove the diagonal lines from legends, so I don’t feel too bad about not getting it. I also couldn’t figure out how to force ggplot2 to give me the horizontal line at 10000. If anyone in the R community knows how to fix these, let me know!

(Pythonistas: I’m aware of the ggplot port by Yhat; functionality I used in my R code is still in TODO, so I didn’t pursue plotting with ggplot in Python)

R: Base Graphics

Of course, not everyone finds ggplot2 to be easy to understand, as it requires a different way of thinking about coding than most ‘base’ R functions. To that end, there are the base graphics built into R, which produced this plot: While I was able to nearly replicate the WordPress chart (except for the feature of having the dark bars slightly smaller width than the lighter), the base R syntax is horrid. The abbreviations for plotting arguments are indefensible, the center and width keywords seem to shift the range of the x-axis instead of changing the actual bar width, and in general, the experience plotting using base R was the worst of the six libraries I evaluated.

Python: matplotlib

In the past year or so, there’s been quite a lot of activity towards improving the graphics capabilities in Python. Historically, there’s been a lot of teeth-gnashing about matplotlib being too low-level and hard to work with, but with enough effort, the results are quite pleasant. Unlike with ggplot2 and base R, I was able to replicate all the features of the WordPress plot:

Python: Seaborn

One of the aforementioned improvements to matplotlib is Seaborn, which promises to be a higher-level means of plotting data than matplotlib, as well as adding new plotting functionality common in statistics and research. Re-creating this plot using Seaborn is a waste of the additional functionality of Seaborn, and as such, I found it more difficult to make this plot using Seaborn than I did with matplotlib.

To replicate the plot, I ended up hacking a solution together using both Seaborn functionality and matplotlib in order to be able to set bar width and to create the legend, which defeats the purpose of using Seaborn in the first place.

Julia: Gadfly

In the Julia community, Gadfly is clearly the standard for plotting graphics. Supporting d3.js, PNG, PS, and PDF, Gadfly is built to work with many popular back-end environments. I was able to replicate everything about the WordPress graph except for the legend:While Gadfly took a line or two more than base R in terms of fewest lines of code, I find the Gadfly syntax significantly more pleasant to work with.

Julia: Plot.ly

Plot.ly is an interesting ‘competitor’ in this challenge, as it’s not a language-specific package per-se. Rather, Plot.ly is a means of specifying plots using JSON, with lightweight Julia/Python/MATLAB/R wrappers. I was able to replicate nearly everything about the WordPress plot, with the exception of not having a line at 10000, having the legend vertical instead of horizontal and I couldn’t figure out how to set the bar widths separately.

And The Winner Is…matplotlib?!

If you told me at the beginning of this exercise that matplotlib (and by extension, Seaborn) would be the only library that I would be able to replicate all the features of the WordPress graph, I wouldn’t have believed it. And yet, here we are. ggplot2 was certainly very close, and I’m certain that someone knows how to fix the diagonal line issue. I suspect I could submit an issue ticket to Gadfly.jl to get the feature added to create custom legends (and for that matter, make the request of Plot.ly for horizontal legends), so in the future there could be feature parity using these two libraries as well.

I hope we all agree there’s no hope for Base Graphics in R besides quick throwaway plots.

In the end, the best thing I can say from this exercise is that the analytics community is fortunate to have so many talented people working to provide these amazing visualization libraries. This graph was rather pedestrian in nature, so I didn’t even scratch the surface of what these various libraries can do. Even beyond the six libraries I chose, there are others I didn’t choose, including: prettyplotlib (Python), Bokeh (Python), Vincent (Python), rCharts (R), ggvis (R), Winston (Julia), ASCII Plots (Julia) and probably even more that I’m not even aware of! All free and open-source and miles apart from terrible looking Microsoft graphics in Excel and Powerpoint.

juliabloggers.com

A Julia Language Blog Aggregator