Tag Archives: plot

Visualizing Data with Julia using Makie

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/visualizing-data-with-julia-using-makie-7685d7850f06?source=rss-8bd6ec95ab58------2

Plotting in Julia with Makie

A Brief Tutorial on Makie.jl

When starting learning Julia, one might get lost in the many different packages available to do data visualization. Right out of the cuff, there is Plots, Gadfly, VegaLite … and there is Makie.

Makie is fairly new (~2018), yet, it’s very versatile, actively developed and quickly growing in number of users. This article is a quick introduction to Makie, yet, by the end of it, you will be able to do a plethora of different plots.

The Future of Plotting in Julia

When I started coding in Julia, Makie was not one of the contenders for “best” plotting libraries. As time passed, I started to here more and more about it around the community. For some reason, people were saying that:

“Makie is the future” — People in the Julia Community

I never fully understood why that was the case, and every time I tried to learn it, I’d be turned off by the verbose syntax, and, frankly, ugly examples. It was only when I bumped into Beautiful Makie that I decided to put aside my prejudices and get on with the times.

Hence, if you are starting to code in Julia, and is wondering which plotting package you should invest your time to learn, I say to you that Makie is the way to go, since I guess “Makie is the future”.

Number of GitHub Star’s in per repository. I guess indeed Makie is the future, if this trend keeps going.

Starting with Makie… Pick your backend

The versatility in Makie can make it a bit unwelcoming for those that “just want to do a damn scatter plot”. First of all, there is Makie.jl, CairoMakie.jl, GLMakie.jl WGLMakie.jl 😰. Which one should you use?

Well, here is the deal. Makie.jl is the main plotting package, but you have to choose a backend to which you will display your plots. The choice depends on your objectives. So yes, besides Makie.jl, you will need to install one of the backends. Here is a small description to help you chose:

  • CairoMakie.jl: It’s the easiest to use of all three, and it’s the ideal choice if you just want to produce static plots (not interactive);
  • GLMakie.jl: Uses OpenGL to display the plots, hence, you need to have OpenGL installed. Once you do a plot and run the display(myplot) , it’ll open an interactive window with your plot. If you want to do interactive 3D plots, then this is the backend for you;
  • WGLMakie.jl: It’s the hardest one to work with. Still, if you want to create interactive visualizations in the web, this is your choice.

In this tutorial, we’ll use CairoMakie.jl.

Your first plot

After picking our backend, we can now start plotting! I’ll go out on a limb and say that Makie is very similar to Matplotlib. It does not work with any fancy “Grammar of Graphics” (but if you like this sort of stuff, take a look at the AlgebraOfGraphics.jl, which implements an “Algebra of Graphics” on Makie).

Thus, there are a bunch of ready to use functions for some of the most common plots.

using CairoMakie #Yeah, no need to import Makie
scatter(rand(10,2))

Easy breezy… Yet, if you are plotting this in a Jupyter Notebook, you might be slightly ticked off by two things. First, the image is just too large. And second, it’s kind of low quality. What is going on?

By default, CairoMakie uses raster format for images, and the default size is a bit large. If you are like me and prefer your plots to be in svg and a bit smaller, then no worries! Just do the following:

using CairoMakie
CairoMakie.activate!(type = "svg")
scatter(rand(10,2),figure=(;resolution=(300,300)))

In the code above, the CairoMakie.activate!() is a command that tells Makie which backend you are using. You can import more than one backend at a time, and switch between them using this activation commands. Also, the CairoMakie backend has the option to do svg plots (to my knowledge, this is not possible for the other backends). Hence, with this small line of code, all our plots will now be displayed in high quality.

Next, we defined a “resolution” to our figure. In my opinion, this is a bit of an unfortunate name, because the resolution is actually the size of our image. Yet, as we’ll see further on, the attribute resolution actually belongs to our figure, and not to the actual scatter plot. For this reason we pass the whole figure = (; resolution=(300,300)) (if you are new to Julia, the ; is just a way of separating attributes that have names, from unnamed ones, i.e. args and kwags).

Congrats! You now know the bare minimum of Makie to do a whole bunch of different plots! Just go to the Makie’s website and see how to use all the different ready-to-use plotting functions! In order to be self contained, here is a small cheat sheet from the great book Julia Data Science.

Of course, we still haven’t talked about a bunch of important things, like titles, subplots, legends, axes limits, etc. Just keep on reading…

Storopoli, Huijzer and Alonso (2021). Julia Data Science. https://juliadatascience.io. ISBN: 9798489859165.

Figure, Axis and Plot

Commands like scatter produce a “FigureAxisPlot” object, which contains a figure, a set of axes and the actual plot. Each of these objects has different attributes and are fundamental in order to customize your visualization. By doing:

fig, ax, plt = scatter(rand(10,2))

We save each of these objects in a different variable, and can more easily modify them. In this example, the function scatter is actually creating all three objects, and not only the plot. We could instead create each of these objects individually. Here is how we do it:

fig = Figure(resolution=(600, 400)) 
ax = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
title = "Title")
lines!(ax, 1:0.1:10, x->sin(x))
Plot from code above

Let’s explain the code above. First, we created the empty figure and stored it in fig . Next, we created an “Axis”. But, we need to tell to which figure this object belongs, and this is where the fig[1,1] comes in. But, what is this “[1,1]”?

Every figure in Makie comes with a grid layout underneath, which enable us to easily create subplots in the same figure. Hence, the fig[1,1] means “Axis belongs to fig row 1 and column 1”. Since our figure only has one element, then our axis will occupy the whole thing. Still confused? Don’t worry, once we do subplots you’ll understand why this is so useful.

The rest of the arguments in “Axis” are easy to understand. We are just defining the names in each axis and then the title.

Finally, we add the plot using lines! . The exclamation is a standard in Julia that means that a function is actually modifying an object. In our case, the lines!(ax, 1:0.1:10, x->sin(x)) is appending a line plot to the ax axis.

It’s clear now how we can, for example, add more line plots. By running the same lines! , this will append more plots to our ax axis. In this case, let’s also add a legend to our plot.

fig = Figure(resolution=(600, 400)) 
ax = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
title = "Title")
lines!(ax, 1:0.1:10, x->sin(x), label="sin")
stairs!(ax, 1:0.1:10, x->cos(x), label="cos", color=:black)
axislegend(ax)
#*Tip*: if you are using Jupyter and want to display your
# visualization, you can do display(fig) or just write fig in
# the end of the cell.

Ok, our plots are starting to look good. Let me end this section talking about subplots. As I said, this is where the whole “fig[1,1]” comes into play. If instead of doing two plots in the same axis we wanted to create two parallel plots in the same figure, here is how we would do this.

fig = Figure(resolution=(600, 300)) 
ax1 = Axis(fig[1, 1], xlabel = "x label", ylabel = "y label",
title = "Title1")
ax2 = Axis(fig[1, 2], xlabel = "x label", ylabel = "y label",
title = "Title2")
lines!(ax1, 1:0.1:10, x->sin(x), label="sin")
stairs!(ax1, 1:0.1:10, x->cos(x), label="cos", color=:black)
density!(ax2, randn(100))
axislegend(ax)
save("figure.png", fig)

This time, in the same figure, we created two axis, but the first one is in the first row and first column, while the second one is in the second column. We then just append the plot to the respective axis. Lastly, we save the figure in “png” format.

Final Words

That’s it for this tutorial. Of course, there is much more the talk about, as we have only scratched the surface. Makie has some awesome capabilities in terms of animations, and much more attributes/objects to play with in order to create truly astonishing visualizations. If you want to learn more, take a look at Makie’s documentation, it’s very nice. And also, the Julia Data Science book has a chapter only on Makie.

References

This article draws heavily on the Julia Data Science book and Makie’s own documentation.

Storopoli, Huijzer and Alonso (2021). Julia Data Science. https://juliadatascience.io. ISBN: 9798489859165.

Danisch & Krumbiegel, (2021). Makie.jl: Flexible high-performance data visualization for Julia. Journal of Open Source Software, 6(65), 3349, https://doi.org/10.21105/joss.03349


Visualizing Data with Julia using Makie was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Alien facehugger wasps, a pandemic, webcrawlers and julia

By: Ömür Özkir

Re-posted from: https://medium.com/oembot/alien-facehugger-wasps-a-pandemic-webcrawlers-and-julia-c1f136925f8?source=rss----386c2bd736a1--julialang

collect and analyze covid 19 numbers for Hamburg, Germany

TL;DR

  1. Build a webcrawler in julia.
  2. Use the data for a simple plot.

Motivation

The web is full of interesting bits and pieces of information. Maybe it’s the current weather, stock prices, or the wikipedia article about the wasp that goes all alien parasite facehugger on other insects, which you vaguely remember from one of those late night documentaries (already sorry you are reading this?).

If you are lucky, that data is available via an API, making it usually pretty easy (not always tho, if API developers come up with byzantine authentications, required headers or other elegant/horrible designs, the fun is over) to get to the data.

A lot of the shiny nuggets are not available via a nice API, tho. Which means, we have to crawl webpages.

Pick something that you are really interested in / want to use for a project of yours, that’ll make it less of a chore and far more interesting!

Local = Relevant

For me, currently, that’s the Covid-19 pandemic. And more specifically, how it is developing close to me. In my case, that means the city of Hamburg in Germany.

Chances are, these specific numbers/case is not relevant to you. But that’s a good thing, you can use what you learned here and mine the website of your home city maybe (or whatever you are interested in).

Nothing helps your brain absorb new things better than generalizing those new skills and using them to solve related problems!

There is the official website of the city, that has a page for the covid-19 numbers, hamburg.de.

The page with the numbers is in german, but don’t worry, that’s what our webcrawler will hopefully help us with — we can get to the numbers without having to understand all the surrounding text. I will try to help out and translate what is relevant, but that will only be a minor detail when we try to find the right text to extract from.

If you like, you can check out the notebook or even code along in binder.

First, let’s get some of the dependencies out of the way:

Aside from HTTP.jl to request the website, we will also use Gumbo to parse html and Cascadia to extract data from the html document via selectors.

using HTTP
using Gumbo, Cascadia
using Cascadia: matchFirst

We need to fetch and parse the website, which is easily done with Gumbo.

url = "https://www.hamburg.de/corona-zahlen"
response = HTTP.get(url)
html = parsehtml(String(response))
# =>
HTML Document:
<!DOCTYPE >
HTMLElement{:HTML}:<HTML lang="de">
<head></head>
<body class="no-ads">
HTTP/1.1 200 OK
ServerHost: apache/portal5
X-Frame-Options: SAMEORIGIN
Access-Control-Allow-Origin: *
Content-Type: text/html;charset=UTF-8
Content-Language: de-DE
Date: Wed, 05 Aug 2020 19:30:23 GMT
Transfer-Encoding: chunked
Connection: keep-alive, Transfer-Encoding
Set-Cookie: JSESSIONID=AAF197B2F1191AACC08B70C4F8DAB18F.liveWorker2; Path=/; HttpOnly
Set-Cookie: content=13907680; Path=/servlet/segment/de/corona-zahlen/
Set-Cookie: BIGipServerv5-webstatic-80-12-cm7=201658796.20480.0000; path=/; Httponly; Secure
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta charset="utf-8"/>
<meta content="text/html" http-equiv="content-type"/>
<script type="text/javascript">window.JS_LANG='de'; </script>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
...

Alright, we can now start to parse data from the html document, by using query selectors.

You know, they might actually be called css selectors, I don’t know. How precise is the frontend terminology anyways, right?

Oh look, a pack of wild frontenders! Hmm, what are they doing, are they encircling us? They do look kinda angry, don’t they? I guess they just tried to vertically align some div the whole day or something.

Ok… I guess we should leave now.

Seems important

hamburg.de: Visual hierarchy, the topmost information is usually something important

We could start with the first information we see on the page, after all, there must hopefully be a reason that it is at the top of the page.

The three bullet points with the numbers mean confirmed cases, recovered and new cases. Now the trick is to find the best selectors. There are a few plugins for the different browsers that help finding the right selectors quickly.

But it is also pretty easy to do by hand. When right-click/inspecting an element on the page (this requires the developer tools) one can pretty quickly find a decently close selector.

If you want to test it out in the browser first, you can write something like this document.querySelectorAll(".c_chart.one .chart_legend li") in the browser console. Some browsers even highlight the element on the page when you hover over the array elements of the results.

Using the selectors in julia is pretty neat:

eachmatch(sel".c_chart.one .chart_legend li", html.root)
# => 
3-element Array{HTMLNode,1}:
HTMLElement{:li}:<li>
<span style="display:inline-block;width:.7em;height:.7em;margin-right:5px;background-color:#003063"></span>
Bestätigte Fälle 5485
</li>


HTMLElement{:li}:<li>
<span style="display:inline-block;width:.7em;height:.7em;margin-right:5px;background-color:#009933"></span>
Davon geheilt 5000
</li>


HTMLElement{:li}:<li>
<span style="display:inline-block;width:.7em;height:.7em;margin-right:5px;background-color:#005ca9"></span>
Neuinfektionen 25
</li>

Ok, we need to extract the numbers from the text of each html element. Using a simple regex seems like the easiest solution in this case. Check this out, it looks very similar to the previous selector matching:

match(r"\d+", "Neuinfektionen 25")
# =>
RegexMatch("25")

Nice, huh? Ok, but we only need the actual match.

match(r"\d+", "Neuinfektionen 25").match
# =>
"25"

And we need to cast it to a number:

parse(Int, match(r"\d+", "Neuinfektionen 25").match)
# =>
25

We want to do this for each element now, so we extract the text from the second node (the span element is the first, see the elements above).

Then we do all the previously done matching and casting and we got our numbers!

function parsenumbers(el)
text = el[2].text
parse(Int, match(r"\d+", text).match)
end
map(parsenumbers, eachmatch(sel".c_chart.one .chart_legend li", html.root))
# =>
3-element Array{Int64,1}:
5485
5000
25

Learning how to Date in german

We should also extract the date when those numbers were published. The selector for the date on the page is very easy this time: .chart_publication.

In the end we want some numbers, that we can use to instantiate a Date object, something like this Date(year, month, day).

We are starting out with this, however:

date = matchFirst(sel".chart_publication", html.root)[1].text
# =>
"Stand: Mittwoch, 5. August 2020"

Oh dear, it’s in german again. We need "5. August 2020" from this string.

parts = match(r"(\d+)\.\s*(\S+)\s*(\d{4})", date).captures
# =>
3-element Array{Union{Nothing, SubString{String}},1}:
"5"
"August"
"2020"

Better, but it’s still in german!

Ok, last bit of german lesson, promised, how about we collect all the month names in a tuple?

Then we can find it’s index in the tuple. That would be the perfect input for our Date constructor.

const MONTHS = ("januar", "februar", "märz", "april", "mai", "juni", "juli", "august", "september", "oktober", "november", "dezember")
findfirst(m -> m == lowercase(parts[2]), MONTHS) # => 8
using Dates
Date(parse(Int, parts[3]),
findfirst(m -> m == lowercase(parts[2]), MONTHS),
parse(Int, parts[1]))
# => 2020-08-05

More local = more relevant!

There are a few more interesting nuggets of information, I think the hospitalization metrics would be very interesting, especially to investigate the correlation between when cases are confirmed and the delayed hospitalizations.

But one thing that is especially interesting (and I don’t think such locally detailed information is available anywhere else) are the number of cases in the last 14 days, for each borough.

Speaking of local, this is probably the most local we can get.

List of boroughs, number of new infections aggregated for the last 14 days

By now, you probably start to see a pattern:

  1. find good selector
  2. extract content
  3. parse/collect details
rows = eachmatch(sel".table-article tr", html.root)[17:end]
df = Dict()
foreach(rows) do row
name = matchFirst(sel"td:first-child", row)[1].text
num = parse(Int, matchFirst(sel"td:last-child", row)[1].text)
df[name] = num
end
df
# =>
Dict{Any,Any} with 7 entries:
"Bergedorf" => 17
"Harburg" => 28
"Hamburg Nord" => 26
"Wandsbek" => 63
"Altona" => 14
"Eimsbüttel" => 12
"Hamburg Mitte" => 41

great, that’s it?

No! No, now the real fun begins. Do something with the data! You will probably already have some idea what you want to do with the data.

How about ending this with something visual?

Visualizations, even a simple plot, can help a lot with getting a feel for the structure of the data:

using Gadfly
Gadfly.set_default_plot_size(700px, 300px)

There are a lot of great plotting packages for julia, I personally really like Gadfly.jl for its beautiful plots.

plot(x=collect(keys(df)), 
y=collect(values(df)),
Geom.bar,
Guide.xlabel("Boroughs"),
Guide.ylabel("New Infections"),
Guide.title("New infections in the last 14 days"),
Theme(bar_spacing=5mm))
Even such a simple plot already helps understanding the data better, right?

The end! Right?

Ha ha ha ha- nope. Webcrawlers are notoriously brittle, simply because the crawled websites tend to change over time. And with it, the selectors. It’s a good idea to test if everything works, once in a while, depending on how often you use your webcrawler.

Be prepared to maintain your webcrawler more often than other pieces of software.

A few things to check out

Very close to the topic: I created a little package, Hamburg.jl, that has a few datasets about Hamburg, including all the covid-19 related numbers that we scraped a little earlier.

The official julia docs should get you up and running with your local julia dev setup.

One more crawler

Ok, one more thing, before I let you off to mine the web for all its information:

html = parsehtml(String(HTTP.get("https://en.wikipedia.org/wiki/Emerald_cockroach_wasp")))
ptags = eachmatch(sel".mw-parser-output p", html.root)[8:9]
join(map(n -> nodeText(n), ptags))
# =>
"Once the host is incapacitated, the wasp proceeds to chew off half of each of the roach's antennae, after which it carefully feeds from exuding hemolymph.[2][3] The wasp, which is too small to carry the roach, then leads the victim to the wasp's burrow, by pulling one of the roach's antennae in a manner similar to a leash. In the burrow, the wasp will lay one or two white eggs, about 2 mm long, between the roach's legs[3]. It then exits and proceeds to fill in the burrow entrance with any surrounding debris, more to keep other predators and competitors out than to keep the roach in.\nWith its escape reflex disabled, the stung roach simply rests in the burrow as the wasp's egg hatches after about 3 days. The hatched larva lives and feeds for 4–5 days on the roach, then chews its way into its abdomen and proceeds to live as an endoparasitoid[4]. Over a period of 8 days, the final-instar larva will consume the roach's internal organs, finally killing its host, and enters the pupal stage inside a cocoon in the roach's body.[4] Eventually, the fully grown wasp emerges from the roach's body to begin its adult life. Development is faster in the warm season.\n"

…the wasp proceeds to chew off half of each of the roach’s antennae, after which it carefully feeds from exuding…

…what…

…The hatched larva lives and feeds for 4–5 days on the roach, then chews its way into its abdomen…

…the…

…Over a period of 8 days, the final-instar larva will consume the roach’s internal organs, finally killing its host…

…hell mother nature, what the hell…


Alien facehugger wasps, a pandemic, webcrawlers and julia was originally published in oembot on Medium, where people are continuing the conversation by highlighting and responding to this story.

#MonthOfJulia Day 18: Plotting

Julia-Logo-Plotting

There’s a variety of options for plotting in Julia. We’ll focus on those provided by Gadfly, Bokeh and Plotly and.

Gadfly

Gadfly is the flavour of the month for plotting in Julia. It’s based on the Grammar of Graphics, so users of ggplot2 should find it familiar.

gadfly-logo

To start using Gadfly we’ll first need to load the package. To enable generation of PNG, PS, and PDF output we’ll also want the Cairo package.

julia> using Gadfly
julia> using Cairo

You can easily generate plots from data vectors or functions.

julia> plot(x = 1:100, y = cumsum(rand(100) - 0.5), Geom.point, Geom.smooth)
julia> plot(x -> x^3 - 9x, -5, 5)

Gadfly plots are by default rendered onto a new tab in your browser. These plots are mildly interactive: you can zoom and pan across the plot area. You can also save plots directly to files of various formats.

julia> dampedsin = plot([x -> sin(x) / x], 0, 50)
julia> draw(PNG("damped-sin.png", 800px, 400px), dampedsin)

damped-sin

Let’s load up some data from the nlschools dataset in R’s MASS package and look at the relationship between language score test and IQ for pupils broken down according to whether or not they are in a mixed-grade class.

julia> using RDatasets
julia> plot(dataset("MASS", "nlschools"), x="IQ", y="Lang", color="COMB",
            Geom.point, Geom.smooth(method=:lm), Guide.colorkey("Multi-Grade"))

nlschools

Those two examples just scratched the surface. Gadfly can produce histograms, boxplots, ribbon plots, contours and violin plots. There’s detailed documentation with numerous examples on the homepage.

Watch the video below (Daniel Jones at JuliaCon 2014) then read on about Bokeh and Plotly.

Bokeh

Bokeh is a visualisation library for Python. Bokeh, like D3, renders plots as Javascript, which is viewable in a web browser. In addition to the examples on the library homepage, more can be found on the homepage for Julia’s Bokeh package.

The first thing you’ll need to do is install the Bokeh library. If you already have a working Python installation then this is easily done from the command line:

$ pip install bokeh

Next load up the package and generate a simple plot.

julia> using Bokeh
julia> autoopen(true);
julia> x = linspace(0, pi);
julia> y = cos(2 * x);
julia> plot(x, y, title = "Cosine")
Plot("Cosine" with 1 datacolumns)

The plot will be written to a file bokeh_plot.html in the working directory, which will in turn be opened by the browser. Use plotfile() to change the name of the file. The plot is interactive, with functionality to pan and zoom as well as resize the plot window.

bokeh-plot

Plotly

The Plotly package provides a complete interface to plot.ly, an online plotting service with interfaces for Python, R, MATLAB and now Julia. To get an idea of what’s possible with plot.ly, check out their feed. The first step towards making your own awesomeness with be loading the package.

using Plotly

Next you should set up your plot.ly credentials using Plotly.set_credentials_file(). You only need to do this once because the values will be cached.

Data series are stored in Julia dictionaries.

julia> p1 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "markers"];
julia> p2 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "lines"];
julia> p3 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "lines+markers"];
julia> Plotly.plot([p1, p2, p3], ["filename" => "basic-line", "fileopt" => "overwrite"])
Dict{String,Any} with 5 entries:
  "error"    => ""
  "message"  => ""
  "warning"  => ""
  "filename" => "basic-line"
  "url"      => "https://plot.ly/~collierab/17"

You can either open the URL provided in the result dictionary or do it programatically:

julia> Plotly.openurl(ans["url"])

plotly-scatter

By making small jumps through similar hoops it’s possible to create some rather intricate visualisations like the 3D scatter plot below. For details of how that was done, check out my code on github.
plotly-3d-scatter

Obviously plotting and visualisation in Julia are hot topics. Other plotting packages worth checking out are PyPlot, Winston and Gaston. Come back tomorrow when we’ll take a look at using physical units in Julia.




The post #MonthOfJulia Day 18: Plotting appeared first on Exegetic Analytics.