By: Karl Pettersson
Re-posted from: https://static-dust.klpn.se/posts/2017-03-01-mcsite.html
A website with mortality charts built using Julia
Since 2015, I have run a website with cause-specific mortality trends. The idea is to have a static site, which gives fast and easy access to information about international mortality trends, using open data available from WHO (2017), which, for many countries, covers the time period from 1950 up until recent times. The website is inspired by Whitlock (2012), which contains comprehensible charts with mortality trends based on these data, but has been unmaintained since 2013, when its creator died. Other sites with international cause-specific mortality trends I have seen tend to be slower, due to dynamic chart generation, and to cover only shorter time periods.
My implementation of the site generator, which was written in Python and R, had become rather messy, and the chart tools I used (matplotlib and ggplot2) are not really suited to make interactive web charts. I decided to rewrite the routines to generate the charts and the site files in Julia (albeit with the help of some non-Julia tools, as described below). These routines are now available as a GitHub repo, and I use them to generate the site in both English and Swedish versions.
The site is built as follows with the Julia package (see the README in the repo for instructions). The whole process is controlled with a JSON configuration file. YAML, using some non-JSON features, might be less cumbersome, and will perhaps be used once there is full YAML write support implemented in Julia. Julia functions mentioned are in the main Mortchartgen.jl file, if not otherwise stated.
- The WHO (2017) data files are downloaded and read into a MySQL database, using the functions in the Download.jl file.
- These data files contain cause of death codes from many different versions of the ICD classifications for different time periods and countries, and the codes are also often at a much more detailed level than I use in the charts. Therefore, the data on deaths is grouped using regular expressions defined in the configuration file. To avoid repeating this time-consuming regular expression matching, the resulting DataFrames can be saved in CSV files. There are still some issues with unsupported datatypes in the MySQL.jl package, which mean that grouping cannot be done at the SQL level and that prepared SQL statements cannot be used.
- The charts themselves are generated from the DataFrames created in step 2, using the Python Bokeh library, which is well-suited for interactive web visualizations. I call Bokeh directly using PyCall, instead of using the Bokeh.jl package, which is unmaintained. There is a
batchplotfunction to generate all the charts for the site using the settings in the configuration file.
writeplotsitefunction generates the charts as well as HTML tables with links to the charts, a documentation file in Markdown format, and navigation menus for a given language, and copies these to a given output location. To generate the site files, except for the charts themselves, templates processed with Mustache.jl are used.
- The final generation of the site is done using Hakyll, a static site generator written in Haskell. In the output directory generated in step 4, there will be a Haskell source file,
site.hs, which, provided that a Haskell complier and the Hakyll libraries are installed, can be compiled to an executable file. This file can then be run as
Whitlock, Gary. 2012. “Mortality Trends [archived 21 december 2014].” http://web.archive.org/web/20141221203103/http://www.mortality-trends.org/.
WHO. 2017. “WHO Mortality Database.” http://www.who.int/healthinfo/mortality_data/en/index.html.