Tag Archives: Blog

Creating a Mathematics Blog with Jekyll

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/creating-a-mathematics-blog-with-jekyll-78cdee0339f3?source=rss-8bd6ec95ab58------2

A Tutorial on how to setup a blog using GitHub, Jekyll and MathJax

There are several ways to create your personal blog on the web, but not as many alternatives when you want to focus on technical subjects and want to write mathematical equations. In this brief article, I explain how you can use GitHub with Jekyll to create your personal blog with MathJax, so you can write beautiful equations using Latex notation. Note that this tutorial is written for Ubuntu, but can easily be adapted for a different OS.

What is Jekyll and MathJax

In this tutorial, I will assume that you now what GitHub is, and that you can use it to host your personal website using GitHub Page. If you don’t, then take a look at this Tutorial. With that being said, let’s explain what Jekyll and MathJax are…

Jekyll is a static site generator. It takes text written in your favorite markup language and uses layouts to create a static website. You can tweak the site’s look and feel, URLs, the data displayed on the page, and more.

— Jekyll Official Website

In other words, Jekyll can be thought as an application written in Ruby, that easily creates a web site that is easily manageable and allows one to write using markup language, which is quite handy. Also, Jekyll is supported by Github and has some pre-built styles, so your blog is beautiful looking right out of the box.

MathJax is a JavaScript display engine for mathematics that works in all browsers.

— MathJax Official Website

In other words, MathJax is a service that renders the mathematical equation, so it looks neat in your browser. You can then write in your blog

$$ x = y ^2 $$

And in your browser, MathJax will render it as

Mathematical equation rendered on the browser with MathJax. Note that Medium doesn’t use MathJax, so I actually just copy and pasted an image of how the rendered text would look like.

The idea is to create a GitHub page, then to use Jekyll to style your blog and allow the use of markdown, together with MathJax to properly render the equations.

Setting up your Blog with Jekyll

There are several ways to set up Jekyll with your GitHub page. One can install themes using Ruby Gems, or use one of the themes provided by GitHub. But to enable MathJax, one has to manually tweak the html code for the theme. So instead, we will fork an specific theme, and modify it.

First, we need to install Jekyll and Bundle (which helps manage Ruby Gems). If you are using Ubuntu 20.04, you might already have Ruby installed. Otherwise, you might need to install it. The command below will install Ruby, and then install Jekyll and Bundle.

sudo apt install ruby-dev
gem install jekyll
gem install bundler

With these few lines of code, you can already set up your blog with Jekyll, just run the following

jekyll new my-website

This will create a folder “my-website” with all website code inside it. If you go inside the directory and run one command, you can start a local server with your blog running.

cd ./my-website

bundle exec jekyll serve

Now go into your browser and go to “localhost:4000” and you can see your blog.

Your blog is already functional! You can just copy all the files inside the “./my-website” folder, and move them to your git repository, and then push them to GitHub. After a few seconds, your blog will be up on the web. Below I show an example:

mv ./my-website/* ./coffee-in-a-klein-bottle.github.io

cd ./coffee-in-a-klein-bottle.github.io

git add -A

git commit -m “Initiated my Jekyll blog”

git push -u origin master

Quick guide to using Jekyll

Now, I’ve explained how to set up Jekyll, but not exactly how to use it. There is a ton of tutorials out there, but, actually, you don’t need to know much to start using Jekyll effectively. Once your blog is created, to personalize it and start writing posts is straightforward. If you go inside the folder containing your new blog, this is what it might look like:

Screenshot of Terminal showing inside the “my-website” folder created by using Jekyll

The “.markdown” files are just, as the name says, markdown files containing information regarding each specific webpage. You can, for example, open the “about.markdown” file, and start modifying it with your personal information. After that, just run

bundle exec jekyll serve

from inside the “my-website” folder, and Jekyll will update your website files, which are inside the “_site/” folder. Note that you don’t do anything with the “_site/” folder, the files in there are taken care of by Jekyll under the hood.

The “_config.yml” file will contain configuration properties used by Jekyll, such as the theme being used, the plugins that are necessary to be installed, the title of your website, and so on. We will be using a preset theme, so you won’t need to worry much about this file for now.

The “Gemfile” and “Gemfile.lock” contain information for Ruby, such as the version of Jekyll and so on. You don’t need to worry about them.

Finally, the “_posts/” folder is where you will store your posts files. This posts are markdown, so there is not much here to be said. Just follow the same structure as the example post provided, and you will be fine.

Updating the them to enable MathJax

Unfortunately, most Jekyll themes don’t come with MathJax enabled right out of the box, so we have to do this manually, and this require that we modify the files of the theme. When hosting your blog on GitHub, some themes are natively supported (such as the minima theme), but the theme files cannot be modified this way. So we will need to bring all the theme files to the repository and modify them there.

First, let’s go to the “minima” theme GitHub repository, and then download the repository to your computer. Delete all the files in your GitHub repository containing the website, and then copy all the files from the “minima” folder to your repository. Then, run the Jekyll command to see if your site is still working. Below are the commands showing an example of how to do this:

cd ~/

git clone https://github.com/jekyll/minima.git

cd ./coffee-in-a-klein-bottle.github.io

rm ./*

mv ~/minima/* ./

bundle exec jekyll serve

Go to the “localhost:4000” and see if your site is working properly. If it is, then stop the server, and let’s modify the theme to enable MathJax.

Open the “_layout/default.html” file and add the CDN for MathJax. In other words, open “_layout/default.html” and paste this two lines in the end of the file:

<script src=”https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id=”MathJax-script” async src=”https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

The image below shows what your file should look like

Writing Mathematical Equations

You are done! The only thing left to do is to actually start writing your blog posts. So let’s do an example. Open one of the posts files and write

$$ x = y^2 $$

Then, run the Jekyll to visualize your website on the localhost.

This is an example of how your post should look:

Finally, just push all the file to your repository, and your personal mathematics blog will be on the web open and running.

Creating a Mathematics Blog with Jekyll was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Belated course announcement: Heike Hofmann’s Julia seminar

Something I probably should have mentioned two weeks ago: Heike Hofmann is teaching a 1 credit Julia Seminar in the Iowa State Statistics Department this semester. It meets Wednesdays from 12-1, and so far has gone through something close to the content’s of Leah Hanson’s “Learn Julia in Y minutes.” You can see the schedule on the course’s GitHub page, https://github.com/heike/stat590f, and it should be interesting and fun.

Filed under: Blog Tagged: julia, programming, statistics

Cobbling together parallel random number generation in Julia

I’m starting to work on some computationally demanding projects, (Monte Carlo simulations of bootstraps of out-of-sample forecast comparisons) so I thought I should look at Julia some more. Unfortunately, since Julia’s so young (it’s almost at version 0.3.0 as I write this) a lot of code still needs to be written. Like Random Number Generators (RNGs) that work in parallel. So this post describes an approach that parallelizes computation using a standard RNG; for convenience I’ve put the code (a single function) is in a grotesquely ambitiously named package on GitHub: ParallelRNGs.jl. (Also see this thread on the Julia Users mailing list.)

A few quick points about RNGs and simulations. Most econometrics papers have a section that examines the performance of a few estimators in a known environment (usually the estimators proposed by the paper and a few of the best preexisting estimators). We do this by simulating data on a computer, using that data to produce estimates, and then comparing those estimate to the parameters they’re estimating. Since we’ve generated the data ourselves, we actually know the true values of those parameters, so we can make a real comparison. Do that for 5000 simulated data sets and you can get a reasonably accurate view of how the statistics might perform in real life.

For many reasons, it’s useful to be able to reproduce the exact same simulations again in the future. (Two obvious reasons: it allows other researchers to be able to reproduce your results, and it can make debugging much faster when you discover errors.) So we almost always use pseudo Random Number Generators that use a deterministic algorithm to produce a stream of numbers that behaves in important ways like a stream of independent random values. You initialize these RNGs by setting a starting value (the “pseudo” aspect of the RNGs is implicit from now on) and anyone who has that starting value can reproduce the identical sequence of numbers that you generated. A popular RNG is the “Mersenne Twister,” and “popular” is probably an understatement: it’s the default RNG in R, Matlab, and Julia. And (from what I’ve read; this isn’t my field at all) it’s well regarded for producing a sequence of random numbers for statistical simulations.

But it’s not necessarily appropriate for producing several independent sequences of random numbers. Which is vitally important because I have an 8 core workstation that needs to run lots of simulations, and I’d like to execute 1/8th of the total simulations on each of its cores.

There’s a common misconception that you can get independent random sequences just by choosing different initial values for each sequence, but that’s not guaranteed to be true. There are algorithms for choosing different starting values that are guaranteed to produce independent streams for the Mersenne Twister (see this research by one of the MT’s inventors), but they aren’t implemented in Julia yet. (Or in R, as far as I can tell; they use a different RNG for parallel applications.) And it turns out that Mersenne Twister is the only RNG that’s included in Julia so far.

So, this would be a perfect opportunity for me to step up and implement some of these advanced algorithms for the Mersenne Twister. Or to implement some of the algorithms developed by L’Ecuyer and his coauthors, which are what R uses. And there’s already C code for both options.

But I haven’t done that yet. I’m ~~lazy~~ busy.

Instead, I’ve written an extremely small function that wraps Julia’s default RNG, calls it from the main process alone to generate random numbers, and then sends those random numbers to each of the other processes/cores where the rest of the simulation code runs. The function’s really simple:

function replicate(sim::Function, dgp::Function, n::Integer)
    function rvproducer()
        for i=1:n
            produce(dgp())
        end
    end
    return(pmap(sim, Task(rvproducer)))
end

That’s all. If you’re not used to Julia, you can ignore the “::Function” and the “::Integer” parts of the arguments. Those just identify the datatype of the argument and you can read it as “dgp_function” if you want (and explicitly providing the types like this is optional anyway). So, you give “replicate” two functions: “dgp” generates the random numbers and “sim” does the remaining calculations; “n” is the number of simulations to do. All of the work is done in “pmap” which parcels out the random numbers and sends them to different processors. (There’s a simplified version of the source code for pmap at that link.)

And that’s it. Each time a processor finishes one iteration, pmap calls dgp() again to generate more random numbers and passes them along. It automatically waits for dgp() to finish, so there are no race conditions and it produces the exact same sequence of random numbers every time. The code is shockingly concise. (It shocked me! I wrote it up assuming it would fail so I could understand pmap better and I was pretty surprised when it worked.)

A quick example might help clear up it’s usage. We’ll write a DGP for the bootstrap:

const n = 200     #% Number of observations for each simulation
const nboot = 299 #% Number of bootstrap replications
addprocs(7)       #% Start the other 7 cores
dgp() = (randn(n), rand(1:n, (n, nboot)))

The data are iid Normal, (the “randn(n)” component) and it’s an iid nonparametric bootstrap (the “rand(1:n, (n, nboot))”, which draws independent values from 1 to n and fills them into an n by nboot matrix). Oh, and there’s a good reason for those weird “#%” comments; “#” is Julia’s comment character, but WordPress doesn’t support syntax highlighting for Julia, so we’re pretending this is Matlab code. And “%” is Matlab’s comment character, which turns the comment green.

We’ll use a proxy for some complicated processing step:

@everywhere function sim(x)
    nboot = size(x[2], 2)
    bootvals = Array(Float64, nboot)
    for i=1:nboot
        bootvals[i] = mean(x[1][x[2][:,i]])
    end
    confint = quantile(bootvals, [0.05, 0.95])
    sleep(3) #% not usually recommended!
    return(confint[1] < 0 < confint[2])
end

So “sim” calculates the mean of each bootstrap sample and calculates the 5th and 95th percentile of those simulated means, giving a two-sided 90% confidence interval for the true mean. Then it checks whether the interval contains the true mean (0). And it also wastes 3 seconds sleeping, which is a proxy for more complicated calculations but usually shouldn’t be in your code. The initial “@everywhere” is a Julia macro that loads this function into each of the separate processes so that it’s available for parallelization. (This is probably as good a place as any to link to Julia’s “Parallel Computing” documentation.)

Running a short Monte Carlo is simple:

julia> srand(84537423); #% Initialize the default RNG!!!
julia> @time mc1 = mean(replicate(sim, dgp, 500))

elapsed time: 217.705639 seconds (508892580 bytes allocated, 0.13% gc time)
0.896 #% = 448/500

So, about 3.6 minutes and the confidence intervals have coverage almost exactly 90%.

It’s also useful to compare the execution time to a purely sequential approach. We can do that by using a simple for loop:

function dosequential(nsims)
    boots = Array(Float64, nsims)
    for i=1:nsims
        boots[i] = sim(dgp())
    end
    return boots
end

And, to time it:

julia> dosequential(1); #% Force compilation before timing
julia> srand(84537423); #% Reinitialize the default RNG!!!
julia> @time mc2 = mean(dosequential(500))

elapsed time: 1502.038961 seconds (877739616 bytes allocated, 0.03% gc time)
0.896 #% = 448/500

This takes a lot longer: over 25 minutes, 7 times longer than the parallel approach (exactly what we’d hope for, since the parallel approach runs the simulations on 7 cores). And it gives exactly the same results since we started the RNG at the same initial value.

So this approach to parallelization is great… sometimes.

This approach should work pretty well when there aren’t that many random numbers being passed to each processor, and when there aren’t that many simulations being run; i.e. when “sim” is an inherently complex calculation. Otherwise, the overhead of passing the random numbers to each process can start to matter a lot. In extreme cases, “dosequential” can be faster than “replicate” because the overhead of managing the simulations and passing around random variables dominates the other calculations. In those applications, a real parallel RNG becomes a lot more important.

If you want to play with this code yourself, I made a small package for the replicate function: ParallelRNGs.jl on GitHub. The name is misleadingly ambitious (ambitiously misleading?), but if I do add real parallel RNGs to Julia, I’ll put them there too. The code is still buggy, so use it at your own risk and let me know if you run into problems. (Filing an issue on GitHub is the best way to report bugs.)

P.S. I should mention again that Julia is an absolute joy of a language. Package development isn’t quite as nice as in Clojure, where it’s straightforward to load and unload variables from the package namespace (again, there’s lots of code that still needs to be written). But the actual language is just spectacular and I’d probably want to use it for simulations even if it were slow. Seriously: seven lines of new code to get an acceptable parallel RNG.

Filed under: Blog Tagged: econometrics, julia, programming

juliabloggers.com

A Julia Language Blog Aggregator