Author Archives: Estadistika -- Julia

Deep Learning: Exploring High Level APIs of Knet.jl and Flux.jl in comparison to Tensorflow-Keras

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//julia/python/packages/knet/flux/tensorflow/machine-learning/deep-learning/2019/06/20/Deep-Learning-Exploring-High-Level-APIs-of-Knet.jl-and-Flux.jl-in-comparison-to-Tensorflow-Keras.html

When it comes to complex modeling, specifically in the field of deep learning, the go-to tool for most researchers is the Google’s TensorFlow. There are a number of good reason as to why, one of it is the fact that it provides both high and low level APIs that suit the needs of both beginners and advanced users, respectively. I have used it in some of my projects, and indeed it was powerful enough for the task. This is also due to the fact that TensorFlow is one of the most actively developed deep learning framework, with Bayesian inference or probabilistic reasoning as the recent extension (see TensorFlow Probability, another extension is the TensorFlow.js). While the library is written majority in C++ for optimization, the main API is served in Python for ease of use. This design works around the static computational graph that needs to be defined declaratively before executed. The static nature of this graph, however, led to difficulty on debugging the models since the codes are itself data for defining the computational graph. Hence, you cannot use a debugger to check the results of the models line by line. Thankfully, it’s 2019 already and we have a stable Eager Execution that allows users to immediately check the results of any TensorFlow operations. Indeed, this is more intuitive and more pythonic. In this article, however, we’ll attempt to explore, what else we have in 2019. In particular, let’s take look at Julia’s deep learning libraries and compare it to high level APIs of TensorFlow, i.e. Keras’ model specification.

As a language that leans towards numerical computation, it’s no surprise that Julia offers a number of choices for doing deep learning, here are the stable libraries:

  1. Flux.jl – The Elegant Machine Learning Stack.
  2. Knet.jl – Koç University deep learning framework.
  3. MLJ.jl – Julia machine learning framework by Alan Turing Institute.
  4. MXNet.jl – Apache MXNet Julia package.
  5. TensorFlow.jl – A Julia wrapper for TensorFlow.

Other related packages are maintained in JuliaML. For this article, we are going to focus on the usage of
Flux.jl and Knet.jl, and we are going to use the Iris dataset for classification task using Multilayer Perceptron. To start with, we need to install the following packages. I’m using Julia 1.1.0. and Python 3.7.3.


Loading and Partitioning the Data

The Iris dataset is available in the RDatasets.jl Julia package and in Python’s Scikit-Learn. The following codes load the libraries and the data itself.



The random seed set above is meant for reproducibility as it will give us the same random initial values for model training. The iris variable in line 11 (referring to Julia code) contains the data, and is a data frame with 150 × 5 dimensions, where the columns are: Sepal Length, Sepal Width, Petal Length, Petal Width, and Species. There are several ways to partition this data into training and testing datasets, one procedure is to do stratified sampling, with simple random sampling without replacement as the sampling selection within each stratum — the species. The following codes define the function for partitioning the data with the mentioned sampling design:


Extract the training and testing datasets using the function above as follows:



All three codes above extract xtrn, the training data (feature) matrix of size 105 × 4 (105 observations by 4 features) dimensions; ytrn, the corresponding training target variable with 105 × 1 dimension; xtst, the feature matrix for testing dataset with 45 × 4 dimensions; and ytst, the target variable with 45 × 1 dimension for testing dataset. Moreover, contrary to TensorFlow-Keras, Knet.jl and Flux.jl need further data preparation from the above partitions. In particular, Knet.jl takes minibatch object as input data for model training, while Flux.jl needs one-hot encoding for the target variables ytrn and ytst. Further, unlike Knet.jl which ships with minibatch function, Flux.jl gives the user the flexibility to create their own.

Specify the Model

The model that we are going to use is a Multilayer Perceptron with the following architecture: 4 neurons for the input layer, 10 neurons for the hidden layer, and 3 neurons for the output layer. The first two layers contain bias, and the neurons of the last two layers are activated with Rectified Linear Unit (ReLU) and softmax functions, respectively. The diagram below illustrates the architecture described:


The codes below specify the model:



Coming from TensorFlow-Keras, Flux.jl provides Keras-like API for model specification, with Flux.Chain as the counterpart for Keras’ Sequential. This is different from Knet.jl where the highest level API you can get are the nuts and bolts for constructing the layers. Having said, however, Flux.Dense is defined almost exactly as the Dense struct of the Knet.jl code above (check the source code here). In addition, since both Flux.jl and Knet.jl are written purely in Julia, makes the source codes under the hood accessible to beginners. Thus, giving the user a full understanding of not just the code, but also the math. Check the screenshots below for the distribution of the file types in the Github repos of the three frameworks:



From the above figure, it’s clear that Flux.jl is 100% Julia. On the other hand, Knet.jl while not apparent is actually 100% Julia as well. The 41.4% of Jupyter Notebooks and other small percentages account for the tutorials, tests and examples and not the source codes.


Train the Model

Finally, train the model as follows for 100 epochs:



The codes (referring to Julia codes) above save both loss and accuracy for every epoch into a data frame and then into a CSV file. These will be used for visualization. Moreover, unlike Flux.jl and Knet.jl which require minibatch preparation prior to training, TensorFlow-Keras specifies this on fit method as shown above. Further, it is also possible to train the model in Knet.jl using a single function without saving the metrics. This is done as follows:


The Flux.jl code above simply illustrates the use of Flux.@epochs macro for looping instead of the for loop. The loss of the model for 100 epochs is visualized below across frameworks:

From the above figure, one can observe that Flux.jl had a bad starting values set by the random seed earlier, good thing Adam drives the gradient vector rapidly to the global minimum. The figure was plotted using Gadfly.jl. Install this package using Pkg as described in the first code block, along with Cario.jl and Fontconfig.jl. The latter two packages are used to save the plot in PNG format, see the code below to reproduce:

Evaluate the Model

The output of the model ends with a vector of three neurons. The index or location of the neurons in this vector defines the corresponding integer encoding, with 1st index as setosa, 2nd as versicolor, and 3rd as virginica. Thus, the codes below take the argmax of the vector to get the integer encoding for evaluation.



The figure below shows the traces of the accuracy during training:

TensorFlow took 25 epochs before surpassing 50% again. To reproduce the figure, run the following codes (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):

Benchmark

At this point, we are going to record the training time of each framework.



The benchmark was done by running the above code repeatedly for about 10 times for each framework, I then took the lowest timestamp out of the results. In addition, before running the code for each framework, I keep a fresh start of my machine.

The code of the above figure is given below (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):

Conclusion

In conclusion, I would say Julia is worth investing even for deep learning as illustrated in this article. The two frameworks, Flux.jl and Knet.jl, provide a clean API that introduces a new way of defining models, as opposed to the object-oriented approach of the TensorFlow-Keras. One thing to emphasize on this is the for loop which I plainly added in training the model just to save the accuracy and loss metrics. The for loop did not compromise the speed (though Knet.jl is much faster without it). This is crucial since it let’s the user spend more on solving the problem and less on optimizing the code. Further, between the two Julia frameworks, I find Knet.jl to be Julia + little-else, as described by Professor Deniz Yuret (the main developer), since there are no special APIs for Dense, Chains, etc., you have to code it. Although this is also possible for Flux.jl, but Knet.jl don’t have these out-of-the-box, it ships only with the nuts and bolts, and that’s the highest level APIs the user gets. Having said, I think Flux.jl is a better recommendation for beginners coming from TensorFlow-Keras. This is not to say that Knet.jl is hard, it’s not if you know Julia already. In addition, I do love the extent of flexibility on Knet.jl by default which I think is best for advanced users. Lastly, just like the different extensions of TensorFlow, Flux.jl is flexible enough that it works well with Turing.jl for doing Bayesian deep learning, which is a good alternative for TensorFlow Probability. For Neural Differential Equations, Flux.jl works well with DifferentialEquations.jl, checkout DiffEqFlux.jl.

Next Steps

In my next article, we will explore the low level APIs of Flux.jl and Knet.jl in comparison to the low level APIs of TensorFlow. One thing that’s missing also from the above exercise is the use of GPU for model training, and I hope to tackle this in future articles. Finally, I plan to test these Julia libraries on real deep learning problems, such as computer vision and natural language processing (checkout the workshop on these from JuliaCon 2018).

Complete Codes

If you are impatient, here are the complete codes excluding the benchmarks and the plots. These should work after installing the required libraries shown above:



References

Software Versions

Julia: Introduction to Web Scraping (PHIVOLCS’ Seismic Events)

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//web/scraping/philippines/julia/programming/packages/2018/10/30/Introduction-to-Web-Scraping-Julia.html

Data nowadays are almost everywhere, often stored in as simple as traditional log books, to as complex as multi-connected-databases. Efficient collection of these datasets is crucial for analytics since data processing takes almost 50% of the overall workflow. An example where manual data collection can be automated is in the case of datasets published in the website, where providers are usually government agencies. For example in the Philippines, there is a website dedicated to Open Stat initiated by the Philippine Statistics Authority (PSA). The site hoards public datasets for researchers to use and are well prepared in CSV format, so consumers can simply download the file. Unfortunately, for some agencies this feature is not yet available. That is, users need to either copy-paste the data from the website, or request it to the agency directly (this also takes time). A good example of this is the seismic events of the Philippine Institute of Volcanology and Seismology (PHIVOLCS).

Data encoded in HTML can be parsed and saved into formats that’s workable for doing analyses (e.g. CSV, TSV, etc.). The task of harvesting and parsing data from the web is called web scraping, and PHIVOLCS’ Latest Seismic Events is a good playground for beginners. There are several tutorials available especially for Python (see this) and R (see this), but not much for Julia. Hence, this article is primarily for Julia users. However, this work introduces web tools as well – how to use it for inspecting the components of the website – which can be useful for non-Julia users.

Why Julia?

The creators of the language described it well in their first announcement (I suggest you read the full post): Why we created Julia? Here’s part of it:

We are greedy: we want more.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

I used Julia in my master’s thesis for my MCMC simulations and benchmarked it against R. It took seconds in Julia while R took more than an hour (sampling over the posterior distribution). I could have optimized my R code using Rcpp (writting the performance-critical part in C++ to speed up and wrapped/call it in R), but I have no time for that. Hence, Julia solves the two-language problem.

Getting to know HTML

Since the data published in the websites are usually encoded as a table, it is therefore best to understand the structure of the HTML document before performing web scraping. HTML (Hypertext Markup Language) is a standardized system for tagging text files to achieve font, color, graphic, and hyperlink effects on World Wide Web pages [1]. For example, bold text in HTML is enclosed inside the <b> tag, e.g. <b>text</b>, the result is text. A webpage is a HTML document that can be structured in several ways, one possible case is as follows:

Scrapers must be familiar with the hierarchy of the HTML document as this will be the template for the frontend source code of every website. Following the structure of the above figure, data encoded in HTML table are placed inside the td (table data) tag, where td is under tr (table row), tr is under tbody (table body), and so on. td is the lowest level tag (sorting by hierarchy) from the figure above that can contain data. However, td can also take precedence over p (paragraph), a (hyperlink), b (bold), i (italic), span (span), and even div (division). So expect to encounter these under td as well.

As indicated in the figure, each HTML tag can have attributes, such as id and class. To understand how the two differ, consider id="yellow" and id="orange", these are unique identities (ids) of colors. These ids can be grouped into a class, e.g. class="colors". HTML tags are not required to have these attributes but are useful for adding custom styles and behavior when doing web development. This article will not dive into the details of the HTML document, but rather to give the reader a high level understanding. There are many resources available on the web, just google.

Inspecting the Source of the Website

In order to have an idea on the structure of the website, browsers such as Google Chrome and Mozilla Firefox include tools for Web Developers. For purpose of illustration but without loss of generality, this article will only scrape portion (why? read on and see the explanation below) of the September 2018 earthquake events. The web developer tools can be accessed from Tools > Web Developer in Firefox, and can be accessed from View > Developer in Google Chrome. The following video shows how to use the inspector tool of the Mozilla Firefox.

Scraping using Julia

To perform web scraping, Julia offers three libraries for the job, and these are Cascadia.jl, Gumbo.jl and HTTP.jl. HTTP.jl is used to download the frontend source code of the website, which then is parsed by Gumbo.jl into a hierarchical structured object; and Cascadia.jl provides a CSS selector API for easy navigation.

To start with, the following code will download the frontend source code of the PHIVOLCS’ Seismic Events for September 2018.

Extract the HTML source code and parsed it as follows:

Now to extract the header of the HTML table, use the Web Developer Tools for preliminary inspection on the components of the website. As shown in the screenshot below, the header of the table is enclosed inside the p tag of the td. Further, the p tag is of class auto-style33, which can be accessed via CSS selector by simply prefixing it with ., i.e. .auto-style33.


qres contains the HTML tags that matched the CSS selector’s query. The result is further cleaned by removing the tabs, spaces and line breaks via Regular Expressions, and is done as follows:

Having the header names, next is to extract the data from the HTML table. Upon inspection, the tds containing the data next to the header rows seem to have the following classes (see screenshot below): auto-style21 for first column (Date-Time), auto-style81 for second column (Latitude), auto-style80 for third and fourth columns (Longitude and Depth), auto-style74 for fifth column (Magnitude), and auto-style79 for sixth column (Location). Unfortunately, this is not consistent across rows (trs), and is therefore best not to use it with Cascadia.jl. Instead, use Gumbo.jl to navigate down the hierarchy of the Document Object Model of the HTML.

Starting with the table tag which is of class .MsoNormalTable (see screenshot below), the extraction proceeds down to tbody then to tr and finally to td.

The following code describes how parsing is done, read the comments:

Complete Code for PHIVOLCS’ September 2018 (Portion) Seismic Events

The September 2018 Seismic Events are encoded in two separate HTML tables of the same class, named MsoNormalTable. For purpose of simplicity, this article will only scrape the first portion (3rd-indexed, see line 14 below: tbody = html[3][1];) of the table (581 rows). The second portion (4th-indexed, change line 14 below to: tbody = html[4][1];) is left to the reader to try out and scrape it as well.

The following code wraps the parsers into functions, namely htmldoc (downloads and parses the HTML source code of the site), scraper (scrapes the downloaded HTML document), firstcolumn (logic for parsing the first column of the table, used inside scraper function).


Having the data, analyst can now proceed to do exploratory analyses, for example the following is the descriptive statistics of the variables:

describe is clever enough not only to not return mean and median for non-continuous variables, but also determine the min, max and nunique (number of uniques) for these variables (date and location).

End Note

I use Python primarily at work with BeautifulSoup as my go-to library for web scraping. Compared to Cascadia.jl and Gumbo.jl, BeautifulSoup offers comprehensive documentations and other resources that are useful for figuring out bugs, and understand how the module works. Having said, I hope this article somehow contributed to the documentation of the said Julia libraries. Further, I am confident to say that Cascadia.jl and Gumbo.jl are stable enough for the job.

Lastly, as a precaution to beginners, make sure to read the privacy policy (if any) of any website you want to scrape.

Software Versions

Julia, Python, R: Introduction to Bayesian Linear Regression

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//data/analyses/wrangling/julia/programming/packages/2018/10/14/Introduction-to-Bayesian-Linear-Regression.html

Reverend Thomas Bayes (see Bayes, 1763) is known to be the first to formulate the Bayes’ theorem, but the comprehensive mathematical formulation of this result is credited to the works of Laplace (1986). The Bayes’ theorem has the following form:

\begin{equation}
\label{eq:bayes-theorem}
\mathbb{P}(\mathbf{w}|\mathbf{y}) = \frac{\mathbb{P}(\mathbf{w})\mathbb{P}(\mathbf{y}|\mathbf{w})}{\mathbb{P}(\mathbf{y})}
\end{equation}

where $\mathbf{w}$ is the weight vector and $\mathbf{y}$ is the data. This simple formula is the main foundation of Bayesian modeling. Any model estimated using Maximum Likelihood can be estimated using the above conditional probability. What makes it different, is that the Bayes’ theorem considers uncertainty not only on the observations but also uncertainty on the weights or the objective parameters.

As an illustration of Bayesian inference to basic modeling, this article attempts to discuss the Bayesian approach to linear regression. Let $\mathscr{D}\triangleq\{(\mathbf{x}_1,y_1),\cdots,(\mathbf{x}_n,y_n)\}$ where $\mathbf{x}_i\in\mathbb{R}^{d}, y_i\in \mathbb{R}$ be the pairwised dataset. Suppose the response values, $y_1,\cdots,y_n$, are independent given the parameter $\mathbf{w}$, and is distributed as $y_i\overset{\text{iid}}{\sim}\mathcal{N}(\mathbf{w}^{\text{T}}\mathbf{x}_i,\alpha^{-1})$, where $\alpha^{-1}$ (assumed to be known in this article) is referred to as the precision parameter — useful for later derivation. In Bayesian perspective, the weights are assumed to be random and are governed by some a priori distribution. The choice of this distribution is subjective, but choosing arbitrary a priori can sometimes or often result to an intractable integration, especially for interesting models. For simplicity, a conjugate prior is used for the latent weights. Specifically, assume that ${\mathbf{w}\overset{\text{iid}}{\sim}\mathcal{N}(\mathbf{0},\beta^{-1}\mathbf{I})}$ such that $\beta>0$ is the hyperparameter supposed in this experiment as known value. The posterior distribution based on the Bayes’ rule is given by

\begin{equation}\label{eq:bayesrulepost}
\mathbb{P}(\mathbf{w}|\mathbf{y})=\frac{\mathbb{P}(\mathbf{w})\mathbb{P}(\mathbf{y}|\mathbf{w})}{\mathbb{P}(\mathbf{y})},
\end{equation}

where $\mathbb{P}(\mathbf{w})$ is the a priori distribution of the parameter, $\mathbb{P}(\mathbf{y}|\mathbf{w})$ is the likelihood, and $\mathbb{P}(\mathbf{y})$ is the normalizing factor. The likelihood is given by

$$
\begin{align}
\mathbb{P}(\mathbf{y}|\mathbf{w})&=\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi\alpha^{-1}}}\exp\left[-\frac{\alpha(y_i-\mathbf{w}^{\text{T}}\mathbf{x}_i)^2}{2}\right]\nonumber\\
&=\left(\frac{\alpha}{2\pi}\right)^{n/2}\exp\left[-\sum_{i=1}^n\frac{\alpha(y_i-\mathbf{w}^{\text{T}}\mathbf{x}_i)^2}{2}\right].\label{eq:likelihood:blreg}
\end{align}
$$

In matrix form, this can be written as

\begin{equation}
\mathbb{P}(\mathbf{y}|\mathbf{w})\propto\exp\left[-\frac{\alpha}{2}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})\right]
\end{equation}

where $\boldsymbol{\mathfrak{A}}\triangleq\left[(\mathbf{x}_i^{\text{T}})\right]$, i.e. $\boldsymbol{\mathfrak{A}}\in(\mathbb{R}^{n}\times\mathbb{R}^d)$, this matrix is known as the design matrix. Given that $\mathbf{w}$ has the following prior distribution

\begin{equation}\label{eq:wpriori}
\mathbb{P}(\mathbf{w})=\frac{1}{\sqrt{(2\pi)^{d}|\beta^{-1}\mathbf{I}|}}\exp\left[-\frac{1}{2}\mathbf{w}^{\text{T}}\beta\mathbf{I}\mathbf{w}\right],
\end{equation}

implies that the posterior has the following form:

$$
\begin{align}
\mathbb{P}(\mathbf{w}|\mathbf{y})&\propto\exp\left[-\frac{\alpha}{2}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})\right]\exp\left[-\frac{1}{2}\mathbf{w}^{\text{T}}\beta\mathbf{I}\mathbf{w}\right]\nonumber\\
&=\exp\left\{-\frac{1}{2}\left[\alpha(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})+\mathbf{w}^{\text{T}}\beta\mathbf{I}\mathbf{w}\right]\right\}.
\end{align}
$$

Expanding the terms in the exponent, becomes

\begin{equation}\label{eq:expterms}
\alpha\mathbf{y}^{\text{T}}\mathbf{y}-2\alpha\mathbf{w}^{\text{T}}\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}+\mathbf{w}^{\text{T}}(\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\boldsymbol{\mathfrak{A}}+\beta\mathbf{I})\mathbf{w}.
\end{equation}

The next step is to complete the square of the above equation such that it resembles the inner terms of the exponential factor of the Gaussian distribution. That is, the quadratic form of the exponential term of a $\mathcal{N}(\mathbf{w}|\boldsymbol{\mu},\boldsymbol{\Sigma}^{-1})$ is given by

$$
\begin{align}
(\mathbf{w}-\boldsymbol{\mu})^{\text{T}}\boldsymbol{\Sigma}^{-1}(\mathbf{w}-\boldsymbol{\mu})&=(\mathbf{w}-\boldsymbol{\mu})^{\text{T}}(\boldsymbol{\Sigma}^{-1}\mathbf{w}-\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu})\nonumber\\
&=\mathbf{w}^{\text{T}}\boldsymbol{\Sigma}^{-1}\mathbf{w}-
2\mathbf{w}^{\text{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}+\boldsymbol{\mu}^{\text{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}.\label{eq:expnorm}
\end{align}
$$

The terms in Equation (\ref{eq:expterms}) are matched up with that in (\ref{eq:expnorm}), so that

\begin{equation}\label{eq:sigmablrgauss}
\boldsymbol{\Sigma}^{-1}=\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\boldsymbol{\mathfrak{A}}+\beta\mathbf{I}
\end{equation}

and

$$
\begin{align}
\mathbf{w}^{\text{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}&=\alpha\mathbf{w}^{\text{T}}\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}\nonumber\\
\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}&=\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}\nonumber\\
\boldsymbol{\mu}&=\alpha\boldsymbol{\Sigma}\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}.\label{eq:mublrgauss}
\end{align}
$$

Thus the a posteriori is a Gaussian distribution with location parameter in Equation (\ref{eq:mublrgauss}) and scale parameter given by the inverse of Equation (\ref{eq:sigmablrgauss}). I’ll leave to the reader the proper mathematical derivation of $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ without matching like what we did above.

Simulation Experiment

In this section, we are going to apply the theory above using simulated data. I will use Julia as the primary programming language for this article, but I also provided codes for R and Python. To start with, load the following libraries:


Next, define the following functions for data simulation and parameter estimation. The estimate of the paramters is governed by the a posteriori which from above is a multivariate Gaussian distribution, with mean given by Equation (\ref{eq:mublrgauss}) and variance-covariance matrix defined by the inverse of Equation (\ref{eq:sigmablrgauss}).



Execute the above functions and return the necessary values as follows:



Finally, plot the fitted lines whose weights are samples from the a posteriori. The red line in the plot below is the Maximum A Posteriori (MAP) of the parameter of interest. Note that, however, the code provided for the animated plot below is Julia. Python and R users can use matplotlib.pyplot (Julia’s Plots backend) and gganimate, respectively.

End Note

There are many libraries available for Bayesian modeling, for Julia we have: Klara.jl, Mamba.jl, Stan.jl, Turing.jl and more related;
for Python, my favorite is PyMC3; and for R, I prefer RStan.

As always, coding from scratch is a good exercise and it helps you appreciate the math. Further, I found Julia to be quite easy to use as a tool for statistical problems. In fact, Julia’s linear algebra API is very close to the mathematical formulae above.

References

  • Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions, 53, 370-418. URL: http://www.jstor.org/stable/105741
  • Laplace, P. S. (1986). Memoir on the probability of the causes of events. Statist. Sci., 1(3), 364–378. URL: http://dx.doi.org/10.1214/ss/1177013621 doi: 10.1214/ss/1177013621

Software Versions