Author Archives: Estadistika -- Julia

Bayesian Sequential Learning

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//bayesian/probability/modeling/inference/julia/2020/03/01/Bayesian-Sequential-Learning.html

In my previous article, I discussed how we can use Bayesian approach in estimating the parameters of the model. The process revolves around solving the following conditional probability, popularly known as the Bayes’ Theorem:

where $\mathbb{P}(\mathbf{w})$ is the a priori (prior distribution) for the objective parameters, $\mathbb{P}(\mathbf{y}|\mathbf{w})$ is the likelihood or model evidence, and $\mathbb{P}(\mathbf{y})$ is the normalizing constant with the following form:

where $\mathscr{P}$ is the parameter space.

Posterior Distribution

The details on the derivation of the a posteriori were also provided in the said article, but there were missing pieces, which I think is necessary for us to support our proposition, and thus we have the following result:

Proposition
Let $\mathscr{D}\triangleq\{(\mathbf{x}_1,y_1),\cdots,(\mathbf{x}_n,y_n)\}$ be the set of data points s.t. $\mathbf{x}\in\mathbb{R}^{p}$. If
$$y_i\overset{\text{iid}}{\sim}\mathcal{N}(w_0+w_1x_i,\alpha^{-1})$$
and $\mathbf{w}\triangleq[w_0,w_1]^{\text{T}}$ s.t. $\mathbf{w}\overset{\text{iid}}{\sim}\mathcal{N}_2(\mathbf{0},\mathbf{I})$, then $\mathbf{w}|\mathbf{y}\overset{\text{iid}}{\sim}\mathcal{N}_2(\boldsymbol{\mu},\boldsymbol{\Sigma})$ where

$$
\begin{align}
\boldsymbol{\mu}&=\alpha\boldsymbol{\Sigma}\mathbf{\mathfrak{A}}^{\text{T}}\mathbf{y},\\
\boldsymbol{\Sigma}&=(\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\mathfrak{A}+\beta\mathbf{I})^{-1}
\end{align}
$$

and $\boldsymbol{\mathfrak{A}}\triangleq\left[(\mathbf{x}_i^{\text{T}})\right]_{\forall i}$.

Proof. Let $\hat{y}_i\triangleq w_0+w_1x_i$ be the model, then the data can be described as follows:
\begin{align}
y_i=\hat{y}_i+\varepsilon_i,\quad\varepsilon_i\overset{\text{iid}}{\sim}\mathcal{N}(0,1/\alpha),
\end{align}
where $\varepsilon_i$ is the innovation that the model can’t explain, and $\mathbb{Var}(\varepsilon_i)=\alpha^{-1}$ since $\mathbb{Var}(y_i)=\alpha^{-1}$ as given above. Then the likelihood of the model is given by:

\begin{align}
\mathcal{L}(\mathbf{w}|\mathbf{y})\triangleq\mathbb{P}(\mathbf{y}|\mathbf{w})&=\prod_{\forall i}\frac{1}{\sqrt{2\pi}\alpha^{-1}}\text{exp}\left[-\frac{(y_i-\hat{y}_i)^2}{2\alpha^{-1}}\right]\\
&=\frac{\alpha^n}{(2\pi)^{n/2}}\text{exp}\left[-\alpha\sum_{\forall i}\frac{(y_i-\hat{y}_i)^2}{2}\right],
\end{align}

or in vector form:

\begin{align}
\mathcal{L}(\mathbf{w}|\mathbf{y})\propto\text{exp}\left[-\frac{\alpha(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})}{2}\right],
\end{align}

where $\mathbf{\mathfrak{A}}\triangleq[(\mathbf{x}_i^{\text{T}})]_{\forall i}$ is the design matrix given above. If the a priori of the parameter is assumed to be standard bivariate Gaussian distribution, i.e. $\mathbf{w}\overset{\text{iid}}{\sim}\mathcal{N}_2(\mathbf{0}, \mathbf{I})$, then

\begin{align}
\mathbb{P}(\mathbf{w}|\mathbf{y})&\propto\mathcal{L}(\mathbf{w}|\mathbf{y})\mathbb{P}(\mathbf{w})\\
&\propto\text{exp}\left[-\frac{\alpha(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})}{2}\right]\exp\left[-\frac{1}{2}\mathbf{w}^{\text{T}}\beta\mathbf{I}\mathbf{w}\right]\\
&\propto\text{exp}\left\{-\frac{1}{2}\left[\alpha(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})^{\text{T}}(\mathbf{y}-\boldsymbol{\mathfrak{A}}\mathbf{w})+\mathbf{w}^{\text{T}}\beta\mathbf{I}\mathbf{w}\right]\right\}.
\end{align}

Expanding the terms in the exponential function returns the following:

$$
\alpha\mathbf{y}^{\text{T}}\mathbf{y}-2\alpha\mathbf{w}^{\text{T}}\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}+\mathbf{w}^{\text{T}}(\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\boldsymbol{\mathfrak{A}}+\beta\mathbf{I})\mathbf{w},
$$

thus

\begin{equation}
\mathbb{P}(\mathbf{w}|\mathbf{y})\propto\mathcal{C}\text{exp}\left\{-\frac{1}{2}\left[\mathbf{w}^{\text{T}}(\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\boldsymbol{\mathfrak{A}}+\beta\mathbf{I})\mathbf{w}-2\alpha\mathbf{w}^{\text{T}}\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}\right]\right\}.
\end{equation}

The inner terms of the exponential function is of the form $ax^2-2bx$. This a quadratic equation and therefore can be factored by completing the square. To do so, let $\mathbf{D}\triangleq\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\boldsymbol{\mathfrak{A}}+\beta\mathbf{I}$ and $\mathbf{b}\triangleq\alpha\boldsymbol{\mathfrak{A}}^{\text{T}}\mathbf{y}$, then

\begin{align}
\mathbb{P}(\mathbf{w}|\mathbf{y})&\propto\mathcal{C}\text{exp}\left[-\frac{1}{2}\left(\mathbf{w}^{\text{T}}\mathbf{D}\mathbf{w}-2\mathbf{w}^{\text{T}}\mathbf{b}\right)\right]\\&=\mathcal{C}\text{exp}\left[-\frac{1}{2}\left(\mathbf{w}^{\text{T}}\mathbf{D}\mathbf{w}-\mathbf{w}^{\text{T}}\mathbf{b}-\mathbf{b}^{\text{T}}\mathbf{w}\right)\right].
\end{align}

In order to proceed, the matrix $\mathbf{D}$ must be symmetric and invertible (this can be proven separately). If satisfied, then $\mathbf{I}\triangleq\mathbf{D}\mathbf{D}^{-1}=\mathbf{D}^{-1}\mathbf{D}$, so that the terms inside the exponential function above become:

$$
\mathbf{w}^{\text{T}}\mathbf{D}\mathbf{w}-\mathbf{w}^{\text{T}}\mathbf{D}\mathbf{D}^{-1}\mathbf{b}-\mathbf{b}^{\text{T}}\mathbf{D}^{-1}\mathbf{D}\mathbf{w}+\underset{\text{constant introduced}}{\underbrace{(\mathbf{b}^{\text{T}}\mathbf{D}^{-1}\mathbf{D}\mathbf{D}^{-1}\mathbf{b}-\mathbf{b}^{\text{T}}\mathbf{D}^{-1}\mathbf{D}\mathbf{D}^{-1}\mathbf{b})}}.
$$

Finally, let $\boldsymbol{\Sigma}\triangleq\mathbf{D}^{-1}$ and $\boldsymbol{\mu}\triangleq\mathbf{D}^{-1}\mathbf{b}$, then

\begin{align}
\mathbb{P}(\mathbf{w}|\mathbf{y})&\propto\mathcal{C}\text{exp}\left[-\frac{1}{2}\left(\mathbf{w}^{\text{T}}\boldsymbol{\Sigma}^{-1}\mathbf{w}-\mathbf{w}^{\text{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}-\boldsymbol{\mu}^{\text{T}}\boldsymbol{\Sigma}^{-1}\mathbf{w}+\boldsymbol{\mu}^{\text{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}\right)\right]\\
&=\mathcal{C}\text{exp}\left[-\frac{(\mathbf{w}-\boldsymbol{\mu})^{\text{T}}\boldsymbol{\Sigma}^{-1}(\mathbf{w}-\boldsymbol{\mu})}{2}\right],
\end{align}

where $-\mathbf{b}^{\text{T}}\mathbf{D}^{-1}\mathbf{D}\mathbf{D}^{-1}\mathbf{b}$ becomes part of $\mathcal{C}$, and that proves the proposition. $\blacksquare$

Simulation Experiment

The above result can be applied to any linear models (cross-sectional or time series), and I’m going to demonstrate how we can use it to model the following simulated data. I will be using Julia in Nextjournal (be sure to head over to this link for reproducibility), which already has an image available for version 1.3.1. Having said, some of the libraries are already pre-installed in the said image, for example Plots.jl. Thus, we only have to install the remaining libraries that we will need for this experiment.

Load the libraries as follows:

The theme above simply sets the theme of the plots below. Further, for reproducibility purposes, I provided a seed as well. The following function will be used to simulate a cross-sectional data with population parameters $w_0\triangleq-.3$ and $w_1\triangleq-.5$ for 20 sample size.

From the above results, the parameters of the a posteriori can be implemented as follows:

One feature of Julia is it supports unicode, making it easy to relate the codes to the math above, i.e. Σ and μ in the code are obviously $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ above, respectively. The vector operations in Julia are also cleaner compared to that in R and in Python, for example we can encode Julia’s A'A above as t(A) %*% A in R, and A.T.dot(A) in Python.

Finally, we simulate the data as follows:

We don’t really have to write down the wtrue variable above since that’s the default values of w0 and w1 arguments, but we do so just for emphasis.

While the main subject here is Bayesian Statistics, it would be better if we have an idea as to what the Frequentist solution would be. As most of you are aware of, the solution to the weights above is given by the following normal equation:

this is a known result, and we can prove this in a separate article. The above equation is implemented as follows:

Therefore, for the current sample data, the estimate we got when assuming the weights as fixed and unknown is $\hat{\mathbf{w}}=[-0.32264, -0.59357]^{\text{T}}$. The figure below depicts the corresponding fitted line.


Not bad given that we have very small dataset. Now let’s proceed and see how we can infer the parameters in Bayesian framework. The prior distribution can be implemented as follows:

As indicated in the above proposition, the parameters are jointly modelled by a bivariate Normal distribution, as indicated by the dimension of the hyperparameter μ above. Indeed, the true parameter we are interested in is the weight vector, $\mathbf{w}$, but because we considered it to be random, then the parameters of the model we assign to it are called hyperparameters, in this case the vector $\boldsymbol{\mu}$ and the identity matrix $\mathbf{I}$ of the prior distribution.

Moreover, the likelihood of the data can be implemented as follows:

The mean of the likelihood is the specified linear model itself, which in vector form is the inner product of the transformed design matrix, $\boldsymbol{\mathfrak{A}}$, and the weights vector, $\mathbf{w}$, i.e. A[i, :]'w. This assumption is valid since the fitted line must be at the center of the data points, and that the error should be random. One of my favorite features of Julia language is the multiple dispatch. For example, the two likelihood functions defined above are not in conflict since Julia evaluates the inputs based on the type of the function arguments. The same is true for the posterior distribution implemented below. Unlike in R and in Python, I usually have to write this as a helper function, e.g. likelihood_helper.

Finally, the prediction is done by sampling the weights from the posterior distribution. The center of these weights is of course the mean of the a posteriori.

For example, to sample 30 weights from the posterior distribution using all the sampled data, and return the corresponding predictions, is done as follows:

The predicted function returns both x and y values, that’s why we indexed the above result to 2, to show the predicted ys only. Further, to use only the first 10 observations of the data for calculating the $\hat{y}$, is done as follows:

So you might be wondering, what’s the motivation of only using the first 10 and not all observations? Well, we want to demonstrate how Bayesian inference learns the weights or the parameters of the model sequentially.

Visualization

At this point, we are ready to generate our main vis. The first function (dataplot) plots the generated fitted lines from the a priori or a posteriori, both of which is plotted using the extended method of contour we defined below.

Tying all the codes together, gives us this beautiful grid plot.


Since we went with style before comprehension, let me guide you then with the axes. All figures have unit square space, with contour plots having the following axes: $w_0$ (the x-axis) and $w_1$ (the y-axis). Obviously, the data space has the following axes: predictor (the x-axis) and response (the y-axis).

Discussions

We commenced the article with emphasis on the approach of Bayesian Statistics to modeling, whereby the estimation of the parameters as mentioned is based on the Bayes’ Theorem, which is a conditional probability with the following form:
\begin{equation}
\mathbb{P}(\mathbf{w}|\mathbf{y})=\frac{\mathbb{P}(\mathbf{w})\mathbb{P}(\mathbf{y}|\mathbf{w})}{\mathbb{P}(\mathbf{y})}.
\end{equation}
Now we will relate this to the above figure using some analogy, as to how the model sequentially learns the optimal estimate for the parameter of the linear model.

Consider the following: say you forgot where you left your phone, and for some reason you can’t ring it up, because it could be dead or can’t pick up some signals. Further, suppose you don’t wanna look for it, rather you let Mr. Bayes, your staff, to do the task. How would he then proceed? Well, let us consider the weight vector, $\mathbf{w}\triangleq[w_0,w_1]^{\text{T}}$, be the location of your phone. In order to find or at least best approximate the exact location, we need to first consider some prior knowledge of the event. In this case, we need to ask the following questions: where were you the last time you had the phone? Were you in the living room? Or in the kitchen? Or in the garage? And so on. In the context of modeling, this prior knowledge about the true location can be described by a probability distribution, and we refer to this as the a priori (or the prior distribution). These set of possible distributions obviously are models itself with parameters, as mentioned above, referred to as the hyperparameters, which we can tweak to describe our prior knowledge of the event. For example, you might consider the kitchen as the most probable place where you left your phone. So we adjust the location parameter of our a priori model to where the kitchen is. Hence Mr. Bayes should be in the kitchen already, assessing the coverage of his search area. Of course, you need to help Mr. Bayes on the extent of the coverage. This coverage or domain can be described by the scale parameter of your a priori. If we relate this to the main plot, we assumed the prior distribution over the weight vector to be standard bivariate Gaussian distribution, centered at zero vector with identity variance-covariance matrix. Since the prior knowledge can have broad domain or coverage on the possible values of the weight vector, the samples we get generates random fitted lines as we see in the right-most plot of the first row of the figure above.

Once we have the prior knowledge in place, that is, we are already in the kitchen and we know how wide the search area likely to be, we can start looking for evidence. The evidence is the realization of your true model, relating to the math above, these realizations are the $y_i$s, coming from $y_i=f(x|\mathbf{w})$, where $f(x|\mathbf{w})$ is the link function of the true model, which we attempt to approximate with our hypothesized link function, $h(x|\hat{\mathbf{w}})$, that generated the predicted $\hat{y}_i$s. For example, you may not know where exactly your phone is, but you are sure with your habits. So you inform Mr. Bayes, that the last time you were with your phone in the kitchen was drinking some coffee. Mr. Bayes will then use this as his first evidence, and assess the likelihood of each suspected location in the kitchen. That is, what is the likelihood that a particular location, formed (or realized, generated or connected to) the first evidence (taking some coffee)? For example, is it even comfortable to drink coffee in the sink? Obviously not, so very low likelihood, but likely in the dining table or close to where the coffee maker is. If we assess all possible location within our coverage using the first evidence, we get the profile likelihood, which is what we have in the first column of the grid plot above, profile likelihood for the ith evidence. Further, with the first evidence observed, the prior knowledge of Mr. Bayes needs to be updated to obtain the posterior distribution. The new distribution will have an updated location and scale parameters. If we relate to the above figure, we can see the samples of the fitted lines in the data space plot (third column, second row), starting to make guesses of possible lines given the first evidence observed. Moving on, you inform Mr. Bayes of the second evidence, that you were reading some newspaper while having some coffee. At this point, the prior belief of Mr. Bayes, for the next posterior, will be the posterior of the first evidence, and so the coverage becomes restrictive and with new location, which further help Mr. Bayes on managing the search area. The second evidence, as mentioned, will then return a new posterior. You do this again and again, informing Mr. Bayes of other evidences sequentially until the last evidence. The final evidence will end up with the final posterior distribution, which we expect to have new location parameter, closer to the exact location, and small scale parameter, covering the small circle of the exact solution. The final posterior will then be your best guess that would describe the exact location of your phone.

This may not be the best analogy, but that is how the above figure sequentially learns the optimal estimate for the weight vector in Bayesian framework.

Bayesian Deep Learning

This section deserves a separate article, but I will briefly give some motivation on how we can generalize the above discussion into complex modeling.

The intention of the article is to give the reader a low-level understanding of how the Bayes’ theorem works, and without loss of generalization, I decided to go with simple linear regression to demonstrate the above subject. However, this can be applied to any model indexed by or a function of some parameters or weights $\mathbf{w}$, with the assumption that the solution is random but govern by some probability distribution.

Complex modeling such as in Deep Learning are usually based on the assumption that the weights are fixed and unknown, which in Statistics is the Frequentist approach to inference, but without assuming some probability distribution on the error of the model. Therefore, if we are to assume some randomness on the weights, we can then use Bayesian inference to derive or at least approximate (for models with no closed-form solution) the posterior distribution. Approximate Bayesian inference are done via Markov Chain Monte Carlo (MCMC) or Variational Inference, which we can tackle in a separate post.

Libraries

There are several libraries for doing Bayesian inference, the classic and still one of the most powertful library is Stan. For Python, we have PyMC3, Pyro (based on Pytorch), and TensorFlow Porbability. For Julia, we have Turing.jl, Mamba.jl, Gen.jl, and Stan.jl. I will have a separate article for these libraries.

Next Steps

The obvious next steps for readers to try out is to model the variance as well, since in the above result, the variance of the innovation or the error is known and is equal to $\alpha$. Further, one might consider the Frequentist sequential learning as well. Or proceed with other nonlinear complex models, such as Neural Networks. We can have these in a separate article.

Software Versions

Model Productization: Crafting a Web Application for Iris Classifier

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//julia/python/software-engineering/ui-ux/model-deployment/productization/2019/07/25/Model-Productization-Creating-a-Web-Application-for-Iris-Classifier.html

Any successful data science project must end with productization. This is the stage where trained models are deployed as application that can be easily accessed by the end users. The application can either be part of already existing system, or it could also be a standalone application working in the back-end of any interface. In any case, this part of the project deals mainly with software engineering — a task that involves front-end programming.

In my previous article, we talked about data engineering by interfacing with relational database using MySQL.jl and PyMySQL; and we’ve done data wrangling as well. Moreover, we also touched on modeling using Multilayer Perceptron (MLP) for classifying the Iris species. Hence, to cover the end-to-end spectrum of a typical data science project (excluding the business part), we’re going to deploy our Iris model as a web application.

There are of course several options for user-interface other than web, for example desktop and mobile front-ends; but I personally prefer web-based since you can access the application without having to install anything — only the browser.

Final Output

The final interface of our application is given below. The app is powered by a server on both ends, with two models (Julia and Python) running in the back-end. Try playing with the app to see how it works.

See the Pen
Tutorial
by Al Asaad (@alstat)
on CodePen.

As indicated in the status pane of the application, the UI above is not powered by the back-end servers. That is, the prediction is not coming from the model, but rather randomly sampled from a set of possible values of Iris species, i.e. Setosa, Versicolor, and Virginica. The same case when estimating the model, the app simply mimics the model training. This is because I don’t have a cloud server, so this only works in the local server in my machine. The application above, however, does send request to the server (localhost:8081 and port 8082, in this case), but if unaccessible, then it will randomly sample from the set of species when classifying; and randomly generates a test accuracy when training — and this is why we still get prediction despite no communication to the back-end servers. In succeeding sections though, I will show you (via screencast) how the application works with the back-end servers.

Project Source Codes

The complete codes for this article are available here as a Github repo, and I encourage you to have it in your machine if you want to follow the succeeding discussions. The repo has the following folder structure:

Software Architecture

As mentioned earlier, the application is powered by back-end servers that are instances of Julia and Python. The communication between the user-interface (client) and the servers is done via HTTP (HyperText Transfer Protocol). The following is the architecture of our application:

As shown in the figure, the client side handles response asynchronously. That is, we can send multiple request without waiting for the response of the preceding request. On the other hand, the server side processes the request synchronously, that is, the request from the client are handled one-at-a-time. These networks work via HTTP.jl for Julia, and Flask for Python. If you are interested in asynchronous back-end server, checkout Python’s Klein library (Flask only works synchronously); and for Julia you can set HTTP.jl to work asynchronously. I should mention though that HTTP.jl is a bit lower in terms of API level compared to Flask. In fact, HTTP.jl is better compared to Python’s Requests library. For Flask counterpart, however, Julia offers Genie.jl and I encourage you to check that out as well.

HyperText Transfer Protocol (HTTP)

We’ve emphasized in the previous section that, the communication between the interface and the model is done via HTTP. In this section, we’ll attempt to briefly cover the basics of this protocol. To do this, we’ll setup a client and server instances both in Julia and in Python. This is done by first running the server in the Terminal/CMD, before running the client in a separate Terminal/CMD (follow the order, execute the server first before the client). Here are the codes:




For your reference, here are the outputs of the above codes.


As shown in the screenshots above, the codes were executed at the root folder of the project repo (see the Project Source Codes section for folder tree structure). The server we setup is running at localhost:8081 or 127.0.0.1:8081 — waiting (or listening) for any incoming request from the client. Thus, when we ran the client codes, which send POST request with the data {"Hello" : "World"}, to localhost:8081, the server immediately responded back to the client — throwing the data received. The header specified in the response, "Access-Control-Allow-Origin" => "*", simply tells the server to respond to any (*) client. For more details on HTTP, I think this video is useful.

Server: Iris Classifier

At this point, you should have at least an idea of how HTTP works. To set this up in Julia and in Python, here are the codes for the REST API:


The requests are received at line 38 of the Julia code, and are handled depending on the target URL. If the target URL of the client’s request is /classify, then the classifier router will handle the request via the classify function defined in line 7. The same case when we receive training request, the train function defined in line 19 will handle the request. Each of these functions then returns a response back to the client with the data. On the other hand, we see that Python’s Flask library uses a decorator for specifying the router, with two main helper functions defined in line 32 and 37. The two approaches are indeed easy to follow, despite the difference in implementation. I should emphasize though that the above codes refer to other files to perform the training and classification. These files include model-training.jl and model_training.py, etc. I will leave these to the reader to explore the scripts in the project repo.

Client: Web Interface

Now that we have an idea of how the request in the servers are processed, in this section, we need to understand how the client prepares and sends requests to the server, and how it processes the response. In the CodePen embedded above, the client codes are in the JS (Javascript) tab. The following is a copy of it:

Lines 23-28 define the event listener for the buttons in the application. Line 24, for example, adds functionality to the Classify button, which is defined in line 44. This is a void function, but it sends HTTP request in line 79 to the specified url defined in lines 63-67. The httpRequest function in line 5, is the one that handles the communication with the servers. It takes three arguments, the URL (the address of the server), the data (the request data to be sent to the server), and the callback (the function that will handle the response from the server). The request as mentioned is handled asynchronously, and is specified by the third argument — true (asynchronous) — of the xhr.open method defined in line 8. The rest of the codes are functions defining the functionalities of the buttons, status pane, output display, etc. of the app.

HTML/CSS

The other codes that are part of the application are the HTML and CSS files. I will leave these to the reader, since these codes are simply declarations of how the layout and style of the app should look like.

Screencast

The following video shows how the application works with the back-end servers running.

Conclusion

The aim of this exercise is to demonstrate the capability of Julia in production, and one of my concern was precompilations. Specifically, if these occur for every client’s request. Fortunately, these only happen at the first request, the succeeding ones are guaranteed to be fast. Having said, just as the stability of Python for production, I think Julia is already stable for creating a fully featured web application, and is therefore capable enough for an end-to-end project.

Software Versions

Interfacing with Relational Database using MySQL.jl and PyMySQL

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//julia/python/packages/relational-databases/2019/07/07/Interfacing-with-Relational-Database-using-MySQL.jl-and-PyMySQL.html

Prior to the advent of computing, relational database can be thought of log books typically used for inventory and visitor’s time-in time-out. These books contain rows that define the item/transaction, and columns describing the features of each row. Indeed, these are the core attributes of any relational database. Unlike spreadsheets, which are used for handling small datasets, databases are mostly used for storing huge transactional data for later use. They run on a server and often at the backend of any user (client) interface such as websites and mobile applications. These applications communicate with database via processing servers (e.g. Flask and Django). The figure below illustrates the request and response communcations between client and servers.

As mentioned earlier, databases are meant to store data for later use — in the sense that we can use it as a response to client’s requests, such as viewing or data extraction for insights. In this article, we are interested in data extraction from the database. In particular, the objective is to illustrate how to send request to MySQL server, and how to process response both from Julia and Python.

MySQL Server Setup

To start with, we need to setup MySQL server in our machine. Click the following link to download and install the application.

Note that I recommend you to download the latest version of MySQL since the setup above is using the old version.

Query: Creating Database

In order to appreciate what we are aiming in this article, we need to go through some basic SQL queries to understand what type of request to send to MySQL server. I’m using macOS, but the following should work on Windows as well. For macOS users, open the MySQL Server Shell by running mysql -u root -p (hit return or enter , and type in your MySQL root password you specified during the installation setup from the previous section) in the terminal. For windows, try to look for it in the Start Menu.



From here, we are going to check the available databases in MySQL server. To do this, run the following:

Indeed, there are four out-of-the-box defined databases already, and we don’t want to touch that. Instead, we are going to create our own database, let’s call it tutorial. To do this, run the following codes:

The best thing about SQL syntax is that, everything is self-explanatory, except maybe for line 19, which simply confirmed that we are using tutorial as our database.

Query: Creating Table

Next is to create a table for our database, we are going to use the 2019 Philippine Election results with columns: Last Name, First Name, Party, Votes. Further, for purpose of illustration, we are going to use the top 5 senators only.

Query: Inserting Values

The following codes will insert the top five senators from the 2019 Philippine election results.

Query: Show Data

To view the encoded data, we simply select all (*) the columns from the table.

MySQL Clients on Julia and Python

In this section, we are going to interface with MySQL server on Julia and Python. This is possible using MySQL.jl and PyMySQL libraries. To start with, install the necessary libraries as follows:


For this exercise, our goal is to save the NYC Flights (2013) data into the database and query it from Julia and Python.

Downloading NYC Flights Data

I have a copy of the dataset on Github, and so the following code will download the said data:


Connect to MySQL Server

In order for the client to send request to MySQL server, the user/client needs to connect to it using the credentials set in the installation.


Note that you need to have a strong password, and this configuration should not be exposed to the public. The above snippets are meant for illustration.

First Request

To test the connection, let’s send our first request — to show the tables in the database:


In Julia, the response is recieved as a MySQL.Query object and can be viewed using DataFrame. For Python, however, you will get a tuple object.

Create NYC Flights Table

At this point, we can now create the table for our dataset. To do this, run the following:


As shown in the previous section, sending request to the server both in Julia and in Python is done by simply using a string of SQL queries as input to MySQL.jl and PyMySQL APIs. Hence, the query object (in line 3 of Julia code and line 4 of Python code) above, simply automates the concatenation of SQL query for creating a table. Having said, you can of course write the query manually. To check if we have indeed created the table, run the following codes:


As you can see, we’ve created it already, but with no entry yet.

Populating the Table

Finally, we are going to populate the table in the database by inserting the values row by row.


From the above Julia code, the result of the stmt is an SQL INSERT query with placeholder values indicated by ?. The timed (@time in Julia code) loop in line 9 above maps the values of the vector, one-to-one, to the elements (?) of the tuple in stmt. Having said, MySQL.Stmt has no equivalent in PyMySQL. Further, one major difference between these libraries is that, PyMySQL will not populate the table even after executing all sorts of SQL queries unless you commit it (con.commit), as shown above. This is contrary to MySQL.jl which automatically commits every execution of the SQL queries. I do like the idea of having con.commit in PyMySQL, since this avoids accidental deletion or modification in the database, thus adding a layer of security. To check if we have indeed populated the table, run the following:


To disconnect from the server, run MySQL.disconnect(con) (Julia) or con.close() (Python).

Benchmark

For the benchmark, I added a timelapse recorder in populating and reading the table in the previous section. The figure below summarizes the results.

The figure was plotted using Gadfly.jl. Install this package using Pkg as described above (see the first code block under MySQL Clients on Julia and Python section), along with Cario.jl and Fontconfig.jl. The latter two packages are used to save the plot in PNG format. See the code below to reproduce:

Conclusion

The aim of this article was simply to illustrate the usage of MySQL.jl APIs in comparison to the PyMySQL; and I would say both libraries have similarities in APIs (as expected) and are stable for the tasks. I should emphasize though that, I do like the con.commit of PyMySQL since this adds a level of security, and I think this is a good addition to MySQL.jl in the future.

Complete Codes

If you are impatient, here are the complete codes excluding the MySQL codes and the plots. These should work after installing the required libraries shown above:


References and Resources

  • MySQL.jl Github Repo: https://github.com/JuliaDatabases/MySQL.jl
  • PyMySQL Github Repo: https://github.com/PyMySQL/PyMySQL
  • Flaticon: https://www.flaticon.com/

Software Versions