By: Bradley Setzler

Re-posted from: http://juliaeconomics.com/2014/06/15/introductory-example-ordinary-least-squares/

* The script to reproduce the results of this tutorial in Julia is located here.

In this post, I show in Julia how to perform ordinary least squares (OLS) estimation after first simulating the OLS data generating process (DGP). At the end, we see that the parameter estimates converge to the true parameter as sample size grows large. If you have not yet installed Julia, it takes 5 minutes following these instructions.

As a reminder and to clarify notation, the OLS DGP is,

,

where is the dependent variable, is the matrix of independent variables, is the vector of parameters that we wish to estimate, and is the error satisfying . Because we assume that the first column of is the constant number 1, we will find it useful below to work with . The least squares estimator is,

.

**Matrix Algebra and Simulated Random Variables: The OLS DGP**

First, we generate the independent variables , then we use to generate the independent variable . To begin, create a new file in Julia Studio and save it to your computer. In the file editor (script), insert the following commands:

using Distributions N=1000 K=3 genX = MvNormal(eye(K)) X = rand(genX,N) X = X' X_noconstant = X constant = ones(N) X = [constant X]

The *using* command let Julia know that we will be using the Distributions package. The *MvNormal()* command initialized a multivariate normal distribution, which is an object including methods such as *pdf* for the probability distribution function and *rand* for drawing random variables. *eye(K)* means , the identity matrix of size . We only told *MvNormal* the covariance matrix, leaving the mean blank, which Julia assumes means that we would like a zero mean. The distribution of is arbitrary; we only used multivariate normal for simplicity. At the end, we concatenate the vector of *ones* to *X* using brackets, *[]*.

Tip: To ensure that the matrices are of the appropriate dimension, use the

sizecommand. Above, we transposed after finding thatrandreturned it as a matrix, when we need it to be . Misaligned dimensions are one of the most common and frustrating errors to make in writing a program.

Now that we have created as a matrix containing a column of ones as well as three independent variables, we wish to multiply it by a vector of regression coefficients of length 4 (including the intercept) and add the normally distributed shock, . For simplicity, we assume .

genEpsilon = Normal(0, 1) epsilon = rand(genEpsilon,N) trueParams = [0.1,0.5,-0.3,0.] Y = X*trueParams + epsilon

Matrix algebra in Julia can be done as in a way comparable to Python, **(A,B)*, which means AxB, or in the more R-like way we used above, *A*B*. Then, you can click run (the little green arrow in Julia Studio) and it will perform the operations in the file above. To make sure it worked, you can now go to the Console and type,

julia> mean(Y)

and press Enter. If the code worked correctly, this should return the mean of simulated , which should be near the true intercept 0.1 (since each of the independent variables has mean zero, the true mean of is just the intercept).

**Functions in Julia: The OLS Estimator**

Functions are defined in Julia using the command *function*, followed by the desired name of your function, and parentheses containing the arguments of the function. An *end* statement is required upon completion of the function definition. Indentation is required within the body of the function, and it is a good practice to explicitly include a return statement. The OLS estimator is defined with respect to any particular parameters as follows:

function OLSestimator(y,x) estimate = inv(x'*x)*(x'*y) return estimate end

This function uses the dot product (***) three times, the transpose (*‘*) twice, and the inverse *(inv()*) once. This function works for any matrices *x* and *y* with the appropriate dimensions. Once you have defined this function by running the file, you can obtain the OLS estimates of the true parameters by typing,

julia> estimates = OLSestimator(Y,X)

Because of the return statement, the parameter estimates will be returned by *OLSestimator()* and stored in *estimates*. If you compute *estimates* in the script, you can print them to the screen using,

println(estimates)

Finally, change the parameter *N *defined at the beginning of your code. When *N* is small (say, 100), estimates will usually be further from *trueParams* than when *N* is large (say 10,000). This should be very easy to change; if you used the print statement for the estimates, just change the value of *N*, run the code, and see the new *estimates* printed to the console. In order to make your estimates reproducible (i.e., the exact same draws from the random distributions), set the random seed at the beginning of your code using,

srand(2)

where 2 is a possible seed you could choose.

**Results**

When I run the code above with random seed 2, I find that,

julia> estimates 4-element Array{Float64,1}: 0.11216 0.476437 -0.290574 0.0108337

so you can use this to check your results.

If you are only trying to estimate OLS, you can use the built-in command* linreg()*, but do not include the vector of ones, as *linreg() *will add another vector of ones. This is why I created *X_noconstant* above, which is just *X* without a column of ones. The syntax is,

julia> linreg(X_noconstant,Y) 4-element Array{Float64,1}: 0.11216 0.476437 -0.290574 0.0108337

and the estimates are identical to those in *estimates*.

Bradley J. Setzler