IJulia on Amazon EC2 instance

Re-posted from: http://grollchristian.wordpress.com/2014/09/01/ijulia-for-amazon-ec2/

In this blog post I want to give a description of how to set up a running IJulia server on an Amazon EC2 instance. Although there already are some useful resources on the topic out there, I didn’t find any comprehensive tutorial yet. Hence, I decided to write down my experiences, just in case it might be a help for others.

1 What for?

At this point, you might think: “okay – but what for?”. Well, what you get with Amazon AWS is a setting where you can easily load a given image on a computer that can be chosen from a range of different memory and processing characteristics (you can find a list of available instances here). In addition, you can also store some private data on a separate volume with Amazon’s Elastic Block Store. So taking both components together, this allows fast setup of the following workflow:

choose a computer with optimal characteristics for your task
load your ready-to-go working environment with all relevant software components pre-installed
mount an EBS volume with your private data

And your customized computer is ready for work!

So, what you get is a flexible environment that you can either use for selected tasks with high computational requirements only, or that you steadily use with a certain baseline level of CPU performance and only scale it up when needed. This way, for example, you could outsource computations from your slow netbook to the Amazon cloud whenever the weather allows you to do your work outside in your garden. Any easy to transport and lightweight netbook already provides you with an entrance to huge computational power in the Amazon cloud. However, compared to a moderately good local computer at home, it doubt that the Amazon cloud will be a cost-efficient alternative for permanent work yet. Besides, you always have the disadvantage of having to rely on a good internet connection, which especially becomes problematic when traveling.

Anyways, I am sure you will find your own use cases for IJulia in the cloud, so let’s get started with the setup.

2 Setting up an Amazon AWS account

The first step, of course, is setting up an account at Amazon web services. You can either log in with an existing Amazon account, or just create a completely new account from scratch. At this step, you will already be required to leave your credit card details. However, Amazon automatically provides you with a free usage tier, comprising (among other benefits) 750 hours of free EC2 instance usage, as well as 30 GB of EBS storage. For more details, just take a look at Amazon’s pricing webpage.

3 Setting up an instance

Logging in to your account will take you to the AWS Management Console: a graphical user interface that allows managing all of your AWS products. Should you plan on making some of your products available to other users as well (employees, students, …), you could easily assign rights to groups and users through their AWS Identity and Access Management (IAM), so that you are the only person with access to the full AWS Management Console.

Now let’s create a new instance. In the overview, click on EC2 to get to the EC2 Dashboard, where you can manage and create Amazon computing instances. You need to specify the region where your instance should physically be created, which you can select in the upper right corner. Then, click on instances on the left side, and select Launch Instance to access the step by step instructions.

In the first step you need to select a basic software equipment that your instance should use. This determines your operating system, but also could comprise additional preinstalled software. You can select from a range of Amazon Machine Images (AMI), which could be images that are put together either by Amazon, by other members of the community or by yourself. At the end of the setup, we will save our preinstalled IJulia server as AMI as well, so that we can easily select it as ready-to-go image for future instances. For now, however, we simply select an image with a basic operating system installed. For the instructions described here, I select Ubuntu Server 14.04 LTS.

In the next step we now need to pick the exact memory and processing characteristics for our instance. For this basic setup, I will use a t2.micro instance type, as it is part of the free usage tier in the first 12 months. If you wanted to use Amazon EC2 for computationally demanding tasks, you would select a type with better computational performance.

In the next two steps, Configure Instance and Add Storage, I simply stick with the default values. Only in step 5 I manually add some tags to the instance. Tags can be added as key-value pairs. So, for example, we could use the following tags to describe our instance:

task = ijulia,
os = ubuntu,
cpu = t2.micro

By now, all characteristics of the instance itself are defined, and we only need to determine the traffic that is allowed to reach our instance. In other words: we have defined a virtual computer in Amazon’s cloud – but how can we communicate with it?

In general, there are several ways to access a remote computer. The standard way on Linux computers would be through a terminal with SSH connection. This way, we can execute commands at the remote computer, which allows the installation of software, for example. In addition, however, we might want to communicate with the remote computer through other channels than the terminal. For example, in our case we want to access the computing power of the remote through an IJulia tab in our browser. Hence, we need to determine all ways through which a communication to the remote should be allowed. For security reasons, you want to keep these communication channels as sparse and restricted as possible. Still, you need to allow some of these channels in order to use our instance in the manner that you want to. In our case, we add a new rule that allows access to our instance through the browser, and hence select All TCP as its type. Be warned, however, that I am definitely not an expert on computer security issues, so that this rule might be overly permissive. Ultimately, you only want to permit communication through https with a given port (8998 in our example), and there might be better ways to achieve this. Still, I will add an additional layer of protection by restricting permission to my current IP only. If you do this, however, then you most likely need to adapt your settings each time that you restart your computer, as your IP address might change. Anyways, in many cases your instance will only live temporarily, and will be deleted after completion of a certain task. Your security concerns for such a short-living instance hence might be rather limited. Either way, you can store your current settings as a new security group, so that you can select the exact same settings for some of your future instances. I will choose “IJulia all tcp” as name of the group.

Once you are finished with your security settings, click Review and Launch and confirm your current settings with Launch. This will bring up a dialogue that lets you exchange ssh keys between remote computer and local computer. You can either create a new key pair or choose an already existing pair. In any case, make sure that you will really have access to the chosen key afterwards, because it will be the only way to get full access to your remote. If you create a new pair, the key will automatically be downloaded to your computer. Just click Launch Instance afterwards, and your instance will be available in a couple of minutes.

4 Installing and configuring IJulia

Once your instance will be displayed as running you can connect to it through ssh. Therefore, we need to use the ssh key that we did get during the setup, which I stored in my download folder with name aws_ijulia_key.pem. First, however, we will need to change the permissions of the key, so that it can not be accessed by everyone. This step is mandatory – otherwise the ssh connection will fail.

chmod 400 ~/Downloads/aws_ijulia_key.pem

Now we are ready to connect. Therefore, we need to link to the ssh key through option -i. The address of your instance is composed of two parts. First, the username, which is ubuntu for the case of an Ubuntu operating system. And second, we need to copy the Public DNS of the instance, which we can find listed in the description when we select the instance in the AWS Management Console.

ssh -i ~/Downloads/aws_ijulia_key.pem 
   [email protected]

Now that we have access to a terminal within our instance, we can use it to install all required software components for IJulia. First, we install Julia from the julianightlies repository.

sudo add-apt-repository ppa:staticfloat/julia-deps
sudo add-apt-repository ppa:staticfloat/julianightlies
sudo apt-get update
sudo apt-get -y install julia

We now already could use Julia through the terminal. Then we need to install a recent version of ipython, which we can do using pip.

sudo apt-get -y install python-dev ipython python-pip
sudo pip install --upgrade ipython

IJulia also requires some additional packages:

sudo pip install jinja2 tornado pyzmq

At last, we need to install the IJulia package:

julia -e 'Pkg.add("IJulia")'

At this step, we have installed all required software components, which we now only need to configure correctly.

The first thing we need to do is set up a secure way of connecting to IJulia through our browser. Therefore, we need to set a password that we will be required to type in when we log in. We will create this password within ipython. To start ipython on your remote, type

ipython

Within ipython, type the following

from IPython.lib import passwd
passwd()

Now type in your password. For this demo, I chose ‘juliainthecloud’, equal to the setup here. IPython then prints out the encrypted version of this password, which in my case is

'sha1:32c75aa1c075:338b3addb7db1c3cb2e10e7b143fbc60be54c1be'

You will need this hashed password later, so you should temporarily store it in your editor.

We now can exit ipython again:

exit

As a second layer of protection, we need to create a certificate which our browser will automatically use when it communicates with our IJulia server. This way, we will not have to send our password in plain text to the remote computer, but the connection will be encrypted right away. In your remote terminal, type:

mkdir ~/certificates
cd ~/certificates
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout 
   mycert.pem -out mycert.pem

You are asked to add some additional information to your certificate, but you can also skip all fields by pressing enter.

Finally, we now only need to configure IJulia. Therefore, we add the following lines to the julia profile within ipython. This profile should already have been created through the installation of the IJulia package, so that we can open the respective file with vim:

vim ~/.ipython/profile_julia/ipython_notebook_config.py

Press i for insert mode, and after

c = get_config()

add the following lines (which I myself did get from here):

# Kernel config
c.IPKernelApp.pylab = 'inline'  # if you want plotting support always
# Notebook config

c.NotebookApp.certfile = u'/home/ubuntu/certificates/mycert.pem'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.password = 
   u'sha1:32c75aa1c075:338b3addb7db1c3cb2e10e7b143fbc60be54c1be'

# It is a good idea to put it on a known, fixed port
c.NotebookApp.port = 8998

Make sure that you did set c.NotebookApp.password to YOUR hashed password from before.

To leave vim again, press escape to end insert mode and :wq to save and quit.

5 Starting and connecting to IJulia server

We are done with the configuration, so that we now can connect to our IJulia server. To start the server on the remote, type

sudo ipython notebook --profile=julia
# nohup ipython notebook  --profile julia > ijulia.log &

And to connect to your server, open a browser on your local machine and go to

https://ec2-54-76-169-215.eu-west-1.compute.amazonaws.com:8998

This will pop up a warning that the connection is untrust. Don’t worry, this is just because we are using the encryption that we did set up ourselves, and this certificate is not officially registered. Once you allow the connection, an IJulia logo appears, and you will be prompted for your password.

Note, however, that the Public DNS of your instance changes whenever you stop your instance from running. Hence, you need to type a different URL when you connect to IJulia the next time.

6 Store as AMI

Now that we have done all this work, we want to store the exact configuration of our instance as our own AMI to be able to easily load it on any EC2 instance type in the future. Therefore, we simply stop our running instance in the AWS Management Console (after we also stopped the running IJulia server of course). Right-click on it as soon as the Instance State says stopped. Choose create image, select an appropriate name for your configuration, and you’re done. The next time you launch a new image, you can simply select this AMI instead of the Ubuntu 14.04 AMI that we did use previously.

However, creating an image of your current instance requires you to store a snapshot of your volume in your elastic block store, which might cause additional costs depending on your AWS profile. Using Amazon’s free usage tier, you have 1GB free snapshot storage at your disposal. Due to compression, your snapshot should be smaller in storage size than the original 8GB of your volume. Nevertheless, I do not yet know where to look up the overall storage size of my snapshots (determining the size of individual snapshots if you have multiple snapshots is even impossible). I only can say that Amazon did not bill me for anything yet, so I assume that the snapshot’s size should be smaller than 1GB.

7 Add volume

In order to have your data available for multiple instances, you should store it on a separate volume, which then can be attached to any instance you like. This also allows easy duplication of your data, so that you can attach one copy of it to your steadily running t2.micro instance, while the other one gets attached to some temporary instance with greater computational power in order to perform some computationally demanding task.

For creation and mounting of a data volume we largely follow the steps shown in the documentation.

On the left side panel, click on Volumes, then select Create Volume. As size we choose 5 GB, as this should leave us with enough free storage size in order to simultaneously run two instances, each with its own copy of personal data mounted. Also, we choose to not encrypt our volume, as t2.micro instances do not support encrypted volumes.

Now that the volume is created, we need to attach it to our running instance. Right-click on the volume and select Attach Volume. Type in the ID of your running instance, and as your device name choose /dev/sdf as proposed for linux operating systems.

Connect to your running instance with ssh, and display all volumes with the lsblk command in order to find the volume name. In my case, the new attached volume displays as xvdf. Now, since we created a new volume from scratch, we additionally need to format it. Be careful to not conduct this step if you attach an already existing volume with data, as it would delete all of your data! For my volume name, the command is

sudo mkfs -t ext4 /dev/xvdf

The device can now be mounted anywhere in your file system. Here we will mount it as folder research in the home directory.

sudo mkdir ~/research
sudo mount /dev/xvdf /home/ubuntu/research

And, in case you need it at some point, you can unmount it again with

umount -d /dev/xvdf

8 Customization

Well, now we know how to get an IJulia server running. To be honest, however, this really is only half of the deal if you want to use it for more than just some on-the-fly experimenting. For more elaborate work, you need to be able to also conveniently perform the following operations at your Amazon remote instance:

editing files on the remote
copying files to and from the remote

There are basically two ways that allow you to edit files on the remote. First, you use some way of editing files directly through the terminal. For example, you could open a vim session in your terminal on the remote itself. Second, you work with a local editor that allows to access files on a remote computer via ssh. As emacs user, I stick with this approach, as emacs tramp allows working with remote files as if they were local.

For synchronization of files to the remote I use git, as I host all of my code on github and bitbucket. However, I also want my remote to communicate with my code hosting platforms through ssh without any additional password query. In principle, one could achieve this through the standard procedure: creating a key pair on the remote, and copying the public key to the respective file hosting platform. However, this could become annoying, given that we have to perform it whenever we launch a new instance in the future. There is a much easier way to enable communication between remote server and github / bitbucket: using SSH agent forwarding. What this does, in simple terms, is passing your local ssh keys on to your remote computer, which is then allowed to use them for authentication itself. To set this up, we have to add some lines to ./ssh/config (take a look at this post for some additional things that you could achieve with your config file). And since we are already editing the file, we also take the chance to set an alias aws_julia_instance for our server and automatically link to the required login ssh key file. Our config file now is

Host aws_julia_instance
     HostName ec2-54-76-169-215.eu-west-1.compute.amazonaws.com
     User ubuntu
     IdentityFile ~/Downloads/aws_ijulia_key.pem
     ForwardAgent yes

Remember: you have to change the HostName in your config file whenever you stop your instance, since the Public DNS changes.

Given the settings in our config file, we now can access our remote through ssh with the following simple command:

ssh aws_julia_instance

And, given that we have set up ssh access to github and bitbucket on our local machine, we can use these keys on the remote as well.

As a last convenience for the synchronization of files on your remote I recommend the use of myrepos, which allows you to easily keep track of all modifications in a whole bunch of repositories. You can install it with

mkdir ~/programs
cd ~/programs
git clone https://github.com/joeyh/myrepos.git
cd /usr/bin/
sudo ln -s ~/programs/myrepos/mr

Filed under: Julia, tools Tagged: amazon, aws, ec2, ijulia, Julia

juliabloggers.com

A Julia Language Blog Aggregator

Table of Contents