Tag Archives: julialang

PhysicalConstants.jl: Julia Package of Physical Constants

PhysicalConstants is a Julia package which has the values of a range of physical constants. Currently MKS and CGS units are supported.

Installation

The package can be installed directly from its github repository:

Pkg.clone("https://github.com/DataWookie/PhysicalConstants.jl")

Usage

Usage is pretty straightforward. Start off by loading the package.

julia> using PhysicalConstants

Now, for example, access Earth’s gravitational acceleration in MKS units.

julia> PhysicalConstants.MKS.GravAccel
9.80665

Or in CGS units.

julia> PhysicalConstants.CGS.GravAccel
980.665

Or, finally, in Imperial units.

julia> PhysicalConstants.Imperial.GravAccel
32.174049

The post PhysicalConstants.jl: Julia Package of Physical Constants appeared first on Exegetic Analytics.

#MonthOfJulia Day 16: Databases

Julia-Logo-Database

Yesterday we looked at how time series data can be sucked into Julia from Quandl. What happens if your data are sitting in a database? No problem, Julia can handle that too. There are a number of database packages available. I’ll be focusing on SQLite and ODBC, but it might be worthwhile checking out JDBC, LevelDB and LMDB too.

SQLite

SQLite is a lightweight transactional SQL database engine that does not require a server or any major configuration. Installation is straightforward on most platforms.

The first step towards using SQLite from Julia is to load the package.

julia> using SQLite

Next, for illustrative purposes, we’ll create a database (which exists as a single file in the working directory) and add a table which we’ll populate directly from a delimited file.

julia> db = SQLiteDB("passwd.sqlite")
SQLiteDB{UTF8String}("passwd.sqlite",Ptr{Void} @0x00000000059cde38,0)
julia> create(db, "passwd", readdlm("/etc/passwd", ':'), ["username", "password", "UID", "GID",
                                                          "comment", "homedir", "shell"])
1x1 ResultSet
| Row | "Rows Affected" |
|-----|-----------------|
| 1   | 0               |

Then the interesting bit: we execute a simple query.

julia> query(db, "SELECT username, homedir FROM passwd LIMIT 10;")
10x2 ResultSet
| Row | "username" | "homedir"         |
|-----|------------|-------------------|
| 1   | "root"     | "/root"           |
| 2   | "daemon"   | "/usr/sbin"       |
| 3   | "bin"      | "/bin"            |
| 4   | "sys"      | "/dev"            |
| 5   | "sync"     | "/bin"            |
| 6   | "games"    | "/usr/games"      |
| 7   | "man"      | "/var/cache/man"  |
| 8   | "lp"       | "/var/spool/lpd"  |
| 9   | "mail"     | "/var/mail"       |
| 10  | "news"     | "/var/spool/news" |

Most of the expected SQL operations are supported by SQLite (check the documentation) and hence also by the Julia interface. When we’re done we close the database connection.

julia> close(db)

Of course, the database we created in Julia is now available through the shell too.

colliera@propane:~/proj/Z-212-language-julia/src$ ls -l passwd.sqlite 
-rw-r--r-- 1 colliera colliera 6144 Sep 18 07:21 passwd.sqlite
colliera@propane:~/proj/Z-212-language-julia/src$ sqlite3 passwd.sqlite 
SQLite version 3.8.7.4 2014-12-09 01:34:36
Enter ".help" for usage hints.
sqlite> pragma table_info(passwd);
0|username|TEXT|0||0
1|password|TEXT|0||0
2|UID|REAL|0||0
3|GID|REAL|0||0
4|comment|TEXT|0||0
5|homedir|TEXT|0||0
6|shell|TEXT|0||0
sqlite> 

ODBC

If you need to access an enterprise DB (for example, Oracle, PostgreSQL, MySQL, Microsoft SQL Server or DB2) then the ODBC interface will be the way to go. To avoid the overhead of using one of these fancy DBs, I will demonstrate Julia’s ODBC functionality using the SQLite database we created above. Before we do that though, you’ll need to setup ODBC for SQLite. It’s not an onerous procedure at all. Then we fire up the ODBC package and we’re ready to roll.

julia> using ODBC

First we’ll check which drivers are available for ODBC (just SQLite in my case) and what data source names (DSNs) are registered.

julia> listdrivers()
(String["SQLite","SQLite3"],String["Description=SQLite ODBC Driver\0Driver=libsqliteodbc.so\0Setup=libsqliteodbc.so\0UsageCount=1\0","Description=SQLite3 ODBC Driver\0Driver=libsqlite3odbc.so\0Setup=libsqlite3odbc.so\0UsageCount=1\0"])
julia> listdsns()
(String["passwd"],String["SQLite3"])

We see that there is a DSN available for the passwd database. So we create a connection:

julia> db = ODBC.connect("passwd")
ODBC Connection Object
----------------------
Connection Data Source: passwd
passwd Connection Number: 1
Contains resultset(s)? No

At this point I’d like to execute a query. However, somewhat disappointingly, this doesn’t work. No error message but also no results. I’ve logged an issue with the package maintainer, so hopefully this will be resolved soon.

julia> query("SELECT * FROM passwd LIMIT 5;", db)
0x0 DataFrame

What’s promising though is that I can still retrieve the metadata for that query.

julia> querymeta("SELECT * FROM passwd LIMIT 5;", db)
Resultset metadata for executed query
-------------------------------------
Query:   SELECT * FROM passwd LIMIT 5
Columns: 7
Rows:    0
7x5 DataFrame
| Row | Names      | Types                  | Sizes | Digits | Nullable |
|-----|------------|------------------------|-------|--------|----------|
| 1   | "username" | ("SQL_LONGVARCHAR",-1) | 65536 | 0      | 1        |
| 2   | "password" | ("SQL_LONGVARCHAR",-1) | 65536 | 0      | 1        |
| 3   | "UID"      | ("SQL_DOUBLE",8)       | 54    | 0      | 1        |
| 4   | "GID"      | ("SQL_DOUBLE",8)       | 54    | 0      | 1        |
| 5   | "comment"  | ("SQL_LONGVARCHAR",-1) | 65536 | 0      | 1        |
| 6   | "homedir"  | ("SQL_LONGVARCHAR",-1) | 65536 | 0      | 1        |
| 7   | "shell"    | ("SQL_LONGVARCHAR",-1) | 65536 | 0      | 1        |

Again, when we’re done, we close the database connection.

julia> disconnect(db)

We’re now covered a number of means for getting data into Julia. Over the next few days we’ll be looking at Julia’s capabilities for analysing data. Stay tuned. In the meantime you can check out the code for today (and previous days) on github. Also take a look at the talk below.

The post #MonthOfJulia Day 16: Databases appeared first on Exegetic Analytics.

#MonthOfJulia Day 14: Data

Julia-Logo-DataFrame

DataFrames

The DataFrame type in Julia is not dissimilar to the analogous types in R and Python/pandas. It provides a way of grouping data which is convenient for analysis and reminiscent of a database table.

I’m assuming that you’ve already installed the DataFrames package. If not, take a look at yesterday’s post. The first step is then to load it up:

julia> using DataFrames

Next we can start assembling our data. A DataFrame can be built up one field at a time (as is done in the example below) or by passing all of the data at once to the constructor.

julia> people = DataFrame();
julia> people[:name]   = ["Andrew", "Claire", "Bob", "Alice"];
julia> people[:gender] = [0, 1, 0, 1];
julia> people[:age]    = [43, 35, 27, 32];
julia> people
4x3 DataFrame
| Row | name     | gender | age |
|-----|----------|--------|-----|
| 1   | "Andrew" | 0      | 43  |
| 2   | "Claire" | 1      | 35  |
| 3   | "Bob"    | 0      | 27  |
| 4   | "Alice"  | 1      | 32  |

names() and eltypes() provide a high level overview of the data, giving the names and data types respectively for each column.

julia> names(people)
3-element Array{Symbol,1}:
 :name  
 :gender
 :age   
julia> eltypes(people)
3-element Array{Type{T<:Top},1}:
 ASCIIString
 Int64      
 Int64 

You can dig deeper with describe(), which gives a simple statistical summary of each column. It does essentially the same thing as summary() in R.

Indexing operations allow you to access the data in various ways. There’s also head() and tail(), which return the first and last few records in the data.

julia> people[:age]
4-element DataArray{Int64,1}:
 43
 35
 27
 32
julia> people[2]
4-element DataArray{Int64,1}:
 0
 1
 0
 1
julia> people[:,2]
4-element DataArray{Int64,1}:
 0
 1
 0
 1
julia> people[1,:]
1x3 DataFrame
| Row | name     | gender | age |
|-----|----------|--------|-----|
| 1   | "Andrew" | 0      | 43  |

You can apply a range of operations to columns. Note, however, that there is a subtle difference in syntax: while == is the normal equality operator, .== is the element-wise equality operator which must be applied to columns in order to make element-by-element comparisons. A similar syntax pertains to other operators like .<= and .>.

julia> people[:gender] = ifelse(people[:gender] .== 1, 'F', 'M');
julia> people
4x3 DataFrame
| Row | name     | gender | age |
|-----|----------|--------|-----|
| 1   | "Andrew" | 'M'    | 43  |
| 2   | "Claire" | 'F'    | 35  |
| 3   | "Bob"    | 'M'    | 27  |
| 4   | "Alice"  | 'F'    | 32  |
julia> people[:gender] .== 'M'
4-element DataArray{Bool,1}:
  true
 false
  true
 false
julia> people[:age] .<= 40
4-element DataArray{Bool,1}:
 false
  true
  true
  true

Of course you're not likely to construct any serious collection of data manually. It's more likely to come from a database or file. There are various ways to accomplish this. The simplest is reading from a delimited file.

julia> passwd = readtable("/etc/passwd", separator = ':', header = false);
julia> names!(passwd, [symbol(i) for i in ["username", "passwd", "UID", "GID",
                                           "comment", "home", "shell"]]);
julia> passwd[1:5,:]
5x7 DataFrame
| Row | username | passwd | UID | GID   | comment  | home        | shell               |
|-----|----------|--------|-----|-------|----------|-------------|---------------------|
| 1   | "root"   | "x"    | 0   | 0     | "root"   | "/root"     | "/bin/bash"         |
| 2   | "daemon" | "x"    | 1   | 1     | "daemon" | "/usr/sbin" | "/usr/sbin/nologin" |
| 3   | "bin"    | "x"    | 2   | 2     | "bin"    | "/bin"      | "/usr/sbin/nologin" |
| 4   | "sys"    | "x"    | 3   | 3     | "sys"    | "/dev"      | "/usr/sbin/nologin" |
| 5   | "sync"   | "x"    | 4   | 65534 | "sync"   | "/bin"      | "/bin/sync"         |

Note how names!() was used to alter the column names. There are other ways of loading data from a delimited text file that will handle column names more elegantly. We'll get to those in a few days time.

Watch the video below and then read further to find out about the DataArrays package.

DataArrays

Data are seldom perfect and missing values are not uncommon. Now, you might use some a particular numerical value (like -9999, for example) to indicate a missing datum. However, this is a bit of a kludge, difficult to maintain and open to ambiguity. The DataArrays package introduces the singleton NA type which can be used to unambiguously indicate missing data.

A vector with missing data is created using the @data macro.

julia> using DataArrays
julia> x = @data([1, 2, 3, 4, NA, 6])
6-element DataArray{Int64,1}:
 1  
 2  
 3  
 4  
  NA
 6  

Functions anyna() and allna() can be used to test whether any or all of the elements of a vector are missing.

Two ways of dealing with NAs are to either drop them or replace them with another value.

julia> dropna(x)
5-element Array{Int64,1}:
 1
 2
 3
 4
 6
julia> convert(Array, x, -1)
6-element Array{Int64,1}:
  1
  2
  3
  4
 -1
  6

Data frames have support for NAs already baked in.

julia> people[:age][2] = NA;
julia> people
4x3 DataFrame
| Row | name     | gender | age |
|-----|----------|--------|-----|
| 1   | "Andrew" | 'M'    | 43  |
| 2   | "Claire" | 'F'    | NA  |
| 3   | "Bob"    | 'M'    | 27  |
| 4   | "Alice"  | 'F'    | 32  |
julia> mean(people[:age])
NA
julia> mean(dropna(people[:age]))
34.0

Note how dropna() was used to calculate the mean of the non-missing data.

Metaprogramming with a DataFrame

The DataFramesMeta package provides a handful of macros for applying metaprogramming techniques to data frames. For example:

julia> using DataFramesMeta
julia> @with(passwd, maximum(:UID))
65534
julia> @select(people, :gender)
4x1 DataFrame
| Row | gender |
|-----|--------|
| 1   | 'M'    |
| 2   | 'F'    |
| 3   | 'M'    |
| 4   | 'F'    |

Further examples can be found on the github page for MonthOfJulia.

The post #MonthOfJulia Day 14: Data appeared first on Exegetic Analytics.