#MonthOfJulia Day 38: Imaging

By: Andrew Collier

Re-posted from: http://www.exegetic.biz/blog/2015/10/monthofjulia-day-38-imaging/


Julia has a few packages aimed at image processing. We’ll start by looking at the TestImages package, which hosts a selection of sample images, then briefly visit the ImageView package before moving onto the Images package, which implements a range of functions for image manipulation.

Test Images

The TestImages package currently provides 25 sample images, which form a convenient basis for experimentation.

julia> using TestImages
julia> readdir(joinpath(homedir(), ".julia/v0.4/TestImages/images/"))
25-element Array{ByteString,1}:

We’ll load the archetypal test image (the November 1972 Playboy centerfold of Lena Söderberg).

julia> lena = testimage("lena_color_256.tif");

Of course, now that we’ve loaded that image, we’ll want to take a look at it. To do that we’ll need the ImageView package.

julia> using ImageView
julia> view(lena)
(ImageCanvas,ImageSlice2d: zoom = Graphics.BoundingBox(0.0,256.0,0.0,256.0))

You can optionally specify the pixel spacing as a parameter to view(), which then ensures that the aspect ratio of the image is conserved on resizing. There are various other bobs and whistles associated with view(): you can click-and-drag within the image to zoom in on a particular region; various simple transformations (flipping and rotation) are possible; images can be annotated and multiple images can be arranged on a canvas for simultaneous viewing.

Image Representation

Outside of the test images, an arbitrary image file can be loaded using imread() from the Images package. Naturally, there are also functions for writing images, imwrite() and writemime().

julia> using Images
julia> earth = imread(joinpath(homedir(), ".julia/v0.4/TestImages/images/earth_apollo17.jpg"))
RGB Images.Image with:
  data: 3000x3002 Array{ColorTypes.RGB{FixedPointNumbers.UfixedBase{UInt8,8}},2}
    IMcs: sRGB
    spatialorder:  x y
    pixelspacing:  1 1

The default representation for the Image object tells us its dimensions, storage type and colour space. The spatial order indicates that the image data are stored using row major ordering. It’s also possible to specify physical units for the pixel spacing, which is particularly important if you are analysing images where absolute scale matters (for example, medical imaging). There are convenience methods for a few image properties.

julia> colorspace(earth)
julia> height(earth)
julia> width(earth)

We can examine individual pixels within the image using the indexing operator.

julia> earth[1,1]

Each pixel is of type RGB (defined in the Colors package), which encapsulates a tuple giving the proportion of red, green and blue for that pixel. The underlying image data can also be accessed via the data() method.


The image can be split into its component colour channels using separate().

julia> earth_rgb = separate(earth)
RGB Images.Image with:
  data: 3002x3000x3 Array{FixedPointNumbers.UfixedBase{UInt8,8},3}
    IMcs: sRGB
    colorspace: RGB
    colordim: 3
    spatialorder:  y x
    pixelspacing:  1 1

Note that the result is a three-dimensional Array. The spatial order has also changed, which means that the data are now represented using column major ordering. The data are thus effectively transposed.

Simple Image Processing

Kernel-based filtering can be applied using imfilter() or imfilter_fft(), where the latter is better suited to larger kernels. There’s a variety of helper functions for constructing kernels, like imaverage() and gaussian2d().

julia> lena_smooth = imfilter(lena, imaverage([3, 3]));
julia> lena_very_smooth = imfilter_fft(lena, ones(10, 10) / 100);
julia> lena_gauss_smooth = imfilter_gaussian(lena, [1, 2]);

The effects of the above smoothing operations can be seen below, with the original image on the left, followed by the 3-by-3 and 10-by-10 boxcar filtered versions and finally the Gaussian filtered image.


The imgradients() function calculates gradients across the image. You can choose from a set of methods for calculating the gradient. The morphological dilation and erosion operations are available via dilate() and erode().

julia> (lena_sobel_x, lena_sobel_y) = imgradients(lena, "sobel");
julia> lena_dilate = dilate(lena);
julia> lena_erode = erode(lena);

Below are the two components of the image gradient calculated using the Sobel operator followed by the results of dilate() and erode().



Other Packages

The ImageMagick package implements further imaging functionality. If, in the future, it provides an interface to the full functionality on the ImageMagick suite then it will be a truly phenomenal resource. Also worth looking at is the PiecewiseAffineTransforms package which implements a technique for warping portions of an image.

If you’d like to exercise your image processing and machine learning skills in Julia, take a look at the First Steps With Julia competition on kaggle.

The post #MonthOfJulia Day 38: Imaging appeared first on Exegetic Analytics.

#MonthOfJulia Day 37: Fourier Techniques

By: Andrew Collier

Re-posted from: http://www.exegetic.biz/blog/2015/10/monthofjulia-day-37-fourier-techniques/


The Fourier Transform is often applied to signal processing and other analyses. It allows a signal to be transformed between the time domain and the frequency domain. The efficient Fast Fourier Transform (FFT) algorithm is implemented in Julia using the FFTW library.

1D Fourier Transform

Let’s start by looking at the Fourier Transform in one dimension. We’ll create test data in the time domain using a wide rectangle function.

julia> f = [abs(x) <= 1 ? 1 : 0 for x in -5:0.1:5];
julia> length(f)

This is what the data look like:
We’ll transform the data into the frequency domain using fft().

julia> F = fft(f);
julia> typeof(F)
julia> length(F)
julia> F = fftshift(F);

The frequency domain data are an array of Complex type with the same length as the time domain data. Since each Complex number consists of two parts (real and imaginary) it seems that we have somehow doubled the information content of our signal. This is not true because half of the frequency domain data are redundant. The fftshift() function conveniently rearranges the data in the frequency domain so that the negative frequencies are on the left.

This is what the resulting amplitude and power spectra look like:
The analytical Fourier Transform of the rectangle function is the sinc function, which agrees well with numerical data in the plots above.

2D Fourier Transform

Let’s make things a bit more interesting: we’ll look at the analogous two-dimensional problem. But this time we’ll go in the opposite direction, starting with a two-dimensional sinc function and taking its Fourier Transform.

Building the array of sinc data is easy using a list comprehension.

julia> f = [(r = sqrt(x^2 + y^2); sinc(r)) for x in -6:0.125:6, y in -6:0.125:6];
julia> typeof(f)
julia> size(f)

It doesn’t make sense to think about a two-dimensional function in the time domain. But the Fourier Transform is quite egalitarian: it’s happy to work with a temporal signal or a spatial signal (or a signal in pretty much any other domain). So let’s suppose that our two-dimensional data are in the spatial domain. This is what it looks like:

Generating the Fourier Transform is again a simple matter of applying fft(). No change in syntax: very nice indeed!

julia> F = fft(f);
julia> typeof(F)
julia> F = fftshift(F);

The power spectrum demonstrates that the result is the 2D analogue of the rectangle function.

Higher Dimensions and Beyond

It’s just as easy to apply the FFT to higher dimensional data, although in my experience this is rarely required.

Most of the FFTW library’s functionality has been implemented in the Julia interface. For example:

  • it’s possible to generate plans for optimised FFTs using plan_fft();
  • dct() yields the Discrete Cosine Transform;
  • you can exploit conjugate symmetry in real transforms using rfft(); and
  • it’s possible to run over multiple threads using FFTW.set_num_threads().

Watch the video below in which Steve Johnson demonstrates many of the features of FFTs in Julia.

The post #MonthOfJulia Day 37: Fourier Techniques appeared first on Exegetic Analytics.

JSoC project update – DataStreams.jl

By: Julia Developers

Re-posted from: http://feedproxy.google.com/~r/JuliaLang/~3/VtZnNbOxvJc/datastreams

Data processing got ya down? Good news! The DataStreams.jl package, er, framework, has arrived!

The DataStreams processing framework provides a consistent interface for working with data, from source to sink and eventually every step in-between. It’s really about putting forth an interface (specific types and methods) to go about ingesting and transferring data sources that hopefully makes for a consistent experience for users, no matter what kind of data they’re working with.

How does it work?

DataStreams is all about creating “sources” (Julia types that represent true data sources; e.g. csv files, database backends, etc.), “sinks” or data destinations, and defining the appropriate Data.stream!(source, sink) methods to actually transfer data from source to sink. Let’s look at a quick example.

Say I have a table of data in a CSV file on my local machine and need to do a little cleaning and aggregation on the data before building a model with the GLM.jl package. Let’s see some code in action:

using CSV, SQLite, DataStreams, DataFrames

# let's create a Julia type that understands our data file
csv_source = CSV.Source("datafile.csv")

# let's also create an SQLite destination for our data
# according to its structure
db = SQLite.DB() # create an in-memory SQLite database

# creates an SQLite table
sqlite_sink = SQLite.Sink(Data.schema(csv_source), db, "mydata")

# parse the CSV data directly into our SQLite table
Data.stream!(csv_source, sqlite_sink)

# now I can do some data cleansing/aggregation
# ...various SQL statements on the "mydata" SQLite table...

# now I'm ready to get my data out and ready for model fitting
sqlite_source = SQLite.Source(sqlite_sink)

# stream our data into a Julia structure (Data.Table)
dt = Data.stream!(sqlite_source, Data.Table)

# convert to DataFrame (non-copying)
df = DataFrame(dt)

# do model-fitting
OLS = glm(Y~X,df,Normal(),IdentityLink())

Here we see it’s quite simple to create a Source type by wrapping a true datasource (our CSV file), a destination for that data (an SQLite table), and to transfer the data. We can then turn our SQLite.Sink into an SQLite.Source for getting the data back out again.

So What Have You Really Been Working On?

Well, a lot actually. Even though the DataStreams framework is currently simple and minimalistic, it took a lot of back and forth on the design, including several discussions at this year’s JuliaCon at MIT. Even with a tidy little framework, however, the bulk of the work still lies in actually implementing the interface in various packages. The two that are ready for release today are CSV.jl and SQLite.jl. They are currently available for julia 0.4+ only.

Quick rundown of each package:

  • CSV: provides types and methods for working with CSV and other delimited files. Aims to be (and currently is) the fastest and most flexible CSV reader in Julia.
  • SQLite: an interface to the popular SQLite local-machine database. Provides methods for creating/managing database files, along with executing SQL statements and viewing the results of such.
So What’s Next?
  • ODBC.jl: the next package to get the DataStreams makeover is ODBC. I’ve already started work on this and hopefully should be ready soon.
  • Other packages: I’m always on the hunt for new ways to spread the framework; if you’d be interested in implementing DataStreams for your own package or want to collaborate, just ping me and I’m happy to discuss!
  • transforms: an important part of data processing tasks is not just connecting to and moving the data to somewhere else: often you need to clean/transform/aggregate the data in some way in-between. Right now, that’s up to users, but I have some ideas around creating DataStreams-friendly ways to easily incorporate transform steps as data is streamed from one place to another.
  • DataStreams for chaining pipelines + transforms: I’m also excited about the idea of creating entire DataStreams, which would define entire data processing tasks end-to-end. Setting up a pipeline that could consistently move and process data gets even more powerful as we start looking into automatic-parallelism and extensibility.
  • DataStream scheduling/management: I’m also interested in developing capabilities around scheduling and managing DataStreams.

The work on DataStreams.jl was carried out as part of the Julia Summer of Code program, made possible thanks to the generous support of the Gordon and Betty Moore Foundation, and MIT.