Author Archives: Sören Dobberschütz

FluxArchitectures: LSTNet

By: Sören Dobberschütz

Re-posted from: http://sdobber.github.io/FA_LSTNet/

The first model in the FluxArchitectures repository is the “Long- and Short-term Time-series network” described by Lai et al., 2017.

Model Architecture

Model Structure

Image from Lai et al, “Long- and Short-term Time-series network”, ArXiv 2017.

The neural net consists of the following elements:

  • A convolutional layer than operates on some window of the time series.
  • Two recurrent layers: A GRU cell with relu activation function, and a SkipGRU cell similar to the previous GRU cell, with the difference that the hidden state is taken from a specific amount of timesteps back in time. Both the GRU and the SkipGRU layer take their input from the convolutional layer.
  • A dense layer that operates on the concatenated output of the previous two layers.
  • An autoregressive layer operating on the input data itself, being added to the output of the dense layer.

The Convolutional Layer

We use the standard Flux convolutional layer. Stemming from an image analysis background, it expects the input data to be in the “width, height, channels, batch size” order. For our application, we pool some window of the time series of input features together, giving

  • width: The number of input features.

  • height: The number of timesteps we pool together.

  • channels: We are only using one channel.

  • batch size: The number of convolutional layers convlayersize we would like to have in our model.

This gives

Conv((in, poolsize), 1 => convlayersize, σ)

Recurrent Layers

Flux has a GRU layer available, however with a different (fixed) activation function.1 Therefore we alter the code slightly to obtain our ReluGRU part of the model.

mutable struct ReluGRUCell{A,V}
  Wi::A
  Wh::A
  b::V
  h::V
end

ReluGRUCell(in, out; init = Flux.glorot_uniform) =
  ReluGRUCell(init(out*3, in), init(out*3, out),
          init(out*3), zeros(Float32, out))

function (m::ReluGRUCell)(h, x)
  b, o = m.b, size(h, 1)
  gx, gh = m.Wi*x, m.Wh*h
  r = σ.(Flux.gate(gx, o, 1) .+ Flux.gate(gh, o, 1) .+ Flux.gate(b, o, 1))
  z = σ.(Flux.gate(gx, o, 2) .+ Flux.gate(gh, o, 2) .+ Flux.gate(b, o, 2))
   = relu.(Flux.gate(gx, o, 3) .+ r .* Flux.gate(gh, o, 3) .+ Flux.gate(b, o, 3))
  h′ = (1 .- z).* .+ z.*h
  return h′, h′
end

Flux.hidden(m::ReluGRUCell) = m.h
Flux.@functor ReluGRUCell

"""
    ReluGRU(in::Integer, out::Integer)

Gated Recurrent Unit layer with `relu` as activation function.
"""
ReluGRU(a...; ka...) = Flux.Recur(ReluGRUCell(a...; ka...))

This is more or less a direct copy from the Flux code, only changing the activation function.

To get access to the hidden state of a ReluGRUCell from prior timepoints, we alter the gate function:

skipgate(h, n, p) = (1:h) .+ h*(n-1)
skipgate(x::AbstractVector, h, n, p) = x[skipgate(h,n,p)]
skipgate(x::AbstractMatrix, h, n, p) = x[skipgate(h,n,p),circshift(1:size(x,2),-p)]

With this, we can adapt the ReluGRU cell, add the skip-length parameter p to construct the SkipGRU part

mutable struct SkipGRUCell{N,A,V}
  p::N
  Wi::A
  Wh::A
  b::V
  h::V
end

SkipGRUCell(in, out, p; init = Flux.glorot_uniform) =
  SkipGRUCell(p, init(out*3, in), init(out*3, out),
          init(out*3), zeros(Float32, out))

function (m::SkipGRUCell)(h, x)
  b, o = m.b, size(h, 1)
  gx, gh = m.Wi*x, m.Wh*h
  p = m.p
  r = σ.(Flux.gate(gx, o, 1) .+ skipgate(gh, o, 1, p) .+ Flux.gate(b, o, 1))
  z = σ.(Flux.gate(gx, o, 2) .+ skipgate(gh, o, 2, p) .+ Flux.gate(b, o, 2))
   = relu.(Flux.gate(gx, o, 3) .+ r .* skipgate(gh, o, 3, p) .+ Flux.gate(b, o, 3))
  h′ = (1 .- z).* .+ z.*h
  return h′, h′
end

Flux.hidden(m::SkipGRUCell) = m.h
Flux.@functor SkipGRUCell

"""
    SkipGRU(in::Integer, out::Integer, p::Integer)

Skip Gated Recurrent Unit layer with skip length `p`. The hidden state is recalled
from `p` steps prior to the current calculation.
"""
SkipGRU(a...; ka...) = Flux.Recur(SkipGRUCell(a...; ka...))

Having decided on the number recurlayersize of recurrent layers in the model, as well as the number of time steps skiplength for going back in the hidden layer, we can use these two layers as

ReluGRU(convlayersize,recurlayersize; init = init)
SkipGRU(convlayersize,recurlayersize, skiplength; init = init)

Subsequently, they are fed to a dense layer with scalar output and identity as activation function. We use the standard Flux layer with

Dense(2*recurlayersize, 1, identity)

Autoregressive Layer

For the autoregressive part of the model, we use a dense layer with the number of features as the input size:

Dense(in, 1 , identity; initW = initW, initb = initb)

Putting it Together

Now that we have all the ingredients, we need to make sure to put it together in a reasonable way, dropping singular dimensions or extracting the right input features.

We first define a struct to hold all our layers

mutable struct LSTnetCell{A, B, C, D, G}
  ConvLayer::A
  RecurLayer::B
  RecurSkipLayer::C
  RecurDense::D
  AutoregLayer::G
end

For creating a LSTNet layer, we define the following constructor

function LSTnet(in::Integer, convlayersize::Integer, recurlayersize::Integer, poolsize::Integer, skiplength::Integer, σ = Flux.relu;
	init = Flux.glorot_uniform, initW = Flux.glorot_uniform, initb = Flux.zeros)

	CL = Chain(Conv((in, poolsize), 1 => convlayersize, σ))
	RL = Chain(a -> dropdims(a, dims = (findall(size(a) .== 1)...,)),
			ReluGRU(convlayersize,recurlayersize; init = init))
	RSL = Chain(a -> dropdims(a, dims = (findall(size(a) .== 1)...,)),
			SkipGRU(convlayersize,recurlayersize, skiplength; init = init))
	RD = Chain(Dense(2*recurlayersize, 1, identity))
	AL = Chain(a -> a[:,1,1,:], Dense(in, 1 , identity; initW = initW, initb = initb) )

    LSTnetCell(CL, RL, RSL, RD, AL)
end

The parts a -> dropdims(a, dims = (findall(size(a) .== 1) and a -> a[:,1,1,:] make sure that we only feed two-dimensional datasets to the following layers.

The actual output from the model is obtained in the following way:

function (m::LSTnetCell)(x)
	modelRL1 = m.RecurLayer(m.ConvLayer(x))
	modelRL2 = m.RecurSkipLayer(m.ConvLayer(x))
	modelRL =  m.RecurDense(cat(modelRL1, modelRL2; dims=1))
	return modelRL + m.AutoregLayer(x)
end

That’s it! We’ve defined our LSTNet layer. The only part missing is that calls to Flux.params and Flux.reset! will not work properly. We fix that by

Flux.params(m::LSTnetCell) = Flux.params(m.ConvLayer, m.RecurLayer, m.RecurSkipLayer, m.RecurDense, m.AutoregLayer)
Flux.reset!(m::LSTnetCell) = Flux.reset!.((m.ConvLayer, m.RecurLayer, m.RecurSkipLayer, m.RecurDense, m.AutoregLayer))

To see an example where the model is trained, head over to the GitHub repository.


  1. The reason for not being able to choose the activation function freely is due to Nvidia only having limited support for recurrent neural nets for GPU acceleration, see this issue

Where to go from here? Announcing `FluxArchitectures`

By: Sören Dobberschütz

Re-posted from: http://sdobber.github.io/changetoflux/

It’s been a while since anything happened on this blog. So what happened in the meantime? Well, for the first I abandoned Tensorflow.jl in favour of Flux.jl. It seems to be the package that most people are using these days. It has a nice way of setting up models, and is nicely integrated into other parts of the Julia ecosystem as well (say, for example by combining it with differential equations to give scientific machine learning).

So how to proceed from here for improving one’s data science skills? As Casey Kneale put it

Focus on your analytical reasoning. Learn/brush up on basic statistics, think about them. Learn the limitations of the statistics you learned, otherwise they are useless. Learn basics of experimental science (experimental design!, the basics of the scientific method) and practice it :). Learn how to collect(scrape, parse, clean, organize, store), and curate data(quality, quantity, how to set up small scale infrastructure). Study some core algorithms, nothing fancy unless your math skills are strong. Try tweaking an algorithm that you think is cool to do something better for a dataset/problem, and study how well it does. Learn statistical methods for comparing experimental outcomes without bias. Make a ton of mistakes, even intentionally.

My point of interest is applying neural network methods to medical data, while brushing up on and learning more about these things. More specifically, I want to try out for predicting future blood glucose levels for patients suffering from diabetes. Here, even a 30 to 45 minute window of a reasonable forecast could provide with an opportunity to mitigate adverse effects.

All of this is centered around modeling of time series. When I started out, I was lacking good examples of slightly more complex models for Flux.jl than the standard examples from the documentation. To give a place to some of the models, I started the FluxArchitectures repository.

Moving to Julia 1.1

By: Sören Dobberschütz

Re-posted from: https://tensorflowjulia.blogspot.com/2019/03/moving-to-julia-11.html

Moving from Julia 0.6 to 1.1


Finally, all files in the GitHub repository have been updated to be able to run on Julia 1.1. In order to be able to run them (at the time of writing), the developmental versions of the Tensorflow.jl and PyCall.jl packages need to be installed. Some notable changes are listed below:

General

  • The shuffle command has been moved to the Random package as Random.shuffle().
  • linspace has been replaced with range(start, stop=…, length=…).
  • To determine the length of a matrix, use size(…,2) instead of length().
  • For accessing the first and last part of datasets, head() as been replaced with first(), and tail() with last().
  • For arithmetic calculations involving constants and arrays, the better syntax const .- array needs to be used instead of const – array. In connection with this, a space might be required before the operator; i.e. to avoid confusion with number types use 1 .- array, and not 1.-array.
  • Conversion of a dataframe df to a matrix can be done via convert(Matrix, df) instead of convert(Array, df).

Exercise 8

  • The creation of an initially undefined vector etc. now requires the undef keyword as in activation_functions = Vector{Function}(undef, size(hidden_units,1)).
  • An assignment of a function to multiple entries of a vector requires the dot-operator: activation_functions[1:end-1] .= z->nn.dropout(nn.relu(z), keep_probability)

Exercise 10

  • For this exercise to work, the MNIST.jl package needs an update. A quick fix can be found in the repository together with the exercise notebooks.
  • Instead of flipdim(A, d), use reverse(A, dims=d).
  • indmax has been replaced by argmax.
  • Parsing expressions has been moved to the Meta package and can be done by Meta.parse(…).

Exercise 11

  • The behaviour of taking the transpose of a vector has been changed – it now creates an abstract linear algebra object. We use collect() on the transpose for functions that cannot handle the new type.
  • To test if a string contains a certain word, contains() has been replaced by occursin().