Author Archives: Sören Dobberschütz

Tips and tricks to register your first Julia package

By: Sören Dobberschütz

Re-posted from: http://sdobber.github.io/juliapackage/

FluxArchitectures has finally been published as a Julia package! Installation is now as easy as typing ] to activate the package manager, and then

add FluxArchitectures

at the REPL prompt.

While preparing the package, I noticed that a lot of good and useful information about some of the details of registering a package is spread out in different documentations and not easily accessible. In this post, I try to give a walkthrough of the different steps. There is also an older (but highly recommended) video from Chris Rackauckas explaining some of the process:

Package Template

Start out by creating a package using the PkgTemplate.jl-package. This package takes care of most of the internals for setting everything up. I decided to have automatic workflows for running tests and preparing the documentation done on Github, so the code for creating the template was

using PkgTemplates

t = Template(; 
    user="UserName",
    dir="~/Code/PkgLocation/",
    julia=v"1.6",
    plugins=[
        Git(; manifest=true, ssh=true),
        GitHubActions(; x86=true),
        Codecov(),
        Documenter{GitHubActions}(),
    ],
)

t("PkgName")

Of course, change UserName to your Github user name, PkgLocation and PkgName to the locaction and package name of your choice. A good starting point is this place of the PkgTemplates.jl documentation. There are a lot of options available, which are described in the documentation.

Depending on which Julia version you want to run your tests, you might need to edit the .github/workflows/CI.yml file. For example, to run tests also on the latest Julia nightly, edit the matrix section:

matrix:
        version:
          - '1.6'
          - 'nightly'

To allow for errors in the nightly version, edit the steps section to include a continue-on-error statement:

    - uses: julia-actions/julia-runtest@v1
      continue-on-error: ${{ matrix.version == 'nightly' }}

(The full file can be found here.)

Develop Code and Tests

The next step is of course to fill the template with Julia code (which goes into the src folder) and tests for your code (residing in the test folder).

Documentation

A package should have some documentation describing its features. It goes into the docs folder in the form of Markdown files. The make.jl file takes care of preparing everything, though we need to tell it the desired sidebar structure of our documentation. This goes into the pages keyword:

pages=[
        "Home" => "index.md",
        "Examples" => "examples/examples.md",
        "Exported Functions" => "functions.md",
        "Models" =>
                    ["Model 1" => "models/model1.md",
                     "Model 2" => "models/model2.md"],
        "Reference" => "reference.md",
    ]

All the files on the right side of the pairs need to exist in the docs/src-directory (or subfolders thereof). It is strongly advised to run the make.jl file locally and see if everything works.

See the Documenter.jl-documentation on how to automatically add docstrings from the package source code, add links to other sections etc. One can also learn a lot by looking at the code for other packages’ documentations.

External Data

In case your package includes some larger files with example data etc., it is a good idea to include them via Julia’s Artifact system. This consists of the following steps:

  • Create a .tar.gz archive of your files.
  • Upload the files to an accessible location (e.g. in a separate GitHub repository). For FluxArchitectures, I used a FA_data repository.
  • Create an Artifacts.toml file in your package folder containing information about the files to download.
  • Access your files in Julia by finding their location through the Artifact system – Julia will automatically take care of downloading them, storing them and making them accesible. This works like
      using Pkg.Artifacts
      rootpath = artifact"DatasetName"
    

These steps are described in detail in the Pkg-documentation. For the toml-file, a file SHA and git tree SHA are needed. They can be produced by Julia itself – see the linked documentation. If one adds the lazy = true keyword to the section containing the git tree SHA, the data is only downloaded when the user requests it for the first time.

A common mistake is to “just” copy-paste the GitHub URL to the archive into the Artifacts.toml file, which will not work. Make sure to use a link to the “raw” data, which usually can be obtained by inserting a raw into the GitHub URL, for example as in https://github.com/sdobber/FA_data/raw/main/data.tar.gz.

Check Requirements

It is a good idea to check the requirements for new packages, which can be found in the RegistryCI.jl-documentation. This documents gives some hints about proper naming etc.

Publish to GitHub & Add JuliaRegistrator

If not done already, make sure that your package is available on GitHub.

Click on the “install app” button on JuliaRegistrator’s page and allow the bot to access your package repository.

Stable vs Dev Documentation

The CI.yml file created in the package template contains a workflow that will build your documentation and make it available in your repository. The documentation is pushed to a new branch called gh-pages. You might need to tell GitHub to use this branch as a “Github Pages” Site. Follow the instructions in this GitHub documentation, and set the “publishing source” to the gh-pages branch.

With the default settings, only documentation for the current development version of the package will be created. If you also want to create and keep documentation for each tagged version, you need to create and add a key pair to the DOCUMENTER_KEY secret of the GitHub repository. The easiest way I found for producing those is to install the package called DocumenterTools.jl and run

using DocumenterTools
DocumenterTools.genkeys()

in the Julia REPL.

The REPL-output will present you with two strings that need to be pasted into different places:

  • The first key needs to be added as a public key to your repository, see this documentation and start at step 2.
  • The second key needs to be added as a repository secret. Follow this document. The name of the secret needs to be DOCUMENTER_KEY, and the value is the output string from Julia.

Register Package

On GitHub, open a new issue in your repository, and write @JuliaRegistrator register in the comment area. The JuliaRegistrator bot will pick this up, and after a 3 day waiting period, your new package hopefully gets added to the general Julia registry.

FluxArchitectures: TPA-LSTM

By: Sören Dobberschütz

Re-posted from: http://sdobber.github.io/FA_TPALSTM/

The next model in the FluxArchitectures repository is the Temporal Pattern Attention LSTM network based on the paper “Temporal Pattern Attention for Multivariate Time Series Forecasting” by Shih et. al.. It claims to have a better performance than the previously implemented LSTNet, with the additional advantage that an attention mechanism automatically tries to determine important parts of the time series, instead of introducing parameters that need to be optimized by the user.

Model Architecture

Model Structure

Image from Shih et. al., “Temporal Pattern Attention for Multivariate Time Series Forecasting”, ArXiv, 2019.

The neural net consists of the following elements: The first part consists of an embedding and stacked LSTM layer made up of the following parts:

  • A Dense embedding layer for the input data.
  • A StackedLSTM layer for the transformed input data.

The temporal attention mechanism consist of

  • A Dense layer that transforms the hidden state of the last LSTM layer in the StackedLSTM.
  • A convolutional layer operating on the pooled output of the previous layer, estimating the importance of the different datapoints.
  • A Dense layer operating on the LSTM hidden state and the output of the attention mechanism.

A final Dense layer is used to calculate the output of the network.

The code is based on a PyTorch implementation by Jing Wang of the same model with slight adjustments.

We define a struct to hold all layers and some metadata:

mutable struct TPALSTMCell
    # Prediction layers
    embedding::A
    output::B
    lstm::C
    # Attention layers
    attention_linear1::D
    attention_linear2::E
    attention_conv::F
	# Metadata ...
end

These layers are initialized as follows:

function TPALSTM(in, hiddensize, poollength, layers=1, filternum=32, filtersize=1)
	embedding = Dense(in, hiddensize, Flux.relu)
    output = Dense(hiddensize, 1)
    lstm = StackedLSTM(hiddensize, hiddensize, hiddensize, layers)
    attention_linear1 = Dense(hiddensize, filternum)
    attention_linear2 = Dense(hiddensize + filternum, hiddensize)
    attention_conv = Conv((filtersize, poollength - 1), 1 => filternum)
    return TPALSTMCell(...)
end

We use the same input data format as for the previous LSTnet layer, i.e. “Number of input features x Number of pooled timesteps x 1 x Number of data points”. The StackedLSTM layer is described later – it is basically a number of LSTM layers, where the hidden state of one layer gets fed to the next layer as input.

The model output is obtained by the following function:

function (m::TPALSTMCell)(x)
    inp = dropdims(x, dims=3)  # num_features x poollength x batchsize
    H_raw = _TPALSTM_gethidden(inp, m)
	H = Flux.relu.(H_raw)  # hiddensize x (poollength - 1) x batchsize
    x = inp[:,end,:]  # num_features x batchsize
    xconcat = m.embedding(x)  # hiddensize x batchsize
    _ = m.lstm(xconcat)  
    h_last = m.lstm.chain[end].state[1]  # hiddensize x batchsize
    ht = _TPALSTM_attention(H, h_last, m)  # hiddensize x batchsize
    return m.output(ht)  # 1 x batchsize
end

The following calculations are performed:

  1. Drop the singleton dimension of the input data.
  2. Get the hidden state from feeding a section of past input data to the stacked LSTM network.
  3. Obtain the hidden state for the current input data.
  4. Transform this hidden state by the attention mechanism.
  5. Obtain the final output.

Step 2 and 5 are described in the following subsections.

Obtaining the hidden states

This function basically runs through the pooled data, feeding it to the LSTM part of the network. In order to be able to collect the outputs, we use a Zygote.Buffer to store the results and return a copy to get back to normal arrays.

function _TPALSTM_gethidden(inp, m::TPALSTMCell)
    batchsize = size(inp,3)
    H = Flux.Zygote.Buffer(Array{Float32}(undef, m.hiddensize, m.poollength-1, batchsize))
    for t in 1:m.poollength-1
        x = inp[:,t,:]
        xconcat = m.embedding(x)
        _ = m.lstm(xconcat)
        hiddenstate = m.lstm.chain[end].state[1]
        H[:,t,:] = hiddenstate
    end
    return copy(H)
end

Attention mechanism

The attention mechanism is contained in the function

function _TPALSTM_attention(H, h_last, m::TPALSTMCell)
    H_u = Flux.unsqueeze(H, 3)  # hiddensize x (poollength - 1) x 1 x batchsize
    conv_vecs = Flux.relu.(dropdims(m.attention_conv(H_u), dims=2))  # (hiddensize - filtersize + 1) x filternum x batchsize

    w = m.attention_linear1(h_last) |>  # filternum x batchsize
        a -> Flux.unsqueeze(a, 1) |>  # 1 x filternum x batchsize
        a -> repeat(a, inner=(m.attention_features_size,1,1))  # (hiddensize - filtersize + 1) x filternum x batchsize
    alpha = Flux.sigmoid.(sum(conv_vecs.*w, dims=2))  # (hiddensize - filtersize + 1) x 1 x batchsize
    v = repeat(alpha, inner=(1,m.filternum,1)) |>  # (hiddensize - filtersize + 1) x filternum x batchsize
        a -> dropdims(sum(a.*conv_vecs, dims=1), dims=1)  # filternum x batchsize

    concat = cat(h_last, v, dims=1)  # (filternum + hiddensize) x batchsize
    return m.attention_linear2(concat)  # hiddensize x batchsize
end

It consists of the following steps:

  1. We make sure that the matrix of pooled hidden states H has the right shape for a convolutional network by adding a third dimension of size one (making it the same size as the original input data).
  2. The Conv layer is applied, followed by a relu activation function.
  3. The transformed current hidden state of the LSTM part is multiplied with the output of the convolutional net. A sigmoid activation function gives attention weights alpha.
  4. The output of the convolutional net is weighted by the attention weights and concatenated with the current hidden state of the LSTM part.
  5. A Dense layer reduces the size of the concatenated vector.

Stacked LSTM

The stacked version of a number of LSTM cells is obtained by feeding the hidden state of one cell as input to the next one. Flux.jl’s standard setup only allows feeding the output of one cell as the new input, thus we adjust some of the internals:

  • Management of hidden states in Flux is done by the Recur structure, which returns the output of a recurrent layer. We use a similar HiddenRecur structure instead which returns the hidden state.
mutable struct HiddenRecur{T}
  cell::T
  init
  state
end

function (m::HiddenRecur)(xs...)
  h, y = m.cell(m.state, xs...)
  m.state = h
  return h[1]  # return hidden state of LSTM
end
  • The StackedLSTM-function chains everything together depending on the number of layers. (One layer corresponds to a standard LSTM cell.)
mutable struct StackedLSTMCell{A}
	chain::A
end

function StackedLSTM(in, out, hiddensize, layers::Integer)
	if layers == 1  # normal LSTM cell
		chain = Chain(LSTM(in, out))
	elseif layers == 2  
		chain = Chain(HiddenRecur(Flux.LSTMCell(in, hiddensize)),
					LSTM(hiddensize, out))
	else
		chain_vec=[HiddenRecur(Flux.LSTMCell(in, hiddensize))]
		for i=1:layers-2
			push!(chain_vec, HiddenRecur(Flux.LSTMCell(hiddensize, hiddensize)))
		end
		chain = Chain(chain_vec..., LSTM(hiddensize, out; init = init))
	end
	return StackedLSTMCell(chain)
end

function (m::StackedLSTMCell)(x)
	return m.chain(x)
end

FluxArchitectures: DA-RNN

By: Sören Dobberschütz

Re-posted from: http://sdobber.github.io/FA_DARNN/

The next model in the FluxArchitectures repository is the “Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction”, based on the paper by Qin et. al., 2017. It claims to have a better performance than the previously implemented LSTNet, with the additional advantage that an attention mechanism automatically tries to determine important parts of the time series, instead of introducing parameters that need to be optimized by the user.

Model Architecture

The neural network has a rather complex structure. Starting with an encoder-decoder structure, it consists of two units, one called the input attention mechanism, and a temporal attention mechanism.

  • The input attention mechanism feeds the input data to a LSTM network. In subsequent calculations, only its hidden state is used, where additional network layers try to estimate the importance of different hidden variables.

  • The temporal attention mechanism takes the hidden state of the encoder network and combines it with the hidden state of another LSTM decoder. Additional network layers try again to estimate the importance of the hidden variables of the encoder and decoder combined.

  • Linear layers combine the output of different layers to the final time series prediction.

Our implementation follows the one for PyTorch. We start out by creating a struct to hold all the necessary elements:

mutable struct DARNNCell{A, B, C, D, E, F, W, X, Y, Z}
  # Encoder part
	encoder_lstm::A
	encoder_attn::B
  # Decoder part
	decoder_lstm::C
	decoder_attn::D
	decoder_fc::E
	decoder_fc_final::F
  # Index for original data etc
	encodersize::W
	decodersize::X
	orig_idx::Y
	poollength::Z
end

In addition to the layers we need for constructing the DA-RNN network, we also store some metadata that are needed for the calculations: The size of the encoder and decoder network, the index orig_idx describing where in the input data the original time series can be found, and the number of time steps that the input data was pooled (corresponding to T in the following picture).

The constructor initializes all layers with their correct size:

function DARNN(inp::Integer, encodersize::Integer, decodersize::Integer, poollength::Integer, orig_idx::Integer)
	# Encoder part
	encoder_lstm = LSTM(inp, encodersize)
	encoder_attn = Chain(Dense(2*encodersize + poollength, poollength),
	                    a -> tanh.(a),
	                    Dense(poollength,1)
	# Decoder part
	decoder_lstm = LSTM(1, decodersize)
	decoder_attn = Chain(Dense(2*decodersize + encodersize, encodersize),
	                    a -> tanh.(a),
	                    Dense(encodersize, 1))
	decoder_fc = Dense(encodersize + 1, 1)
	decoder_fc_final = Dense(decodersize + encodersize, 1)

	return DARNNCell(encoder_lstm, encoder_attn, decoder_lstm, decoder_attn, decoder_fc,
	 		  decoder_fc_final, encodersize, decodersize, orig_idx, poollength)
end

Encoder network

Model Structure Encoder

Image from Qin et. al., “Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction”, ArXiv, 2017.

We use the same input data format as for the previous LSTnet layer, i.e. “Number of input features x Number of pooled timesteps x 1 x Number of data points”. Before feeding the data to the encoder, we drop the singleton dimension: input_data = dropdims(x; dims=3).

The encoder loops over the pooled timesteps to perform a scaling of the input data: It extracts the hidden state and cell state of the encoder LSTM layer, concatenates it with the input data and feeds it to the attention network. Using a softmax function, we obtain the scaling for the input data for timestep t, which is fed to the LSTM network. In the following code, we indicate the equation numbers from the paper cited in the introduction.

for t in 1:m.poollength
  hidden = m.encoder_lstm.state[1]
  cell = m.encoder_lstm.state[2]

	# Eq. (8)
  x = cat(repeat(hidden, inner=(1,1,size(input_data,1))),
          repeat(cell, inner=(1,1,size(input_data,1))),
          permutedims(input_data,[2,3,1]), dims=1) |>  # (2*encodersize + poollength) x datapoints x features
			a -> reshape(a, (:, size(input_data,1)*size(input_data,3))) |>  # (2*encodersize + poollength) x (features * datapoints)
      m.encoder_attn  # features * datapoints

	# Eq. (9)
	attn_weights = Flux.softmax( reshape(x, (size(input_data,1), size(input_data,3))))  # features x datapoints
	# Eq. (10)
	weighted_input = attn_weights .* input_data[:,t,:]  # features x datapoints
	# Eq. (11)
	_ = m.encoder_lstm(weighted_input)

  input_encoded[:,t,:] = Flux.unsqueeze(m.encoder_lstm.state[1],2)  # features x 1 x datapoints
end

In order to make this code trainable by Flux, we wrap the input_encoded into a Zygote.Buffer structure, and return copy(input_encoded).

Decoder Network

Model Structure Encoder

Image from Qin et. al., “Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction”, ArXiv, 2017.

The decoder operates on input_encoded from the encoder, i.e. a collection of hidden states of the encoder LSTM network. It also loops over the pooled timesteps to calculate an “attention weight” to find relevant encoder hidden states and to calculate a “context vector” as a weighted sum of hidden states.

for t in 1:m.poollength
	# Extract hidden state and cell state from decoder
	hidden = m.decoder_lstm.state[1]
  cell = m.decoder_lstm.state[2]

	# Eq. (12) - (13)
	x = cat(permutedims(repeat(hidden, inner=(1,1,m.poollength)), [1,3,2]),
          permutedims(repeat(cell, inner=(1,1,m.poollength)), [1,3,2]),
          input_encoded, dims=1) |>  # (2*decodersize + encodersize) x poollength x datapoints
      a -> reshape(a, (2*m.decodersize + m.encodersize,:)) |>  #  (2*decodersize + encodersize) x (poollength * datapoints)
			m.decoder_attn |>  # poollength * datapoints
			a -> Flux.softmax(reshape(a, (m.poollength,:)))  # poollength x datapoints

	# Eq. (14)
	context = dropdims(NNlib.batched_mul(input_encoded, Flux.unsqueeze(x,2)), dims=2)  # encodersize x datapoints
	# Eq. (15)
	 = m.decoder_fc(cat(context, input_data[m.orig_idx,t,:]', dims=1))  # 1 x datapoints
	# Eq. (16)
	_ = m.decoder_lstm()
end

The decoder returns the context vector context of the last timestep.

Final Output

The final model output is obtained by feeding the encoder output to the decoder, and calling the final Dense layer on the concatenation of the decoder hidden state and the context vector:

function (m::DARNNCell)(x)
	# Initialization code missing...

	input_data = dropdims(x; dims=3)
	input_encoded = darnn_encoder(m, input_data)
	context = darnn_decoder(m, input_encoded, input_data)
	# Eq. (22)
	return m.decoder_fc_final( cat(m.decoder_lstm.state[1], context, dims=1))
end

Helper functions

To make sure that Flux knows which parameters to train, and how to reset the model, we define

Flux.trainable(m::DARNNCell) = (m.encoder_lstm, m.encoder_attn, m.decoder_lstm,
    m.decoder_attn, m.decoder_fc, m.decoder_fc_final)
Flux.reset!(m::DARNNCell) = Flux.reset!.((m.encoder_lstm, m.decoder_lstm))

When the DA-RNN network is reset, the number of hidden states in the LSTM units does not have the desired size. To initialize them, we feed input data of the right size manually to those layers:

function darnn_init(m::DARNNCell,x)
	m.encoder_lstm(x[:,1,1,:])
	m.decoder_lstm(x[m.orig_idx,1,1,:]')
	return nothing
end
Flux.Zygote.@nograd darnn_init