# Intro to Machine Learning with TensorFlow.jl

In this blog post, I am going to go through as series of neural network structures.
This is intended as a demonstration of the more basic neural net functionality.
This blog post serves as an accompanyment to the introduction to machine learning chapter of the short book I am writing (
Currently under the working title “Neural Network Representations for Natural Language Processing”)

I do have an earlier blog covering some similar topics.
However, I exect the code in this one to be a lot more sensible,
since I am now much more familar with TensorFlow.jl, having now written a significant chunk of it.
Also MLDataUtils.jl is in different state to what it was.

Input:

# MNIST classifier

This is the most common benchmark for neural network classifiers.
MNIST is a collection of hand written digits from 0 to 9.
The task is to determine which digit is being shown.
With neural networks this is done by flattening the images into vectors,
and using one-hot encoded outputs with softmax.

Input:

Output:

A visualisation of one of the examples from MNIST.
Code is a little complex because of the unflattening, and adding a border.

Input:

Output:

10

20

30

10

20

30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

In this basic example we use a traditional sigmoid feed-forward neural net.
It uses just a single wide hidden layer.
It works surprisingly well compaired to early benchmarks.
This is becuase the layer is very wide compaired to what was possible 30 years ago.

Input:

Output:

Input:

Output:

<Tensor Group:1 shape=unknown dtype=Any>


### Train

We use normal minibatch training with Adam.
We do use relatively large minibatches, as that gets best performance advantage on GPU,
by minimizing memory transfers.
A more advanced implementation might do a the batching within Tensorflow,
rather than batching outside tensorflow and invoking it via run.

Input:

Output:

Input:

Output:

25

50

75

100

0.25

0.50

0.75

1.00

training loss

Input:

Output:

## Advanced MNIST classifier

Here we will use more advanced TensorFlow features, like indmax,
and also a more advanced network.

Input:

Output:

Input:

Output:

<Tensor Group:1 shape=unknown dtype=Any>


Input:

Input:

Output:

Input:

Output:

25

50

75

100

0.5

1.0

basic

### Test

Input:

Output:

It can be seen that overall all the extra stuff done in the advanced model did not gain much.
The margin is small enough that it can be attributed to in part to luck – repeating it can do better or worse depending on the random initialisations.
Classifying MNIST is perhaps too simpler problem for deep techneques to pay off.

# Bottle-knecking Autoencoder

An autoencoder is a neural network designed to recreate its inputs.
There are many varieties, include RBMs, DBNs, SDAs, mSDAs, VAEs.
This is one of the simplest being based on just a feedforward neural network.

The network narrows into to a very small central layer – in this case just 2 neurons,
before exampanding back to the full size.
It is sometimes called a Hour-glass, or Wine-glass autoencoder to describe this shape.

Input:

Output:

Input:

Output:

<Tensor Group:1 shape=unknown dtype=Any>


The choice of activation function here, is (as mentioned in the comments) a bit special.
On this particular problem, as a deep network, sigmoid was not going well presumably because of the exploding/vanishing gradient issue that normally cases it to not work out (though I did not check).

Switching to ReLU did not help, though I now suspect I didn’t give it enough tries.
ReLU6 worked great the first few tries, but coming back to it later,
and I found I couldn’t get it to train because one or both of the hidden units would die,
which I did see the first times I trained it but not as commonly.

The trick to make this never happen was to allow the units to turn themselves back on.
This is done by providing a non-zero gradient for the off-states.
A leaky RELU6 unit.
Mathematically it is given by
$%

## Training

Input:

Input:

Output:

Input:

Output:

20

40

60

0.02

0.04

0.06

Autoencoder Loss

Input:

Output:

reconstruct (generic function with 1 method)


Input:

Output:

20

40

60

10

20

30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Input:

Output:

20

40

60

10

20

30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

## Visualising similarity

One of the key uses of an autoencoder such as this is to project from a the high dimentional space of the inputs, to the low dimentional space of the code layer.

Input:

Output:

200

400

600

200

400

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Input:

A high-resolution PDF with more numbers shown can be downloaded from here

So the position of each digit shown on the scatter-plot is given by the level of activation of the coding layer neurons.
Which are basically a compressed repressentation of the image.

We can see not only are the images roughly grouped acording to their number,
they are also positioned accoridng to appeance.
In the top-right it can be seen arrayed are all the ones.
With there posistion (seemingly) determined by the slant.
Other numbers with similarly slanted potions are positioned near them.
The implict repressentation found using the autoencoder unviels hidden properties of the images.

# Conclusion

We have presented a few fairly basic neural network models.
Hopefully, the techneques shown encourage you to experiment further with machine learning with Julia, and TensorFlow.jl.