By: Logan Kilpatrick
Re-posted from: https://towardsdatascience.com/working-with-flux-jl-models-on-the-hugging-face-hub-b95af2b80a47?source=rss-2c8aac9051d3------2
How to use the Julia Deep Learning library to interact with models from Hugging Face
By: Logan Kilpatrick
Re-posted from: https://towardsdatascience.com/working-with-flux-jl-models-on-the-hugging-face-hub-b95af2b80a47?source=rss-2c8aac9051d3------2
How to use the Julia Deep Learning library to interact with models from Hugging Face
By: DSB
Re-posted from: https://medium.com/coffee-in-a-klein-bottle/deep-learning-with-julia-e7f15ad5080b?source=rss-8bd6ec95ab58------2

Flux.jl is the most popular Deep Learning framework in Julia. It provides a very elegant way of programming Neural Networks. Unfortunately, since Julia is still not as popular as Python, there aren’t as many tutorial guides on how to use it. Also, Julia is improving very fast, so things can change a lot in a short amount of time.
I’ve been trying to learn Flux.jl for a while, and I realized that most tutorials out there are actually outdated. So this is a brief updated tutorial.
So, the goal of this tutorial is to build a simple classification Neural Network. This will be enough for anyone who is interested in using Flux. After learning the very basics, the rest is pretty much altering Networks architectures and loss functions.
Instead of importing data from somewhere, let’s do everything self-contained. Hence, we write two auxiliary functions to generate our data:
#Auxiliary functions for generating our data
function generate_real_data(n)
x1 = rand(1,n) .- 0.5
x2 = (x1 .* x1)*3 .+ randn(1,n)*0.1
return vcat(x1,x2)
end
function generate_fake_data(n)
θ = 2*π*rand(1,n)
r = rand(1,n)/3
x1 = @. r*cos(θ)
x2 = @. r*sin(θ)+0.5
return vcat(x1,x2)
end
# Creating our data
train_size = 5000
real = generate_real_data(train_size)
fake = generate_fake_data(train_size)
# Visualizing
scatter(real[1,1:500],real[2,1:500])
scatter!(fake[1,1:500],fake[2,1:500])

The creation of Neural Network architectures with Flux.jl is very direct and clean (cleaner than any other Library I know). Here is how you do it:
function NeuralNetwork()
return Chain(
Dense(2, 25,relu),
Dense(25,1,x->σ.(x))
)
end
The code is very self-explanatory. The first layer is a dense layer with input 2, output 25 and relu for activation function. The second is a dense layer with input 25, output 1 and a sigmoid activation function. The Chain ties the layers together. Yeah, it’s that simple.
Next, let’s prepare our model to be trained.
# Organizing the data in batches
X = hcat(real,fake)
Y = vcat(ones(train_size),zeros(train_size))
data = Flux.Data.DataLoader(X, Y', batchsize=100,shuffle=true);
# Defining our model, optimization algorithm and loss function
m = NeuralNetwork()
opt = Descent(0.05)
loss(x, y) = sum(Flux.Losses.binarycrossentropy(m(x), y))
In the code above, we first organize our data into one single dataset. We use the DataLoader function from Flux, that helps us create the batches and shuffles our data. Then, we call our model and define the loss function and the optimization algorithm. In this example, we are using gradient descent for optimization and cross-entropy for the loss function.
Everything is ready, and we can start training the model. Here, I’ll show two way of doing it.
ps = Flux.params(m)
epochs = 20
for i in 1:epochs
Flux.train!(loss, ps, data, opt)
end
println(mean(m(real)),mean(m(fake))) # Print model prediction
In this code, first we declare what parameters are going to be trained, which is done using the Flux.params() function. The reason for this is that we can choose not to train a layer in our network, which might be useful in the case of transfer learning. Since in our example we are training the whole model, we just pass all the parameters to the training function.
Other then this, there is not much to be said. The final line of code is just printing the mean prediction probability our model is giving.
m = NeuralNetwork()
function trainModel!(m,data;epochs=20)
for epoch = 1:epochs
for d in data
gs = gradient(Flux.params(m)) do
l = loss(d...)
end
Flux.update!(opt, Flux.params(m), gs)
end
end
@show mean(m(real)),mean(m(fake))
end
trainModel!(m,data;epochs=20)
This method is a bit more convoluted, because we are doing the training “manually”, instead of using the training function given by Flux. This is interesting since one has more control over the training, which can be useful for more personalized training methods. Perhaps the most confusing part of the code is this one:
gs = gradient(Flux.params(m)) do
l = loss(d...)
end
Flux.update!(opt, Flux.params(m), gs)
The function gradient receives the parameters to which it will calculate the gradient, and applies it to the loss function, that is calculated for the batch d. The splater operator (the three dots) is just a neat way of passing x and y to the loss function. Finally, the update! function is adjusting the parameters according to the gradients, which are stored in the variable gs.
Finally, the model is trained, and we can visualize it’s performance again the dataset.
scatter(real[1,1:100],real[2,1:100],zcolor=m(real)')
scatter!(fake[1,1:100],fake[2,1:100],zcolor=m(fake)',legend=false)

Note that our model is performing quite well, it can properly classify the points in the middle with probability close to 0, implying that it belongs to the “fake data”, while the rest has probability close to 1, meaning that it belongs to the “real data”.
That’s all for our brief introduction. Hopefully this is a first article on a series on how to do Machine Learning with Julia.
Note that this tutorial is focused on simplicity, and not on writing the most efficient code. For that learning how to improve performance, look here.
TL;DR
Here is the code with everything put together:
#Auxiliary functions for generating our data
function generate_real_data(n)
x1 = rand(1,n) .- 0.5
x2 = (x1 .* x1)*3 .+ randn(1,n)*0.1
return vcat(x1,x2)
end
function generate_fake_data(n)
θ = 2*π*rand(1,n)
r = rand(1,n)/3
x1 = @. r*cos(θ)
x2 = @. r*sin(θ)+0.5
return vcat(x1,x2)
end
# Creating our data
train_size = 5000
real = generate_real_data(train_size)
fake = generate_fake_data(train_size)
# Visualizing
scatter(real[1,1:500],real[2,1:500])
scatter!(fake[1,1:500],fake[2,1:500])
function NeuralNetwork()
return Chain(
Dense(2, 25,relu),
Dense(25,1,x->σ.(x))
)
end
# Organizing the data in batches
X = hcat(real,fake)
Y = vcat(ones(train_size),zeros(train_size))
data = Flux.Data.DataLoader(X, Y', batchsize=100,shuffle=true);
# Defining our model, optimization algorithm and loss function
m = NeuralNetwork()
opt = Descent(0.05)
loss(x, y) = sum(Flux.Losses.binarycrossentropy(m(x), y))
# Training Method 1
ps = Flux.params(m)
epochs = 20
for i in 1:epochs
Flux.train!(loss, ps, data, opt)
end
println(mean(m(real)),mean(m(fake))) # Print model prediction
# Visualizing the model predictions
scatter(real[1,1:100],real[2,1:100],zcolor=m(real)')
scatter!(fake[1,1:100],fake[2,1:100],zcolor=m(fake)',legend=false)
Deep Learning with Julia was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.
Authored by Paul Shealy, Senior Software Engineer, and Gopi Kumar, Principal Program Manager, at Microsoft.
Deep learning has received significant attention recently for its ability to create machine learning models with very high accuracy. It’s especially popular in image and speech recognition tasks, where the availability of massive datasets with rich information make it feasible to train ever-larger neural networks on powerful GPUs and achieve groundbreaking results. Although there are a variety of deep learning frameworks available, getting started with one means taking time to download and install the framework, libraries, and other tools before writing your first line of code.
Microsoft’s Data Science Virtual Machine (DSVM) is a family of popular VM images published on the Azure marketplace with a broad choice of machine learning and data science tools. Microsoft is extending it with the introduction of a brand-new offering in this family – the Data Science Virtual Machine for Linux, based on Ubuntu 16.04LTS – that also includes a comprehensive set of popular deep learning frameworks.
Deep learning frameworks in the new VM include:
The image can be deployed on VMs with GPUs or CPU-only VMs. It also includes OpenCV, matplotlib and many other libraries that you will find useful.
Run dsvm-more-info at a command prompt or visit the documentation for more information about these frameworks and how to get started.
Sample Jupyter notebooks are included for most frameworks. Start Jupyter or log in to JupyterHub to browse the samples for an easy way to explore the frameworks and get started with deep learning.
Training a deep neural network requires considerable computational resources, so things can be made significantly faster by running on one or more GPUs. Azure now offers NC-class VM sizes with 1-4 NVIDIA K80 GPUs for computational workloads. All deep learning frameworks on the VM are compiled with GPU support, and the NVIDIA driver, CUDA and cuDNN are included. You may also choose to run the VM on a CPU if you prefer, and that is supported without code changes. And because this is running on Azure, you can choose a smaller VM size for setup and exploration, then scale up to one or more GPUs for training.
The VM comes with nvidia-smi to monitor GPU usage during training and help optimize parameters to make full use of the GPU. It also includes NVIDIA Docker if you want to run Docker containers with GPU access.
The Data Science Virtual Machine family of VM images on Azure includes the DSVM for Windows, a CentOS-based DSVM for Linux, and an Ubuntu-based DSVM for Linux. These images come with popular data science and machine learning tools, including Microsoft R Server Developer Edition, Microsoft R Open, Anaconda Python, Julia, Jupyter notebooks, Visual Studio Code, RStudio, xgboost, and many more. A full list of tools for all editions of the DSVM is available here. The DSVM has proven popular with data scientists as it helps them focus on their tasks and skip mundane steps around tool installation and configuration.

To try deep learning on Windows with GPUs, the Deep Learning Toolkit for DSVM contains all tools from the Windows DSVM plus GPU drivers, CUDA, cuDNN, and GPU versions of CNTK, MXNet, and TensorFlow.
We invite you to use the new image to explore deep learning frameworks or for your machine learning and data science projects – DSVM for Linux (Ubuntu) is available today through the Marketplace. Free Azure credits are available to help get you started.
Paul & Gopi