Author Archives: Tejan Karmali

GSoC’19: Duckietown.jl Summary

By: Tejan Karmali

Re-posted from: https://tejank10.github.io/jekyll/update/2019/08/24/GSoC-2019-Duckietown.jl-Summary.html

Hello there,

Over the past year, I continued my streak with Julia by contributing to some interesting experiments with differentiable programming. That got me super-excited about the paradigm of differentiable learning. The main idea being that if we know the system, it could be used to simplify and accelerate the training process. Together with Mike and Avik, we planned on a mission: a self-driving car simulator using differentiable programming.

We chose duckietown environment to test our approach. Duckietown is a project started by Prof. Liam Paull. It is a miniature model of a town having buildings, vehicles, traffic signals, and pedestrians. Maxime Chevalier-Boisvert et al, from MILA, have built an awesome simulator of the duckietown, called gym-duckietown. It is used for testing algorithms before deploying it in real duckietown environment. But since it is written using python, it is not differentiable. Hence to make it differentiable we had to build it in julia.

Duckietown.jl creation is spread over three parts:

  • The Simulator
  • Rendering with RayTracer
  • Training

Let’s get started!

The Environment

Duckietown environment contains maps for different tasks – straight road, loop, zigzag turns, UdeM, etc. There are also some variants of these maps with dynamic objects, like traffic signal and pedestrians. The maps are encoded in a .yaml files and can be parsed with YAML.jl. Each environment contains a map in the form of a grid. Each element of the grid is allocated a tile: for eg., a road, an asphalt, an office floor or it can be a grassy surface. A logical arrangement of these tiles is all that is required for a bare-bones version of the map to set up. That’s all the straight road and loopy road maps have. To make these maps more challenging, we need to add objects to it. Objects can be static or dynamic. Static objects include house, tree, traffic sign, bus, truck, traffic cone, etc. Dynamic objects are traffic signals and duckies which are the pedestrians in the duckietown. The positions of these objects are defined in the map. An object is represented as meshes, with texture wrapped around it.

UdeM

The Simulator

using Duckietown, Flux, Zygote

sim = Simulator(map_name="straight_road", camera_width = 128, camera_height = 128)

Simulator manages the subtasks involved in running the duckietown and maintains related statistics. The subtasks include updating the states and positions of different objects involved, running an action on duckiebot, maintaining data such as velocity, position, the action performed on the duckiebot, rendering what the bit sees, etc. The parameters of the simulator are defined in a FixedParams object. These are the parameters that define the behavior of the simulator.

Rendering the view

rende_obs(...) is used to render was duckiebot sees. We use differentiable RayTracer.jl for this purpose. For rendering, we first need to define a camera model. The camera needs to know where the bot is looking from & at, dimensions of the image, field of view, focal length, and up vector.

x, y, z = sim.cur_pos
# get the direction in which bot is looking
dx, dy, dz = get_dir_vec(sim.cur_angle)

## Define camera model
# Looking from
eye = Vec3([x], [y], [z])
# Looking at
target = Vec3([x + dx], [y + dy], [z + dz])
# vup is vector pointing in upward direction
vup = Vec3([0f0], [1f0], [0f0])
cam = Camera(eye, target, vup, cam_fov_y, focal_length, cam_width, cam_height)

A scene is generated containing all the objects in the environments. These objects are decomposed into triangles.

## Scene generation
scene = Vector{Triangle}()
# Decompose the objects into triangles
obj_Δs = map(obj->render(obj, fp.draw_bbox), objs)

for  in obj_Δs
    scene = vcat(scene, )
end

Light source and its position is defined, which is then used to raytrace the scene.

# Define light source
light_pos = Vec3([-40f0], [200f0], [100f0])
# PointLight takes color, intensity and position of light source as args
light = PointLight(Vec3([1f0]), 5f15, light_pos)
origin, direction = get_primary_rays(cam)

# Rendering what duckiebot sees
im = raytrace(origin, direction, observation, light, origin, 2)

Taking action

step!(...) is used to take action on the duckiebot. action is a vector of length 2. It specifies the speed of the left and the right wheel. each element belongs to [-1, 1], where positive speed implies moving in the forward direction. Velocities of both the wheels give us the information about the steering direction of the robot. For eg: to move on a straight road both the velocities should be equal, whereas to take a left turn velocity of the left wheel should be less than that of the right wheel. Using this, the robot’s new position and direction is calculated.

A path to be followed is determined by bezier curves. Each tile has its bezier curve defined. For example, a straight road tile would have a straight line as its curve whereas that for a left turn would be approximately circular. Based on this curve, two kinds of rewards are defined. The distance from the curve and angular distance from the tangent of the curve. There is also a penalty to prevent a collision. Each object has a safety circle surrounding itself. Collision penalty is the degree of overlap between the safety circle of the bot and that of an object.

With these details, we are now equipped to train a model!

Training a model

We define a simple Flux model, which extracts features from the image using Conv and passes them onto the FC layers.

# model: Input- Rendering of what duckiebot sees
#        Output- Action to be taken
model = Chain(
           Conv((3, 3), 3=>8, relu, pad = 1),
           MaxPool((2, 2)),
           Conv((3, 3), 8=>16, relu, pad = 1),
           MaxPool((2, 2)),
           Conv((3, 3), 16=>32, relu, pad = 1),
           x -> reshape(x, :, 1),
           Dense((32 * 32 * 32), 64, relu),
           Dense(64, 16, relu),
           Dense(16, 2),
           x -> reshape(x, 2))

opt = ADAM(0.001f0)

We begin with dividing an episode into sequences. Let’s call a sequence as μEpisode. In each μEpisode, actions are performed for a short number of timesteps. We take the loss as negative of reward and add an action penalty. Recall that actions should lie in [-1, 1]. Since for initial few timesteps actions could arbitrarily lie anywhere in the real domain, this penalty is required. Also, the reward is proportional to speed. If Very high action is chosen, then it should also set very high speed and in turn very high reward, which is not expected ideally. Action penalty is somewhat similar to the regularisation loss.

function μEpisode(model, sim, initial_render, μEp_len)
    obs, action, reward, done, info = step!(sim, model(initial_render))
    loss = -reward + action_penalty(action)

    done && return loss

    for iter in 2:μEp_len
        obs, action, reward, done, info = step!(sim, model(obs))
        loss += -reward + action_penalty(action)
        done && return loss
    end

    return loss
end

In the episode!(...) function, the gradient of the loss wrt to the parameters of the model is taken using Zygote.jl a source-to-source AD package. Gradients are clamped to prevent the overflow due to gradient explosion.

function episode!(sim)    
    for μEp in 1:NUM_μEPISODES
        # Get the gradients of μEpisode
        initial_render = render_obs(sim)
        gs = Zygote.gradient(params(model)) do
            μEpisode(model, sim, initial_render, μEPISODE_LENGTH)
        end

        # Update the weights
        for p in params(model)
            clamp!(gs[p], -0.01f0, 0.01f0)
            Flux.Optimise.update!(opt, p, gs[p])
        end

        sim.done && break
    end

    reset!(sim)
end

And after a while, you should be able to see the bot guiding itself on the lane!
Straight road

What’s next?

What a productive summer it was! With Duckietown.jl you can now research autonomous driving in Julia, and also leverage the differentiability aspect of it. I believe this is just a start for differentiable programming. By knowing the system, we can speed up the training of a model on it by leaps and bounds. In the future, I plan to:

  • Transfer learning: Evaluating the performance model trained on one map by testing it on other maps.
  • Defining tasks over different maps
  • There has been some advances in terms of the packages for physical environements for deep learning. I plan to do some experiments on that using diffferentiable programming.

Acknowledgments

I am extremely grateful to my mentor Mike Innes for posing faith in me for this ambitious project. A huge thanks to my fellow GSoC’er Avik Pal for his amazing RayTracer, and helping me out from time to time. I would also like to thank Dhairya Gandhi for his valuable inputs, Julia Computing Bengaluru for hosting me, and Julia Computing for providing machines for training. Finally, I thank Google for providing me this amazing opportunity in being part of the mission to drive open-source culture.s

GSoC’19: Differentiable Duckietown

By: Tejan Karmali

Re-posted from: https://tejank10.github.io/jekyll/update/2018/08/19/GSoC-2019-Differentiable-Duckietown.html

Hello there,

Over past one year I continued my streak with Julia by contributing to some interesting experiments with differentiable programming. That got me super-excited about the paradigm of differentiable learning. Main idea being that if we have knowledge about the system, it could be used to simplify and accelerate the training process. Together with Mike and Avik, we planned on a mission: a self driving car simulator using differentiable programming.

We chose duckietown environment to test our approach. Duckietown is a project started by Prof. Liam Paull. It is miniature model of a town having buildings, vehicles, traffic signals, and pedestrians. Maxime Chevalier-Boisvert et al, from MILA have built an awesome simulator of the duckietown, called gym-duckietown. It is used for testing algorithms before deploying it in real duckietown environment. But since it is written using python, it is not differentiable. Hence to make it differentiable we had to build it in julia.

Let’s get started!

The Environment

Duckietown environment comtains maps for different tasks – straight road, loop, zigzag turns, UdeM etc. There are also some variants of these maps with dynamic objects, like traffic signal and pedestrians. The maps are encoded in a .yaml files and can be parsed with YAML.jl. Each environments contains a map in the form of a grid. Each element of the grid is allocated a tile: for eg. a road, an asphault, an office floor or it can be a grassy surface. A logical arrangement of these tiles is all that is required for a bare bones version of map to set up. That’s all the straight road and loopy road maps have. To make these maps more challenging, we need to add objects to it. Objects can be static or dynamic. Static ojects incnlude a building, tree, triffic sign, bus, truck, traffic cone etc. Dynamic objects are traffic signals, and duckies which are the pedestrians in the duckietown. The positions of these objects are defined in the map. An object is represented as meshes, with texture wrapped around it.

UdeM

The Simulator

using Duckietown, Flux, Zygote

sim = Simulator(map_name="straight_road", camera_width = 128, camera_height = 128)

Simulator manages the subtasks involved in running the duckietown and
maintains related statistics. The subtasks include updating the states and positions of different objects involved, running an action on duckiebot, maintaining data such as velocity, position, action performed on the duckiebot, rendering what the bit sees etc. The parameters of the simulator are defined in a FixedParams objects. These are the parameters that define the behaviour of the simulator.

Rendering the view

renderobs(...) is used to render was duckiebot sees. We use differentiable RayTracer.jl for this purpose. For rendering we first need to define a camera model. Camera needs to know where the bot is looking from & at, dimensions of image, fied of view, focal length, and up vector.

x, y, z = sim.cur_pos
# get the direction in which bot is looking
dx, dy, dz = get_dir_vec(sim.cur_angle)

## Define camera model
# Looking from
eye = Vec3([x], [y], [z])
# Looking at
target = Vec3([x + dx], [y + dy], [z + dz])
# vup is vector pointing in upward direction
vup = Vec3([0f0], [1f0], [0f0])
cam = Camera(eye, target, vup, cam_fov_y, focal_length, cam_width, cam_height)

A scene is generated containing all the objects in the environments. These objects are decomposed into triangles.

## Scene generation
scene = Vector{Triangle}()
# Decompose the objects into triangles
obj_Δs = map(obj->render(obj, fp.draw_bbox), objs)

for  in obj_Δs
    scene = vcat(scene, )
end

Light source and its position is defined, which is then used to raytrace the scene.

# Define light source
light_pos = Vec3([-40f0], [200f0], [100f0])
# PointLight takes color, intensity and position of light source as args
light = PointLight(Vec3([1f0]), 5f15, light_pos)
origin, direction = get_primary_rays(cam)

# Rendering what duckiebot sees
im = raytrace(origin, direction, observation, light, origin, 2)

Taking action

step!(...) is used to take action on the duckiebot. action is a vector of length 2. It specifies
speed of left and right wheel. each element belongs to [-1, 1], where positive speed implies moving in the forward direction. Velocities of both the wheels give us the information about the steering direction of the robot. For eg: to move on a straight road both the velocities should be equal, whereas to take a left turn velocity of left wheel should be less than that of right wheel. Using this, robot’s new position and direciton is calculated.

A path to be followed is determined by bezier curves. Each tile has its own bezier curve defined. For example, a straight road tile would have a straight line as its curve whereas that for a left turn would be approximately circular. Based on this curve, two kinds of rewards are defined. The distance from the curve and angular distance from the tangent of the curve. There is also a penalty to prevent collision. Each object has a safety circle surrounding itself. Collision penalty is the degree of overlap between the the safety circle of the bot and that of an object.

With these details, we are now equipped to train a model!

Training a model

We define a simple Flux model, which extracts features from the image using Conv and passes them onto the FC layers.

# model: Input- Rendering of what duckiebot sees
#        Output- Action to be taken
model = Chain(
           Conv((3, 3), 3=>8, relu, pad = 1),
           MaxPool((2, 2)),
           Conv((3, 3), 8=>16, relu, pad = 1),
           MaxPool((2, 2)),
           Conv((3, 3), 16=>32, relu, pad = 1),
           x -> reshape(x, :, 1),
           Dense((32 * 32 * 32), 64, relu),
           Dense(64, 16, relu),
           Dense(16, 2),
           x -> reshape(x, 2))

opt = ADAM(0.001f0)

We begin with dividing an episode into sequences. Let’s call a sequence as μEpisode. In each μEpisode, actions are perormed for short number of timesteps. We take loss as negative of reward and add an action penalty. Recall that actions should lie in [-1, 1]. Since for initial few timesteps actions could arbirtrarily lie anywhere in the real domain, this penalty is required. Also, reward is proportional to speed. If Very high action is chosen, than it sould also set very high speed and in turn very high reward, which is not expected ideally. Action penalty is somewhat similar to the regularisation loss.

function μEpisode(model, sim, initial_render, μEp_len)
    obs, action, reward, done, info = step!(sim, model(initial_render))
    loss = -reward + action_penalty(action)

    done && return loss

    for iter in 2:μEp_len
        obs, action, reward, done, info = step!(sim, model(obs))
        loss += -reward + action_penalty(action)
        done && return loss
    end

    return loss
end

In the episode!(...) function, gradient of the loss wrt to the parameters of the model is taken using Zygote.jl, a source-to-source AD package. Gradients are clamped in order ot prevent the overflow due to gradient explosion.

function episode!(sim)    
    for μEp in 1:NUM_μEPISODES
        # Get the gradients of μEpisode
        initial_render = render_obs(sim)
        gs = Zygote.gradient(params(model)) do
            μEpisode(model, sim, initial_render, μEPISODE_LENGTH)
        end

        # Update the weights
        for p in params(model)
            clamp!(gs[p], -0.01f0, 0.01f0)
            Flux.Optimise.update!(opt, p, gs[p])
        end

        sim.done && break
    end

    reset!(sim)
end

And after a while, you should be able to see the bot guiding itself on the lane!
Straight road

What next?

Final Thoughts

I believe this is just a start for differentiable programming. By having knowledge about the system, we can speed up the training of a model on it by leaps and bounds.

It was a very productive summer for me. I am extremely grateful to my mentor Mike Innes for posing faith in me for this ambitious project. A huge thanks to my fellow GSoC’er Avik Pal for his amazing RayTracer, and helping me out from time to time. I would also like to thank Dhairya Gandhi for his valuable inputs, Julia Computing Bengaluru for hosting me, and Julia Computing for providing machines for training. Finally, I thank Google for providing me this amazing opportunity in being part of the mission to drive open source culture.

GSoC’18: Final Summary

By: Tejan Karmali

Re-posted from: https://tejank10.github.io/jekyll/update/2018/08/06/GSoC-Final-Summary.html

Hello, world!

In this post I’m going to briefly summarize about the machine learning models I have worked on during this summer for GSoC. I worked towards enriching model zoo of Flux.jl, a machine learning library written in Julia. My project covered Reinforcement Learning and computer vision models.

The project is spread over these 4 codebases

  1. Flux-baselines
  2. AlphaGo.jl
  3. GAN models
  4. DNI model

In the process, I could achieve most of my targets. I had to skip a few of them, and also made some unplanned models. Below, I discuss about these issues repository wise.

1. Flux-baselines

Flux baselines is a collection of various Deep Reinforcement Learning models. This includes Deep Q Networks, Actor-Critic and DDPG.

Basic structure of an RL probem is as folowd: There is an environment, let’s say game of pong is our environment. The environment may contain many ojects which interact with each other. In pong there are 3 objects: a ball and 2 paddles. The environment has a state. It is the current situation present in the environment in terms of various features of the objects in it. These features could be position, velocity, color etc. pertaining to the objects in the it. An actions needs to be chosed to play a move in the environment and obtain the next state. Actions will be chosen till the game ends. An RL model basically finds the actions that needs to be chosen.

Over past few years, deep q learning has gained lot of popularity. After the paper by Deep Mind about the Human level control sing reinforcement learning, there was no looking back. It combined the advanced in RL as well as deep learning to get an AI player which had superhuman performance. I made the basic DQN and Double DQN during the pre-GSoC phase, followed by Duel DQN in the first week on GSoC.

The idea used in the A2C model is different from the one in DQN. A2C falls in the class of “Actor-Critic” models. In AC models we have 2 neural networks, policy network and value network. policy network accepts the state of the game and returns a probability distribution over the action space. Value Nework takes the state and action chosen using policy network as input and determines how suitable is that action for that state.

DDPG is particularly useful when the actions which needs to be chosed are spread over a continuous space. one possible solution you may have in mind is that what if we discretize the action space? If we discretize it narrowly we end up with a large number of actions. If we discretize it sparsely then we lose important data.

DDPG

                                                 DDPG: Score vs Episodes

Some of these models have been deployed on Flux’s website. CartPole example has been trained on Deep Q Networks and the pong example is trained on Duel-DQN.

Here is a demo of Pong trained using Flux.

Targets achieved

  1. Advantage Actor-Critic
  2. Duel DQN

Extra mile

  1. DDPG
  2. Prioritized DQN

Future Work

  1. Add more variety of models, especially the ones which have come up in the last 18 momnths.
  2. Create an interface to easily train and test any environment from OpenAIGym.jl.

2. AlphaGo.jl

This mini-project of the GSoC phase 2 was the most challenging part. AlphaGo Zero is a breakthrough AI by Google DeepMind. It is an AI to play Go, which is considered to be one of most challeenging games in the world, mainly due to number of states it can lead to. AlphaGo Zero defeated the best Go player in the world. AlphaFo.jl’s objective is achieve the results produced by AlphaGo Zero algorithm over Go, and achieve similar results on any zero-sum game.

Now, we have a package to train AlphaGo zero model in Julia! And it is really simple to train the model. We just have to pass the training parameters, the environment on which we want to train the model and then play with it.
For more info in the AlphaGo.jl refer to the blog post.

Targets achieved

  1. Game of Go
  2. Monte Carlo tree search

Targets couldn’t achieve

  1. Couldn’t train the model well

Extra Mile

  1. Game of Gomoku to test the algorithm (since it is easier game)

Future work

  1. Train a model on any game
  2. AlphaChess

3. Generative Adversarial Networks

GANs have been extremely suceessful in learning the underlying representation of any data. By doing so, it can reproduce some fake data. For example the GANs trained on MNIST Human handwritten digits dataset can produce some fake images which look very similar to those in the MNIST. These neural nets have great application in image editing. It can remove certain features from the image, add some new ones; depending on the dataset. The GANs contain of two networks: generator and discriminator. Generator’s objective os to generate fake images awhereas the discriminator’s objective is to differentiate between the fake images generted by thhe generator and the real images in the dataset.

LSGAN
DCGAN
WGAN
MADE

               LSGAN                               DCGAN                               WGAN                               MADE

Targets acheived

  1. ConvTranspose layer
  2. DCGAN

Extra Mile

  1. LSGAN
  2. WGAN

Future work

  1. More models of GAN like infoGAN, BEGAN, CycleGAN
  2. Some cool animations with GANs
  3. Data pipeline for training and producing images with dataset, and GAN type as input.

4. Decoupled Neural Interface

Decoupled Neural Interface is a new technique to train the momdel. It does not use the backpropagation from the output layer right upto the input layer. Instead it uses a trick to “estimate” the gradient. It has a small linear layer neural network to predict the gradients, instead of running the backpropagation rather than finding the true gradients. The advantage of such a model is that it can be parallelized. This technique results in slight dip in the accuracy, but we have improved speed if we have parallelized the layers in the network.

loss
loss

Targets achieved:

Conclusion

During the past three months, I learn a lot about Reinforcement Learning and AlphaGo in particular. I experienced training an RL model for days, finally saw it working well! I encountered the issues faced in training the models and learnt to overcome them. All in all, as an aspiring ML engineer these three months have been the most productive months. I am glad that I could meet most of my objectives. I have worked on some extra models to make up for the objectives I could not meet.

Acknowledgements

I really would like to thank my mentor Mike Innes for guiding me throughtout the project, and James Bradbury for his valuable inputs for improving the code in the Reinforcement Learning models. I also would like to thank Neethu Mariya Joy for deploying the trained models on th web. And last but not the least, NumFOCUS: for sponsoring me and all other JSoC students for JuliaCon’18 London.