Friday, 15 March 2019

Reinforcement Q-Learning in ESL

The language ESL merges Actors with Function Programming and other features to make the construction of highly concurrent applications as easy as possible. ESL is currently in development and we are developing support for Machine Learning, and Reinforcement Learning in particular.

A form of Reinforcement Learning is called Q-Learning (described here). A simple example is a Predator-Prey situation in a grid-world where a predator can move horizontally and vertically in the grid and must navigate to a randomly placed prey. To start off, the predator has no knowledge of how to do this, it just moves randomly in the world. When it catches the prey, the predator gets a reward. Q-Learning will then back-up the reward along the path that led to the goal being achieved thereby reinforcing that policy the next time it is encountered but he predator. If this is repeated a sufficient number of times, the resulting policy will allow the predator to catch the randomly placed prey every time, even though the hard-coded behaviour for the predator has no knowledge of such a strategy.

The video below shows three things:
  1. An implementation of a generalised Q-Learning policy creation function mkPolicy in ESL.
  2. A Predator-Prey example that consisting of a world and an independent predator actor. The predator actor uses mkPolicy to create a policy, and then supplies information to improve the policy.
  3. A run of the example showing two phases: the learning phase followed by a stable phase. The learning phase starts the predator and prey in random positions and uses the current version of the policy (which is initially random) until the predator finds the prey. During the learning phase the policy is updated with information about the world and gradually improves. The stable phase does not change the policy: it just uses it. You can see that the learning phase involves lots of random movement (although it gets less random towards the end) and the stable phase is very directed.
To see the source code, you may need to halt the video play or you can download the files policy.esl and policy_pred.esl.

No comments:

Post a Comment