Introduction
I created a reinforcement learning simulator in OpenGL/C++ to study emergent behavior in simple simulated robots. In these simulations the goal is finding the best configuration of a creature, that allows it to travel the farthest distance across a terrain in a few seconds, for example. Creatures are called that because they move as though they were living. They are made of pistons and rigid bars but it's easier to understand if we call them muscles and bones. The best way to understand is to watch the video below.
I was inspired by seeing a few Youtube videos, and thought that watching those simulated creatures were so cool. So I decided to try it myself. I did my first version in 2D and I used to just watch the creatures walking across the screen on my TV. I could watch all the different types of locomotion all day. Then I said pfft, 2D? Go big or go home and I wrote the 3D version using openGL in C/C++.
How it works
Our goal is to find the optimal configuration of Muscles and Bones (ie: locations, strengths, lengths, quantities, etc.) that leads to the maximum displacement of the creature. To do this, we must start with an initial population that is randomly generated. We then simulate each creature so that we can determine how well they performed. There are many different ways to measure fitness, for simplicity, I choose distance from the origin.
After we finish the whole population, we sort and rank them from best to worst. We then apply some selection function which dictates how likely a creature is to make it to the next generation, given their rank. Obviously, you want it to be more likely that the better creatures make it, but we also need good amount of variation. This was actually one of the points of investigation I wrote about in a paper. I wanted to know how various selection functions affected how quickly the population reached it's maximum, as well as what that maximum was.
If this was all that we did, all that would happen is that we would find the best creature from the initial population. We have to add variations to the populations, we do this through mutations. There are a set of ~8 mutations that change the various properties of the components and the creature as a whole. This allows us to sample a wider range of creatures. We iterate this process until some end condition. It can be as simple as when you feel like it, or after a set number of generations, after a particular fitness is achieved.
Outcome
At the end of all this, we end up 'finding' better and better creatures with each generation. Finally after a period, there will be so few major improvements that the only thing left is fine tuning the structure. Even this has to come to an end and improvement comes to a halt all together, this can be considered the end state of the program. You could imagine with all the possible configurations, that it is almost impossible to find 'the' maximum, but we can get pretty close! I've since extended the creatures to control their muscles using Neural Networks that mimic brains. This gives them a significantly increased capabilities and some interesting results, so stay tuned for more on that!