The paper (go read it!) contains lots of material, and I won't try to summarize the rather dense methods and results sections here. But let me reflect a bit on some of the major conclusions:
- First of all, td-learning can be blazing fast when it works like it should. Of course td-learning could potentially be faster than evolution as it learns from feedback during the lifetime of an individual, but we didn't expect it to be quite as fast as it sometimes was. A few times we saw a car controller starting from tabula rasa and going to driving decently between waypoints in a few hundred time steps, maybe 20 seconds of simulated time!
- But to balance the picture, td-learning can be a bitch. Really. Performance is completely unpredictable, the very same parameter configuration gives completely different learning results in successive runs, and the same configuration can learn very well sometimes and not at all at other times. Often, already learned good behaviour is unlearned after a few more epochs. Etc, etc. It is simply much easier to learn something sensible with evolution than td-learning. And in the end, the best evolved controllers are consistently better than the best td-learned controllers.
- Which brings us on to the question of whether these effects are inherent to the algorithms, or whether they are an artifact of Simon and me being much more familiar with evolution than td-learning. Interesting question. I don't know. We did, however, bring in Thomas Runarsson to help us with the experiments and he's done quite a bit of td-learning in the past.
- Another interesting thing that came out of our experiments is how good it is to have a forward model available. Evolving state value functions consistently outperformed direct control. I think the use of forward models might very well be the next big thing in evolutionary robotics. We have a couple of exciting ideas for how to do this, now we just need time to get working on that...
Anyway, that's all for today. Slightly more unstructured than the usual. But so am I.