Friday, February 23, 2007

Evolution versus td-learning revisited

One of the papers we are presenting at this year's Computational Intelligence and Games symposium is about comparing td-learning and neuroevolution for learning car racing skills.

The paper (go read it!) contains lots of material, and I won't try to summarize the rather dense methods and results sections here. But let me reflect a bit on some of the major conclusions:

  • First of all, td-learning can be blazing fast when it works like it should. Of course td-learning could potentially be faster than evolution as it learns from feedback during the lifetime of an individual, but we didn't expect it to be quite as fast as it sometimes was. A few times we saw a car controller starting from tabula rasa and going to driving decently between waypoints in a few hundred time steps, maybe 20 seconds of simulated time!

  • But to balance the picture, td-learning can be a bitch. Really. Performance is completely unpredictable, the very same parameter configuration gives completely different learning results in successive runs, and the same configuration can learn very well sometimes and not at all at other times. Often, already learned good behaviour is unlearned after a few more epochs. Etc, etc. It is simply much easier to learn something sensible with evolution than td-learning. And in the end, the best evolved controllers are consistently better than the best td-learned controllers.

  • Which brings us on to the question of whether these effects are inherent to the algorithms, or whether they are an artifact of Simon and me being much more familiar with evolution than td-learning. Interesting question. I don't know. We did, however, bring in Thomas Runarsson to help us with the experiments and he's done quite a bit of td-learning in the past.

  • Another interesting thing that came out of our experiments is how good it is to have a forward model available. Evolving state value functions consistently outperformed direct control. I think the use of forward models might very well be the next big thing in evolutionary robotics. We have a couple of exciting ideas for how to do this, now we just need time to get working on that...

Anyway, that's all for today. Slightly more unstructured than the usual. But so am I.

1 comment:

Marcelo said...

Hi, Julian!

Nice post, friend! :)

I have never heard about TDL. Neuro-evolution seems to be a big deal inside evolutionary computation and AI in general. Have you already heard about David Fogel's Blondie24?

If you permit me, I would like to say that I wrote a reply to your and Amir's comments upon that quote which deals with airplanes, toasters, and algorithms.

Please, see here:

See you!