Tuesday, February 06, 2007

Sensorless but not senseless

Imagine you were driving a car in a long, dark, tunnel, and suddenly your headlights started flickering, going off and on irregularly, with intervals of a second or so. What would you do? It seems the only way you could keep from crashing would be to accurately remember the bends of the tunnel. For example, if the lights went out just before a left turn, you would have to predict in how long time the turn starts and start turning appropriately.

Now imagine you were driving a radio-controlled car, but due to some low-grade engineering, there was an unfortunate delay between when you issue a command (such as turning left) and the command has an effect (angling the wheels). How would you handle this? It seems you would have to predict the effects of your turning, so that you started and stopped turning slightly before you seemed to need to.

These two situations are the inspiration for a paper we (Hugo, me and Magdalena) are presenting at the 2007 IEEE-ALife Symposium in Hawaii. Essentially, we wanted to see whether we could force our controllers to learn to predict. Of course, we used my good old car racing simulator for the experiments. To remind you, this is what one of our evolved controllers looks like when all six sensors are turned on and current: (The strange lines represent the sensors)

Now, let's turn of the sensors intermittently and see what happens: (No lines = no sensors)

Not very pretty. Can we improve on this? We tried, by recording the car driving around a few tracks and trying to teach neural networks to predict the future (what sensor input comes next, given current input and action taken). First, we used backpropagation for this. Combining such a predictor with the same evolved controller as before looks like this:

Better than before, but not much.

So we tried another thing. Instead of training the predictor networks to predict, we evolved them for being able to help the controller to drive. It might at first not seem like much of a difference, but in fact it is crucial. Look for yourselves:

CLearly much better. And the difference turns out to be not only quantitative, but also qualitative. But before we go into the analysis, let's look at the other task: the delay task. Below is the same good old evolved controller as in the above examples, but with all sensor inputs delayed by three time steps:

Looks like the driver is drunk, doesn't it?

Let's see if we can do something about this. First, we try to predict the current sensory state from the outdated perceptions, using a predictor trained with backpropagation. We then get something like this:

Pretty terrible. The driver went from drunk to stoned.

The next step was to instead evolve a predictor for maximum performance, as we did with the intermittent task above. Again, the result is strikingly different:

So, what's the take-home message from this? That evolution works better than backpropagation for learning predictors? Not so simple. Because when we analyse the various evolved and trained predictors, it turns out that the evolved predictors don't actually do any prediction! In other words, the mean squared error of the predicted next state and the real next state is quite low for the trained predictors, but horribly high for the evolved ones!

So, again, what does this mean? For one thing, the type of neural networks and the data we are using (only one prior state and action) is not enough to predict the next state as accurately as we would have needed. Therefore the predictors we got with supervised learning were not up to the task. Evolution, on the other hand, quickly figures out that accurate prediction is impossible and decides to go for something else. The evolved predictors instead act as extensions of the controller, changing its behaviour so that it copes with the missing or delayed data better. These changes might include slower driving, higher propensity for turning one way rather than the other, or making sure that when bumping into walls, the back end of the car goes first, rather than the front of the car.

At least, this is what we think happens. Let's say that the topic merits further study... please read the paper if you're interested.

I'm not so sure if any of the above made much sense to you, dear reader. Is my habit of trying to summarise the main points of whole papers a good one? Or does it all just become compressed to the point of unintelligibility? Tell me!


mirko said...

interesting anyway, as usual ;-)

Sergiu Goschin said...

This is really interesting. But there are some things are kind of fuzzy for me (as i am working on something similar right now):
1. How did you train the neural network with backpropagation? How did you know the desired outputs in this context?
2. What are the "states" in your approach - sensory input? If not, how did you represent them?
3. Isn't it normal that the prior state and action are not enough for more complex tasks? Because in the end this seems improved reactive control - of course i don't know the architecture - it's just an idea. I mean shouldn't there be a more complex handling of state history?
4. I hope you will be able to post the paper once it's published. It seems really interesting
I am working now in combining reinforcement learning and evolutionary computation for simulated robot controllers.

Togelius said...

Thanks for the comments guys! As to Sergiu's questions:

1. We let the controller drive the car a couple of laps around the track, and recorded the succession of states. So the desired output is the next state, given previous state and action (at 20 Hz).

2. Yes. Six wall sensors, speed and way point sensor.

3. As what we're learning is not control but prediction (we've already learned control) it is more accurate to say that we're assuming that the task is Markovian. But you're right, this assumption is clearly not correct. It very rarely is. The question is just how wrong it is.

4. It's linked from the blog post now, and also from my home page at julian.togelius.com. There you'll also find another new paper on comparing evolution and td-learning, which could be of interest to you.

Would you mind telling me more about how you plan to combine evolution and reinforcement learning? In some recent experiments we're currently writing up for JMLR we've used td-learning for lifetime learning and evolution for learning over generations to sometimes very good effect, but it seems tricky to get it working consistently.

Sergiu Goschin said...

Thanks for the response. I answered in more details in an email.
And another question popped in my head: did you try the evolved neural networks on different tracks or environments different from the one you trained them in?
Best regards,

Togelius said...

As to Sergiu's last question:

Yes. The predictors seem to generalise quite well over different tracks, which makes sense given that the controllers themselves have been evolved to work with several different tracks. But the predictors don't seem to generalise well over different controllers, so it seems that predictors take advantage of the way each controller works - another sign that they are not actually doing proper prediction.

Again, there's more on this in the paper.