Saturday, July 26, 2008

What should we call non-evolutionary reinforcement learning?

I work in evolutionary reinforcement learning. That is, I develop reinforcement learning problems, and evolutionary algorithms that solve such problems.

The problem is that many people in reinforcement learning (RL) would say that I'm not working on RL at all. These people work on things like temporal difference learning, policy gradients, or (more likely) some newfangled algorithms I have never heard of, but which are most certainly not evolutionary. Most of the people working on non-evolutionary RL probably don't know much (maybe nothing!) about evolutionary RL either. So disconnected are our communities. It's a shame.

In their discipline-defining (4880 citations on Google Scholar) book "Reinforcement Learning", Sutton and Barto start with defining RL as the study of algorithms that solve RL problems, and mention in passing that they can be solved by evolutionary algorithms as well. The book then mentions nothing more about evolution, and goes on to essentially discuss TD-learning and variations thereof for a few hundred pages.

In practice, the evolutionary and non-evolutionary RL folks publish in different conferences and journals, and don't cite (nor read?) each other much. We write our papers in very different styles (the non-evolutionary RL people having much more maths in them, evolutionary RL researchers often relying on qualitative argument coupled with experimental results), and I for one often simply don't understand non-evolutionary RL papers.

Again, it's a shame. And it would be great if we could find some way of bridging this divide, as we work on the same class of problems.

But to do this, we need to find a way of addressing the issue, which was really the purpose of this blog post. Simply put, what do we call the two classes of algorithms and the research communities studying them? This is an issue I run into now and then, most recently when writing a grant proposal, and now again when preparing lecture slides for a course I'll be teaching this autumn.

The non-evolutionary RL people would not want a negative definition, based on what their algorithms aren't rather than what they are. They would rather go for RL, plain and simple, but this has the problem that non-evolutionary RL is excluded from that field, in spite of being part of the definition. In the proposal we wrote we ended up talking about "classical" versus evolutionary RL, but this has the problem that evolutionary algorithms predated td-learning by several decades. We could also use the term "single-agent RL", but then again, a simple hill-climber is arguably a (degenerate) evolutionary algorithm, and very much single-agent. Besides, there is multi-agent non-evolutionary RL. Sigh.

So I really don't know.

10 comments:

  1. On one hand, I disagree with your core thesis argument, namely that you can conflate all other methods of solving reinforcement learning problems into a single, nameable unit, instead of realizing that there might be many other non-evolutionary approaches. Note that such a description would be as marginalizing as the "non-evolutionary" reinforcement-learning community's non-consideration of the evolutionary school.

    On the other hand, I think it is worthwhile for researchers in evolutionary reinforcement learning to have names for these various approaches, while realizing that these labels may not yet describe the diversity of practice.

    Two other schools that then might be closely related are the CoLT school, concerned with reinforcement learning as a classification setting (which action class with yield minimal loss under PAC-like bounds) as part of the overall program of computational learning theory. Another is closely related to optimization, and is called "neuro-dynamic programming", or the dynamic programming over approximate state architectures.

    ReplyDelete
  2. Hi, Julian!

    Nice to know your blog is still alive!

    Well, I have seen the same situation between the guys who call the EC field of Genetic and Evolutionary Computation and the others who prefer the short form: Evolutionary Computation.

    Would someone tell me what is the difference between Genetic Computation and Evolutionary Computation? I think both of them aim at the same target: Evolution!

    I think you should call it TNAOWORL: The New And Old Wave Of Reinforcement Learning.

    I think the Old School guys from RL may have an arrogant eye upon the EC RL dudes. Like those old school metalheads/headbangers and the New Metal kids. :)

    Bye!

    Marcelo

    ReplyDelete
  3. But, IMHO, your post show a similar kind of vocabulary gap. Indeed, reinforcement learning, and more generally machine-learning problems can be modelled as optimization problems, thus a lot of different algorithms may be used, but are considered to be out of scope.

    For example, you use the term "evolutionary algorithms", but that marginalize very similar algorithms, like simulated annealing or ant colony optimization (some ACO have been proved to be equivalent to an EDA and a gradient ascent, for instance).

    Thus, by using the term "evolutionary algorithms" instead of a more general "metaheuristic" one, you also contributes to divide the fields... And what about a more general "optimization algorithms" term ? For sure, there may be a lot of interesting stuff to use in constraint programming!

    Ok, I'm kidding, but I think the gap naturally tends to be reduced, it is a matter of time!

    ReplyDelete
  4. Heh, point taken Nojhan, but I think you've misunderstood my point. My point is not technique-centered at all, because the techniques themselves don't care if they are lumped together in one classification system or another. Instead, I'm talking about communities of practice.

    Among the folks who work on reinforcement learning, and generally use techniques that practitioners agree aren't evolutionary techniques, there are at least two groups of practioners. How did I know? Because these groups themselves make similar complaints that the work of the other is opaque or has different criteria for judging quality. And these folks certainly do have differences that they care about, and that in order to work with them effectively, you have to care about too.

    ReplyDelete
  5. Indeed, more generally, I've see such problems between 1) problem-centered peoples and algorithm-centered ones and 2) specialists and generalists.

    I think that peoples working on a specific problems claims to be agnostic, but eventually tends to prefer one kind of technique... and that peoples claiming to work on an algorithms tends to prefer a kind of problem.

    At the end, you have two axes : problems/algorithms VS specialists/generalists. Trying to work with peoples on the other side is always a matter of cultural adaptation.

    ReplyDelete
  6. Oh, certainly, although I still think that while it can be a good first order approximation to come in with your own typology (generalist/specialist or technique/problem), it doesn't replace asking the folks themselves. Instead, why not ask them who's doing similar work, who's doing different work on similar problems, and who's doing work they don't consider legitimate? I think that, to some extent, cultural adaptation required cultural approaches.

    ReplyDelete
  7. Yes, and pay special attention to the vocabulary, everyone has different definition of technical terms :-)

    ReplyDelete
  8. Thanks for all the comments!

    John, I'll definitely try to have a look at the fields/approaches you mention. I say "try to" as I'm not sure I understand much, but hey, that's true for most of us and most things in the world.

    Nojhan, I'm not sure the "gap naturally tends to be reduced", as you claim. In fact, there seems to be fewer and fewer people that have a grasp of the whole field of learning/optimization. Once upon a time you could be a "computer scientist" and have a reasonable grasp of what was going on in all subfields of this science. Now it's hard to even keep up with what's happening in evolutionary computation, and if you try to keep up with e.g. the sort of research that gets presented at ICML. You would have to go to all conferences in the world, and not have time to do any research of your own any more.

    But maybe this is unavoidable. Francis Crick eventually quit molecular biology because even he couldn't keep up with even the major things that happened in the field, after having invented it!

    Marcelo: the difference between genetic and evolutionary computation boils down to the (personal? professional?) differences between Goldberg/Koza (ACM) and Fogel (IEEE), I think.

    ReplyDelete
  9. Hey Julian

    I am a CS Engineering student from India. Something has spurred an interest in me on Evolutionary computation, especially in genetic algorithms, and NEAT. I have not gotten too far though..,

    Anyway, given that the game of Life is so chaotic, do you think it is possible to evolve an initial population that keeps growing??

    I thought that'd be the ultimate in GA. I have been looking on the net for help on approach to generating such a population mathematically. I could get only populations. No info on how it was developed.

    Cheers,
    Sanjeev

    ReplyDelete
  10. If you are trying to name the communities, I'd go with GECCO RL vs. ICML RL. If you are trying to name the techniques, I'd say "evolutionary RL", "policy-search RL", "value-function-based RL", "model-based RL"...

    (This page came up on a websearch. Sorry for being late to the party!)

    ReplyDelete