I work in evolutionary reinforcement learning. That is, I develop reinforcement learning problems, and evolutionary algorithms that solve such problems.
The problem is that many people in reinforcement learning (RL) would say that I'm not working on RL at all. These people work on things like temporal difference learning, policy gradients, or (more likely) some newfangled algorithms I have never heard of, but which are most certainly not evolutionary. Most of the people working on non-evolutionary RL probably don't know much (maybe nothing!) about evolutionary RL either. So disconnected are our communities. It's a shame.
In their discipline-defining (4880 citations on Google Scholar) book "Reinforcement Learning", Sutton and Barto start with defining RL as the study of algorithms that solve RL problems, and mention in passing that they can be solved by evolutionary algorithms as well. The book then mentions nothing more about evolution, and goes on to essentially discuss TD-learning and variations thereof for a few hundred pages.
In practice, the evolutionary and non-evolutionary RL folks publish in different conferences and journals, and don't cite (nor read?) each other much. We write our papers in very different styles (the non-evolutionary RL people having much more maths in them, evolutionary RL researchers often relying on qualitative argument coupled with experimental results), and I for one often simply don't understand non-evolutionary RL papers.
Again, it's a shame. And it would be great if we could find some way of bridging this divide, as we work on the same class of problems.
But to do this, we need to find a way of addressing the issue, which was really the purpose of this blog post. Simply put, what do we call the two classes of algorithms and the research communities studying them? This is an issue I run into now and then, most recently when writing a grant proposal, and now again when preparing lecture slides for a course I'll be teaching this autumn.
The non-evolutionary RL people would not want a negative definition, based on what their algorithms aren't rather than what they are. They would rather go for RL, plain and simple, but this has the problem that non-evolutionary RL is excluded from that field, in spite of being part of the definition. In the proposal we wrote we ended up talking about "classical" versus evolutionary RL, but this has the problem that evolutionary algorithms predated td-learning by several decades. We could also use the term "single-agent RL", but then again, a simple hill-climber is arguably a (degenerate) evolutionary algorithm, and very much single-agent. Besides, there is multi-agent non-evolutionary RL. Sigh.
So I really don't know.