Friday, March 25, 2016

A way to deal with enormous branching factors

When you play Chess, there are on average something like 30 different moves you can make each turn. In other words, the branching factor is around 30. Go, on the other hand, has a branching of 300 or so. The standard Minimax algorithm, traditionally used to play Chess and similar board games, can't handle this branching factor very well. This is one of the two reasons why Go is so much harder for computers to play than Chess is, and why it took 19 years between the victory of AI over humans in Chess and the same victory in Go. (The other reason is the difficulty of state evaluation in Go.)

Let's think a bit about that number: 300. Apparently that's enough to stop Minimax (and Xerxes). Ultimately, Go was conquered with the help of Monte Carlo Tree Search (MCTS), which handles higher branching factors better than Minimax. But still, 300 is not a terribly big number.

Now consider this. Chess and Go are both games where you only move a single piece each turn. Many, perhaps most, turn based games are not like this. In board games such as Risk or Diplomacy you can move a number of units each turn. The same goes for computer strategy games such as Civilization, Advance Wars or XCOM. In trading card games such as Magic and Hearthstone you can play out a large number of cards each turn. Let's call these games multi-action games. (And let's not even get started on a game such as StarCraft, where you move lots of units in tandem but have the added complication of continuous time and space.) One might point out that such games model a number of real-world scenarios with varying degrees of accuracy, so an AI method for playing such games could be useful for non-game applications.

Hero Academy, a multi-action turn-based game with a characteristically huge branching factor.
What's the branching factor of a multi-action game? If you have 6 units to move, and you can move each unit to one of ten possible positions, you have a branching factor of 10 to the power of 6, or one million! Suddenly 300 does not feel like a very big number at all. And this is even simplifying things a bit, assuming that the order in which we move the units doesn't matter.

Obviously, Minimax is incapable of handling a branching factor of a million. A two-ply search means that a trillion positions will need to be examined. Even MCTS runs into very serious problems with this kind of branching factor. The basic problem is that you need to explore all the actions from the root node before doing anything else, and that might already take more time than you have that turn. (There are some clever ways of using MCTS in this space by considering individual unit actions, but that will have to be a topic for another time.)

At EvoStar 2016, we will present a paper that takes a fresh look at this problem and comes with a somewhat surprising solution. The paper is written by Niels Justesen, Tobias Mahlmann and myself; Niels was one of our star masters students at ITU Copenhagen, and Tobias and me supervised his masters thesis. We used a re-implementation of the strategy game Hero Academy as our testbed; Hero Academy is fairly representative of computer strategy games: each turn, a number of action points are available, and any action point can be used to move any character.

The basic idea of the paper is that the branching factor is so huge that we cannot do any tree search. Instead, we will have to concentrate on searching the space of possible action sequences that make up a single turn of the game. A simple state evaluation function will have to suffice when judging the quality of a turn–even though we cannot simulate the actions of the opponent, a good execution of each single turn might be all we need to play well. But even then, the branching factor is too high to search the possibilities of a single turn exhaustively. So we need to search smartly.

Therefore, we turned to evolution. We use an evolutionary algorithm to evolve sequences of actions that could make up a single turn. The fitness function is the quality of the board state of the end of the turn, as measured by some straightforward metrics. Both mutation and crossover are used to provide variation, as is common in evolutionary computation. We call the use of evolution to search actions within a turn Online Evolution.

The crossover scheme for Online Evolution in Hero Academy.
What about the results? Simply put, Online Evolution kicks ass. Evolution outperforms both a standard MCTS and two different greedy search methods by a wide margins. While it would certainly be possible to find MCTS versions that would handle this particular problem better, Online Evolution is at the very least highly competitive on this problem.

This is significant because the use of evolution for planning actions is very underexplored. While evolutionary algorithms have been used extensively to play games, it is almost always through evolving a program of some kind to play the game (for example through neuroevolution). The use of evolution in the actual playing phase seems to be almost virgin territory. We took some inspiration from the only other example of this general idea we know, Rolling Horizon Evolution for the PTSP game. But ultimately, there is no good reason that evolution could not be used to solve all the same problems that MCTS can. In fact, in his recent PhD thesis Cameron McGuinness showed that MCTS can be used to solve many of the same problems that evolution can. There seems to be some deeper connection here, which is well worth exploring.

Are you still with me? Good! Then you can go read our paper to get all the details!

Monday, March 21, 2016

Switching brains and putting the cart before the horse: EvoCommander, an experimental AI game

One of the best ways to make AI that is relevant to game development is to (1) make the AI and (2) make a game around the AI. That is, design a game that needs this particular flavor of AI to work.

To most people in the game industry (and far too many people in academia), this is the equivalent of putting the cart before the horse. The "proper" way to do game AI is to start with a requirement, a problem posed by the game's design, and then come up with some AI to solve this.

But many great innovations would never have been made if we insisted on putting the horse before the cart all the time. And making AI that solves the problems of existing game designs often leads to boring AI, because most games are designed to not need very interesting AI. It's more fun to turn things around - start with the AI and design a game that will need that AI, to prove that such a game design can be interesting.

I am definitely not the first person to think that thought. In particular, Ken Stanley has been involved in a couple of really interesting academic experiments in designing games around evolutionary neural networks, or neuroevolution. NERO is a game where you train the brains of a small army of troops, and then let your army fight other people's armies. Galactic Arms Race (with Erin Hastings as main developer) revolves around picking up and learning to use bizarre weaponry, which is evolved based on players' choices. Petalz (the offspring of my good friend Sebastian Risi and others) is a social network game about flowers, powered by actual evolution. I've been involved in a couple of such attempts myself, such as Infinite Tower Defense which uses several different AI mechanisms to adapt to the player's strategy and preferences, creating an ever-shifting challenge.

Of course, there are several examples of commercial games that seem to have been built partly to showcase interesting AI as well, including classics such as Creatures and Black and White. And there are many games that have chosen non-standard, AI-heavy solutions to design problems. But it would take us too far to dig deeper into those games, as I think it's about time that I get to the reason I wrote this blog post.

The reason I wrote this blog post is EvoCommander (game, paper). EvoCommander is a game designed and implemented by Daniel Jallov, while he was a student at ITU Copenhagen under the supervision of Sebastian Risi and myself; we also contributed to the game design.

A mission being played in EvoCommander.

EvoCommander's game play revolves around training your little combat robot, and then unleashing it against human or computer-controlled enemies. The robot's brain is a neural network, and training happens through neuroevolution. You train the robot by giving it a task and deciding what it gets rewarded for; for example, you can reward it for approaching the enemy, using one of its weapons, or simply keeping distance; you can also punish if for any of these things. Like a good dog, your little robot will learn to do things so as to maximize reward and minimize punishment, but these things are not always what you had in mind when you decided what to reward. Unlike a game like Pokemon, where "training" is simply progression along a pretermined skill trajectory, in EvoCommander training really is an art, with in principle limitless and open-ended outcomes. In this regard, the premise of the game resembles that of NERO (mentioned above).

Fierce PvP battle in the EvoCommander arena.
A key difference to that game, and also a key design innovation in EvoCommander, is the brain switching mechanic. You can train multiple different "brains" (neural networks) for different behaviors, some of which may be attacking tactics, others tactics for hiding behind walls etc. When battling an opponent, you can then decide which brain to use at each point in time. This gives you constant but indirect control over the robot. It also gives you considerable leeway in selecting your strategy, both in the training phase and the playing phase. You may decide to train complicated generic behaviors (remember that you can start training new brains from any brain you trained so far) and only switch brains rarely. Or you may train brains that only do simple things, and use brain switching as a kind of macro-action, a bit like a combo move in Street Fighter.

The robot bootcamp, where you train your brains.
As an experimental research game, EvoCommander is not as polished as your typical commercial game. However, that is not the point. The point is to take an interesting AI method and show how it can be the basis for a game design, and in the process invent a game design that would not be possible without this AI method.

You are welcome play the game and/or read the paper yourself to find out more!

Further reading: I've written in the past about why academia and game industry don't always get along, and strategies for overcoming this. Building AI-based games to show how interesting AI can be useful in games is one of my favorite strategies. An analysis (and idea repository) for how AI can be used in games can be found in our recent paper on AI-based game design patterns.