Tuesday, August 09, 2016

Algorithms that select which algorithm should play which game

This was meant to be a short post announcing some new papers of ours, but it grew. So what you get is a post announcing some new papers, with plenty of context.

Video games are possibly the best way of testing artificial intelligence algorithms. At least I've argued this (at some length) before. One of the things that video games offer that for example robot problems don't offer, is to easily and quickly test the same algorithm on a large number of games. That is one of the guiding principles of General Video Game AI Competition (GVGAI), which has certain advantages over other game-based AI testbeds.

Some games implemented in General Video Game AI Framework.

Now in practice we don't yet have any algorithm that can play all different video games well. (If we had, we would pretty much have solved AI. Really. Remember that "video games" is a very broad category, including Sokoban and Zork as well as Witcher 3 and Ikaruga. The GVGAI competition is more limited, but still covers a broad range of games.) For some games, we have no algorithms that can play the game well at all. For other games, we have algorithms that can play those games at human level, or sometimes even better. But an algorithm which can play Space Invaders well might not be very good at Boulder Dash, and suck completely at all a puzzle game such as A Good Snowman is Hard to Build. Conversely, a good snowman-building AI might not be very adept at fending off invading space aliens. We like to say that game-playing performance is intransitive.

But what we want is a single algorithm that can play any game well. Because intelligence is not about being able to perform a single task, but to perform all (or most) tasks (from within some reasonably general domain) well. Given that what we have are different algorithms that can each only perform some minority of tasks well, how do we get there?

One answer is to build an algorithm that includes a number of different game-playing algorithms, the "champions" of each type of game. When faced with new game, it would choose one of its constituent algorithms to play the game. If for each game the algorithm selects the best sub-algorithm to play that game, it would perform much better than any one algorithm on its own. Ex pluribus unum, stronger together, the whole is greater than the sum of its parts and all that. The question then is of course how to select the right algorithm for each game.

How well a number of different algorithms play different games in the GVGAI framework. Lighter=better. Several different categories or clusters of games appear based on the performance of the algorithms.

Luckily, people have studied the problem of how to automatically select algorithms to solve problems for some time under the names algorithm selection (obviously) and hyper-heuristics (less obviously). Various methods for selecting which algorithms (or heuristics, which means the same in this context) have been explored. But not for video games. Only for more abstract and... I can't say boring, can I? Whatever. Hyper-heuristics and algorithm selection techniques have mostly been applied to more abstract (and perhaps boring) problems such as satisfiability and combinatorial optimization. These are important problems for sure, but they are not all there is. In terms of problems that map to relevant challenges for human intelligence, general video game playing is arguably more important.

My coworkers and I therefore set out investigate how hyper-heuristics can be applied to video games, more specifically video games in the GVGAI framework. Our first result is a paper published in the IEEE Conference on Computational Intelligence and Games this September - read it here. We show that using simple features easily available to the agent, we can predict which of several algorithms will play an unseen game best with relatively high accuracy. This allows us to build an agent based on a handful of those agents that did best in the 2014 and 2015 GVGAI competitions, and verify that by cleverly choosing which one of these to use for each new game it can outperform all of them on games that it has not been trained on. The classifier at the heart of this agent is a simple decision tree, which also allows us to inspect on what basis it selects certain agents over others.

A decision tree used in one of our hyper-heuristic agents. Based on features of the game, it decides which algorithm to use to play it.

However, we felt that these results were not enough - ideally, the agent should select the best-performing algorithm for a new game every time, not just most of the time. Maybe we could find better features to feed our algorithm selection algorithm, or maybe train it differently?

In our next paper, which will be presented at the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, we set out to investigate this question in more depth. Read the paper here. To begin with, we tested the performance of a very broad range of algorithms on all known GVGAI games. We identified a number of groups of games where particular algorithms do well. We also investigated a broader set of features for selecting algorithms, hoping to find better ways of identifying classes of games that would be susceptible to specific types of game-playing algorithms. One idea we had was that those classes would somehow correspond to game genres as identified by humans (e.g. some algorithms would be better at shooters, others at puzzle games etc). While we still don't have enough features to reliably classify games into genres as seen by humans, some results support this notion. In particular we found that we could divide games into four different classes, where different algorithms play well, by looking only at two features: whether the avatar in the game can shoot, and whether it can die.

This decision tree looks only at whether the avatar can shoot and whether it can die.

Of course, there's a lot of research left to do here. We still need better algorithm selection methods, and trying to (automatically?) construct features is a promising direction here. We could also probably do better if we learn to switch algorithms during gameplay - maybe some algorithm is more useful in the initial phases of a game, and another algorithm is more useful for the endgame? And let's not forget that we have many games which no existing algorithm plays well.