So you've built an artificial intelligence, or some kind of learning system. How do you know if it's any good? Well, you test it. And what do you test it on? You let if play games of course! Games are excellent testbeds, because they were made to test human thinking skills of various kinds - in fact, they are fun to play largely because they exercise your brain so well - so therefore they will test adequate aspects of your AI's intelligence. Also, games are excellent testbeds because they execute much faster than robots, don't need any expensive hardware, can be understood by everyone and are fun to both watch and participate. In fact, it is hard to understand why you would ever use an AI testbed that was not a game. To make things easier for you, there are now a number of competitions running at the major AI/games conferences, which allow you to submit your AI agents and tests them on their capacity to play games such as car racing, Super Mario Bros, Othello, StarCraft, Ms. Pac-Man, Go or Unreal Tournament.
Alright, great! Sounds like we have AI benchmarking all sorted, right?
But hold up! There's something wrong with the picture I paint above. And it's not that games are excellent testbeds; they truly are. And I am certainly not meaning to demean the set of game-based AI competitions on offer (how could I, having started some of them myself?).
Instead, what's wrong with the picture is the idea that you first build an AI and then test it on some game. More likely, a researcher will look at the game, play it for a bit, and try to imagine how it could be played by a computer program. The researcher will then try to build an AI with the game in mind, and tune it to play this game specifically. The result, in many cases, is a piece of software that plays a game very well (in many cases better than the researcher who built it) - but who might not be good at anything else.
So what do we learn about artificial intelligence - in general - from creating software that is good at playing a particular game? How could you design game-based competitions that make you learn more about AI? In fact, how could you design game-based competitions that are easy and fun to participate in yet are scientifically meaningful? And how could you write up the results of these competitions so that they add to the sum of scientific literature, while giving everybody who worked on the project due credit? Because running, and participating in, game-based AI competitions is a form of AI research too. It's just one we have not discussed very much yet from a methodological or epistemological standpoint, or even from a standpoint of giving researchers credit for their work.
I explore these issues in a new article in TCIAIG; the article is available here (or use this direct link if you don't have access to IEEE Xplore). The article is based on my tutorial at CIG 2013, and the comments I received afterwards; this tutorial is in turn based on my experience of running a number of these competitions.
Also, the article can be seen as a travesty of a particular "literary" genre. See if you can find all the genre tropes!
Wednesday, November 05, 2014
Wednesday, October 22, 2014
New York, New York
In January, I will move to New York City and take up a position as Associate Professor at New York University (to be more specific in the Department of Computer Science and Engineering at the Polytechnic School of Engineering).
I am extremely excited about this. I'll be joining a growing games programme at NYU, and will be affiliated with the Media and Games Network (MAGNET) and the NYU Game Center. My new department wrote a short story about my arrival, which emphasises my work in procedural content generation and interdisciplinary collaborations. This is also my plan - continue my current main research directions (PCG, general video game playing, player modelling, game generation) while also starting new collaborative projects with the excellent people at NYU.
My five and a half years at ITU have been excellent and I've learned a lot from my amazing colleagues and developed as a researcher. I'd like to point out that I'm not moving in order to leave ITU. I'm moving because it is time to move on and seek new challenges and opportunities, and because of all the great people I will work with here and the great city I will be living in.
Maybe you want to come here as well? I am looking for two new PhD students, to start in September. You need to apply soon. And you should definitely mail me first to discuss research proposal ideas. (Hint: have a look at some of my recent papers to see what my current research interests are; these include general video game playing/generation and mixed-initiative PCG.) More information about the procedure can be found here.
I am extremely excited about this. I'll be joining a growing games programme at NYU, and will be affiliated with the Media and Games Network (MAGNET) and the NYU Game Center. My new department wrote a short story about my arrival, which emphasises my work in procedural content generation and interdisciplinary collaborations. This is also my plan - continue my current main research directions (PCG, general video game playing, player modelling, game generation) while also starting new collaborative projects with the excellent people at NYU.
My five and a half years at ITU have been excellent and I've learned a lot from my amazing colleagues and developed as a researcher. I'd like to point out that I'm not moving in order to leave ITU. I'm moving because it is time to move on and seek new challenges and opportunities, and because of all the great people I will work with here and the great city I will be living in.
Maybe you want to come here as well? I am looking for two new PhD students, to start in September. You need to apply soon. And you should definitely mail me first to discuss research proposal ideas. (Hint: have a look at some of my recent papers to see what my current research interests are; these include general video game playing/generation and mixed-initiative PCG.) More information about the procedure can be found here.
Friday, October 10, 2014
Why academics and game industry don't collaborate on AI, and how we could improve the situation
In academia, there is a research community focused on artificial intelligence (and computational intelligence, which is mostly the same thing) in games. Within the games industry, there are job titles such as AI programmer and game components known as AI, and there is also a community of people self-identifying as game AI experts. Each community has their own conferences, publications and associations. The CIG and AIIDE conferences, the TCIAIG journal and the Games Technical Committee on the academic side, and the GDC AI Summit, Game AI Conference, AIGameDev website and AI Game Programmers' Guild on the industry side.However, there is surprisingly little collaboration between these communities. Instead, there is a large degree of mutual ignorance and distrust. There are a number of shining examples of crossover of industry knowledge into academia or vice versa (these are the subject of a different blog post) but these are unfortunately exceptions to the rule. Every now and then, someone comes along and points out the lack of collaboration, and vigorous discussion ensues. Every time this discussion starts, many people who have not thought about this situation are honestly surprised (they might not even have known that there was game AI research in academia or game AI R&D in industry). I've seen this debate playing out on repeat dozens of times by now, so I thought I'd share my thoughts about it in a compressed form for future reference.First, here is an incomplete list of reasons why academics and game developers that work on AI don't collaborate, and often don't even understand each other.
Acknowledgements: this mail started as a reply to a mail by Robert Zubek, which was a reaction to an ongoing discussion in the AI Game Programmer Guild's mailing list, which in turn partly continues the Twitter debate following Squirrel Eiserloh's recent keynote at AIIDE. My views have been shaped by ongoing discussions about this topic with more people than I can remember in various mailing lists and conferences. People whose opinions on this topic have illuminated my own recently include Dave Mark, Alex Champandard, Tommy Thompson, Mike Cook and Luke Dickens.
- Commercial game AI and academic game AI only partly overlap. There are problems that are studied in both communities, but also technical problems in games that are considered too uninteresting to be AI by academics and academic problems that are too abstract or weird to be interesting to industry along with academic methods that won't work in real games.
- Some (many?) academics are not really interested in doing work which is useful in games, just in using games as testbeds.
- Some academics want to do work that is useful in games but do not know the demands and constraints of game design and development.
- Some academics want to do work that is useful in games but do not have access to code and/or data to do that. Industry generally won't make code and data, or even APIs, available because of the need to protect intellectual property.
- Some academics want to do work that is useful in games, but target the wrong type of games. In particular, there is the perception that only AAA games in well-established genres like FPS, RPG and RTS are worth targeting, while these games are often designed so as not to need much (interesting) AI. Why work on solving AI problems in a game which was designed not to need AI? Instead, the lust and need for new AI techniques is often stronger in the indie scene and emerging game genres.
- Some developers in industry harbour a dislike or distrust of academic research in general. This could be because the game industry tends to see itself as an entertainment industry rather than a tech industry. It could also be because certain developers were very dissatisfied with their own university educations.
- Some (hopefully very few) academics have a superiority complex and think their fancy degrees entitle them to talk down to the unwashed masses.
- Some (hopefully very few) developers in industry have a superiority complex and think the fact that their software sells entitle them to tell others how Things Really Are.
- There is little or no funding available for game-applicable AI research from government funding agencies, because games are not seen as important enough when you could instead do research on curing cancer or stopping terrorists.
- There is no funding available for game-applicable AI research from companies, because game development companies don't support research, unlike companies in many other sectors. This could be because of the very short time horizon of game development companies. A car company or insurance company will have a 20-year strategic plan, a game company most likely won't.
- The academic incentive system - publish, get cited, get grants - does not reward doing the "last mile" of converting your research prototype into something that could be plugged into a game engine.
- The game industry's incentive system - make sure the next game sells or you're out of a job - does not reward trying things that might not work.
- We simply don't go to the same places, because GDC is too expensive for academics and does not have any publication options, and academic conferences are deemed irrelevant by industry.
- Respect each other. Don't slander each other.
- Be curious about each other's work, both in terms of contribution and motivation.
- Write blog posts (not just academic papers written in hard-to-penetrate academese) and post videos explaining your work. Goes for both academics and developers, maybe particularly for academics.
- Find ways of making datasets and source code from commercial games available, perhaps with an NDA; alternatively ship games with an API to interface AI.
- Make it easier to publish academic work based on collaboration with game developers. Special sessions of conferences and special issues in journals.
- Reward developers for going to academic conferences.
- Make it possible to publish academic papers at industry conferences, alternatively just make it cheaper to attend for academics.
- Place more value on published (or otherwise publicly released) games and playable experiences in academics' CVs.
- Fund applied research into game AI.
- (Ok, I'll stop joking. Not very likely that this will happen. We'll make this work without funding because we love games and AI.)
- Read up on what industry game AI developers are interested in and which problem they find hard to tackle. This can be hard, as the information is typically not available in the form of academic papers. But AIGameDev.com is a good starting point, so is the AI Game Programming Wisdom book series. Stalking talkative developers on Twitter might also help.
- Approach game developers and ask if you can work together with them.
- Send your students on internships in game companies, with the objective of doing some project both you and the company find interesting. This has worked well for me in the past.
- Use benchmarks that are based on industrially relevant games. Several of the competitions that run each year at CIG and AIIDE - in particular the BotPrize and the StarCraft competition, but to some extent also Pac-Man and Mario AI (especially if you're not just focusing on performance but on other aspects of gameplay or content generation) - are relevant here.
- Make your own games. You don't have to work with the game industry in order to make an impact on games. In fact, you don't even have to work on the problems the industry define for you. If you are the kind of researcher who likes to come up with crazy new algorithms and then invent the problem they might solve, go one step further and build a whole game around it. Or many games. Release them out in the wild, let people play them even if they are not perfect (they will not be). They serve to prove a point, that this is AI which is relevant for games, even if it's not what the industry calls "Game AI". Lead, don't follow. Redefine what games are, and what AI is. Who's stopping you? I think this kind of research is the one which is most likely to end up being really important.
- Participate in game jams. This is an excellent way to get started with getting your hands dirty and actually making games. At least, that's what I hear, I never have time to participate in game jams because I'm busy arguing with people on the Internet.
Acknowledgements: this mail started as a reply to a mail by Robert Zubek, which was a reaction to an ongoing discussion in the AI Game Programmer Guild's mailing list, which in turn partly continues the Twitter debate following Squirrel Eiserloh's recent keynote at AIIDE. My views have been shaped by ongoing discussions about this topic with more people than I can remember in various mailing lists and conferences. People whose opinions on this topic have illuminated my own recently include Dave Mark, Alex Champandard, Tommy Thompson, Mike Cook and Luke Dickens.
Friday, July 25, 2014
A Panorama of Artificial and Computational Intelligence in Games
Georgios Yannakakis and I are proud to present our new survey paper called "A Panorama of Artificial and Computational Intelligence in Games", to be published in IEEE TCIAIG.
It is available from IEEE Xplore here; if you don't have access to Xplore, there's a free pdf available here.
This is a very ambitious paper, on which we have worked on and off for two years. The aim is nothing less than to survey which are the active areas of research and application of computational intelligence (CI) and artificial intelligence (AI) in games, and how these areas inform each other.
To this end, we identify 10 key areas of research and application, and more than 50 ways in which these areas interact. So if you ever wanted to know how NPC learning affects procedural content generation, where player modelling can be used in commercial games or how game-based AI competitions inform planning research, these are exactly the kind of questions we seek to answer here.
Furthermore, we try to identify which of these research areas inform technology development for which part of the player-game interaction loop, and which key AI and machine learning algorithms are used in those areas. Yes, it is rather complex, though we have attempted to make it more like a readable paper and less like a phone directory or circuit diagram.
While the article is not really meant as an introduction to research in game AI, it might be used as supplementary reading in courses on AI/CI and games. However, its main purpose is to help you, as a researcher or practioner in this field or some neighbouring field, structure and understand this rapidly evolving research field. As part of this mapping operation, we have also identified numerous underexplored or unexplored connections, so there are a number of suggestions for promising future research projects within the paper.
The article builds partly on the Dagstuhl seminar in 2012, where around 40 of the world's leading experts on CI and AI in games assembled to identify the key areas and problems of this research field. It also builds on our own experience of working in the field for around a decade, including running several workshops and conferences. However, despite our best attempts at being inclusive and adopting a panoramic perspective (meaning that we try to see everything), the article will of course be coloured by our own perspective. That's why we are so eager to hear your comments on the paper - we sincerely hope this paper can catalyse a discussion within the community on what exactly it is we are doing!
We will give a tutorial based on this paper at CIG 2014 - of course, we hope as many as possible of you can make it there.
Now go on and read the paper - we look forward to your feedback!
It is available from IEEE Xplore here; if you don't have access to Xplore, there's a free pdf available here.
This is a very ambitious paper, on which we have worked on and off for two years. The aim is nothing less than to survey which are the active areas of research and application of computational intelligence (CI) and artificial intelligence (AI) in games, and how these areas inform each other.
To this end, we identify 10 key areas of research and application, and more than 50 ways in which these areas interact. So if you ever wanted to know how NPC learning affects procedural content generation, where player modelling can be used in commercial games or how game-based AI competitions inform planning research, these are exactly the kind of questions we seek to answer here.
Furthermore, we try to identify which of these research areas inform technology development for which part of the player-game interaction loop, and which key AI and machine learning algorithms are used in those areas. Yes, it is rather complex, though we have attempted to make it more like a readable paper and less like a phone directory or circuit diagram.
While the article is not really meant as an introduction to research in game AI, it might be used as supplementary reading in courses on AI/CI and games. However, its main purpose is to help you, as a researcher or practioner in this field or some neighbouring field, structure and understand this rapidly evolving research field. As part of this mapping operation, we have also identified numerous underexplored or unexplored connections, so there are a number of suggestions for promising future research projects within the paper.
The article builds partly on the Dagstuhl seminar in 2012, where around 40 of the world's leading experts on CI and AI in games assembled to identify the key areas and problems of this research field. It also builds on our own experience of working in the field for around a decade, including running several workshops and conferences. However, despite our best attempts at being inclusive and adopting a panoramic perspective (meaning that we try to see everything), the article will of course be coloured by our own perspective. That's why we are so eager to hear your comments on the paper - we sincerely hope this paper can catalyse a discussion within the community on what exactly it is we are doing!
We will give a tutorial based on this paper at CIG 2014 - of course, we hope as many as possible of you can make it there.
Now go on and read the paper - we look forward to your feedback!
Wednesday, July 23, 2014
MCTS for arcade games
This blog post contains entertaining videos of an AI agent playing Super Mario Bros, making a series of interesting mistakes but ultimately playing quite well (after several modifications). But before watching those, I want you to read about the algorithm that plays the game.
Monte Carlo Tree Search (MCTS) is the new black in game AI. At least for board games and similar games, including many types of card games and puzzles. The algorithm has only been around since 2006, but has already become the method of choice for Go (the Asian board game) and "general game playing" (developing AI that can play any of a wide range of games). People had been working on AI for Go for decades without producing anything that played better than a human beginner, but after MCTS was invented playing strength increased drastically to the point where computers can now compete with human club players. All the best Go-playing programs use MCTS. For the General Game Playing Competition, and the General Video Game Playing Competition, agents based on MCTS are pretty much the only ones that work at all.
At its core, MCTS is a tree search algorithm. Just like Minimax, the simple algorithm that is used for playing games such as Chess and Checkers among many others, it looks at the possible moves that can be made from a particular (board) state, and selects the best move based on which states these moves result in and which moves can be taken from them. There are two big differences with Minimax, however. The first is that while Minimax explores all moves equally, MCTS is prejudiced and explores promising branches of the tree much deeper than others. For this, it uses the UCB equation, which determines which nodes are underexplored and should be explored further. The second difference is that while Minimax needs an evaluation function that assigns a "goodness" value to a game state (how close that game state is to winning), MCTS evaluates a state through playing a number of games until the end by taking random actions. Yes, random; it sounds crazy but it works. Because MCTS does not need an evaluation function, it works well for games where it is very hard to effectively evaluate the quality of a game state (such as Go), and because MCTS builds an unbalanced tree, it works for games that have very high branching factors (such as Go).
Are you still with me? Good. The above was of course a gross simplification. For the full story of MCTS, including pseudocode, history and several common improvements, see this excellent survey paper. Now back to the story.
So, MCTS does really well for Go and for a number of other board game-like games. What other games would it work well on? An hypothesis of mine was that for MCTS to work well, the game needed to be:
1. Turn-based, or close to turn-based. Relatively few actions should be taken in a given time period, and each action should have a measurable and frequently meaningful impact on the game state.
2. Turn-limited. After playing a certain number of turns, even if just taking random actions, one should be guaranteed to arrive at an end state (win, loss or draw).
The reasoning is if that the game is not guaranteed to finish in a limited time, the playouts might go on forever; also, if the game is not turn-based, each action might mean too little for the value of each node to be estimated. Go, the main success story of MCTS, fulfils both of these criteria.
Most arcade games, and other action games, fulfils none of these criteria. Typically, you have high frame rates (meaning that each action only changes the state a little bit) and considerable freedom of movement (so that you're not pushed towards an inevitable end of the game). So I thought it would be interesting to investigate whether MCTS would work for such a game, in what ways it would fail if it didn't work, and how we could modify the algorithm to make it work. But then, someone would need to actually do this research. Luckily, two very talented students, Emil Juul Jacobsen and Rasmus Greve, walked into my office one day and inquired about doing a bachelors thesis. So we had a deal. In what follows, Emil and Rasmus did all the hard work, whereas I was mostly waving my hands around and saying encouraging things ("supervising").
What follows below is mostly a concise summary of our recent paper.
First, we decided on a game. We chose the Mario AI Benchmark, which is a hacked version of a clone of Super Mario Bros. (You can find the old version here, the new version here and an overview paper here.) The version of the Mario AI Benchmark we used has rather simple levels, and it has previously been shown by Baumgarten that A* search in state space can play these levels very well. (In fact, our MCTS implementation actually reuses the forward model from Baumgarten's A* implementation.)
Our first, naive attempts at playing Mario with MCTS met with limited success. We had to make a couple of modifications to the algorithm to make it work at all. As it is very rarely possible to "roll out" (play randomly) until the end of the level, we had to cap the rollouts to a very low number (e.g. 6) actions, and if Mario wasn't either dead or winning by then we returned how far Mario had travelled to the right as a state evaluation. This gave us rudimentary gameplay ability. After removing the possibility to press the "down" button (or rather, to issue the "down" command) performance improved somewhat, but we still see a shortsighted and somewhat cowardly Mario in the video below.
Indeed, not very impressive. Note the red line in front of Mario, which shows the best found path in each time step; it is not very long. The short action size and constraints on real-time control conspire to limit the maximum depth of the tree that can be explored. That explains the shortsightedness. But what about the cowardice, with Mario hesitating in front of pipes and cannons and not having the courage to jump? (An earlier version of the agent had him hesitating in front of holes too.) We hypothesised that this is because the average outcome of jumping (falling into a hole, or getting hurt by a flower) is worse than the average outcome of doing nothing, even though the best-case outcome is considerably better.
Now we had our baseline result. We decided to see whether we could improve on it, by modifying MCTS to perform better on this type of task. Below, describe the five modifications we implemented and evaluated.
Mixmax rewards tries to address the issue of cowardice. The basic idea is very simple: in addition to just backing up the average value of all children of a node, we also back up to maximum value of any child of that node. The actual value of the node is some combination of the average and max. A bit of empirical testing showed that around 20% max was a good number. The Mixmax Mario player below shows a somewhat more assertive behaviour, and also scores better than the baseline in systematic evaluations.
The next modification is Macro actions. This is a rather crude attempt to counteract the shortsightedness of Mario by making each move last longer. Each time an action is explored, the same action is taken for a number (e.g. 3 or 5) of time steps. The idea of Macro-actions is certainly not new; it has been explored before in the context of the Physical Travelling Salesperson Problem. In our case, an added feature is that the search reverts to a single time step when Mario is close to an enemy. This yields a player who sees further into the future than standard MCTS but does not play much better.
Partial expansion is another attempt to remedy Mario's shortsightedness. Here, we've modified the UCT formula that underlies the MCTS algorithm to make is possible to explore grandchildren of a node before having explored all children of that node. This allows the algorithm to explore deeper but sparser trees, and works relatively well in practice.
The order in which actions are explored can be quite important in MCTS. Ideally, you want the best actions explored first - but of course, you don't know which those are. There have been several attempts at making more intelligent action ordering in the literature. We call our solution Roulette wheel selection; we simply estimate the average value of each action (there are only 16 possible button combinations) and then use the classic roulette wheel selection operator from genetic algorithms to probabilistically select which action to explore each time we expand a node.
Finally, we experimented with throwing in a little bit of domain knowledge. Hole detection is a very simple modification of the reward scheme, where we simply give a negative reward for ever being inside a hole, rather than only for reaching the bottom of a hole. As you can see, this allows Mario to avoid holes better.
We then evaluated all possible combinations of these various improvements. It turns out that there are several combinations that perform indistinguishably from A*, and thus seem to play optimally (at least on these simple linear levels). In general, all modifications work with each other most of the time except macro actions, which are frequently detrimental. Below you can see an example of gameplay with all modifications except macro actions turned on.
Pretty neat, eh?
It even works well of you remove the domain knowledge.
So, it seems that we've cracked the problem of how to make MCTS play Super Mario Bros. A couple of rather simple modifications to the basic MCTS algorithm takes it from very bad performance to really rather good performance. We believe most of these modifications will translate well to other game domains, and might therefore be useful to anyone interested in applying MCTS to games that more like Super Mario Bros than like Go.
There's only one tiny fly in the ointment though: A* can already play Super Mario Bros just as well. So what's the point in attempting something fancy like MCTS when you can just the trusty old best-first search algorithm?
Here's why. A* frequently produces solutions that seem extremely brittle, and relying on everything going as planned. So we decided to see what happens if we changed the benchmark a bit, so things sometimes worked out differently than planned. We simply added a 20% action noise, so that there was a 20% chance that for any given action, a random action would be taken instead of the one that the agent chose. This resulted in complete collapse for A*; performance degraded by around 2/3. MCTS on the other hand held up relatively well, losing some performance but playing considerably better than A* under noisy conditions.
So MCTS can indeed be useful for arcade games, as it produces gameplay that is not only strong but also robust.
The above was of course a simplified account of what we did, missing lots of details and results. Read the paper, published in the proceedings of GECCO 2014, to find out more!
Monte Carlo Tree Search (MCTS) is the new black in game AI. At least for board games and similar games, including many types of card games and puzzles. The algorithm has only been around since 2006, but has already become the method of choice for Go (the Asian board game) and "general game playing" (developing AI that can play any of a wide range of games). People had been working on AI for Go for decades without producing anything that played better than a human beginner, but after MCTS was invented playing strength increased drastically to the point where computers can now compete with human club players. All the best Go-playing programs use MCTS. For the General Game Playing Competition, and the General Video Game Playing Competition, agents based on MCTS are pretty much the only ones that work at all.
At its core, MCTS is a tree search algorithm. Just like Minimax, the simple algorithm that is used for playing games such as Chess and Checkers among many others, it looks at the possible moves that can be made from a particular (board) state, and selects the best move based on which states these moves result in and which moves can be taken from them. There are two big differences with Minimax, however. The first is that while Minimax explores all moves equally, MCTS is prejudiced and explores promising branches of the tree much deeper than others. For this, it uses the UCB equation, which determines which nodes are underexplored and should be explored further. The second difference is that while Minimax needs an evaluation function that assigns a "goodness" value to a game state (how close that game state is to winning), MCTS evaluates a state through playing a number of games until the end by taking random actions. Yes, random; it sounds crazy but it works. Because MCTS does not need an evaluation function, it works well for games where it is very hard to effectively evaluate the quality of a game state (such as Go), and because MCTS builds an unbalanced tree, it works for games that have very high branching factors (such as Go).
Are you still with me? Good. The above was of course a gross simplification. For the full story of MCTS, including pseudocode, history and several common improvements, see this excellent survey paper. Now back to the story.
So, MCTS does really well for Go and for a number of other board game-like games. What other games would it work well on? An hypothesis of mine was that for MCTS to work well, the game needed to be:
1. Turn-based, or close to turn-based. Relatively few actions should be taken in a given time period, and each action should have a measurable and frequently meaningful impact on the game state.
2. Turn-limited. After playing a certain number of turns, even if just taking random actions, one should be guaranteed to arrive at an end state (win, loss or draw).
The reasoning is if that the game is not guaranteed to finish in a limited time, the playouts might go on forever; also, if the game is not turn-based, each action might mean too little for the value of each node to be estimated. Go, the main success story of MCTS, fulfils both of these criteria.
Most arcade games, and other action games, fulfils none of these criteria. Typically, you have high frame rates (meaning that each action only changes the state a little bit) and considerable freedom of movement (so that you're not pushed towards an inevitable end of the game). So I thought it would be interesting to investigate whether MCTS would work for such a game, in what ways it would fail if it didn't work, and how we could modify the algorithm to make it work. But then, someone would need to actually do this research. Luckily, two very talented students, Emil Juul Jacobsen and Rasmus Greve, walked into my office one day and inquired about doing a bachelors thesis. So we had a deal. In what follows, Emil and Rasmus did all the hard work, whereas I was mostly waving my hands around and saying encouraging things ("supervising").
What follows below is mostly a concise summary of our recent paper.
First, we decided on a game. We chose the Mario AI Benchmark, which is a hacked version of a clone of Super Mario Bros. (You can find the old version here, the new version here and an overview paper here.) The version of the Mario AI Benchmark we used has rather simple levels, and it has previously been shown by Baumgarten that A* search in state space can play these levels very well. (In fact, our MCTS implementation actually reuses the forward model from Baumgarten's A* implementation.)
Our first, naive attempts at playing Mario with MCTS met with limited success. We had to make a couple of modifications to the algorithm to make it work at all. As it is very rarely possible to "roll out" (play randomly) until the end of the level, we had to cap the rollouts to a very low number (e.g. 6) actions, and if Mario wasn't either dead or winning by then we returned how far Mario had travelled to the right as a state evaluation. This gave us rudimentary gameplay ability. After removing the possibility to press the "down" button (or rather, to issue the "down" command) performance improved somewhat, but we still see a shortsighted and somewhat cowardly Mario in the video below.
Indeed, not very impressive. Note the red line in front of Mario, which shows the best found path in each time step; it is not very long. The short action size and constraints on real-time control conspire to limit the maximum depth of the tree that can be explored. That explains the shortsightedness. But what about the cowardice, with Mario hesitating in front of pipes and cannons and not having the courage to jump? (An earlier version of the agent had him hesitating in front of holes too.) We hypothesised that this is because the average outcome of jumping (falling into a hole, or getting hurt by a flower) is worse than the average outcome of doing nothing, even though the best-case outcome is considerably better.
Now we had our baseline result. We decided to see whether we could improve on it, by modifying MCTS to perform better on this type of task. Below, describe the five modifications we implemented and evaluated.
Mixmax rewards tries to address the issue of cowardice. The basic idea is very simple: in addition to just backing up the average value of all children of a node, we also back up to maximum value of any child of that node. The actual value of the node is some combination of the average and max. A bit of empirical testing showed that around 20% max was a good number. The Mixmax Mario player below shows a somewhat more assertive behaviour, and also scores better than the baseline in systematic evaluations.
The next modification is Macro actions. This is a rather crude attempt to counteract the shortsightedness of Mario by making each move last longer. Each time an action is explored, the same action is taken for a number (e.g. 3 or 5) of time steps. The idea of Macro-actions is certainly not new; it has been explored before in the context of the Physical Travelling Salesperson Problem. In our case, an added feature is that the search reverts to a single time step when Mario is close to an enemy. This yields a player who sees further into the future than standard MCTS but does not play much better.
Partial expansion is another attempt to remedy Mario's shortsightedness. Here, we've modified the UCT formula that underlies the MCTS algorithm to make is possible to explore grandchildren of a node before having explored all children of that node. This allows the algorithm to explore deeper but sparser trees, and works relatively well in practice.
The order in which actions are explored can be quite important in MCTS. Ideally, you want the best actions explored first - but of course, you don't know which those are. There have been several attempts at making more intelligent action ordering in the literature. We call our solution Roulette wheel selection; we simply estimate the average value of each action (there are only 16 possible button combinations) and then use the classic roulette wheel selection operator from genetic algorithms to probabilistically select which action to explore each time we expand a node.
Finally, we experimented with throwing in a little bit of domain knowledge. Hole detection is a very simple modification of the reward scheme, where we simply give a negative reward for ever being inside a hole, rather than only for reaching the bottom of a hole. As you can see, this allows Mario to avoid holes better.
We then evaluated all possible combinations of these various improvements. It turns out that there are several combinations that perform indistinguishably from A*, and thus seem to play optimally (at least on these simple linear levels). In general, all modifications work with each other most of the time except macro actions, which are frequently detrimental. Below you can see an example of gameplay with all modifications except macro actions turned on.
Pretty neat, eh?
It even works well of you remove the domain knowledge.
So, it seems that we've cracked the problem of how to make MCTS play Super Mario Bros. A couple of rather simple modifications to the basic MCTS algorithm takes it from very bad performance to really rather good performance. We believe most of these modifications will translate well to other game domains, and might therefore be useful to anyone interested in applying MCTS to games that more like Super Mario Bros than like Go.
There's only one tiny fly in the ointment though: A* can already play Super Mario Bros just as well. So what's the point in attempting something fancy like MCTS when you can just the trusty old best-first search algorithm?
Here's why. A* frequently produces solutions that seem extremely brittle, and relying on everything going as planned. So we decided to see what happens if we changed the benchmark a bit, so things sometimes worked out differently than planned. We simply added a 20% action noise, so that there was a 20% chance that for any given action, a random action would be taken instead of the one that the agent chose. This resulted in complete collapse for A*; performance degraded by around 2/3. MCTS on the other hand held up relatively well, losing some performance but playing considerably better than A* under noisy conditions.
So MCTS can indeed be useful for arcade games, as it produces gameplay that is not only strong but also robust.
The above was of course a simplified account of what we did, missing lots of details and results. Read the paper, published in the proceedings of GECCO 2014, to find out more!
Saturday, May 17, 2014
I am a game interface conservative
On occasion of Microsoft de-bundling the Kinect with Xbox One, here's my conservative stance on game interfaces. The following are some human-game interface gimmicks I've never expected to outlive the initial buzz, and to have any kind of lasting impact:
As for the rest of the list, these gimmicks either have failed, are failing, were never seriously introduced, or play a role only for some minor niche audience. They are not gaining lasting mainstream success as input/output modalities for games.
It's easy to explain why. That's because they all make playing games harder, not easier. With all of these gimmicks, telling the game what you want to do is harder than with a keyboard/mouse/gamepad, and understanding what the game is telling you is harder than with a regular screen/TV. Fingers pressing buttons and eyes scanning a flat screen are just very, very effective at conveying information back and forth. And they do not require the player to jump around, wave their arms around, strap themselves with weird devices, wake up the neighbours, rearrange their living rooms or throw up from simulator sickness.
In a sense I'm happy that these gimmicks are failing. Not because I'm in principle against multimodal interfaces. There's lots of interesting research to be done here, and many of these technologies (Kinect, skin resistance sensors, eye tracking) have proven very useful as research tools. But when it comes to consumer technologies, I think it's important that we come up with tech that works better than what we already have before it's widely adopted. Or more likely: come up with an interesting case of something that could not be done with the "old" interfaces. If Kinect were to "succeed" it would actually be a step backwards for game interfaces: players would be using a worse interface to play essentially the same games in the same way.
Now, obiously, I'm not against research and innovation. Quite the opposite. But imagine just for a second that the same sort of resources that go into game interface research, development and marketing would go into game AI research, development and marketing. Where would we be then?
- The Wiimote
- Kinect
- Location-based gaming (GPS as input)
- 3D screens
- Voice input
- VR goggles/helmets like the Oculus Rift
- Tilt-based controls
- EEG (electroencephalography)
- Eye tracking
- Skin resistance / blood volume pulse sensors
- Regular camera inputs (e.g. PlayStation Eye)
- Toy guns (e.g. NES Zapper)
As for the rest of the list, these gimmicks either have failed, are failing, were never seriously introduced, or play a role only for some minor niche audience. They are not gaining lasting mainstream success as input/output modalities for games.
It's easy to explain why. That's because they all make playing games harder, not easier. With all of these gimmicks, telling the game what you want to do is harder than with a keyboard/mouse/gamepad, and understanding what the game is telling you is harder than with a regular screen/TV. Fingers pressing buttons and eyes scanning a flat screen are just very, very effective at conveying information back and forth. And they do not require the player to jump around, wave their arms around, strap themselves with weird devices, wake up the neighbours, rearrange their living rooms or throw up from simulator sickness.
In a sense I'm happy that these gimmicks are failing. Not because I'm in principle against multimodal interfaces. There's lots of interesting research to be done here, and many of these technologies (Kinect, skin resistance sensors, eye tracking) have proven very useful as research tools. But when it comes to consumer technologies, I think it's important that we come up with tech that works better than what we already have before it's widely adopted. Or more likely: come up with an interesting case of something that could not be done with the "old" interfaces. If Kinect were to "succeed" it would actually be a step backwards for game interfaces: players would be using a worse interface to play essentially the same games in the same way.
Now, obiously, I'm not against research and innovation. Quite the opposite. But imagine just for a second that the same sort of resources that go into game interface research, development and marketing would go into game AI research, development and marketing. Where would we be then?
Tuesday, April 22, 2014
Metablogging my research
When I started this blog in 2006, it was with the intention that I should write short blog posts explaining my own research in a more accessible way. It appears, however, that I haven't found the time to do much of that recently. (I was better at doing so in the first few years of the blog's existence - see some examples here, here and here.) The few blog posts I've written in recent years tend to be more general discussions about things such as human intelligence, conference acceptance rates or why I work so much.
While I still intend to write more blog posts explaining my research in more accessible terms, I am not saying I will get around to actually doing so anytime soon. However, some other researchers have been kind enough to write accessibly about research done in my group. So instead of a blog post about my research, this is a blog post about blog posts about my research. Here are some picks - if you have blogged about my research (or research I supervised) and want to be included in this list, tell me and I'll add it!
First we have Mike Cook at Goldsmiths, University of London. He's written several excellent posts on my team's work over at his own project blog for his Angelina project. For example, see this post on Antonis' work with Sentient Sketchbook, or this post on the Ropossum generator for Cut the Rope levels or maybe this post on the Procedural Procedural Level Generator Generator.
Tommy Thompson at University of Derby has produced a number of very well-written posts on his blog about games and AI recently, partly for the benefit of his students and partly for a general public. This post on PCG and the Mario AI Competition is a very good example.
Alex Champandard at AIGameDev.com (a very useful resource for industry-style game AI) did an interview with me a long time ago, soon after I finished my PhD. Re-reading my answers, I guess I was fairly naive back then. I probably still am.
There has also been a number of articles about my work in New Scientist, one of the more well-known and respected popular science magazines. These articles are generally written by competent and well-informed professional science journalists (unlike some of the coverage of my work in the mainstream press which I won't link to here, as they are more likely to confuse than enlighten). Other good, short pieces (blog posts rather than articles, though it's hard to distinguish the two) from quality news outlets are the following from Kotaku, Wired and Guardian. In addition to such secondary sources there seems to be dozens of "tertiary" blog posts that are essentially just reformulations of some of the posts and articles I've linked to above - the Mario AI competition in particular is the focus of most of them.
I'll finish with the following post, which was written in 2007 and is a rather acidic attempt to ridicule some of the earliest work I in my PhD. Apparently, I took this seriously enough back then to write an elaborate response post. I think I would have been much more comfortable with that kind of criticism these days, when I have established a position in the research community. However, as it's safer to criticise a PhD student than a respected researcher, nobody criticises me anymore. Which is a shame: there must be some people out there who think that what I do is completely misguided, and constructive, well-informed criticism would be very welcome. I've even tried actively seeking critical views (and of course promote myself) by posting stories about my work on Slashdot (for example here, here and here). Well, it turns out you do get critical comments, just not always well-informed ones...
So perhaps the conclusion is that while promiting your own and others' work (and mine!) to other researchers, students and the general public is all great, it's not necessarily the best way of finding out what people you respect really think about your work. For that, you have to go to a conference and get everybody really drunk.
While I still intend to write more blog posts explaining my research in more accessible terms, I am not saying I will get around to actually doing so anytime soon. However, some other researchers have been kind enough to write accessibly about research done in my group. So instead of a blog post about my research, this is a blog post about blog posts about my research. Here are some picks - if you have blogged about my research (or research I supervised) and want to be included in this list, tell me and I'll add it!
First we have Mike Cook at Goldsmiths, University of London. He's written several excellent posts on my team's work over at his own project blog for his Angelina project. For example, see this post on Antonis' work with Sentient Sketchbook, or this post on the Ropossum generator for Cut the Rope levels or maybe this post on the Procedural Procedural Level Generator Generator.
Tommy Thompson at University of Derby has produced a number of very well-written posts on his blog about games and AI recently, partly for the benefit of his students and partly for a general public. This post on PCG and the Mario AI Competition is a very good example.
Alex Champandard at AIGameDev.com (a very useful resource for industry-style game AI) did an interview with me a long time ago, soon after I finished my PhD. Re-reading my answers, I guess I was fairly naive back then. I probably still am.
There has also been a number of articles about my work in New Scientist, one of the more well-known and respected popular science magazines. These articles are generally written by competent and well-informed professional science journalists (unlike some of the coverage of my work in the mainstream press which I won't link to here, as they are more likely to confuse than enlighten). Other good, short pieces (blog posts rather than articles, though it's hard to distinguish the two) from quality news outlets are the following from Kotaku, Wired and Guardian. In addition to such secondary sources there seems to be dozens of "tertiary" blog posts that are essentially just reformulations of some of the posts and articles I've linked to above - the Mario AI competition in particular is the focus of most of them.
I'll finish with the following post, which was written in 2007 and is a rather acidic attempt to ridicule some of the earliest work I in my PhD. Apparently, I took this seriously enough back then to write an elaborate response post. I think I would have been much more comfortable with that kind of criticism these days, when I have established a position in the research community. However, as it's safer to criticise a PhD student than a respected researcher, nobody criticises me anymore. Which is a shame: there must be some people out there who think that what I do is completely misguided, and constructive, well-informed criticism would be very welcome. I've even tried actively seeking critical views (and of course promote myself) by posting stories about my work on Slashdot (for example here, here and here). Well, it turns out you do get critical comments, just not always well-informed ones...
So perhaps the conclusion is that while promiting your own and others' work (and mine!) to other researchers, students and the general public is all great, it's not necessarily the best way of finding out what people you respect really think about your work. For that, you have to go to a conference and get everybody really drunk.
Thursday, February 06, 2014
The Curse of Increasing Marginal Work Utility, or Why I Work So Much
Academics constantly complain about being overworked, yet they also praise their freedom to decide how and when to work as one of the great advantages of their job. This seems contradictory. If they have such freedom, why are they so overworked?
Part of the answer is of course that you are never really done. You can always do better. Write more and better papers, get better research results and deliver better teaching. But this is true for most white-collar jobs these days - you could always work more on your next budget, product description, market report, technical drawing or whatever it is you do. If you are good at what you do, it should be enough to work normal hours, go home and get some good rest and return refreshed next day to do more work. Few successful academics do this. There seems to be some extra factor that affect academics, contributing to that they always seem stressed out - and that you very rarely see successful academics below the age of 50 working reasonable hours.
That extra factor is increasing marginal work utility. Simply put, the more you work, the more you get out of each extra work hour. This gives you a very strong incentive to work as much as possible, which is clearly not a very good idea if you also want to live your life. Let me explain.
We begin by imagining I work the statutory work week of around 40 hours. If I work efficiently, I will then have time to prepare and conduct my teaching, supervise my masters students, do my administrative duties (including answering lots of mails), review papers for some conferences and devote the minimum acceptable time to supervising my PhD students. These are all tasks that I must do - not doing them would mean that I am simply not doing my job. This amount of work is clearly not enough, as I do not get time to do any research - though I could of course try to take credit for the work of PhD students that I don't supervise enough.
If I put in some more time, I have time to do some of the research where I am not in a "driving" role. I could participate some more in what my PhD students are doing, help my masters students write up papers based on their theses (which I have supervised), do my part in joint research projects with people in other institutions, and serve on organising committees of conferences. Also, write parts of collaborative grant proposals. These are activities where I am mostly fulfilling obligations to other people, though they are also activities where I get relatively much visibility from limited effort. But it does not mean doing any research on my own.
So let's say I put in even more time. I can now advertise some of the research I (and my collaborators) have been doing and help shape the directions of my own and others' research, for example by going to give invited talks, or write parts of survey articles. I can start new initiatives, e.g. competitions, discussion groups or grant proposals. These are things that make a lot of sense to do and I enjoy doing them, but it is still not the same as actually doing my own research.
Finally, when I'm done with all of the above and there are no urgent mails awaiting reply in my inbox, I could sit down with my own idea, write some code of my own, run my own experiments and write a paper with myself as first author. In other words, do some research of my own. This rarely happens, as I rarely get to this point and when I do I rarely have any energy left.
The utility of that sixtieth work hour is so much higher than the tenth or twentieth because I can use it do my own research. If I work 60 hours rather than 40, I don't get 50% more of my own research done, but almost infinitely much more, as I would otherwise get almost none of my own research done. Given that I am interested in doing research of my own, there is a very strong incentive to work very long hours. It is not that I am uninterested in any of the other parts of my job - I enjoy all of them, except grant writing and meetings with management - but I am more interested in research.
You could compare the job of an academic to having a teaching and administration job and having research as a hobby. Except that the "hobby" is the presumed core of the job as advertised, and the reason you applied for the job.
An interesting detail is that it wasn't always like this. Back when I was a PhD student, and to a large extent still when I was a postdoc, my marginal work utility was basically flat as I spent most of my time on my own research. But as I was good at research I got promoted to a position where I had to spend most of my time doing something else. (Incidentally, I have seen quite a few excellent researchers who are less than excellent at teaching and administration.)
Finally, let me point out that I am not complaining, just explaining. And I am certainly not blaming anyone, especially not my wonderful colleagues and students. After all, I love my job and I would not trade it for any other job. I just hope to have made the logic clear behind why I, and probably many others like me, work such long hours.
Part of the answer is of course that you are never really done. You can always do better. Write more and better papers, get better research results and deliver better teaching. But this is true for most white-collar jobs these days - you could always work more on your next budget, product description, market report, technical drawing or whatever it is you do. If you are good at what you do, it should be enough to work normal hours, go home and get some good rest and return refreshed next day to do more work. Few successful academics do this. There seems to be some extra factor that affect academics, contributing to that they always seem stressed out - and that you very rarely see successful academics below the age of 50 working reasonable hours.
That extra factor is increasing marginal work utility. Simply put, the more you work, the more you get out of each extra work hour. This gives you a very strong incentive to work as much as possible, which is clearly not a very good idea if you also want to live your life. Let me explain.
We begin by imagining I work the statutory work week of around 40 hours. If I work efficiently, I will then have time to prepare and conduct my teaching, supervise my masters students, do my administrative duties (including answering lots of mails), review papers for some conferences and devote the minimum acceptable time to supervising my PhD students. These are all tasks that I must do - not doing them would mean that I am simply not doing my job. This amount of work is clearly not enough, as I do not get time to do any research - though I could of course try to take credit for the work of PhD students that I don't supervise enough.
If I put in some more time, I have time to do some of the research where I am not in a "driving" role. I could participate some more in what my PhD students are doing, help my masters students write up papers based on their theses (which I have supervised), do my part in joint research projects with people in other institutions, and serve on organising committees of conferences. Also, write parts of collaborative grant proposals. These are activities where I am mostly fulfilling obligations to other people, though they are also activities where I get relatively much visibility from limited effort. But it does not mean doing any research on my own.
So let's say I put in even more time. I can now advertise some of the research I (and my collaborators) have been doing and help shape the directions of my own and others' research, for example by going to give invited talks, or write parts of survey articles. I can start new initiatives, e.g. competitions, discussion groups or grant proposals. These are things that make a lot of sense to do and I enjoy doing them, but it is still not the same as actually doing my own research.
Finally, when I'm done with all of the above and there are no urgent mails awaiting reply in my inbox, I could sit down with my own idea, write some code of my own, run my own experiments and write a paper with myself as first author. In other words, do some research of my own. This rarely happens, as I rarely get to this point and when I do I rarely have any energy left.
The utility of that sixtieth work hour is so much higher than the tenth or twentieth because I can use it do my own research. If I work 60 hours rather than 40, I don't get 50% more of my own research done, but almost infinitely much more, as I would otherwise get almost none of my own research done. Given that I am interested in doing research of my own, there is a very strong incentive to work very long hours. It is not that I am uninterested in any of the other parts of my job - I enjoy all of them, except grant writing and meetings with management - but I am more interested in research.
You could compare the job of an academic to having a teaching and administration job and having research as a hobby. Except that the "hobby" is the presumed core of the job as advertised, and the reason you applied for the job.
An interesting detail is that it wasn't always like this. Back when I was a PhD student, and to a large extent still when I was a postdoc, my marginal work utility was basically flat as I spent most of my time on my own research. But as I was good at research I got promoted to a position where I had to spend most of my time doing something else. (Incidentally, I have seen quite a few excellent researchers who are less than excellent at teaching and administration.)
Finally, let me point out that I am not complaining, just explaining. And I am certainly not blaming anyone, especially not my wonderful colleagues and students. After all, I love my job and I would not trade it for any other job. I just hope to have made the logic clear behind why I, and probably many others like me, work such long hours.