Wednesday, November 05, 2014

Game-based AI competitions

So you've built an artificial intelligence, or some kind of learning system. How do you know if it's any good? Well, you test it. And what do you test it on? You let if play games of course! Games are excellent testbeds, because they were made to test human thinking skills of various kinds - in fact, they are fun to play largely because they exercise your brain so well - so therefore they will test adequate aspects of your AI's intelligence. Also, games are excellent testbeds because they execute much faster than robots, don't need any expensive hardware, can be understood by everyone and are fun to both watch and participate. In fact, it is hard to understand why you would ever use an AI testbed that was not a game. To make things easier for you, there are now a number of competitions running at the major AI/games conferences, which allow you to submit your AI agents and tests them on their capacity to play games such as car racing, Super Mario Bros, Othello, StarCraft, Ms. Pac-Man, Go or Unreal Tournament.

Alright, great! Sounds like we have AI benchmarking all sorted, right?

But hold up! There's something wrong with the picture I paint above. And it's not that games are excellent testbeds; they truly are. And I am certainly not meaning to demean the set of game-based AI competitions on offer (how could I, having started some of them myself?).

Instead, what's wrong with the picture is the idea that you first build an AI and then test it on some game. More likely, a researcher will look at the game, play it for a bit, and try to imagine how it could be played by a computer program. The researcher will then try to build an AI with the game in mind, and tune it to play this game specifically. The result, in many cases, is a piece of software that plays a game very well (in many cases better than the researcher who built it) - but who might not be good at anything else.

So what do we learn about artificial intelligence - in general - from creating software that is good at playing a particular game? How could you design game-based competitions that make you learn more about AI? In fact, how could you design game-based competitions that are easy and fun to participate in yet are scientifically meaningful? And how could you write up the results of these competitions so that they add to the sum of scientific literature, while giving everybody who worked on the project due credit? Because running, and participating in, game-based AI competitions is a form of AI research too. It's just one we have not discussed very much yet from a methodological or epistemological standpoint, or even from a standpoint of giving researchers credit for their work.

I explore these issues in a new article in TCIAIG; the article is available here (or use this direct link if you don't have access to IEEE Xplore). The article is based on my tutorial at CIG 2013, and the comments I received afterwards; this tutorial is in turn based on my experience of running a number of these competitions.

Also, the article can be seen as a travesty of a particular "literary" genre. See if you can find all the genre tropes!

Wednesday, October 22, 2014

New York, New York

In January, I will move to New York City and take up a position as Associate Professor at New York University (to be more specific in the Department of Computer Science and Engineering at the Polytechnic School of Engineering).

I am extremely excited about this. I'll be joining a growing games programme at NYU, and will be affiliated with the Media and Games Network (MAGNET) and the NYU Game Center. My new department wrote a short story about my arrival, which emphasises my work in procedural content generation and interdisciplinary collaborations. This is also my plan - continue my current main research directions (PCG, general video game playing, player modelling, game generation) while also starting new collaborative projects with the excellent people at NYU.

My five and a half years at ITU have been excellent and I've learned a lot from my amazing colleagues and developed as a researcher. I'd like to point out that I'm not moving in order to leave ITU. I'm moving because it is time to move on and seek new challenges and opportunities, and because of all the great people I will work with here and the great city I will be living in.

Maybe you want to come here as well? I am looking for two new PhD students, to start in September. You need to apply soon. And you should definitely mail me first to discuss research proposal ideas. (Hint: have a look at some of my recent papers to see what my current research interests are; these include general video game playing/generation and mixed-initiative PCG.) More information about the procedure can be found here.

Friday, October 10, 2014

Why academics and game industry don't collaborate on AI, and how we could improve the situation

In academia, there is a research community focused on artificial intelligence (and computational intelligence, which is mostly the same thing) in games. Within the games industry, there are job titles such as AI programmer and game components known as AI, and there is also a community of people self-identifying as game AI experts. Each community has their own conferences, publications and associations. The CIG and AIIDE conferences, the TCIAIG journal and the Games Technical Committee on the academic side, and the GDC AI Summit, Game AI Conference, AIGameDev website and AI Game Programmers' Guild on the industry side.However, there is surprisingly little collaboration between these communities. Instead, there is a large degree of mutual ignorance and distrust. There are a number of shining examples of crossover of industry knowledge into academia or vice versa (these are the subject of a different blog post) but these are unfortunately exceptions to the rule. Every now and then, someone comes along and points out the lack of collaboration, and vigorous discussion ensues. Every time this discussion starts, many people who have not thought about this situation are honestly surprised (they might not even have known that there was game AI research in academia or game AI R&D in industry). I've seen this debate playing out on repeat dozens of times by now, so I thought I'd share my thoughts about it in a compressed form for future reference.First, here is an incomplete list of reasons why academics and game developers that work on AI don't collaborate, and often don't even understand each other.
  • Commercial game AI and academic game AI only partly overlap. There are problems that are studied in both communities, but also technical problems in games that are considered too uninteresting to be AI by academics and academic problems that are too abstract or weird to be interesting to industry along with academic methods that won't work in real games.
  • Some (many?) academics are not really interested in doing work which is useful in games, just in using games as testbeds.
  • Some academics want to do work that is useful in games but do not know the demands and constraints of game design and development.
  • Some academics want to do work that is useful in games but do not have access to code and/or data to do that. Industry generally won't make code and data, or even APIs, available because of the need to protect intellectual property.
  • Some academics want to do work that is useful in games, but target the wrong type of games. In particular, there is the perception that only AAA games in well-established genres like FPS, RPG and RTS are worth targeting, while these games are often designed so as not to need much (interesting) AI. Why work on solving AI problems in a game which was designed not to need AI? Instead, the lust and need for new AI techniques is often stronger in the indie scene and emerging game genres.
  • Some developers in industry harbour a dislike or distrust of academic research in general. This could be because the game industry tends to see itself as an entertainment industry rather than a tech industry. It could also be because certain developers were very dissatisfied with their own university educations.
  • Some (hopefully very few) academics have a superiority complex and think their fancy degrees entitle them to talk down to the unwashed masses.
  • Some (hopefully very few) developers in industry have a superiority complex and think the fact that their software sells entitle them to tell others how Things Really Are.
  • There is little or no funding available for game-applicable AI research from government funding agencies, because games are not seen as important enough when you could instead do research on curing cancer or stopping terrorists.
  • There is no funding available for game-applicable AI research from companies, because game development companies don't support research, unlike companies in many other sectors. This could be because of the very short time horizon of game development companies. A car company or insurance company will have a 20-year strategic plan, a game company most likely won't.
  • The academic incentive system - publish, get cited, get grants - does not reward doing the "last mile" of converting your research prototype into something that could be plugged into a game engine.
  • The game industry's incentive system - make sure the next game sells or you're out of a job - does not reward trying things that might not work.
  • We simply don't go to the same places, because GDC is too expensive for academics and does not have any publication options, and academic conferences are deemed irrelevant by industry.
Perhaps most of these reasons point to an underlying difference in motivation. Researchers want to do crazy long-term stuff, developers want to do safe short-term stuff. I'm certainly of the crazy persuasion myself. But this shouldn't stop us from collaborating - rather, this complementarity is a fantastic opportunity. (Indeed, some of my own best research-work has been in collaboration with a more down-to-earth personality who tempers my wild flights of fancy.) We just need to find ways of collaborating, and stop blaming each other.I'm convinced the intersection of the interests of academic researchers and industrial developers is large enough to permit much more collaboration than we see right now. So how could we improve this situation? Here's some suggestions, again an incomplete list:
  • Respect each other. Don't slander each other.
  • Be curious about each other's work, both in terms of contribution and motivation.
  • Write blog posts (not just academic papers written in hard-to-penetrate academese) and post videos explaining your work. Goes for both academics and developers, maybe particularly for academics.
  • Find ways of making datasets and source code from commercial games available, perhaps with an NDA; alternatively ship games with an API to interface AI.
  • Make it easier to publish academic work based on collaboration with game developers. Special sessions of conferences and special issues in journals.
  • Reward developers for going to academic conferences.
  • Make it possible to publish academic papers at industry conferences, alternatively just make it cheaper to attend for academics.
  • Place more value on published (or otherwise publicly released) games and playable experiences in academics' CVs.
  • Fund applied research into game AI.
  • (Ok, I'll stop joking. Not very likely that this will happen. We'll make this work without funding because we love games and AI.)
It would also be worthwhile to point out that there are more than one way for an academic to make an impact on games. Some different modes of working - and I don't have to remind you this is an incomplete list and the different modes of working are certainly not mutually exclusive - are:
  • Read up on what industry game AI developers are interested in and which problem they find hard to tackle. This can be hard, as the information is typically not available in the form of academic papers. But is a good starting point, so is the AI Game Programming Wisdom book series. Stalking talkative developers on Twitter might also help.
  • Approach game developers and ask if you can work together with them.
  • Send your students on internships in game companies, with the objective of doing some project both you and the company find interesting. This has worked well for me in the past.
  • Use benchmarks that are based on industrially relevant games. Several of the competitions that run each year at CIG and AIIDE - in particular the BotPrize and the StarCraft competition, but to some extent also Pac-Man and Mario AI (especially if you're not just focusing on performance but on other aspects of gameplay or content generation) - are relevant here.
  • Make your own games. You don't have to work with the game industry in order to make an impact on games. In fact, you don't even have to work on the problems the industry define for you. If you are the kind of researcher who likes to come up with crazy new algorithms and then invent the problem they might solve, go one step further and build a whole game around it. Or many games. Release them out in the wild, let people play them even if they are not perfect (they will not be). They serve to prove a point, that this is AI which is relevant for games, even if it's not what the industry calls "Game AI". Lead, don't follow. Redefine what games are, and what AI is. Who's stopping you? I think this kind of research is the one which is most likely to end up being really important.
  • Participate in game jams. This is an excellent way to get started with getting your hands dirty and actually making games. At least, that's what I hear, I never have time to participate in game jams because I'm busy arguing with people on the Internet.
In short, there are so many cool things to do that we should all stop complaining about our various differences and just get on with doing great work. So what are you waiting for? What am I waiting for? I'll stop writing now.
Acknowledgements: this mail started as a reply to a mail by Robert Zubek, which was a reaction to an ongoing discussion in the AI Game Programmer Guild's mailing list, which in turn partly continues the Twitter debate following Squirrel Eiserloh's recent keynote at AIIDE. My views have been shaped by ongoing discussions about this topic with more people than I can remember in various mailing lists and conferences. People whose opinions on this topic have illuminated my own recently include Dave Mark, Alex Champandard, Tommy Thompson, Mike Cook and Luke Dickens.

Friday, July 25, 2014

A Panorama of Artificial and Computational Intelligence in Games

Georgios Yannakakis and I are proud to present our new survey paper called "A Panorama of Artificial and Computational Intelligence in Games", to be published in IEEE TCIAIG.

It is available from IEEE Xplore here; if you don't have access to Xplore, there's a free pdf available here.

This is a very ambitious paper, on which we have worked on and off for two years. The aim is nothing less than to survey which are the active areas of research and application of computational intelligence (CI) and artificial intelligence (AI) in games, and how these areas inform each other.

To this end, we identify 10 key areas of research and application, and more than 50 ways in which these areas interact. So if you ever wanted to know how NPC learning affects procedural content generation, where player modelling can be used in commercial games or how game-based AI competitions inform planning research, these are exactly the kind of questions we seek to answer here.

Furthermore, we try to identify which of these research areas inform technology development for which part of the player-game interaction loop, and which key AI and machine learning algorithms are used in those areas. Yes, it is rather complex, though we have attempted to make it more like a readable paper and less like a phone directory or circuit diagram.

While the article is not really meant as an introduction to research in game AI, it might be used as supplementary reading in courses on AI/CI and games. However, its main purpose is to help you, as a researcher or practioner in this field or some neighbouring field, structure and understand this rapidly evolving research field. As part of this mapping operation, we have also identified numerous underexplored or unexplored connections, so there are a number of suggestions for promising future research projects within the paper.

The article builds partly on the Dagstuhl seminar in 2012, where around 40 of the world's leading experts on CI and AI in games assembled to identify the key areas and problems of this research field. It also builds on our own experience of working in the field for around a decade, including running several workshops and conferences. However, despite our best attempts at being inclusive and adopting a panoramic perspective (meaning that we try to see everything), the article will of course be coloured by our own perspective. That's why we are so eager to hear your comments on the paper - we sincerely hope this paper can catalyse a discussion within the community on what exactly it is we are doing!

We will give a tutorial based on this paper at CIG 2014 - of course, we hope as many as possible of you can make it there.

Now go on and read the paper - we look forward to your feedback!

Wednesday, July 23, 2014

MCTS for arcade games

This blog post contains entertaining videos of an AI agent playing Super Mario Bros, making a series of interesting mistakes but ultimately playing quite well (after several modifications). But before watching those, I want you to read about the algorithm that plays the game.

Monte Carlo Tree Search (MCTS) is the new black in game AI. At least for board games and similar games, including many types of card games and puzzles. The algorithm has only been around since 2006, but has already become the method of choice for Go (the Asian board game) and "general game playing" (developing AI that can play any of a wide range of games). People had been working on AI for Go for decades without producing anything that played better than a human beginner, but after MCTS was invented playing strength increased drastically to the point where computers can now compete with human club players. All the best Go-playing programs use MCTS. For the General Game Playing Competition, and the General Video Game Playing Competition, agents based on MCTS are pretty much the only ones that work at all.

At its core, MCTS is a tree search algorithm. Just like Minimax, the simple algorithm that is used for playing games such as Chess and Checkers among many others, it looks at the possible moves that can be made from a particular (board) state, and selects the best move based on which states these moves result in and which moves can be taken from them. There are two big differences with Minimax, however. The first is that while Minimax explores all moves equally, MCTS is prejudiced and explores promising branches of the tree much deeper than others. For this, it uses the UCB equation, which determines which nodes are underexplored and should be explored further. The second difference is that while Minimax needs an evaluation function that assigns a "goodness" value to a game state (how close that game state is to winning), MCTS evaluates a state through playing a number of games until the end by taking random actions. Yes, random; it sounds crazy but it works. Because MCTS does not need an evaluation function, it works well for games where it is very hard to effectively evaluate the quality of a game state (such as Go), and because MCTS builds an unbalanced tree, it works for games that have very high branching factors (such as Go).

Are you still with me? Good. The above was of course a gross simplification. For the full story of MCTS, including pseudocode, history and several common improvements, see this excellent survey paper. Now back to the story.

So, MCTS does really well for Go and for a number of other board game-like games. What other games would it work well on? An hypothesis of mine was that for MCTS to work well, the game needed to be:
1. Turn-based, or close to turn-based. Relatively few actions should be taken in a given time period, and each action should have a measurable and frequently meaningful impact on the game state.
2. Turn-limited. After playing a certain number of turns, even if just taking random actions, one should be guaranteed to arrive at an end state (win, loss or draw).
The reasoning is if that the game is not guaranteed to finish in a limited time, the playouts might go on forever; also, if the game is not turn-based, each action might mean too little for the value of each node to be estimated. Go, the main success story of MCTS, fulfils both of these criteria.

Most arcade games, and other action games, fulfils none of these criteria. Typically, you have high frame rates (meaning that each action only changes the state a little bit) and considerable freedom of movement (so that you're not pushed towards an inevitable end of the game). So I thought it would be interesting to investigate whether MCTS would work for such a game, in what ways it would fail if it didn't work, and how we could modify the algorithm to make it work. But then, someone would need to actually do this research. Luckily, two very talented students, Emil Juul Jacobsen and Rasmus Greve, walked into my office one day and inquired about doing a bachelors thesis. So we had a deal. In what follows, Emil and Rasmus did all the hard work, whereas I was mostly waving my hands around and saying encouraging things ("supervising").

What follows below is mostly a concise summary of our recent paper.

First, we decided on a game. We chose the Mario AI Benchmark, which is a hacked version of a clone of Super Mario Bros. (You can find the old version here, the new version here and an overview paper here.) The version of the Mario AI Benchmark we used has rather simple levels, and it has previously been shown by Baumgarten that A* search in state space can play these levels very well. (In fact, our MCTS implementation actually reuses the forward model from Baumgarten's A* implementation.)

Our first, naive attempts at playing Mario with MCTS met with limited success. We had to make a couple of modifications to the algorithm to make it work at all. As it is very rarely possible to "roll out" (play randomly) until the end of the level, we had to cap the rollouts to a very low number (e.g. 6) actions, and if Mario wasn't either dead or winning by then we returned how far Mario had travelled to the right as a state evaluation. This gave us rudimentary gameplay ability. After removing the possibility to press the "down" button (or rather, to issue the "down" command) performance improved somewhat, but we still see a shortsighted and somewhat cowardly Mario in the video below.

Indeed, not very impressive. Note the red line in front of Mario, which shows the best found path in each time step; it is not very long. The short action size and constraints on real-time control conspire to limit the maximum depth of the tree that can be explored. That explains the shortsightedness. But what about the cowardice, with Mario hesitating in front of pipes and cannons and not having the courage to jump? (An earlier version of the agent had him hesitating in front of holes too.) We hypothesised that this is because the average outcome of jumping (falling into a hole, or getting hurt by a flower) is worse than the average outcome of doing nothing, even though the best-case outcome is considerably better.

Now we had our baseline result. We decided to see whether we could improve on it, by modifying MCTS to perform better on this type of task. Below, describe the five modifications we implemented and evaluated.

Mixmax rewards tries to address the issue of cowardice. The basic idea is very simple: in addition to just backing up the average value of all children of a node, we also back up to maximum value of any child of that node. The actual value of the node is some combination of the average and max. A bit of empirical testing showed that around 20% max was a good number. The Mixmax Mario player below shows a somewhat more assertive behaviour, and also scores better than the baseline in systematic evaluations.

The next modification is Macro actions. This is a rather crude attempt to counteract the shortsightedness of Mario by making each move last longer. Each time an action is explored, the same action is taken for a number (e.g. 3 or 5) of time steps. The idea of Macro-actions is certainly not new; it has been explored before in the context of the Physical Travelling Salesperson Problem. In our case, an added feature is that the search reverts to a single time step when Mario is close to an enemy. This yields a player who sees further into the future than standard MCTS but does not play much better.

Partial expansion is another attempt to remedy Mario's shortsightedness. Here, we've modified the UCT formula that underlies the MCTS algorithm to make is possible to explore grandchildren of a node before having explored all children of that node. This allows the algorithm to explore deeper but sparser trees, and works relatively well in practice.

The order in which actions are explored can be quite important in MCTS. Ideally, you want the best actions explored first - but of course, you don't know which those are. There have been several attempts at making more intelligent action ordering in the literature. We call our solution Roulette wheel selection; we simply estimate the average value of each action (there are only 16 possible button combinations) and then use the classic roulette wheel selection operator from genetic algorithms to probabilistically select which action to explore each time we expand a node.

Finally, we experimented with throwing in a little bit of domain knowledge. Hole detection is a very simple modification of the reward scheme, where we simply give a negative reward for ever being inside a hole, rather than only for reaching the bottom of a hole. As you can see, this allows Mario to avoid holes better.

We then evaluated all possible combinations of these various improvements. It turns out that there are several combinations that perform indistinguishably from A*, and thus seem to play optimally (at least on these simple linear levels). In general, all modifications work with each other most of the time except macro actions, which are frequently detrimental. Below you can see an example of gameplay with all modifications except macro actions turned on.

Pretty neat, eh?

It even works well of you remove the domain knowledge.

So, it seems that we've cracked the problem of how to make MCTS play Super Mario Bros. A couple of rather simple modifications to the basic MCTS algorithm takes it from very bad performance to really rather good performance. We believe most of these modifications will translate well to other game domains, and might therefore be useful to anyone interested in applying MCTS to games that more like Super Mario Bros than like Go.

There's only one tiny fly in the ointment though: A* can already play Super Mario Bros just as well. So what's the point in attempting something fancy like MCTS when you can just the trusty old best-first search algorithm?

Here's why. A* frequently produces solutions that seem extremely brittle, and relying on everything going as planned. So we decided to see what happens if we changed the benchmark a bit, so things sometimes worked out differently than planned. We simply added a 20% action noise, so that there was a 20% chance that for any given action, a random action would be taken instead of the one that the agent chose. This resulted in complete collapse for A*; performance degraded by around 2/3. MCTS on the other hand held up relatively well, losing some performance but playing considerably better than A* under noisy conditions.

So MCTS can indeed be useful for arcade games, as it produces gameplay that is not only strong but also robust.

The above was of course a simplified account of what we did, missing lots of details and results. Read the paper, published in the proceedings of GECCO 2014, to find out more!

Saturday, May 17, 2014

I am a game interface conservative

On occasion of Microsoft de-bundling the Kinect with Xbox One, here's my conservative stance on game interfaces. The following are some human-game interface gimmicks I've never expected to outlive the initial buzz, and to have any kind of lasting impact:
  • The Wiimote
  • Kinect
  • Location-based gaming (GPS as input)
  • 3D screens
  • Voice input
  • VR goggles/helmets like the Oculus Rift
  • Tilt-based controls
  • EEG (electroencephalography)
  • Eye tracking
  • Skin resistance / blood volume pulse sensors
  • Regular camera inputs (e.g. PlayStation Eye)
  • Toy guns (e.g. NES Zapper)
Obviously, the Wiimote is very much in use in millions of homes. But that's because it's the default input for a system (the Wii) that has some very good games people want to play. Most of these games don't actually use the waving-your-hands-around control mode very much.

As for the rest of the list, these gimmicks either have failed, are failing, were never seriously introduced, or play a role only for some minor niche audience. They are not gaining lasting mainstream success as input/output modalities for games.

It's easy to explain why. That's because they all make playing games harder, not easier. With all of these gimmicks, telling the game what you want to do is harder than with a keyboard/mouse/gamepad, and understanding what the game is telling you is harder than with a regular screen/TV. Fingers pressing buttons and eyes scanning a flat screen are just very, very effective at conveying information back and forth. And they do not require the player to jump around, wave their arms around, strap themselves with weird devices, wake up the neighbours, rearrange their living rooms or throw up from simulator sickness.

In a sense I'm happy that these gimmicks are failing. Not because I'm in principle against multimodal interfaces. There's lots of interesting research to be done here, and many of these technologies (Kinect, skin resistance sensors, eye tracking) have proven very useful as research tools. But when it comes to consumer technologies, I think it's important that we come up with tech that works better than what we already have before it's widely adopted. Or more likely: come up with an interesting case of something that could not be done with the "old" interfaces. If Kinect were to "succeed" it would actually be a step backwards for game interfaces: players would be using a worse interface to play essentially the same games in the same way.

Now, obiously, I'm not against research and innovation. Quite the opposite. But imagine just for a second that the same sort of resources that go into game interface research, development and marketing would go into game AI research, development and marketing. Where would we be then?

Tuesday, April 22, 2014

Metablogging my research

When I started this blog in 2006, it was with the intention that I should write short blog posts explaining my own research in a more accessible way. It appears, however, that I haven't found the time to do much of that recently. (I was better at doing so in the first few years of the blog's existence - see some examples here, here and here.) The few blog posts I've written in recent years tend to be more general discussions about things such as human intelligence, conference acceptance rates or why I work so much.

While I still intend to write more blog posts explaining my research in more accessible terms, I am not saying I will get around to actually doing so anytime soon. However, some other researchers have been kind enough to write accessibly about research done in my group. So instead of a blog post about my research, this is a blog post about blog posts about my research. Here are some picks - if you have blogged about my research (or research I supervised) and want to be included in this list, tell me and I'll add it!

First we have Mike Cook at Goldsmiths, University of London. He's written several excellent posts on my team's work over at his own project blog for his Angelina project. For example, see this post on Antonis' work with Sentient Sketchbook, or this post on the Ropossum generator for Cut the Rope levels or maybe this post on the Procedural Procedural Level Generator Generator.

Tommy Thompson at University of Derby has produced a number of very well-written posts on his blog about games and AI recently, partly for the benefit of his students and partly for a general public. This post on PCG and the Mario AI Competition is a very good example.

Alex Champandard at (a very useful resource for industry-style game AI) did an interview with me a long time ago, soon after I finished my PhD. Re-reading my answers, I guess I was fairly naive back then. I probably still am.

There has also been a number of articles about my work in New Scientist, one of the more well-known and respected popular science magazines. These articles are generally written by competent and well-informed professional science journalists (unlike some of the coverage of my work in the mainstream press which I won't link to here, as they are more likely to confuse than enlighten). Other good, short pieces (blog posts rather than articles, though it's hard to distinguish the two) from quality news outlets are the following from Kotaku, Wired and Guardian. In addition to such secondary sources there seems to be dozens of "tertiary" blog posts that are essentially just reformulations of some of the posts and articles I've linked to above - the Mario AI competition in particular is the focus of most of them.

I'll finish with the following post, which was written in 2007 and is a rather acidic attempt to ridicule some of the earliest work I in my PhD. Apparently, I took this seriously enough back then to write an elaborate response post. I think I would have been much more comfortable with that kind of criticism these days, when I have established a position in the research community. However, as it's safer to criticise a PhD student than a respected researcher, nobody criticises me anymore. Which is a shame: there must be some people out there who think that what I do is completely misguided, and constructive, well-informed criticism would be very welcome. I've even tried actively seeking critical views (and of course promote myself) by posting stories about my work on Slashdot (for example here, here and here). Well, it turns out you do get critical comments, just not always well-informed ones...

So perhaps the conclusion is that while promiting your own and others' work (and mine!) to other researchers, students and the general public is all great, it's not necessarily the best way of finding out what people you respect really think about your work. For that, you have to go to a conference and get everybody really drunk.

Thursday, February 06, 2014

The Curse of Increasing Marginal Work Utility, or Why I Work So Much

Academics constantly complain about being overworked, yet they also praise their freedom to decide how and when to work as one of the great advantages of their job. This seems contradictory. If they have such freedom, why are they so overworked?

Part of the answer is of course that you are never really done. You can always do better. Write more and better papers, get better research results and deliver better teaching. But this is true for most white-collar jobs these days - you could always work more on your next budget, product description, market report, technical drawing or whatever it is you do. If you are good at what you do, it should be enough to work normal hours, go home and get some good rest and return refreshed next day to do more work. Few successful academics do this. There seems to be some extra factor that affect academics, contributing to that they always seem stressed out - and that you very rarely see successful academics below the age of 50 working reasonable hours.

That extra factor is increasing marginal work utility. Simply put, the more you work, the more you get out of each extra work hour. This gives you a very strong incentive to work as much as possible, which is clearly not a very good idea if you also want to live your life. Let me explain.

We begin by imagining I work the statutory work week of around 40 hours. If I work efficiently, I will then have time to prepare and conduct my teaching, supervise my masters students, do my administrative duties (including answering lots of mails), review papers for some conferences and devote the minimum acceptable time to supervising my PhD students. These are all tasks that I must do - not doing them would mean that I am simply not doing my job. This amount of work is clearly not enough, as I do not get time to do any research - though I could of course try to take credit for the work of PhD students that I don't supervise enough.

If I put in some more time, I have time to do some of the research where I am not in a "driving" role. I could participate some more in what my PhD students are doing, help my masters students write up papers based on their theses (which I have supervised), do my part in joint research projects with people in other institutions, and serve on organising committees of conferences. Also, write parts of collaborative grant proposals. These are activities where I am mostly fulfilling obligations to other people, though they are also activities where I get relatively much visibility from limited effort. But it does not mean doing any research on my own.

So let's say I put in even more time. I can now advertise some of the research I (and my collaborators) have been doing and help shape the directions of my own and others' research, for example by going to give invited talks, or write parts of survey articles. I can start new initiatives, e.g. competitions, discussion groups or grant proposals. These are things that make a lot of sense to do and I enjoy doing them, but it is still not the same as actually doing my own research.

Finally, when I'm done with all of the above and there are no urgent mails awaiting reply in my inbox, I could sit down with my own idea, write some code of my own, run my own experiments and write a paper with myself as first author. In other words, do some research of my own. This rarely happens, as I rarely get to this point and when I do I rarely have any energy left.

The utility of that sixtieth work hour is so much higher than the tenth or twentieth because I can use it do my own research. If I work 60 hours rather than 40, I don't get 50% more of my own research done, but almost infinitely much more, as I would otherwise get almost none of my own research done. Given that I am interested in doing research of my own, there is a very strong incentive to work very long hours. It is not that I am uninterested in any of the other parts of my job - I enjoy all of them, except grant writing and meetings with management - but I am more interested in research.

You could compare the job of an academic to having a teaching and administration job and having research as a hobby. Except that the "hobby" is the presumed core of the job as advertised, and the reason you applied for the job.

An interesting detail is that it wasn't always like this. Back when I was a PhD student, and to a large extent still when I was a postdoc, my marginal work utility was basically flat as I spent most of my time on my own research. But as I was good at research I got promoted to a position where I had to spend most of my time doing something else. (Incidentally, I have seen quite a few excellent researchers who are less than excellent at teaching and administration.)

Finally, let me point out that I am not complaining, just explaining. And I am certainly not blaming anyone, especially not my wonderful colleagues and students. After all, I love my job and I would not trade it for any other job. I just hope to have made the logic clear behind why I, and probably many others like me, work such long hours.

Tuesday, December 17, 2013

Against "selective" conferences

Computer science differs from most other academic fields in that conference papers are counted as real, citable publications. While journals are generally seen as more important, it is perfectly possible to get a tenured faculty position without ever publishing a journal paper.

This is mainly a good thing. The relatively low time from initial submission to publication (compared to traditional journals) makes sure that research gets published relatively timely. The deadlines make sure that people get their act together and get the paper submitted. Not all computer science papers are super-polished, but who cares? It's more important that people get their ideas and results out there for others to build on.

However, it has also had the result that many computer science conferences are hard to get into. In computational intelligence it's pretty common to accept about half of the submitted papers. Other fields are much harsher. In American academia, it is common to require that a conference accepts at most 30% of submitted papers in order to be counted as "selective". Many conferences are even stricter than that, with acceptance rates in the 10% range.

Why are acceptance rates so low? The often stated reason is that conference attendees don't have enough time to see lots of bad research being presented, and therefore only the best papers should be presented. However, this assumes that all attendees see all talks. If there are many talks (or posters) in parallel at a conference, people can choose which talk/poster they want to see. This is after all how it works in fields such as medicine and physics, where conference presentations are based on abstracts, and all or most abstracts are accepted.

The real reason is that conferences want to be prestigious through being exclusive. If a conference only accepts 13% of submitted papers, getting in is seen as an accomplishment for the author, something to put in your CV to impress various committees with. Paper acceptance becomes instant gratification for the scientist. For conferences, being an exclusive conference means that you get more submissions and more attendees, and as every organisation wants to perpetuate itself there is an incentive for conferences to be exclusive.

So what is the problem with this? There are several problems. To begin with, peer review is notoriously noisy. Most conferences have three or four reviewers per paper. It is very often the case that reviewers disagree with each other. One reviewer might think a paper is excellent, another thinks it is boring and derivative, a third thinks it is off-topic, and a fourth does not understand the paper and therefore thinks its badly written and/or wrong. How do you make a decision based on this? Being the program chair of a conference means making decisions based on conflicting reviewer judgements, knowing that your decisions will often be wrong. If you have a low acceptance rate, it is extremely probable that you will reject papers with one or two negative reviews, therefore rejecting many good papers.

Rejecting good papers is bad, because you are holding up science. Good papers, whose research should be out there for the world to see, don't get published. Authors don't get feedback, and get dissuaded from doing further research.

Why are reviewer opinions so conflicting? Part of the reason is certainly that there are few incentives to do a good job when reviewing papers, or even to review papers at all, so why bother? But more fundamentally, it is not always possible to tell the good papers from the bad ones. It is often as hard to spot greatness as it is to spot errors and misconduct. Many groundbreaking papers initially got rejected. There are probably many other groundbreaking results that the world doesn't know about, because the papers were never published.

If conferences are selective, reviewers will become nasty. It is a well-known fact that reviewers in many parts of computer science are more negative than in other sciences. This is probably because they might themselves submit to the same conference, or have submitted in the past and gotten rejected, and they don't want the paper they are reviewing to be treated better than their own paper was or will be. This breeds a culture of nastiness.

People respond to incentives. With selective conferences, researchers will start writing papers to maximise the likelihood of acceptance, and start doing research that can be written up in such papers. This is a disastrous consequence, because the easiest way to get into a selective conference is to write a paper which makes a small incremental advance, and which is not wrong in any way. The easiest part of a reviewer's job (certainly so for a nasty reviewer) is to find faults in the paper under review. I personally feel I have done a good job as a reviewer when I've found many faults in the paper I'm reviewing. Papers that make drastic claims or bold hypothesis are the easiest to shoot down. It is much harder to reject a paper because it is "not exciting". Thus selective conferences could conform to the Law of Jante, even though that is nobody's intention.

I suspect that this effect is even greater for people entering some sub-field of computer science, without being hand-led by someone who is already an insider. A newcomer to the field does not know which buzzwords to use, important people to cite and theories to subscribe to, making it very easy to shoot down an uncomfortable paper.

To sum this up, selective conferences are bad for science because good papers get rejected, because they perpetuate the myth that we can accurately judge the quality of a paper before it is even published, because they breed a culture of nastiness, and because they can reward mediocre research rather than ground-breaking research.

I see no reason why we should go on playing this game. Instead, we should have inclusive conferences, that accept all papers that are good enough. This could 10%, 50% or even 100% of them. Good enough could be defined as being on topic, substantially correct, intelligibly written and making some sort of contribution. This would of course mean that we can no longer judge a researcher based on what conferences he or she gets papers accepted in.

So, if we can't judge papers based on what conference they have been published in, how should we judge them? Well, there are two ways. The first is to actually read them. Shocking as this suggestion might seem, reading a paper is the only way to really know its value. Of course, it requires that you know the research field well enough to understand the paper, and that you are prepared to spend the time it takes to read it.

The other way is to wait a few years and see if the paper influences other peoples' research, and therefore gets cited.  In addition to citation count, we could have some sort of voting mechanism, where attendants of a conference or members of a research community get to vote on the most important papers of conferences three of five years before. The problem with this is of course that you have to wait a few years.

But there is not really any way around it, if you don't want to or can't read the papers. Research takes time, and hiring or tenure decisions should not be based on the sort of incomplete information you can get from low-quality metrics such as in which venue a paper was published.

Thursday, July 18, 2013

On best paper award selection through reviewer nomination followed by attendee vote

At the IEEE Conference on Computational Intelligence on Games, we decide on our annual best paper award through reviewer nomination followed by attendee vote. Concretely, this means that we select all papers with an exceptionally high average review score, and all papers with a high score where several reviewers have separately indicated that the paper should be nominated for a best paper award. The nominated papers (for CIG 2013, there are 9 nominated papers) are presented in one or two plenary sessions, and the conference attendees are asked to vote for the best paper award at the end of the presentations by way of secret ballot. The paper with most votes wins.

To paraphrase someone famous, I think that while this system for deciding a best paper award might not be perfect, it is better than all the other systems that have been tried. However, we recently received a mail questioning this policy, and suggesting that we instead select a best paper award by a special awards committee. I wrote a longish answer, which I'm reproducing below (slightly edited):

While some conferences decide awards in a small committee, other conferences (including some reputable ones like Gecco, Foundations of Digital Games and EuroGP) decide on their awards like us, through reviewer nomination and votes among conference attendees. There are several reasons for why one might want to do it this way:

* It makes the award more legitimate and avoids many potential conflicts of interest. If the award is decided on by a small committee, who is either secret or otherwise conducts their work in secrect, suspicions can always arise about why the committee decided like it did. This is especially true for a relatively small community where the conference organisers know many of the attendees personally. (It is worth noting that the chairs of CIG has this year not selected any papers directly at all; the 9 nominees are selected purely based on a cutoff in terms of reviewer scores and number of best paper nominations.)

* It ensures that a larger number of people with different expertise get to weight in on the award. This could be seen as ensuring quality through "wisdom of the crowds", or simply as ensuring that a set of experts whose fields of competence and interest that reflect the conference attendees get to decide. In a small committee, some fields of expertise are bound to be missing.

* It engages the audience. Attendees are more likely to pay close attention in a session where every attendee is expected to provide feedback, especially if this feedback has real impact.

* It incentivises good presentations. Twenty minutes is enough to present the core ideas of any conference paper. However, many researchers do not put sufficient effort into preparing and revising their presentations, and as a result conferences are often filled with poor presentations. Knowing that getting the best paper award or not depends partly on how well you can bring your message across tends to have a great effect on presentation quality.

As a personal anecdote, earlier this year I attended the best paper session at EuroGP in Vienna. The winner was a paper that was complex, unintuitive and challenged core notions of how genetic programming works. The presenter had gone to great lenghts to prepare a presentation that most of the audience actually understood - and walked off with a very well-deserved award. To me, that presentation was worth as much as rest of the conference together.