Monday, August 08, 2022

Apology for Video Games Research

I just finished reading this excellent history of early digital computing, disguised as a biography of computing researcher and visionary J. C. R. Licklider. One of the things that the book drove home was the pushback, skepticism, and even hostility you faced if you wanted to work on things such as interactive graphics, networking, or time-sharing in the early decades of digital computers. In the fifties, sixties, and even seventies, the mainstream opinion was that computers were equipment for serious data processing and nothing else. Computers should be relatively few (maybe one per company or department), manned by professional computer operators, and work on serious tasks such as payrolls, nuclear explosion simulations, or financial forecasting. Computing should happen in batch mode, and interactive interfaces and graphical output were frivolities and at best a distraction.

In such an environment, Licklider had the audacity to believe in a future of interconnected personal computers with interactive, easy-to-use graphical interfaces and fingertip access to the world's knowledge as well as to your friends and colleagues. He wrote about this in 1960. Through enthusiasm, smart maneuvering, and happenstance he got to lead his own research group on these topics. But more importantly, he became a program manager at the organization that would become DARPA, and not only directed tons of money into this vision of the future but also catalyzed the formation of a research community on interactive, networked computing. The impact was enormous. Indirectly, Licklider is one of the key people in creating the type of computing that permeates our entire society.

When I go out and talk about artificial intelligence and games, I often make the point that games were important to AI research since the very beginning. And that's true if we talk about classical board games such as Chess and Checkers. Turing, von Neumann, and McCarthy all worked on Chess, because it was seen as a task that required real intelligence to do well at. It was also easy to simulate, and perhaps most importantly, it was respectable. Important people had been playing Chess for millennia, and talked about the intellectual challenges of the game. And so, Chess was important in AI research for 50 years or so, leading to lots of algorithmic innovations, until we sucked that game dry.


Video games are apparently a completely different matter. It's a new form of media, invented only in the seventies (if you don't count Spacewar! from 1962), and from the beginning associated with pale teenagers in their parents' basements and rowdy kids wasting time and money at arcade halls. Early video games had such simple graphics that you couldn't see what you were doing, later the graphics got better, and you could see that what you were doing was often shockingly violent (on the other hand, Chess is arguably a very low-fidelity representation of violence). Clearly, video games are not respectable.

I started doing research using video games as AI testbeds in 2004. The first paper from my PhD concerned using a weight-sharing neural architecture in a simple arcade game, and the second paper was about evolving neural networks to play a racing game. That paper ended up winning a best paper award at a large evolutionary computation conference. The reactions I got to this were... mixed. Many people felt that while my paper was fun, the award should have gone to "serious" research instead. Throughout the following years, I often encountered the explicit or implicit question about whether I was going to start doing serious research soon. Something more important, and respectable, than AI for video games. 

Gradually, as a healthy research community has formed around AI for video games, people have grudgingly had to admit that there might be something there after all. If nothing else, the game industry is economically important, and courses on games draw a lot of students. That DeepMind and OpenAI have (belatedly) started using games as testbeds has also helped with recognition. But still, I get asked what might happen if video games go away: will my research field disappear then? Maybe video games are just a fad? And if I want to do great things, why am I working on video games?

Dear reader, please imagine me not rolling my eyes at this point.


As you may imagine, during my career I've had to make the case for why video games research is worthwhile, important even, quite a few times. So here, I'll try to distill this into not-too-many words. And while I'm at it, I'd like to point out that the "apology" in the title of this text should be read more like Socrates' apology, as a forceful argument. I'm certainly not apologizing for engaging in video games research. For now, I will leave it unsaid whether I think anyone else ought to apologize for things they said about video games.

To begin with, video games are the dominant media of the generation that is in school now. Video games, for them, are not just a separate activity but an integrated part of social life, where Minecraft, Roblox, and Fortnite are both places to be, ways of communicating, and activities to do. Before that, two whole generations grew up playing video games to various extents. Now, studying the dominant media of today to try to understand it better would seem to be a worthwhile endeavor. Luckily, video games are eminently studiable. Modern games log all kinds of data with their developers, and it is also very easy to change the game for different players, creating different "experimental conditions". So, a perfect setting for both quantitative and qualitative research into how people actually behave in virtual worlds. While this ubiquitous data collection certainly has some nefarious applications, it also makes behavioral sciences at scale possible in ways that were never before.

People who don't play games much tend to underestimate the variety of game themes and mechanics out there. There are platform games (like Super Mario Bros), first-person shooters (like Call of Duty) and casual puzzle games (like Candy Crush)... is there anything else? Yes. For example, there are various role-playing games, dating simulators, flight simulators, racing games, team-based tactics games, turn-based strategy games, collectible card games, games where you open boxes, arrange boxes, build things out of boxes, and there's of course boxing games. I'm not going to continue listing game genres here, you get the point. My guess is that the variety of activities you can undertake in video games is probably larger than it is in most people's lives.

To me, it sounds ridiculous to suggest that video games would some day "go away" because we got tired of them or something. But it is very possible that in a decade or two, we don't talk much about video games. Not because they will have become less popular, but because they will have suffused into everything else. The diversity of video games may be so great that it might make no sense to refer to them as a single concept (this may already be the case). Maybe all kinds of activities and items will come with a digitally simulated version, which will in some way be like video games. In either case, it will all in some  ways have developed from design, technology, and conventions that already exist.

In general, it's true that video games are modeled on the "real world". Almost every video game includes activities or themes that are taken from, or at least inspired by, the physical world we interact with. But it's also increasingly true that the real world is modeled on video games. Generations of people have spent large amounts of their time in video games, and have learned and come to expect certain standards for interaction and information representation; it is no wonder that when we build new layers of our shared social and technical world, we use conventions and ideas from video games. This runs the gamut from "gamification", which in its simplest form is basically adding reward mechanics to everything, to ways of telling stories, controlling vehicles, displaying data, and teaching skills. So, understanding how video games work and how people live in them is increasingly relevant to understanding how people live in the world in general.


The world of tomorrow will build not only on the design and conventions of video games, but also on their technology. More and more things will happen in 3D worlds, including simulating and testing new designs and demonstrating new products to consumers. We will get used to interacting with washing machines, libraries, highway intersections, parks, cafés and so on in virtual form before we interact with them in the flesh, and sometimes before they exist in the physical world. This is also how we will be trained on new technology and procedures. By far the best technology for such simulations, with an unassailable lead because of their wide deployment, is game engines. Hence, contributing to technology for games means contributing to technology that will be ubiquitous soon.

Now, let's talk about AI again. I brand myself an "AI and games researcher", which is convenient because the AI people have a little box to put me in, with the understanding that this is not really part of mainstream AI. Instead, it's a somewhat niche application. In my mind, of course, video games are anything but niche to AI. Video games are fully-fledged environments, complete with rewards and similar incentives, where neural networks and their friends can learn to behave. Games are really unparalleled as AI problems/environments, because not only do we have so many different games that contain tasks that are relevant for humans, but these games are also designed to gradually teach humans to play them. If humans can learn, so should AI agents. Other advantages include fast simulation time, unified interfaces, and huge amounts of data from human players that can be learned from. You could even say that video games are all AI needs, assuming we go beyond the shockingly narrow list of games that are commonly used as testbeds and embrace the weird and wonderful world of video games in its remarkable diversity.

AI in video games is not only about playing them. Equally importantly, we can use AI to understand players and to learn to design games and the content inside them. Both of these applications of AI can improve video games, and the things that video games will evolve into. Generating new video game content may also be crucial to help develop AI agents with more general skills, and understanding players means understanding humans.


It is true that some people insist that AI should "move on" from games to "real" problems. However, as I've argued above, the real world is about to become more like video games, and build more on video game technology. The real world comes to video games as much as video games come to the real world.

After reading this far, you might understand why I found reading about Licklider's life so inspirational. He was living in the future, while surrounded by people who were either uninterested or dismissive, but luckily also by some who shared the vision. This was pretty much how I felt maybe 15 years ago. These days, I feel that I'm living in the present, with a vision that many younger researchers nod approvingly to. Unfortunately, many of those who hold power over research funding and appointments have not really gotten the message. Probably because they belong to the shrinking minority (in rich countries) who never play video games.

I'd like to prudently point out that I am not comparing myself with Licklider in terms of impact or intellect, though I would love to one day get there. But his example resonated with me. And since we're talking about Licklider, one of his main contributions was building a research community around interactive and networked computing using defense money. For people who work on video games research and are used to constantly disguising our projects as being about something else, it would be very nice to actually have access to funding. Following the reasoning above, I think it would be well-invested money. If you are reading this and are someone with power over funding decisions, please consider this a plea.

If you are a junior researcher interested in video games research and face the problem that people with power over your career don't believe in your field, you may want to send them this text. Maybe it'll win them over. Or maybe they'll think that I am a total crackpot and wonder how I ever got a faculty job at a prestigious university, which is good for you because you can blame me for the bad influence. I don't care, I have tenure. Finally, next time someone asks you why video games research is important, try turning it around. Video games are central to our future in so many ways, so if your research has no bearing on video games, how is your research relevant for the world of tomorrow?

Note: Throughout this text I have avoided using the term "metaverse" because I don't know what it means and neither do you.

Thanks to Aaron Dharna, Sam Earle, Mike Green, Ahmed Khalifa, Raz Saremi, and Graham Todd for feedback on a draft version of this post.

Friday, July 29, 2022

Brief statement of research vision

I thought I would try to very briefly state the research vision that has in some incarnation animated me since I started doing research almost twenty years ago. Obviously, this could take forever and hundreds of pages. But I had some good wine and need to go to bed soon, so I'll try to finish this and post before I fall asleep, thus keeping it short. No editing, just the raw thoughts. Max one page.

The objective is to create more general artificial intelligence. I'm not saying general intelligence, because I don't think truly general intelligence - the ability to solve any solvable task - could exist. I'm just saying considerably more general artificial intelligence than what we have now, in the sense that the same artificial system could do a large variety of different cognitive-seeming things.

The way to get there is to train sets of diverse-but-related agents in persistent generative virtual worlds. Training agents to play particular video games is all good, but we need more than one game, we need lots of different games with lots of different versions of each. Therefore, we need to generate these worlds, complete with rules and environments. This generative process needs to be sensitive to the capabilities and needs/interests of the agents, in the sense that it generates the content that will best help the agents to develop.

The agents will need to be trained over multiple timescales, both faster "individual" timescales and slower "evolutionary" timescales; perhaps we will need many more different timescales. Different learning algorithms might be deployed at different timescales, perhaps with gradient descent for the lifetime learning and evolution at longer timescales. The agents need to be diverse - without diversity we will collapse to learning a single thing - but they will also need to build on shared capabilities. A quality-diversity evolutionary process might provide the right framework for this.

Of course, drawing a sharp line between agents and environments is arbitrary and probably a dead end at some point. In the natural world, the environments largely consists of other agents, or is created by other agents, of the same species or others. Therefore, the environment and rule generation processes should also be agential, and subject to the same constraints and rewards; ideally, there is no difference between "playing" agents and "generating" agents.

Human involvement could and probably should happen at any stage. This system should be able to identify challenges and deliver them to humans, for example to navigate around a particular obstacle, devise a problem that a particular agent can't solve, and things like that. These challenges could be delivered to humans at a massively distributed scale in a way that provides a game-like experience for human participants, allowing them to inject new ideas into the process where the process needs it most and "anchoring" the developing intelligence in human capabilities. The system might model humans' interests and skills to select the most appropriate human participants to present certain challenges to.

Basically, we are talking about a giant, extremely diverse video game-like virtual world with enormous agent diversity constantly creating itself in a process where algorithms collaborate with humans, creating the ferment from which more general intelligence can evolve. This is important because current agential AI is held back by the tasks and environments we present it with far more than by architectures and learning algorithms.

Of course, I phrase this as a project where the objective is to develop artificial intelligence. But you could just as well turn it around, and see it as a system that creates interesting experiences to humans. AI for games rather than games for AI. Two sides of the same coin etc. Often, the "scientific objective" of a project is a convenient lie; you develop interesting technology and see where it leads.

I find it fascinating to think about how much of this plan has been there for almost twenty years. Obviously, I've been influenced by what other people think and do research-wise, or at least I really hope so. But I do think the general ideas have more or less been there since the start. And many (most?) of the 300 or so papers that have my name on them (usually with the hard work done by my students and/or colleagues) are in some way related to this overall vision.

The research vision I'm presenting here is certainly way more mainstream now than it was a decade or two ago; many of the ideas now fall under the moniker "open-ended learning". I believe that almost any idea worth exploring is more or less independently rediscovered by many people, and that there comes a time for every good idea when the idea is "in the air" and becomes obvious to everyone in the field. I hope this happens to the vision laid out above, because it means that more of this vision gets realized. But while I'm excited for this, it would also mean that I would have to actively go out and look for a new research vision. This might mean freedom and/or stagnation.

Anyway, I'm falling asleep. Time to hit publish and go to bed.

Friday, May 13, 2022

We tried learning AI from games. How about learning from players?

Aren't we done with games yet? Some would say that while games were useful for AI research for a while, our algorithms have mastered them now and it is time to move to real problems in the real world. I say that AI has barely gotten started with games, and we are more likely to be done with the real world before we are done with games.

I'm sure you think you've heard this one before. Both reinforcement learning and tree search largely developed in the context of board games. Adversarial tree search took big steps forward because we wanted our programs to play Chess better, and for more than a decade, TD-Gammon, Tesauro's 1992 Backgammon player, was the only good example of reinforcement learning being good at something. Later on, the game of Go catalyzed development of Monte Carlo Tree Search. A little later still, simple video games like those made for the old Atari VCS helped us make reinforcement learning work with deep networks. By pushing those methods hard and sacrificing immense amounts of compute to the almighty Gradient we could teach these networks to play really complex games such as DoTA and StarCraft. But then it turns out that networks trained to play a video game aren't necessarily any good at doing any tasks that are not playing video games. Even worse, they aren't even any good at playing another video game, or another level of the same game, or the same level of the same game with slight visual distortions. Sad, really. A bunch of ideas have been proposed for how to improve this situation, but progress is slow going. And that's where we are.



A Wolf Made from Spaghetti, as generated by the Midjourney diffusion model. All images in this blog post were generated by Midjourney using prompts relevant to the text.












As I said, that's not the story I'm going to tell here. I've told it before, at length. Also, I just told it, briefly, above.

It's not controversial to say that the most impressive results in AI from the last few years have not come from reinforcement learning or tree search. Instead, they have come from self-supervised learning. Large language models, which are trained to do something as simple as predicting the next word (okay, technically the next token) given some text, have proven to be incredibly capable. Not only can they write prose in a wide variety of different styles, but also answer factual questions, translate between languages, impersonate your imaginary childhood friends and many other things they were absolutely not trained for. It's quite amazing really, and we're not really sure what's going on more than that the Gradient and the Data did it. Of course, learning to predict the next word is an idea that goes back at least to Shannon in the 1940s, but what changed was scale: more data, more compute, and bigger and better networks. In a parallel development, unsupervised learning on images has advanced from barely being able to generate generic, blurry faces to creating high-quality high-resolution illustrations of arbitrary prompts in arbitrary styles. Most people could not produce a photorealistic picture of a wolf made from spaghetti, but DALL-E 2 presumably could. A big part of this is the progression in methods from autoencoders to GANs to diffusion models, but an arguably more important reason for this progress is the use of slightly obscene amounts of data and compute.


As impressive as progress in language and image generation is, these modalities are not grounded in actions in a world. We describe the words, and we do things with words. (I take an action when I ask you to pass me the sugar, and you react to this, for example by passing the sugar.) Still, GPT-3 and its ilk do not have a way to relate what it says to actions and their consequences in the world. In fact, it does not really have a way of relating to the world at all, instead it says things that "sound good" (are probable next words). If what a language model says happens to be factually true about the world, that's a side effect of its aesthetics (likelihood estimates). And to say that current language models are fuzzy about the truth is a bit of an understatement; recently I asked GPT-3 to generate biographies of me, and they are typically a mix of some verifiably true statements ("Togelius is a leading game AI researcher") with plenty of plausible-sounding but untrue statements such as that I'm born in 1981 or that I'm a professor at the University of Sussex. Some of these false statements are flattering, such as that I invented AlphaGo, others less flattering, such as that I'm from Stockholm.

We have come to the point in any self-respecting blog post about AI where we ask what intelligence is, really. And really, it is about being an agent that acts in a world of some kind. The more intelligent the agent is, the more "successful" or "adaptive" or something like that the acting should be, relative to a world or a set of environments in a world.

Now, language models like GPT-3 and image generators like DALL-E 2 are not agents in any meaningful sense of the word. They did not learn in a world; they have no environments they are adapted to. Sure, you can twist the definition of agent and environment to say that GPT-3 acts when it produces text and its environment is the training algorithm and data. But the words it produces do not have meaning in that "world". A pure language model never has to learn what its words mean because it never acts or observes consequences in the world from which those words derive meaning. GPT-3 can't help lying because it has no skin in the game. I have no worries about a language model or an image generator taking over the world, because they don't know how to do anything.

Let's go back to talking about games. (I say this often.) Sure, tree search poses unreasonable demands on its environments (fast forward models), and reinforcement learning is awfully inefficient and has a terrible tendency to overfit, so that after spending huge compute resources you end up with a clever but oh so brittle model. For some types of games, reinforcement learning has not been demonstrated to work at all. Imagine training a language model like GPT-3 with reinforcement learning and some kind of text quality-based reward function; it would be possible, but I'll see you in 2146 when it finishes training.


But what games have got going for them is that they are about taking actions in a world and learning from the effects of the actions. Not necessarily the same world that we live most of our lives in, but often something close to that, and always a world that makes sense for us (because the games are made for us to play). Also, there is an enormous variety among those worlds, and the environments within them. If you think that all games are arcade games from the eighties or first-person shooters where you fight demons, you need to educate yourself. Preferably by playing more games. There are games (or whatever you want to call them, interactive experiences?) where you run farms, plot romantic intrigues, unpack boxes to learn about someone's life, cook food, build empires, dance, take a hike, or work in pizza parlors. Just to take some examples from the top of my head. Think of an activity that humans do with some regularity, and I'm pretty certain that someone has made a game that represents this activity at some level of abstraction. And in fact, there are lots of activities and situations in games that do not exist (or are very rare) in the real world. As more of our lives move into virtual domains, the affordances and intricacies of these worlds will only multiply. The ingenious mechanism that creates more relevant worlds to learn to act in is the creativity of human game designers; because originality is rewarded (at least in some game design communities) designers compete to come up with new situations and procedures to make games out of.

Awesome. Now, how could we use this immense variety of worlds, environments, and tasks to learn more general intelligence that is truly agentic? If tree search and reinforcement learning are not enough to do this on their own, is there a way we could leverage the power of unsupervised learning on massive datasets for this?

Yes, there is. But this requires a shift in mindset: we are going to learn as-general-as-we-can artificial intelligence not only from games, but also from gamers. Because while there are many games out there, there are even more gamers. Billions of them, in fact. My proposition here is simple: train enormous neural networks to learn to predict the next action given an observation of a game state (or perhaps a sequence of several previous game states). This is essentially what the player is doing when watching the screen of a game and manipulating a controller, mouse or keyboard to play it. It is also a close analogue of training a large language model on a vast variety of different types of human-written text. And while the state observation from most games is largely visual, we know from GANs and diffusion models that self-supervised learning can work very effectively on image data.

So, if we manage to train deep learning models that take descriptions of game states as inputs and produce actions as output (analogously to a model that takes a text as input and produces a new word, or takes an image as input and produces a description), what does this get us? To paraphrase a famous philosopher, the foundation models have described the world, but the behavior foundation models will change it. The output will actually be actions situated in a world of sorts, which is something very different than text and images.

I don't want to give the impression that I believe that this would "solve intelligence"; intelligence is not that kind of "problem". But I do believe that behavior foundation models trained on a large variety (and volume) of gameplay traces would help us learn much about intelligence, in particular if we see intelligence as adaptive behavior. It would also almost certainly give us models that would be useful for robotics and all kinds of other tasks that involve controlling embodied agents including, of course, video games.



I think the main reason that this has not already been done is that the people who would do it don't have access to the data. Most modern video games "phone home" to some extent, meaning that they send data about their players to the developers. This data  is mostly used to understand how their games are played, as well as balancing and bug fixing. The extent and nature of this data varies widely, with some games mostly sending session information (when did you start and finish playing, which levels did you play) and others sending much more detailed data. It is probably very rare to log data at the level of detail we would need to train foundation models of behavior, but certainly possible and almost certainly already done by some game. The problem is that game development companies tend to be extremely protective about this data, as they see it as business critical.

There are some datasets available out there to start with, for example one used to learn from demonstrations in CounterStrike (CS:GO). Other efforts, including some I've been involved in myself, used much less data. However, to train these models properly, you would probably need very large amounts of data from many different games. We would need a Common Crawl or at least an ImageNet of game behavior. (There is a Game Trace Archive, which could be seen as a first step.)

There are many other things that need to be worked out as well. What are the inputs - pixels, or something more clever? And output also differs somewhat between games (except for consoles, which use standardized controllers and conventions) - should there be some intermediate representations? How frequent does the data capture need to be? And, of course, there's the question of what kind of neural architecture would best support these kinds of models.

Depending on how you plan to use these models, there are some ethical considerations. One is that we would be building on lots of information that players are giving by playing games. This is of course already happening, but most people are not aware that some real-world characteristics of people are predictable from playtraces. As the behavior exhibited by trained models would not be any particular person's playstyle, and we are not interested in identifiable behavior, this may be less of a concern. Another thing to think about is what kind of behavior these models will learn from game traces, given that the default verb in many games is "shoot". And while a large portion of the world's population play video games, the demographics is still skewed. It will be interesting to study what the equivalent of conditional inputs or prompting will be for foundation models of behavior, allowing us to control the output of these models.



Personally, I think this is the most promising road not yet taken to more general AI. I'm ready to get started. Both in my academic role as head of the NYU Game Innovation Lab, and in my role as research director at our game AI startup modl.ai, where we plan to use foundation models to enable game agents and game testing among other things. If anyone reading this has a large dataset of game behavior and wants to collaborate, please shoot me an email! Or, if you have a game with players and want modl.ai to help you instrument it to collect data to build such models (which you could use), we're all ears!

PS. Yesterday, as I was revising this blog post, DeepMind released Gato, a huge transformer network that (among many other things) can play a variety of Atari games based on training on thousands of playtraces. My first thought was "damn, they already did more or less what I was planning to do!". But, impressive as the results are, that agent is still trained on relatively few playtraces from a handful of dissimilar games of limited complexity. There are many games in the world that have millions of daily players, and there are millions of games available across the major app stores. Atari VCS games are some of the simplest video games there are, both in terms of visual representation and mechanical and strategic complexity. So, while Gato is a welcome step forward, the real work is ahead of us!

Thanks to those who read a draft of this post and helped improve it: M Charity, Aaron Dharna, Sam Earle, Maria Edwards, Michael Green, Christoffer Holmgård, Ahmed Khalifa, Sebastian Risi, Graham Todd, Georgios Yannakakis.