Togelius: 2008

Sunday, December 28, 2008

Automatic Game Design

One of the papers I presented at the recent CIG conference is called "An Experiment in Automatic Game Design". Designing games automatically, what's that all about? I thought I'd take a blog post to explain the main ideas in the paper (and it really is mostly a proof-of-concept paper).

What we're trying to do is search a space of game rules for rule sets that constitute fun games. This immediately raises two questions: how do you define and search a space of game rules, and how can you measure whether a game is fun?

We decided to create set of "meta-rules" or "axioms" that define a space of grid-based games, where one of the games that could be created would be a simple version of Pac-man. The game arena ("maze") is very simple (just a few walls) and is the same for all games. The player controls an agent (the purple blob in the figure below) and can move one block up, down, left or right every time step. Apart from the agent, there are a number of red, green and blue "things" in a game. They are deliberately called "things" and not opponents, food, mines, collaborators etc. because their exact relation to the player is decided by the rules of the game.

The rules of any particular game is defined by:

The number of red, green and blue things
A table of collision effects: what happens when a thing of one colour occupies the same space as a thing of another colour or with the agent - decrementing or incrementing the score, teleportation and/or death (usually different effects for the two parts in a collision)
Movement logics - how things of different colours move (random, clockwise, counter-clockwise)
The score the player has to reach in order to win the game
The time the player has to reach this score before losing the game

In this space, a Pacman-like could be defined using e.g. red things as pills (increments score and disappears when the agent moves over them) and green things as ghosts (kills the player when he touches them, moves randomly). But many other elements are possible, including things that eat other things, teleportation etc.

Here is a screenshot from a little example game:

It should be pretty straightforward to see how game rules can be represented to be evolved: just encode them as e.g. an array of integers, and define some sensible mutation and possibly recombination operators. (In this particular case, we use a simple generational EA without crossover.) For other rule spaces, some rules might be more like parameters, and could be represented as real numbers.

What's the much trickier question is the fitness function. How do you evaluate the fitness of a particular set of game rules? What we want is a fitness function for rulesets that somehow approximates how fun it would be for a human to play a game with that particular ruleset.

In our previous experiments on evolving racing tracks for car games, we used various measures of how a neural network-based controller drove on the track as the fitness function for tracks. In the case of evolving complete games, we can't really judge the behaviour of particular agent on the ruleset, as there is no agent that can play every game.

Our solution is to use learnability as a predictor of fun. A good game is one that is not winnable by a novice player, but which the player can learn to play better and better over time, and eventually win; it has a smooth learning curve. This can be seen as an interpretation of Raph Koster's "Theory of Fun for Fame Design", or of Juergen Schmidhuber's curiosity principle.

Somewhat more technically, our fitness function proceeds in two stages: first it tries to play the game using only random actions. If a random player can win a the game, the ruleset (=the game) is assigned a negative fitness. Otherwise, an evolutionary algorithm is used to try to learn a neural network that plays the game (using the score of the game as fitness function). The fitness of the game then becomes the best fitness found by the "inner" evolutionary algorithm after a certain number of generations.

The important and novel idea here is that a learning algorithm is used as a fitness measure inside another learning algorithm. We call this a dynamic fitness function, to discriminate it from the static fitness functions we used in our papers on track evolution and which did not assume that the agent was learning anything.

Using this setup we managed to evolve a couple of games - none of them very interesting in itself, but together a proof that the idea works. All the details and much more background is in the paper, which is available online.

There is a huge number of questions left to answer - are the generated games with high fitness really more fun for humans than those that have low fitness? Does the evolutionary algorithm learn to play games like a human? Does the approach scale up to more complex games, and different sorts of rule spaces? Well, that means there's lots of interesting research left to do...

Friday, December 26, 2008

CIG 2008 Conference Report

I'm still jetlagged from coming home from CIG 2008 in Perth four days ago (Perth-Singapore-London-Copenhagen is a 26 hours trip from first takeoff to last landing). But it was worth it. As usual, I find the CIG conferences enormously interesting and inspiring. Mostly because it's Computational Intelligence and Games is as much my "core" research area as it gets.

It's true that the acceptance rate it's a bit high, and there are quite a few papers which might not even be good science, in the sense of providing systematic experiments with statistically valid conclusions. In fact, someone who came from the broader CI or ML community and was not specifically interested in the application areas would probably complain about this. (However, it strikes me that most people can easily be made enthusiastic about games when you talk to them a bit... in striking contrast to some other common CI application areas such as scheduling or traffic flow optimization.)

We've beem discussing whether we should lower the acceptance rate, but I don't think we should. At least not yet. The main reason is that the research community is still quite small and we want it to grow. Another reason is that don't really believe in being too harsh. Reviewers can usually tell whether a paper is crap or not, but it's very hard to tell whether it's a seminal contribution. That's for posterity to judge, and to be approximated by the number of citations the paper has amassed after ten years or so. Conferences that only accept 25% or so of papers are, in my opinion, bound to make a rather arbitrary selection. Besides, we now have TCIAIG for publishing the cream of CIG research.

Enough about this, and back to the conference. There were keynotes representing both the ivory-tower variety of CIG research (Jonathan Schaeffer on solving Checkers), the industry perspective (Jason Hutchen, whose keynote I missed due to the conference banquet being the day before. Yes, free drinks) and the harmonious marriage of the two (Penny Sweetser on Emergence in Games).

One paper I particularly liked was Bobby Bryant and Matt Parker on learning to play Quake II using visual inputs only. I think their work is very relevant both for studying the evolutionary emergence of complex intelligence (seeing the FPS as a more advanced robot simulator) and for developing more lifelike NPC behaviour (e.g. aiming behaviour). The paper is not online yet, but here is a previous paper of theirs (NB their results are not very impressive yet, it's the idea I like.)

As for my own contributions, I gave a tutorial (with Georgios Yannakakis) on "Measuring and Optimizing Player Satisfaction". I also presented three papers, one on An Experiment in Automatic Game Design, one on Generating Diverse Opponents with Multiobjective Evolution and one detailing The WCCI 2008 Simulated Car Racing Competition. I hope to find time to write posts on this blog explaining the concepts behing the first two of these papers sometime soon, as I think they are really quite cool myself. I also presented the results of the CIG installment of the ongoing car racing competitions.

This post is already long enough, so I'll stop writing here. What can I say - if you are interested in games and AI/CI, you should have been there! And you should definitely come to the next CIG in Milan, Italy, September 2009. I'll be involved with the organization of that one, so I'll be writing more about it!

Thursday, October 16, 2008

Submit a paper to the CEC special session in CIG, or to TCIAIG

Another reminder: please consider submitting a paper to the CEC 2009 special session on Computational Intelligence and Games, which I am co-organizing together with Pier Luca Lanzi and Daniele Loiacono.

If you're looking to submit your paper to a journal rather than a conference, you might be interested in IEEE Transactions on Computational Intelligence and AI in Games, a new high quality journal that is starting next year (but already accepts submissions) and for which I am an associate editor. Quite a mouthful of a name, but it's bound to be the most important publication outlet for us researchers working in applying CI methods to games.

CIG 2008 Car racing competition

Just a reminder: please consider submitting to the TORCS-based car racing competition, that I am again organizing together with Daniele Loiacono and Pier Luca Lanzi. The goal is to use your favorite learning algorithm (evolution, td-learning, policy gradients, PSO etc.) and function representation (neural nets, expression trees, rulesets etc.) to develop the best controller for a racing car. Your car needs to win over all the over submitted controllers in a number of races.

Thursday, September 18, 2008

PPSN 2008, post 3

With some hindsight - that is, one day's worth of hindsight - I must say that PPSN 2008 was one of the best conferences I've ever attended. On my very personal ranking, it's up there in the top with CIG 2007 in Hawaii. The organisation of PPSN was top-notch, with nothing going wrong, the food good, the opportunities for socialising/networking plentiful and the schedule adhered to (this is Germany, after all).

It's striking how high the quality of the papers are in PPSN. In Gecco and CEC you know and then find papers you think should not have been accepted in a scientific conference, but never so in PPSN. There is also a difference in "scientificness". Many papers at the two other major conferences present yet another variation on a well-known algorithm, or yet another application, with little in the way of analysis and comparison to the state of the art. At PPSN, the norm seems to be to isolate a particular phenomenon, parameter or operator of known algorithms and benchmarks and study it further, making sure that even papers that are not groundbreaking (which is by necessity most papers) add to the body of human knowledge.

Of course, there are new algorithms and applications at PPSN as well. In particular, a number of variants of the CMA-ES were presented. CMA-ES seems to have become the standard algorithm to benchmark continuous optimization algorithms against, which makes sense, as it reaches good result very quickly on many problems.

Speaking of benchmarks, there seems to be a consensus that there is a lack of good reinforcement learning benchmark. A poster by Marc Schoenauer even went so far as to list "Stop balancing the double pole!" among it's conclusion. Of course, I tried to convince everybody who brought up the topic that they should use simplerace instead. A much better benchmark in many ways.

Now I'm going to prepare the talk I'll give tomorrow (with Marie Gustafsson) on "AI from Science to Fiction" at the Fantastic Film Festival in Lund, and the talk I will give on Monday on "Computational Intelligence and Game Design" at ITU Copenhagen.

Tuesday, September 16, 2008

PPSN 2008, post 2

We've now been to the conference dinner, in an old steel mill. Excellent food, and an excellent place to visit. I love lo-tech and especially huge, rusting metal structures.

Towards the end of the conference dinner, DJ JJ organized a singing competition between Spanish-speakers and a group of speakers of various other tongues(Swedish, English, Dutch, Hebrew etc.). Needless to say, we lost - we couldn't find any songs in English we all knew the lyrics to. After the dinner most of us went back to our respective hotels to recuperate. In my case, I'm recuperating from the excellent Dortmund nightlife. Clubbing until 4.30 on a packed dance floor, with the beers priced 50 cents each. On a Monday. Brilliant!

So you want to know about the scientific side of things? Well, PPSN is really a very good conference. It is hard to find a paper which is not good, though it is of course easy to find papers that don't interest me. I'm not interesting in everything. For example, a theoretical analysis of the behaviour of the 1+1 ES does not interest me very much, even if it is obvious that the research is sound and the paper good. Still, there are many papers here that I like.

Monday, September 15, 2008

PPSN 2008, post 1

I am at PPSN in Dortmund. I am listening to the talk on "Semidefinite Programming and Lift-and-Project Methods in Combinatorial Optimization" by Levent Tunçel. I do not understand anything at all of this talk. I am not the only one in the audience.

While I applaud the PPSN tradition of inviting people who are not evolutionary computation researchers to give keynotes, I don't think people outside our field understand how very little mathematics many (most?) people in this field know.

But the tutorials yesterday and the workshops on Saturday were really nice.

Saturday, July 26, 2008

What should we call non-evolutionary reinforcement learning?

I work in evolutionary reinforcement learning. That is, I develop reinforcement learning problems, and evolutionary algorithms that solve such problems.

The problem is that many people in reinforcement learning (RL) would say that I'm not working on RL at all. These people work on things like temporal difference learning, policy gradients, or (more likely) some newfangled algorithms I have never heard of, but which are most certainly not evolutionary. Most of the people working on non-evolutionary RL probably don't know much (maybe nothing!) about evolutionary RL either. So disconnected are our communities. It's a shame.

In their discipline-defining (4880 citations on Google Scholar) book "Reinforcement Learning", Sutton and Barto start with defining RL as the study of algorithms that solve RL problems, and mention in passing that they can be solved by evolutionary algorithms as well. The book then mentions nothing more about evolution, and goes on to essentially discuss TD-learning and variations thereof for a few hundred pages.

In practice, the evolutionary and non-evolutionary RL folks publish in different conferences and journals, and don't cite (nor read?) each other much. We write our papers in very different styles (the non-evolutionary RL people having much more maths in them, evolutionary RL researchers often relying on qualitative argument coupled with experimental results), and I for one often simply don't understand non-evolutionary RL papers.

Again, it's a shame. And it would be great if we could find some way of bridging this divide, as we work on the same class of problems.

But to do this, we need to find a way of addressing the issue, which was really the purpose of this blog post. Simply put, what do we call the two classes of algorithms and the research communities studying them? This is an issue I run into now and then, most recently when writing a grant proposal, and now again when preparing lecture slides for a course I'll be teaching this autumn.

The non-evolutionary RL people would not want a negative definition, based on what their algorithms aren't rather than what they are. They would rather go for RL, plain and simple, but this has the problem that non-evolutionary RL is excluded from that field, in spite of being part of the definition. In the proposal we wrote we ended up talking about "classical" versus evolutionary RL, but this has the problem that evolutionary algorithms predated td-learning by several decades. We could also use the term "single-agent RL", but then again, a simple hill-climber is arguably a (degenerate) evolutionary algorithm, and very much single-agent. Besides, there is multi-agent non-evolutionary RL. Sigh.

So I really don't know.

Wednesday, July 02, 2008

CIG 2008 special session on coevolution in games

Coevolution in Games: A Special Session at IEEE Symposium on Computational Intelligence and Games
Perth, Australia
15-18 December, 2008

Special Session Chairs: Julian Togelius, Alan Blair and Philip
Hingston
Contact: julian@togelius.com

Submission deadline: 15 August 2008

http://www.csse.uwa.edu.au/cig08/specialSessions.html

Description:

In coevolution, the fitness of a solution is determined not (only) by a fixed fitness function, but also by the other solution(s) being evaluated. Thus, coevolution has the potential to overcome several problems with static fitness functions, paving the way for more open-ended evolution. However, several phenomena common to coevolutionary algorithms are at present poorly understood, including cycling and loss of gradient. Further understanding of such phenomena would facilitate more widespread use of coevolutionary algorithms.

This special session seeks to bring together research that uses coevolutionary algorithms to learn to play games, uses games to investigate coevolution, or uses coevolution as a basis for game design. Due to their adversarial nature, often involving interaction of multiple agents, games are uniquely suited to be combined with
coevolution. We invite both theoretical and applied work in the intersection of coevolution and games, including but not limited to the following topics:

Competitive coevolution
Cooperative coevolution
Multiple populations in coevolution
Coevolution with diverse representations
Theory of coevolution
Preventing cycling and loss of gradient
Coevolution-based game design
Self-play and coevolutionary-like reinforcement learning
Relative versus absolute fitness metrics

About the organisers:

Julian Togelius is a researcher at the Dalle Molle Institute for Artificial Intelligence (IDSIA) in Lugano, Switzerland. His research interests include evolving game-playing agents, modelling player behaviour, and evolving interesting game content, mainly using evolutionary and coevolutionary techniques. He also co-organizes the well-attended Simulated Car Racing Competitions for the IEEE CIG and CEC conferences. See his home page for more information.

Philip Hingston is an associate professor of computer science at Edith Cowan University in Perth. His research interests are in the theory and application of artificial intelligence and computational intelligence. He has a particular interest in evolutionary computation as a tool for design, and in computer games. He is chair of the IEEE CIS Task Force on co-evolution. More information can be found on his home page.

Alan Blair is Chair of the IEEE CIS Task Force on Co-evolution and Games. His research interests include robot navigation, image and language processing as well as co-evolutionary learning for Backgammon, Tron, IPD, simulated hockey and language games. Homepage.

Friday, March 14, 2008

Busy days in Hong Kong in June

If you're going to WCCI2008 in Hong Kong in June, you might run into me. Chances are, however, that I'll just keep on running, as I've got a busier schedule than I've ever had in a conference. So just don't take it personally, OK?

To begin with I'm giving a tutorial together with Simon Lucas and Tom Runarsson on "Learning to play games". Preliminarily, Tom will talk about evolution versus other types of reinforcement learning, Simon will talk about different sorts of function approximators, including his n-tuple classifier, and I will discuss different ways in which such techniques can be used in games, with examples. Not only car racing, is the plan...

Then, I will participate in a panel discussion about Computational Intelligence and Games, together with some of the people that are defining this young field.

Staying on the topic of games, I will then present the results of the car racing competition, together with Daniele Loiacono and Pier Luca Lanzi. By the way, have you considered participating? We now have proper software packages available, so it's easy to get started with developing controllers using your favourite brand of learning algorithms.

Yes, I do have some papers as well. However, these are not (primarily) about games this time...

The first paper, a collaboration with my good friends Alberto Moraglio and Renzo De Nardi, is called Geometric PSO + GP = Particle Swarm Programming. The idea is to combine Alberto's geometric particle swarm optimization algorithm with genetic programming. The good thing with GPSO is that it can be used in any search space where a distance between two solutions can be defined, and a weighted recombination operator for these solutions can be devised; in contrast, standard PSO only works in continuous spaces. We have previously successfully used GPSO to solve Sudoku problems, to show this really works. In our new paper, we invent a few weighted recombination operators for expression trees, and test the PSP (GPSO + GP) algorithm on two standard GP benchmarks. The results are decent, but not as good as we would have wanted them; we think this is down to the recombination operators still needing some more work. Nevertheless, we think this is the first time PSO has been applied directly to GP.

The other paper, my first collaboration with Faustino Gomez and Juergen Schmidhuber (my new colleagues at IDSIA) is called Learning what to ignore: Memetic climbing in topology and weight space. It's about "memetic neuroevolution", an idea I've had for a while, and which someone else really should have thought about. The key algorithm here (the "memetic climber") is incredibly simple: evolve topology of weights of a neural network at different time scales. Topology mutations are almost always very destructive, so after each topology mutation, do local search (hill-climbing) in the space of neural network weights do find the potential of the new topology. If the fitness at the end of the local search is not at least as high as it was before the topology mutation, the topology mutation is discarded. This algorithm turns out to work really well, especially for problems where an agent has to handle high-dimensional input arrays. In such problems, which is really the ones i'm most interested in, most neuroevolution algorithms get stuck in local optima if you don't manually specify a good topology for the neural net. So I have big plans for this little algorithm in the future.

Friday, February 22, 2008

Mentioned at the GDC

A paper by me, Renzo and Simon was recently mentioned as "one of the top ten research finding in games studies" at the Game Developers Conference. This is really cool, as GDC is the most important conference for the games industry, attended by all the important developers and industry heads and tons of journalists.

The presentation can be found here:
http://www.avantgame.com/top10.htm

A short description and some discussion on Raph Koster's site:
http://www.raphkoster.com/2008/02/20/gdc2008-game-studies-download-30/

And the paper:
http://julian.togelius.com/Togelius2007Towards

Quite surprising to see that I'm doing "games studies", I thought I was doing artificial intelligence...

Thursday, February 21, 2008

Call for participation: IEEE WCCI 2008 Simulated Car Racing Competition

*** SIMULATED CAR RACING COMPETITION @ IEEE WCCI 2008 ***
*** DEVELOPING CONTROLLERS FOR A REALISTIC CAR RACING SIMULATOR ***

Homepage: http://cig.dei.polimi.it/?page_id=5

Deadline: May 25th, 2008
Organizers: Daniele Loiacono, Julian Togelius and Pier Luca Lanzi.
Email: carracing@gmail.com

The car racing competition, organized in association with the 2008 IEEE World Congress on Computational Intelligence WCCI 2008, is now open for submissions. Competition rules and further details can be found on the competition web site:

http://cig.dei.polimi.it/?page_id=5

while recent news can be found at

http://cig.dei.polimi.it/?cat=4

*** GOAL & SCOPE***

The goal of the competition is to develop racing car controllers that can compete on a set of tracks against the timer or other controllers.

There are no limitations on the type of controller, handcoded controllers are also welcome, although the focus of the competition, given the venue, will be on controllers using computational intelligence techniques (e.g. genetic programming, neuroevolution, td-learning, policy gradient search, fuzzy logic, etc.).

*** ORGANIZATION ***

The competition is organized as the successful car racing competitions held in association with CEC2007 and CIG2007, but it is based on "The Open Racing Car Simulator" (TORCS), a realistic car racing simulator, and structured as a client-server architecture.

A racing server will manage the race, while controllers will run as different processes. Communication between the racing server and the controllers is done through TCP/IP. The race is therefore in *real time* and the controller's reaction time is likely to be an important factor in the design.

We will score every submitted controller on the distance raced in a fixed amount of time when driving on its own on a set of tracks. At the end of the competition, the best few controllers will race against each other on a different set of tracks, validating that the controllers perform well in the presence of other cars and that their performance generalizes to other tracks than those they were trained for. The winner of the final competitive races will get to present their controller at WCCI2008 <http://www.wcci2008.org>, and will have their registration fee reimbursed.

*** SOFTWARE ***

The competition software can be downloaded from:

http://cig.dei.polimi.it/?page_id=5

where detailed installation instructions, a description of the sensors model, and examples of controllers are available; more example trainers and interfaces in different programming languages will be added soon.

Togelius