Google DeepMind just announced Genie 3, their new promptable world model, which is another term for neural game engine. This is a big neural network that takes as input a description of a world or situation, and produces a playable environment where you can move around and interact with the world. There has been work on world models for quite some time, with standout papers such as Ha and Schmidhuber's World Models paper from 2018, and the GameNGen paper from last year, but Genie 3 is by far the most advanced such model so far.
My friends at Google DeepMind generously invited me for an early research preview of Genie 3, so I've had a chance to play with it myself and see what it can do. First of all, it's a very impressive model, and a big step forward. It generates beautiful environments, and you get great lighting and photorealistic detail for free, so to speak. You can interact with the generated environments by moving, camera panning, and "jumping" (which may translate to somewhat different actions depending on what, exactly, you generated). The environments render in smooth real-time, and while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself.
(All videos below were generated by me during the research preview.)
Generally, scenarios that are more in-distribution give you "better" results. If you ask it for a racing game or platform game with a particular theme, you will get that. Not a great game, and there may be strange artifacts and weird levels, but it works. You can drive your car or walk around as a mutant squirrel.
There are of course limitations, some of which will be overcome with a little more work, others that may be more fundamental. You have a limited range of control inputs. There are often strange graphical artifacts, and the more out-of-distribution your scenario is the more common they become. Game feel is often lacking. The version I tested was limited to a minute playtime per scenario, and I was told the scenarios are typically playable for a few minutes or so before they decohere. Most importantly, the type and level of control you get from prompting the model is quite limited; every time you press enter is to some extent a jump into the unknown, and changing the prompt a little often does not change what you thought it would change.
So how will Genie 3 and its successors affect video games and game development? Here are some thoughts:
I think the use case for Genie 3 that is viable already now is ideation. Sure, the model worked best for things that were more or less in distribution (e.g. "race a Ferrari through Greenwich Village") but those were also the least interesting results, and they were not games that anyone would want to play if they could instead play a good game. On the other hand, out-there prompts such as "Tetris #reallife #photorealistic" gave really interesting and evocative results, fully realized interactive fever dreams that could be probed to reveal new possibilities. The model becomes a thinking tool that can help professional or amateur designers come up with new scenarios, mechanics, and assets that could then be recreated in a game engine.
Some future version of Genie could also be a prototyping tool. Designers could describe what they are thinking of in detail, and in no time have a janky version of the described game scenario playable. Then they could iterate by making small changes to the prompt and testing again, before implementing what they want in a game engine.
There is also a use case for some version of Genie as a fast forward model, allowing planning and reinforcement learning. Current game engines are notoriously bad at fast simulation. But if you fine-tuned a model on your specific game, and then distilled it down to a lo-res, really fast model, that would be really useful for planning.
You could also imagine a social media use case for small user-designed playable experiences that are less than full games. A new type of interactive thing to post. A new way of getting engagement for your posts. Would be fun. (I have at points toyed with starting a company along those lines, but with more traditional technology.)
What I don't think this technology will do is replace game engines. I just don't see how you could get the very precise and predictable editing you have in a regular game engine from anything like the current model. The real advantage of game engines is how they allow teams of game developers to work together, making small and localized changes to a game project. And then we're not even talking about long-term coherence of the model etc. However, one could imagine some kind of back-and-forth workflow, where you create a promptable model, and then translate the neural model into a game engine, make some changes, translate it back into a network etc. That could be really useful, and seems hard but potentially doable; someone should start a company around it.
2 comments:
Hi Julian, thank you for your review; as always, I really enjoyed your post.
I am very interested in the "decoherance" part. Did you experience a scenario collapsing? If so, what did it feel like (or if you had conversations around it what did people say)?
Thanks! Yes, there were various forms of decoherence. Even in games that played well and were "in distribution", there were things such as cars that looked good at the beginning of the play session and started looking more like angular blobs towards the end. If I generated something the model couldn't really handle, such as complex mazes, weird things happened - for example, the screen would gradually fade to black.
Post a Comment