Google’s ‘Genie 3’ Interactive Generative Video Model Takes Us One Step Closer to the Holodeck

13

DeepMind, Google’s AI research lab, announced the release of Genie 3, a new AI system capable of generating interactive virtual environments in real-time—and bringing us one step closer to the Holodeck.

Google says in a DeepMind update that with a simple text prompt, Genie 3 can create dynamic, navigable scenes that run at 24 frames-per-second in 720p resolution.

Granted, Genie 3 can be only be used on flatscreen monitors, so there’s no telling when we’ll get something similar for VR headsets. For example, Quest 3’s display has a per-eye resolution of 2,064 × 2,208, clocked at a base refresh rate of 90Hz, putting VR on the far end of the performance fringe (as usual).

It’s undoubtedly prescient look at things to come though. Unlike static or pre-rendered simulations, Google says the model generates each frame on the fly, allowing for quicker user interaction and environmental feedback.

What’s more, these generated worlds can remain visually and physically consistent for several minutes, Google says, with the system retaining a form of short-term memory to reflect past actions.

Genie 3 is also capable of simulating a wide range of scenarios, including natural environments, historical settings, and both fictional and animated worlds. Meanwhile, users can trigger “promptable world events,” where users can insert in-world changes via text commands, like altering the weather or introducing new objects.

SEE ALSO
'Hello Kitty' is Getting Its Own Social VR Game This Year in 'Skyland'

Beyond the fun of recreating 1800’s Osaka, or making a jet ski appear in the canals of Amsterdam, Google says Genie 3 will also be a tool for embodied AI training, with potential applications in fields like robotics, gaming, and artificial general intelligence research.

For now, there are a few limitations. Google says Genie 3 currently has a limited “action space” for agents, and struggles with accurately modeling multi-agent interactions in shared environments. By “agents,” the company’s referring to AI systems that operate autonomously within the virtual environments, in a way making decisions, taking actions, and learning from experience.

It also faces challenges with simulating real-world locations with “perfect geographic accuracy”, rendering text clearly, and maintaining long-duration interactions beyond a few minutes.

Still, it’s a pretty amazing leap from the sort of non-interactive videos we’re seeing online now, many of which are pretty difficult to tell from the real deal. Will Smith spaghetti-eating simulations are only going to get more lifelike and, with systems like Genie 3, interactive too.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.

Well before the first modern XR products hit the market, Scott recognized the potential of the technology and set out to understand and document its growth. He has been professionally reporting on the space for nearly a decade as Editor at Road to VR, authoring more than 4,000 articles on the topic. Scott brings that seasoned insight to his reporting from major industry events across the globe.
  • Stephen Bard

    When will the new YouTube "Interactive" category be introduced, where every time you engage with a video, you can play the scene differently for several minutes?

  • Mike

    "By “agents,” the company’s referring to AI systems that operate autonomously within the virtual environments, in a way making decisions, taking actions, and learning from experience."

    Agents? Sounds like they're creating The Matrix…

  • Andrew Jakobs

    This is really getting unbelievable, and we're still in the baby years of AI..

  • MadHenGSH

    The illusion is getting deeper and deeper /sigh

  • XRC

    And to think Deepmind all started back in Camden town, London at Demis Hassabis's Elixir game studio by the canal. I visited to offer help with game testing once the build was ready.

    Whilst their game "Republic:The revolution" (a political simulator) and follow up "Evil Genius" weren't a financial success it led to the founding of Deepmind, which certainly caught Google's attention, and the rest is history…

  • Impressive

  • Herbert Werters

    I believe that cooling GPUs will be the new CO2 that further heats up the globe.

  • david vincent

    I'm into generative AIs but I don't see what's the point of this one

  • david vincent

    So Genie 3 already requires enormous hardware resources just to generate 720p videos at 24 fps. I calculated (well Gemini did) that it would take 40× more resources to run it on a Quest 3 @72 Hz…
    When asked Gemini "40x speed gain WHEN?", it did ultra-complex logarithmic calculations to answer me with y > 3.34 years, while reminding me that there’s also the issue of latency from cloud-based generation…

  • xyzs

    That's quite mind blowing.
    Between Unreal 5 pushing the envelope for "classic" realtime graphics and this that can create worlds on the fly, the future is going to be insane.

  • Feitan

    we are making it free.

  • Max-Dmg

    It'll be overly woke and censored just as current publicly accessible AI is. Most generations and prompts will be deemed unsafe.

  • Max-Dmg

    Every 5 minutes you will be apprcoached by a DEI trans-salesman who is selling ads.