Hands-on: Apple Upgrades Personas for True Face-to-face Chats on Vision Pro

Apr 2, 2024

Apple today released ‘Spatial Personas’ in public beta on Vision Pro. The newly upgraded avatar system can now bring people right into your room. We got an early look.

Much has been said about Apple’s Persona avatar system for Vision Pro. Whether you find them uncanney or passable, one thing is certain: it’s the most photorealistic real-time avatar system built into any headset available today. And now Personas is getting upgraded with ‘Spatial Personas’.

But weren’t Personas already ‘spatial’? Let me explain.

Sorta Spatial

At launch the Persona system allowed users to scan their faces into the headset to create a digital identity that looks and moves like the user thanks to the bevy of sensors in Vision Pro. When doing a FaceTime call with another Vision Pro user (or users), their Persona(s) head, shoulders, and hands would be shown inside a floating box.

While this could feel like face-to-face talking at times, the fact that they were contained within a frame (which you can move or resize like any other window) made it feel like they weren’t actually standing right next to you. And that’s not just because of the frame, but also because you weren’t actually in a sharing the same space as them—it’s not like they could walk right up to you for a high-five, because they’d be stuck in the window on your screen.

Face-to-face

Now with Spatial Personas (released in beta today on the latest version of VisionOS), each person’s avatar is rendered in a shared space without the frame. When I say ‘shared space’, I mean that if someone takes takes a step toward me in their room, I actually see them come one step closer to me.

Previously the frame made it feel sort of like you were doing a 3D video chat. Now with the shared space and no frame, it really feels like you’re standing right next to each other. It’s the ‘hang out on the same couch’ or ‘gather around the same table’ experience that wasn’t actually possible on Vision Pro at launch.

And it’s really quite compelling. I got a sneak peek at the new system in a Vision Pro FaceTime call with four people (though up to five are supported total), all using Spatial Personas. You’ll still only see their head, shoulders, and hands but now it really feels like a huddle instead of a 3D video chat. It feels much more personal.

Spatial Personas Are Opt-in

To be clear, the ‘video chat’ version of Personas (with the frame) still exists. In fact, it’s the default way that avatars are shown when a FaceTime call is started. Switching to a Spatial Persona requires hitting a button on the FaceTime menu.

And while this might seem like a strange choice, I actually think there’s something to it.

On the one hand, the default ‘FaceTime in Vision Pro’ experience feels like a video chat. In everyday business we’re all pretty used to seeing someone else on the other side of a webcam by now. And even though this is more personal than an audio-only call, it’s still a step away from actually meeting with someone in person.

Spatial Personas is more like you’re actually meeting up in person, since you can actually feel the interpersonal space between you and the other people in this shared space. If they walk up and get a little too close, you’ll truly feel it in the same way if someone stands too close to you in real life.

So it’s nice to have both of these options. I can ‘video chat’ with someone with the regular mode, or I can essentially invite them into my space if the situation calls for a more personal meeting.

The Little Details

Apple also thought through some smaller details for Spatial Personas, perhaps the most interesting of which is ‘locomotion’.

Room-scale locomotion is essentially the default. If you want to move closer to a person or app… you just physically walk over to it. But what happens if it’s outside the bounds of your physical space? Well, instead of directly moving yourself virtually, you can actually move the whole shared space closer or further from you.

You can do this any time, in any app, and everyone else will see your new position reflected within their space, keeping everything synchronized.

Apple also made is so when two Spatial Personas get too close together, they will temporarily revert to just looking like a floating contact photo. I think this is probably because they want to avoid possible harassment or trolling (ie: you want to annoy someone so you phase your virtual hand right through their virtual face, which is uncomfortable both visually and from an interpersonal space standpoint).

The headset’s excellent spatial audio is of course included by default, so everyone sounds like they’re coming from wherever they’re standing in the room, and their voices actually sound like they’re in your room (based on the headset’s estimate of what the acoustics should sound like). And if you move to a fully immersive space like an ‘environment’, the spatial audio transitions to that new acoustic environment—so for instance you can hear people faintly echoing in the Joshua Tree environment because of all the rock surfaces nearby. Hearing the acoustics fade from being inside your own room to being ‘outside’ in an environment is a subtle bit of magic.

And last but not least, it’s possible to have a mixed group of FaceTime participants. For instance you could have people using an iPhone, an Android tablet (yes you can FaceTime with people on non-Apple devices), a normal Persona, and a Spatial Persona all at once. SharePlay in that case will also work between those formats (except non-Apple devices) as long as long as the app supports it. In cases with apps that are Vision Pro native, the iPhone user would get a notification that their device isn’t supported.

– – — – –

Spatial Personas is a big upgrade to Apple’s avatar system, but the company maintains the whole Persona system is still in ‘beta’. Presumably that means there’s more improvements yet to come.

Adrian Meredith

You have to wonder what on earth meta is playing at. They’ve been showing this off for years now and there no sign of them shipping it. Instead we’re stuck with those horrible avatars
- Ondrej
  
  No, Meta has never showed anything like this.
  
  Meta is doing hundreds of scans in a multimillion dollar studio and turning that data by experts into a few sophisticated examples. We don’t even know how much time and effort it takes for a single codec avatar.
  Their results look much better, but at what cost?
  
  Apple shipped it in a consumer device. All integrated.
  
  This is a world of difference.
  - CaryMGVR
    
    Sad.
    But true.
  - Dragon Marble
    
    Makes no difference for me. I have no one to share this experience with. When something requires >$7000 (at least two Vision Pros), it’s a “consumer device” in name only.
    - CaryMGVR
      
      Yeah, but rich consumers are gonna use it! lol
- Rogue Transfer
  
  Latency. That’s the main issue they remarked on in the recent research video online about Meta Codec Avatars. They said that earlier versions, like the Codec Avatars 1.0(the one they say worked on standalone) and 2.0 both suffer from a disconnect, due to the delay from all the processing needed to render even their simpler versions of Codec Avatars.
  
  They say that finally with the use of four RTX 4090s, locally in a PC workstation to render them quick enough, they have the feeling of the other person’s Codec Avatar being real-time & present. There are some other issues they have to do with initial avatar scanning(they require someone to do 65 expressions one-after-another), that are deal-breakers to launching it too.
  - CaryMGVR
    
    So who’s runnin’ the show over there that this *still* hasn’t been fixed …??
    Boz …?
    If that’s the case, ohhhhh boy ….
Charles U. Farley

Did they make them less creepy? Or did they just bring the creepy into 3D?
- foamreality
  
  Is it actually 3d though? Or just 2d without a frame. The article omits to explain this. I suspect its the latter. Lame.
CaryMGVR

Y’know what kills me about all of this …??
THIS IS ALL VERY DOABLE ON QUEST 3 RIGHT NOW ….

What do you think multiplayer VR games are:
you sharing space with other avatars.
What’s described in this article is precisely that, only in *AR* ….
Heck, we don’t even have that “Other Avatars In Your Home Enviornment” thing yet that Zuck promised us THREE YEARS AGO ….

This has nothing to do with “technical ability”.
All the tools are now, and *have* been, in place for a long while.
But tragically, this is Meta’s MO all over:
fantastic tech, but it just sits there, unused, rotting away.
And that makes me friggin’ INSANE ….
[]^ (
- Charles U. Farley
  
  You’re already insane.
  - CaryMGVR
    
    I resemble that remark ….
    It just so happens that I am N-O-T crazy!
    Anymore.
- ApocalypseShadow
  
  This is actually true. Facebook could do this currently. But they’ve been dangling the carrot in front of consumers telling them it’s in the future. You can only sell dreams so long until you have to produce something.
  
  Apple is like, “here it is.” And it just works. Even in beta. Even connecting with iPhone, Android phone or tablet users. Just like, “Here’s clear pass through” day one. “Here’s good hand tracking” day one. “Here’s a lot of useful apps you already use on your phone that you can use on your headset” day one.
  
  This is what Zuckerberg has feared. Actual competition from a company that already has a huge platform of content, a huge amount of followers. And, hardware and technical know-how. This isn’t Pico or PC or console VR that lacks many things that Apple has and will have and improve on it.
  
  It’s expensive now. But version 2 won’t be. It’ll do almost everything this first model does. It’s why Zuckerberg has downplayed Vision Pro. He can say that this is better or that is better on Quest. But he’s got to produce. Facebook is using gamers as a stepping stone to get to where cellphones users are. Apple is already there with cellphone users, can sell a more expensive device to their base and the masses. And is already offering useful things beyond the cellphone in their headset. Both want to be that next paradigm shift. But Apple is going directly at the masses. Not through gamers to get there.
  
  Facebook better get cracking and get those realistic avatars out. Or they’ll be left behind.
  - CaryMGVR
    
    Hear, hear!!
    []^ )
  - foamreality
    
    The one single reason apple is so successful (and its beyond belief that not a single tech company on earth has emulated it) can be summed in one word: Polish.
    
    They release software and hardware that is polished. Its not hard. But meta, and all the PCVR competitors are too dumb to polish any of their software even several years after release. Apple knows that people want things to work. And to work well. They don’t want half arsed gimicks and tech demo’s that look cool for 5 seconds until you realise they don’t really work properly and have no useful purpose Which is what every other tech company focuses on. People will pay 3500 dollars just for a bit of polish. To know things are going to work more or less as expected.
Ondrej

Hopefully, Apple not allowing competitors to use eye tracking in their social apps will be challenged sooner than later.

Of course they use the shameless “privacy” excuse. They are acting like a dictatorship justifying censorship with “only we know what’s good for our citizens”.
- gothicvillas
  
  I expected nothing less. Apple is as far left as it gets.
wheeler

There’s a funny pattern here. Over the years, Meta has tried to promote an XR vision/ideal encompassing certain XR / “spatial computing” features which we’ve known that Apple has been working on (hinted at through rumors/leaks over the years–and now confirmed with Apple actually releasing and rapidly iterating on those features). And yet even with Meta’s 10 year head start, Meta has failed to execute on these features and Quest is still a kids gaming console with an otherwise bad XR interface.

They clearly received some intel on what Apple was doing and threw together some snazzy promotions and tech demos to try and claim that vision as their own, but ultimately failed to implement them in time (or what they have implemented just stinks). They have neither the hardware or software to pull it all together.

Meanwhile Apple is actually making it happen. VR enthusiasts are criticizing Apple for not embracing immersive VR gaming, but Apple is probably looking at that market (with its extremely lopsided level of investment to return, and low retention despite high market penetration and accessibility) and thinking “why would we care about that?”
Dragon Marble

For me, this is the same as Meta’s codec avatar demo: cool but irrelevant. I am already crazy enough to purchase a Vision Pro for myself. Who am I going to watch a movie together with?

You see, just because one of them is a “consumer product” and the other a “technical demo” makes no practical difference for 99.9% of people.
- Arno van Wingerde
  
  I think you should this in long term perspective, when the AVS arrives at 1000-1500 or so and some of your friends colleagues will also have one. This would be possible with the Quest3 today, at a lower quality, if Meta would actually take time to develop the software.
  - CaryMGVR
    
    That’s all it is, man: SOFTWARE.
    That’s what makes it so bloody frustratiing!!
  - Blaexe
    
    Without eye and face tracking these realistic avatars wouldn’t work, at least today and maybe ever. So no, it wouldn’t be possible on Quest 3 today.
    - foamreality
      
      Could work on quest pro but not possible on quest 3. Pro is a far superior product except for the lower chipset. Such a shame the q3 didn’t have eye and face tracking. Also non OLED headsets are truly terrible, I don’t understand why anyone would buy one. Breaks immersion more than any other spec.
TonyVT SkarredGhost

Wow, we were expecting it for WWDC instead they already launched this feature. It’s very cool
ViRGiN

It’s cool, it’s really cool, but how is valve ever supposed to catch up?
Competition is great, forces everyone to make better products, except for valve, valve idea of connecting people is lowpoly shapes of robots with disconnected hands, like a ray-man.

Hands-on: Apple Upgrades Personas for True Face-to-face Chats on Vision Pro

Sorta Spatial

Face-to-face

Spatial Personas Are Opt-in

The Little Details

Latest Headlines

VR Comfort Settings Checklist & Glossary for Developers and Players Alike

‘Into the Radius 2’ Releases in Early Access on PC VR Today, Including Two-Player Co-op

‘Arken Age’ Release Date is “coming soon,” Promising 10–15 Hour Campaign on PSVR 2 & PC VR

Features & Reviews

‘Bounce Arcade’ is Like VR Pinball for Your Fists—And Exactly the Kind of Creativity VR Needs to Thrive

‘Half-Life: Alyx’ on PSVR 2 Would be a Win-win-win for Valve, Sony, & Players

Hands-on: Sony’s New MR Headset Impresses with Clarity & Ergonomics, But Still Needs Tuning