Facebook Reality Labs (FRL) chief scientist Michael Abrash believes AR and VR will be the primary way people work, play, and connect in the future. Abrash regularly speaks about when he expects these fundamental milestones in technology to occur, but in a new year-long blog series he wants to drill down to exactly how it’ll happen.
Abrash today revealed in a blog post more about the company’s research surrounding lifelike avatars, something undertaken by the Pittsburgh branch of the company’s skunkworks, Facebook Reality Labs.
Dubbed ‘Codec Avatars’, the Pittsburgh office is using what they call “groundbreaking 3D capture technology and AI systems” to generate lifelike virtual avatars that could provide the basis of a quick and easy personal avatar creator of the future.
The idea, FRL Pittsburgh’s director of research Yaser Sheikh says, is to close physical distances and make creating social connections in virtual reality as “natural and common as those in the real world.”
“It’s not just about cutting-edge graphics or advanced motion tracking,” Sheikh says. “It’s about making it as natural and effortless to interact with people in virtual reality as it is with someone right in front of you. The challenge lies in creating authentic interactions in artificial environments.”
It boils down to achieving what the team calls ‘social presence’, and vaulting the uncanny valley to deliver acceptably realistic avatars is something they’ve been working on for years; the team calls the process “passing the ego test and the mother test.”
“You have to love your avatar and your mother has to love your avatar before the two of you feel comfortable interacting like you would in real life. That’s a really high bar,” Sheikh maintains.
A demonstration showing two VR users talking with lifelike avatars gives an interesting look at what the future of VR avatars could be.
The company says at this point these sorts of real-time, photorealistic avatars require quite the gear to achieve. The lab’s two capture studios—one for the face, and one for the body—are admittedly both “large and impractical” at this point.
The ultimate goal however is to achieve all of this through lightweight headsets, although FRL Pittsburgh currently uses its own prototype Head Mounted Capture systems (HMCs) equipped with cameras, accelerometers, gyroscopes, magnetometers, infrared lighting, and microphones to capture the full range of human expression.
“Codec Avatars need to capture your three-dimensional profile, including all the subtleties of how you move and the unique qualities that make you instantly recognizable to friends and family,” the company says. “And, for billions of people to use Codec Avatars every day, making them has to be easy and without fuss.”
Using a small group of participants, the lab captures 1GB of data per second in effort to create a database of physical traits. In the future, the hope is consumers will be able to create their own avatars without a capture studio and without much data either.
At the moment volumetric captures last around 15 minutes, and require a large number of cameras to create the most photorealistic avatars possible. The lab then plans to use these captures to train AI systems so consumers could then quickly and easily build a Codec Avatar from just a few snaps or videos.
Humans come in plenty of different shapes and sizes though, which will be its own challenge to surmount, FRL research scientist Shoou-I Yu says.
“This has taught me to appreciate how unique everyone is. We’ve captured people with exaggerated hairstyles and someone wearing an electroencephalography cap. We’ve scanned people with earrings, lobe rings, nose rings, and so much more,” says Yu. “We have to capture all of these subtle cues to get it all to work properly. It’s both challenging and empowering because we’re working to let you be you,” Yu continues.
There are still plenty of challenges to address on the way there, Sheikh maintains. One big problem looming on the horizon is ‘deepfakes’, or the act of recreating a person’s appearance or voice to deceive others.
“Deepfakes are an existential threat to our telepresence project because trust is so intrinsically related to communication,” says Sheikh. “If you hear your mother’s voice on a call, you don’t have an iota of doubt that what she said is what you heard. You have this trust despite the fact that her voice is sensed by a noisy microphone, compressed, transmitted over many miles, reconstructed on the far side, and played by an imperfect speaker.”
Sheikh maintains we’re still years away from seeing this level of avatar photorealism, although the lab is currently exploring the idea of securing future avatars through an authentic account, as well as several security and identity verification options for future devices.
Abrash says we’ll be getting more blog posts surrounding optics and displays, computer vision, audio, graphics, haptic interaction, brain/computer interface, and eye/hand/face/body tracking.