As far as live-action VR video is concerned, volumetric video is the gold standard for immersion. And for static scene capture, the same holds true for photogrammetry. But both methods have limitations that detract from realism, especially when it comes to ‘view-dependent’ effects like specular highlights and lensing through translucent objects. Research from Thailand’s Vidyasirimedhi Institute of Science and Technology shows a stunning view synthesis algorithm that significantly boosts realism by handling such lighting effects accurately.
Researchers from the Vidyasirimedhi Institute of Science and Technology in Rayong Thailand published work earlier this year on a real-time view synthesis algorithm called NeX. It’s goal is to use just a handful of input images from a scene to synthesize new frames that realistically portray the scene from arbitrary points between the real images.
Researchers Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, and Supasorn Suwajanakorn write that the work builds on top of a technique called multiplane image (MPI). Compared to prior methods, they say their approach better models view-dependent effectis (like specular highlights) and creates sharper synthesized imagery.
On top of those improvements, the team has highly optimized the system, allowing it to run easily at 60Hz—a claimed 1000x improvement over the previous state of the art. And I have to say, the results are stunning.
Though not yet highly optimized for the use-case, the researchers have already tested the system using a VR headset with stereo-depth and full 6DOF movement.
The researchers conclude:
Our representation is effective in capturing and reproducing complex view-dependent effects and efficient to compute on standard graphics hardware, thus allowing real-time rendering. Extensive studies on public datasets and our more challenging dataset demonstrate state-of-art quality of our approach. We believe neural basis expansion can be applied to the general problem of light-field factorization and enable efficient rendering for other scene representations not limited to MPI. Our insight that some reflectance parameters and high-frequency texture can be optimized explicitly can also help recovering fine detail, a challenge faced by existing implicit neural representations.
You can find the full paper at the NeX project website, which includes demos you can try for yourself right in the browser. There’s also WebVR-based demos that work with PC VR headsets if you’re using Firefox, but unfortunately don’t work with Quest’s browser.
Notice the reflections in the wood and the complex highlights in the pitcher’s handle! View-dependent details like these are very difficult for existing volumetric and photogrammetric capture methods.
Volumetric video capture that I’ve seen in VR usually gets very confused about these sort of view-dependent effects, often having trouble determining the appropriate stereo depth for specular highlights.
Photogrammetry, or ‘scene scanning’ approaches, typically ‘bake’ the scene’s lighting into textures, which often makes translucent objects look like cardboard (since the lighting highlights don’t move correctly as you view the object at different angles).
The NeX view synthesis research could significantly improve the realism of volumetric capture and playback in VR going forward.