Adobe Shows Method to Create Volumetric VR Video From Flat 360 Captures

25

With a new technique, Adobe wants to turn normal monoscopic 360-degree video into a more immersive 3D experience, complete with a level of depth information that promises to give the user the ability to shift their physical vantage point in a way currently only possible with special volumetric cameras—and all with your off-the-shelf 360 camera rig.

Announced by Adobe’s head of research Gavin Miller at National Association of Broadcasters Show (NAB) in Las Vegas this week and originally reported as a Variety exclusive, the new software technique aims to bring six degree of freedom (6-DoF) to your bog standard monoscopic 360-degree video, meaning you’ll not only be able to look left, right, up and down, but also be able to move your head naturally from side-to-side and forward and backward; just as if you were actually there.

According to a group of Adobe researchers, giving the video positional tracking is done via a novel warping algorithm that can synthesize new views within the monoscopic video—and all on the fly while critically maintaining 120 fps. As a consequence, the technique can also be used to stabilize the video, making it a useful tool for those looking to smooth out those jerky hand-held captures that are generally unattractive to VR users. All of this comes with one major drawback though: in order to recover 3D geometry from the monoscopic 360 vid, the camera has to be moving.

SEE ALSO
Adobe's Project Aero Aims to Help Creators Build AR Content

Here’s a quick run-down of what’s in the special sauce.

Adobe’s researchers report in their paper that they first employ what’s called a structure-from-motion (SfM) algorithm to compute the camera motion and create a basic 3D reconstruction. After inferring 3D geometry from captured points, they map every frame of the video onto six planes of a cube map, and then run a standard computer-vision tracker algorithm on each of the 6 image planes. Some of the inevitable artifacts of mapping each frame to a cube map is handled by using a field-of-view (FOV) greater than 45 degrees to generate overlapping regions. The video heading this article shows the algorithms in action.

Photo courtesy Adobe, Jingwei Huang, Zhili Chen, Duygu Ceylan, and Hailin Jin

The technique isn’t a 3D panacea to your monoscopic woes just yet. Besides only working with a moving camera, the quality of the 3D reconstruction depends on how far the synthetically created view is from the original one. Going too far, too fast can risk a wonky end result. Problems also arise when natural phenomenon like large textureless regions, occlusions, and illumination changes come into play, which can create severe noise in the reconstructed point cloud and ‘holes’ in the 3D effect. In the fixed view-point demonstration, you’ll also see some warping artifacts of non-static objects as the algorithm tries to blend synthetic frames with original frames.

SEE ALSO
Researchers Exploit Natural Quirk of Human Vision for Hidden Redirected Walking in VR

Other techniques to achieve 6-DoF VR video usually require light-field cameras like HypeVR’s crazy 6k/60 FPS, LiDAR rig or Lytro’s giant Immerge camera. While these undoubtedly will produce a higher quality 3D effect, they’re also custom-built and ungodly expensive. Even though it might be a while until we see the technique come to an Adobe product, the thought of being able to produce what you might call ‘true’ 3D VR video from consumer-grade 360 camera, is exciting to say the least.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.


  • DaKangaroo

    “structure-from-motion (SfM) algorithm”

    This is the same algorithm that is used for photogrammetry, it’s not a new technique. Any time you have a couple of images of the same things from multiple angles you can in theory work out the position of the camera from the relative movement of those things in the images. However the problem with photogrammetry has always been A) reliable feature point detection B) accurate feature point tracking.

    Photogrammetry can be brought undone by a whole range of visual challenges that are complicated to solve. For example a mirror (or highly reflective object) totally screws up the equations because things move in ways which don’t make sense. Transparent objects are also a problem. But worse of all is just any kind of scene where there are very large featureless surfaces, like a clean white wall.

    This is fine for a cheap solution, but I think (high resolution, high speed, cheap) depth cameras would be a better way forward ultimately.

    • I’ve seen alot of 360 array Go-Pro’s, but never a 360 array of Kinects. I don’t suppose some enterprising person here would like to try it?

      • On it – well with 5 Kinect 2s at the moment (stay tuned!). However, it’s an expensive system to set up as each Kinect needs its own PC (if you’re on Windows) and the USB requirements are high and somewhat unpredictable.There is also an issue with the Kinects potentially interfering with each other, this will increase with directional overlap and the number of Kinects.

    • muchrockness

      Both photogrammetry and SfM employ the same concept of taking images from multiple perspectives, but the technique in capturing that depth data is so much simpler with SfM that an untrained monkey could do it.

      • Lucidfeuer

        SfM is way less accurate than pure photogrammetry (depending on the source as well of course).

        • Sponge Bob

          none of these are practical for consumer products at the moment

          • Lucidfeuer

            Well they are practical in the sense that they work with any off-the-shelve camera and are easy to implement software wise, but are still old inaccurate technics for 3D reconstructions.

            I think the most important component here is not SfM or VSlam but depth-map capture. If only both could be combined to provide some sort of volumetric depth maps.

          • Dylan

            it’s being done on tango devices now. OrbSlam and dirrivitives are coming, as is monofusion.

          • Lucidfeuer

            I’m sorry but to me these are all shit obsolete 3D reconstruction technics. They’ve been great for the past 8 years, but anything that is Slam is an archaïc technic. Not that I know how any better except high-resolution photogrammetry, which is not possible to do in real-time anyway.

    • Sponge Bob

      high resolution + high speed != cheap

      apllies to all cameras

  • Nice, this might make those 360 movies more like *ACTUAL* VR experiences. Didn’t Google do something like this with Street View? I seem to recall Street View trying to patch in some 3D’ness when it moved between fixed points, often with little success but I give them credit for trying. That’s been there for a few years now. This seems more advanced, or at least, it’s working with better quality capture.

    • Simon Wood

      The StreetView cars are equipped with LiDAR which measures the distances to objects. They do not need to compute positions from the images, ‘just’ blend in the data…

      • Sponge Bob

        at 50K price point you can do many amazing technical feets
        with lasers, lidars, ToF cameras etc
        try to do it with one (smartphone) camera (and no tracking)

      • Graham H Reeves

        The streetview depth data is INCREDIBLY low resolution. Any scene typically has about 6 approximate planes, (each pixel being given a plane it’s on). And even then, it’s wrong about 10% of the time

  • DC

    I think its the automation of the process that’s exciting in many ways. Sure you could do all the steps yoursefl like they did in their research, but try integrating that smoothlly into post workflow. Yaaaah, no thanks. It’s kinda like when Pioneer and Ableton brought auto beat matching to dj sets… all the established djs’s were annoyed, but those with more open minds knew that it would open up a new realm of creativity for many (and yes, laziness for others). I think its fascinating to see how volumetric work flows are coming together into their first steps and look forward to losing many week of sleep while learning a few!

  • Lucidfeuer

    If automatised à la Adobe, then it’s awesome. But I never understood how people plan on doing this for videos with moving components if your camera has to move to get spatial details too? Does the algo ambiguates the moves of the camera and the moving objects in the scene to make-up for the discrepancy?

    • Sponge Bob

      obviously for anything moving you would need at least 2 cameras
      in 360 case its really problematic though

    • Dylan

      it works fine if the object is recognizable for the software. It’s the movement of the camera rig that allows what is essentially SLAM tech and photogrammetry. You only need a few frames of movement from the recording source and two similar but different angles to calculate the volume using trigonometry. Photogrammetry works the same way, this is just doing it 120 times a second at greater then a meter away. :D

      • Lucidfeuer

        Yes, I guess that with high-frequency photogrammetry this can work, but the problem still is that photogrammetry even backed by real IR/Lidar backed SLAM is highly inaccurate.

        SLAM and IR technics are getting old…

        • Dylan

          not true when it’s monofusion. Microsoft’s 3d scan, when done with a kinect for windows, is fantastic.

          • Lucidfeuer

            I don’t agree, it’s noisy and inaccurate as fuck.

  • Dylan

    I’ve been doing this process myself using off the shelf free software and an xbox kinect (or just with photogrammetry and some video texture tricks) I’m glad they’re making an all in one package, I’ll probably buy this on day 1.

  • Dacian

    can some one please contact me when this is aveable for us?? contact me at dacsol@gmail.com i want this!