Nokia has taken a firm stride in VR territory with the introduction of Ozo, a VR camera designed for professional content makers, but ultimately it’s going to be consumers on the other end of the screen that experience the camera’s capabilities en masse. We’ve gone hands-on with three experiences filmed with the camera to find out just how well Ozo can transport viewers into the scene.

At Nokia’s Ozo reveal I got to check out three test scenes filmed with a prototype version of the camera. Each camera on the array shoots at 2K x 2K resolution, with eight cameras total, capturing an entire sphere of the scene. The synchronized cameras use a global shutter which eliminates the potential for a rolling shutter effect which can impact capture performance for fast moving objects. Each camera lens covers a 195 degree field of view, offering quite a bit of overlap from one view to the next. Eight microphones capture spatial audio.

See Also: First Look – Nokia’s ‘OZO’ 3D VR Camera Aims at Professional Market with Real-time 3D Preview

noka-ozo-beachWhen it comes to resolution/fidelity, the test footage I saw from Ozo is up there with the best 3D VR video capture I’ve seen. At this point it’s difficult to assess the maximal quality given that the resolution of today’s VR headsets is often the bottleneck. Indeed, this appeared to be the case as I could see a noticeable improvement between the same Ozo footage when viewed through the Oculus Rift DK2 (960×1080 effective resolution) and then the HTC Vive (1080×1200 effective resolution).

When it comes to playback though, Nokia has a unique playback solution which bypasses the need for the stitching process (which combines the multiple camera feeds into a seamless sphere). Nokia’s custom playback solution seems to fade dynamically between stereo pairs based on where the user is looking.

SEE ALSO
Pimax Reveals New High-end PC VR Headsets Focused on Affordability & Performance

The effect is hard to describe, but it results in a noticeable ‘popping’ of scene depth as the view transitions from one stereo pair to the next. This can be jarring to one’s perception, as though the view has lurched slightly around you. It can be especially iffy if the section of the scene that you want to look at is right on a transition border, causing a quick back and forth of different depths. However, this method does bring some benefits, like reproducing a very natural stereo 3D effect for each given view and avoiding the issue of stitching seams (at least in the footage that I saw).

The good news about this stitchless playback solution, which definitely has some pros and cons to be considered, is that it’s optional. Nokia told me that the video captured by Ozo can be stitched with traditional methods. And while the stitchless playback will likely be restricted to platforms that specially support it, traditionally rendered video from Ozo can be distributed on any platform.

Though I haven’t been able to confirm it officially, I’ve been told by third-parties that Ozo captures at 30 FPS. This actually came as a surprise to me as the footage I saw seemed adequately smooth (not noticeably out of line with other 360 VR footage I’ve seen) though none of what I saw had fast motion where it would have been most telling. Of course, headtracking works at higher framerates regardless of the capture rate.

A brief description of the three clips I saw recorded with an Ozo prototype:

SEE ALSO
XR Year in Review: The Most Important Stories of 2023 and What They Mean for 2024

Wedding

I was sitting in a pew in a small church amidst a wedding. The bride and groom were getting ready to say their “I do’s” when a wedding goer stands up to profess his love for the bride. The interruption delves into a comedic back and forth between the objector and the bride that’s rife with awkwardness in the style of The Office’s Michael Scott.

The scene was a few minutes long and around me I could see the reactions of other wedding goers as the dialogue progressed. One couple in the pew in front of me was recording the exchange on a smartphone and whispering between each other as the scene unfolded, which I could hear directionally even while not looking at them thanks to the spatial audio capture. Eventually this couple would have their own awkward exchange just before one of them storms off.

News

This piece appeared to aim to demonstrate the potential of VR video in journalism. A mock crowd was gathered at the base of what appeared to be a courthouse, brandishing rainbow flags and cheering, apparently in celebration of a marriage equality ruling.

To my left I could see a cameraman holding a traditional camera aimed at a woman who was being interviewed. As I continued to hear the cheers of the crowd around me, a small picture-in-picture window appeared floating out in the scene that showed a 2D close-up of the interviewee from the perspective of the cameraman, showing how it’s possible to mix immersive and traditional video components.

After the interview, some of the crowd members came running around the camera, hootin’ and hollerin’ in celebration, raising a big rainbow colored tarp over the camera, making me feel as though I was underneath it.

SEE ALSO
Apple Announces Vision Pro Release & Pre-order Dates

Performance

I was positioned inside of a large warehouse structure with giant balloons floating at various altitudes surrounding me. Positioned behind widely scattered stations were a few members of some sort of band, all dressed curiously and playing several different instruments. Music of a genre that I won’t attempt to categorize was playing all around me throughout the space.

At the front of my view was the singer of this group, who was also curiously dressed, wearing something resembling a straw wig and angel wings. Stretching out between me and the singer was a long rope which the singer was pulling to draw my position ever closer. Below me I could see that I was sitting on a cart that also carried some keyboards. The motion was slow and steady enough so as not to introduce any feelings of discomfort. Eventually I was just a few feet away from the singer who let free a nearby balloon that I tracked upward as it lifted into the air.

All of these scenes featured positional audio which seemed accurate and suitably subtle, offering me cues about where to look, especially when picking up on conversations outside of my immediate field of view. As far as I could tell, each scene was continuously acted with no cuts. In this sense, it felt somewhat closer to theater than cinema, with a distinct appreciaition for the skill of delivering an unbroken performance in one go.

nokia-ozo-background

Capture quality and playback aside, the success or failure of Ozo will be equally decided by whether or not the camera can find adoption among professional VR filmmakers.


Disclosure: Nokia covered travel and lodging expenses for Road to VR to attend the Ozo reveal event.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. More information.


Ben is the world's most senior professional analyst solely dedicated to the XR industry, having founded Road to VR in 2011—a year before the Oculus Kickstarter sparked a resurgence that led to the modern XR landscape. He has authored more than 3,000 articles chronicling the evolution of the XR industry over more than a decade. With that unique perspective, Ben has been consistently recognized as one of the most influential voices in XR, giving keynotes and joining panel and podcast discussions at key industry events. He is a self-described "journalist and analyst, not evangelist."
  • spyro

    Please stop cutting resolution at half and call that ‘effective resolution’. That’s simply not true. It WOULD be true, if both eyes would get the very same picture. Instead, both eyes get unique information about the same world, different pixels. That’s the same situation as in 2D. It’s not just another view of the same pixels or something.

  • spyro

    Please stop cutting resolution at half and call that ‘effective resolution’. That’s simply not true. It WOULD be true, if both eyes would get the very same picture. Instead, both eyes get unique information about the same world, different pixels.

    That’s actually the same situation as in 2D. It’s not just another view of the “same pixels” or something. That’s also the reason why a 3D picture of the same scene looks a lot sharper then the same scene in 2D with the same ‘effective’ resolution.

    • Ben Lang

      That’s an interesting, and understandable, way to look at it, but using the ‘effective resolution’ approach is probably the best way to quantify the fidelity of the display.

      It also isn’t always true that each eye is getting different info, like in the case of monoscopic 360 video or photos. Further, you wouldn’t say that a 1920×1080 3D TV has a resolution of twice its actual pixel count just because it’s sending a full frame of depth info to each eye every other frame.

      I’m not saying your method doesn’t make sense, just that the bulk of people do not bring 3D into the discussion of resolution in that sense. So for communicating what I’d like to communicate (fidelity of the screen), the use of effective resolution is the easiest way.

      • spyro

        Well, if the software just shows the same picture on both sides of the panel (monoscopic) that that’s true, of course. But that’s clearly not the inteded use case. It’s like viewing a upscaled picture on a hires 2D screen.

        A 3DTV with shutter glasses in fact gives you twice as much individual pixels at the same time (compared to the same 2D move) but that’s *temporal* resolution of course, basically just a higher frame rate.

        Normally I wouldn’t care but you are an well-respected VR expert, so please don’t do this or people will reference you as a source ;)

      • lhl

        It seems like we should switch to using arc-resolution sooner rather than later (probably PPD, pixels per degree). While FOV measurement isn’t standardized yet (and can be dependent on eye-relief, etc), but that will come (remember when every monitor manufacturer used their own dot pitch measurements?), and it’d still be more useful/representative than talking about absolute pixels (1K @ 150 degrees, is significantly less resolution than 1K @ 100 degrees).

        With distortion shaders, temporal super-resolution, and other factors, maybe perceived resolution (line pairs?) might be useful as well, but I suppose that will have to wait until it can be effectively measured (not to mention the complexities between center and corner sharpness).

        (btw, agree about monoscopic content – also even w/ stereo-disparity, once you account for rendering artifacts and quantization, I’d suspect when you split the difference, perceived resolution is much closer to the per-eye resolution than the stereo pair)

        • Don Gateley

          I sure agree with that relative to HMD’s with integrated displays but what about phones used in various HMD adapters like those for Google Cardboard? For those you need a conversion factor from PPI to PPD because it can vary by adapter.

          • Don Gateley

            I should have worded that, “What about HMD adapters for phones like those used for Google Cardboard?”

  • Don Gateley

    Looking forward to content from this appearing on YouTube 360/3D suitable for Google Cardboard viewing.