Samsung Preps iPhone-Style Spatial Videos & Photos for “Galaxy XR Headset”, Leak Suggests

24

A new feature has leaked to some Samsung smartphones which is expected to bring the ability to capture 3D images and videos specifically for “Galaxy XR headsets,” SamMobile has discovered.

Samsung revealed its forthcoming XR headset, codenamed ‘Project Moohan’ (Korean for ‘Infinite’), late last year, which is slated to bring competition to Apple Vision Pro sometime later this year.

When, how much, or even the mixed reality headset’s official named are all still a mystery, however a recent feature leak uncovered by SamMobile’s Asif Iqbal Shaik reveals Samsung smartphones could soon be able to capture 3D photos and video—just like iPhone does for Vision Pro.

Image courtesy SamMobile

Shaik maintains the latest version (4.0.0.3) of the Camera Assistant app contains the option to capture specifically for “Galaxy XR headsets,” initially hidden within an update to the app on Galaxy S25 FE. Transferring the APK file to a Galaxy S25 Ultra however reveals the option, seen above.

Speculation regarding the plurality of Galaxy XR headsets aside: Samsung has gone on record multiple times since Project Moohan’s late 2024 unveiling that the mixed reality headset will indeed release later this year, making the recent software slip an understandable mistake as the company ostensibly seeks to match Vision Pro feature-for-feature on its range of competing smartphones on arrival.

SEE ALSO
Apple Reportedly Shelves Cheaper & Lighter Vision Pro for Smart Glasses to Rival Meta

Slated to be the first XR headset to run Google’s Android XR operating system, Moohan could be releasing sooner than you think. A recent report from Korea’s Newsworks maintained the device will be featured at a Samsung product event on September 29th. Notably, Moohan was a no-show at Samsung’s Galaxy event earlier this month, which saw the unveiling of Galaxy S25 FE, Galaxy Tab S11, and Galaxy Tab S11 Ultra.

Newsworks further suggests Moohan could launch first in South Korea on October 13th, priced at somewhere between ₩2.5 and ₩4 million South Korean won—or between $1,800 and $2,900 USD—with a global rollout set to follow.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.

Well before the first modern XR products hit the market, Scott recognized the potential of the technology and set out to understand and document its growth. He has been professionally reporting on the space for nearly a decade as Editor at Road to VR, authoring more than 4,000 articles on the topic. Scott brings that seasoned insight to his reporting from major industry events across the globe.
  • asdf

    I am getting tired of the censorship on this forum. I get the distinct impression that Ben is the one doing it because it happens when I don't even name the companies that I am criticizing and it is happening on irregular intervals that make it look like it being done manually. Unless Ben denies he's the one doing then I will assume he is guilty of it.

    • Andrew Jakobs

      Did it contain weblinks?

      • Christian Schildwaechter

        It doesn't even have to be qualified URLs/links, the name of a domain with a dot will suffice. youtu_be is fine, youtu[dot]be will be put onto a review list for eternity. But this isn't censorship or even an attempt to keep people on the site, as even links to roadtovr_com get the same treatment.

        As will accidental typos like missing the space beyond the period at the end of the sentence, and accidentally starting the next sentence with "to" or something that makes it look like a domain name. We have armies of bots spamming every forum with junk for that, as nobody can afford 24/7 staff to instantly check every single link before it goes live, so the only way is to discourage posting junk with links is banning links. Which of course sucks and renders one of the most unique and powerful features of the web useless. But unfortunately some humans will always act like a*holes, and that's why we can't have nice things.

    • Nevets

      That's a harsh approach good sir, branding somebody as guilty unless they issue a specific denial.

      • Christian Schildwaechter

        I seriously doubt that he would even accept a denial, mostly counting that none will be given to "proof" his point. Which is of course a lot more convenient than having to proof things yourself by systematically testing the issue. Enough of my posts never made it through the automatic checks because I included something resembling a domain name, that I now make a copy before posting, so that if it vanishes, I can identify the thing that most likely triggered the spam filter, and then post again after replacing it.

        And we got statements from Ben regarding acceptable content. I remember people explicitly asking for MORE censorship while ViRGiN was still posting here, which Ben (rarely) responded to, defending a no-censorship position. So the "we don't censor based on content/opinion, at most remove spam and (maybe) very obvious harassment" denial has already been made.

        • Olle

          What happend with ViRGiN? Was he banned? Not that I miss him deeply but every now and then I had a quick laugh about the drama he consistently created.

  • Stephen Bard

    So called "Spatial" photos/videos captured by iPhones are already low-quality 3D photos due to the fact that they are poorly "improvised" using mismatched cameras that are also too close together to provide proper eye-distance stereo separation. iPhone 3D photos only have about 40% of the "depth" that you get from a proper 3D camera with eye-distance lens spacing. Even the $200 Xreal Beam Pro camera delivers 3D photos/videos far better than iPhone semi-spatial memories. Presumably the narrow camera spacing on the Samsung Galaxy phones will also prduce 3D photos/videos for the Moohan that will also have very limited depth.

    • Christian Schildwaechter

      Pretty much every photo you take on a smartphone today is really the product of combining multiple cameras and running the results through a very powerful, dedicated image processor. There is no other way to get great pictures out of miniature cameras with teeny tiny apertures and a few square millimeters of sensor surface area. In the past you had to hurl around huge and heavy cameras with 36mm sensors for this, plus even bigger and heavier objectives that allow a lot of light to pass, and the laws of physics or the amount of photons falling onto the same sensor area haven't changed since. On a pixel level, basically all phone photos are AI generated, or at least task specific AI assisted.

      So only looking at the physical configuration to establish quality makes no sense, as it is all software now. And that's even true for 3D movies that are no longer shot with dual camera setups. Instead the 3D is added in post production by calculating the different camera perspective, with the aid of extra camera 3D telemetry data during the shoot. Which works so well that basically nobody bothers with dual sensors for "regular" 3D movies anymore, though there are special cameras like the Blackmagic URSA Cinema that places two 180° fisheye lenses in front of a single 17K sensor esp. to shoot 8K/eye immersive 180° video for AVP.

      So yes, the camera configuration on an iPhone, with cameras just a few millimeters apart, is obviously not enough to physically properly map the on average 65mm IPD of human eyes. But it is enough to record a tiny bit of parallax with high resolution sensors, and the closer an object is, the larger the parallax will be. So even this tiny shift will allow to calculate how far away different objects are, and with the proper software and a lot of compute power, you can then render a 3D image as if it would have been recorded with cameras 65mm apart.

  • eadVrim

    3D stereoscopic photos/videos suddenly their name became Spatial.

    • sfmike

      They had to do something as 3D has developed a negative vibe to the general public due to online 3D haters.

    • Christian Schildwaechter

      The most obvious reason is that "spatial" is a much more consumer friendly name than "3D stereoscopic". So while engineers might prefer the latter for technical accuracy, every marketing department and most consumers would hate it.

      And it's not even that technical correct, as the spatial photos on AVP go far beyond the stereoscopic photos that have been around for more than 150 years. Thanks to clever software, you are no longer bound to one fixed position with two slightly different perspectives for two eyes, you can actually move around a little bit.

      So saying this is just 3D stereoscopy is like saying VR HMDs are just like 3D glasses for TVs, completely missing the importance that head tracking and the capability to look around has for immersion.

      • eadVrim

        Currently in AVP and the new Samsung XR headset, It is VR 3D stereoscopic, not (Spatial) where user move in the video space.
        I can accept it as marketing term of Apple, but other companies copies a term invented by Apple, I find it blind imitation

        • Christian Schildwaechter

          youtu_be/68hnv3gEGhc

          The software separates the image into separate depth layers, so you can actually "move in the video space". Not a lot, but enough to make it feel like this is a view into an actual space, something stereoscopy with a fixed viewer position simply cannot do. And iOS/visionOS can do that even starting with 2D photos, just throwing some AI at it. This is quite compute intense, so right now it is only possible for photos on iPhone/AVP, not for video. But as adding 3D in post-production in movies shows, this is not a fundamental issue.

          If you look at still frames of 3D stereoscopic movies, they look somewhat strange, as objects sticking out seem to follow your head movements, which is of course wrong, you instead should see the sides, and parts at different depths should be moving at different speeds. Consequently 3D movies look best when the either the camera or the objects are moving, both because this emphasizes the effect and hides the wrong perspective during head movements.

          You can say that technically spatial and stereoscopic images are recorded the same way, just taking two pictures from slightly different perspectives, but the real difference here is how they are viewed/preprocessed on the HMD to turn them into what Apple calls spatial photos. And of course everybody else could implement these too, and there has been (cumbersome) software for converting 2D to 3D before AVP. But the resulting experience is very different from just watching old-school 3D stereoscopic images, so claiming that this is just a marketing term for something that we already had for a long time is simply false.

          • Olle

            Serious question to someone like you Christian who might know these things. I recently bought a new powerful PC and decided to give my Q2 another go with it. I ended up watching a video in which I could accually move around in the 3d space (6dof) while it was happening. Ok, I admit it was a porn video. It made me wonder, how do they do this? How many cameras are needed? And is the tech required to record this expensive? I'm curious because it was damn cool and something I have been longing for to happen (not just for adult content) since DK2, so naturally I wonder how much time it will take for this so go mainstream, as in could this format come to tv-series, sports, movies etc… in the near future. The app I used had in total about 2,5 minutes of video that took about 7 Gig hard drive space so it is obviously pretty unpractical with todays HDD. But the immersion was on a whole other level than when watching 3DOF vids.

          • Christian Schildwaechter

            My wild guess would be Gaussian splatting.

            "3D Gaussian Splatting – Why Graphics Will Never Be The Same" 2:11 youtu_be/HVv_IQKlafQ

            There have been a number of ways to create photorealistic images in the past. One well known is photogrammetry, where you take lots of single pictures from all directions, the software compares these based on overlapping areas, and from this generates a traditional 3D mech model with a very realistic texture.

            Newer techniques like NeRF (Neural Radiance Field) also start with a bunch of static images from all directions, but instead of generating a 3D model, they train a network to "learn" the density and light emittance of points in space. Later you can ask this network how a scene would look from a different directions or with different light conditions, or from the perspective of your left and right eye. Gaussian splatting works similar, but instead of learning properties for points in space, it learns gaussian distribution curves in space. It involves a lot of math.

            The creation of all these is the same, take tons of photos and throw some software at them. So in theory you can capture this with a smartphone and a lot of patience, but most will use arrays of cameras to speed up the process. The sky/your budged is the limit. Computational cost goes up significantly with the models. While you will be able to create a photogrammetry scene on a regular PC, NeRF and esp. Gaussian splatting will pretty much only be trainable on very expensive AI accelerators due to their high memory consumption during training. So the expensive part will not be taking the images, but the money you have to pay some data center for using their Nvidia AI cluster to train the radiance field.

            The resulting data is still in the gigabytes for single scenes, but this is much less than storing the full volumetric data would take. Photogrammetry is much less storage heavy, but it effectively just creates painted paper model that are hollow, while radiance fields actually store spatial information. As your example was porn, which usually comes with a lot of movement in the scene, this cannot be regular Gaussian splatting. And there is something called 3D Temporal Gaussian splatting extending the method to capturing dynamic scenes, but I understand maybe 1/10th of the terms used to describe how that actually works.

          • Olle

            Ok, good to hear some explanation. I guess I have to wait and see what happens with that. Hopefully as time goes by there the computational capacities will improve enough for this to become relatively inexpensive. Being a tennis fan, what I would love to see is live tennis streams where you can move around on the court during the rallies. That would be awesome, I’m making an unqualified hopefull guess that it will be possible in 2-3 decades and keep my fingers crossed. Anyways, thank you for your reply.

          • Christian Schildwaechter

            It's going to happen much, much faster, esp. for live sport event streaming. Apple bought NextVR in 2020, a company that specialized in sport streaming for VR and had previously offered 180° NFL and basketball streaming with clients for Oculus Go, Quest and PCVR. At the time of the acquisition NextVR were working on new streaming tech that allows viewers to move around a little bit. Not like changing to a different part of the court, this would still require switching to another camera, but leaning left, right etc., making it a much more real feeling 6DoF viewing experience compared to the 3DoF cameras used before that both feel unnatural and cause discomfort. Apple signed a 10 year USD 2.5B contract with Major League Soccer that explicitly includes developing "future types of broadcasting". They tried this with the NFL, but the NFL wasn't willing to commit/support not yet existing forms of media.

            Methods like Gaussian splatting are very expensive on the encoding side, but esp. Gaussian splatting has become popular because it can be viewed on low power devices and integrated into game engines like Unity. For a company like Apple or Meta it doesn't matter if you need a multi-million dollar AI cluster to create the "splats" in real-time, if they are streaming to thousands of users. Technically these splats are sort of flat images, and you only need a few for every position, so instead of having you download 7GB for 2.5 minutes that allow you to move around freely, they can stream only those splats that match the viewers current position, which is a small fraction of that. Meta already offered one similar streaming demo for the Quest, where you can basically walk around, with only the data needed for your current position streamed from one of their data centers.

            So we are probably talking more about 2-3 years than 2-3 decades, but definitely less than ten years, with five also a pretty safe bet. They'll start with sport events that draw a lot of people with high ticket prices, so football, basketball and soccer will probably show up first, but once the tech is established, I'd fully expect them to also extend these offers to tennis and other types of sports.

            Apple already offered "immersive video" recordings of some passed games for AVP on Apple TV+, and just this month announced that they will live stream several Lakers games from the next NBA season in this immersive format. It is not quite the same as a space you can freely move around in, but way more immersive than 180° video, and for live event streaming pretty close to sitting in a stadium. The reactions so far have been extremely positive, with many considering this a potential killer app for XR HMDs, so you can bet that Apple and Meta and Google are all working on making this happen ASAP.

          • Olle

            I read about that NextGen VR acquisition a few years ago when I bought Q2 but I hadn’t heard anything so I thought it turned out to be a dead end. It’s good to hear it is still being worked on tho I’m less optimistic then you when it comes to the time frame we’re looking at. Consider that 4k tennis broadcasts are rare even today, more than a decade after 4k was introduced into the market. The technology is there but the broadcasting companies cite bandwidth limitations as the reason they are still using 1080p. Some major broadcasters in the US like fox and CBS are still using 720p and 1080i for live sports. Here in Sweden there are no channels broadcasting tennis in 4k. I have an iptv subscription with many channels from all over the world and a search among these with the word “tennis” give 156 results while a “tennis 4k” search render 9 results, all european. And even with clever algorithms and AI clusters, I expect the kind of thing I’m looking forward to to require substatially more bandwidth than 4k, perhaps even an exponential increase. For comparison, the internet speed in the apartment I live in has only increased by a factor of 6 (from 50 to 300 Mbit) in the decade I’ve lived here.

            And then there is the if and when VR goes mainstream which is required for any broadcaster to be interested in doing the investment to stream sports. I take it Meta and Apple are just experimenting at this stage with both surely stepping away the moment they don’t see a profit happening in the near future (I tried Metas sport playback feature on Q2 in 2023, it sucked). At this stage, it all depends on the possible success of AR. But the first few iterations of those AR glasses are unlikely to offer reasonable resolution for sports stream. All in all, while your prediction for when the tech is available might be closer to the truth than mine, the time it takes for this kind of tech to be commercially feasible enough to be offered at a reasonable price point might be much longer.

          • Christian Schildwaechter

            In a way I'm optimistic precisely because XR hasn't gone mainstream yet, and Apple and Meta are struggling to find reasons why esp. non-games should bother to buy one of their headsets.

            On AVP watching immersive and 3D 4K video turned out to be one of the most popular uses, made possible by Apple already having the infrastructure for high bitrate streaming Apple TV(+) to iPhones, iPads and dedicated Apple TV boxes. They list 25MBit/sec as a minimal requirement for 4K streaming, and with AVP using MV-HEVC for stereoscopic/immersive video, the multiview extension of h265 that stores only the small differences between the perspective seen by the eyes for the second stream, 3D 4K streaming shouldn't be a lot more demanding, so 50Mbit/sec should be fine. Sending Gaussian splats can be even less demanding for small movements, as the actual image is still generated on the HMD itself, so with clever compression only a view actual changes per frame have to be transferred. But here I am speculating about what nifty optimizations Apple/NextVR might have come up within the last five years, nothing is official or announced. Estimates for how much Apple paid for NextVR are around USD 100M. Like pretty much everything AVP, and all the XR activities at Meta, this is a huge bet on a far future. But this is what gets them to invest a lot of money into things that will not make money anytime soon, and maybe never.

            There are probably only a limited number of people who really wish for 4K live tennis broadcasts, while most are fine with 1080p, so even though the technology exists, there isn't a lot of incentive. There are currently even less people that will use XR HMDs for sports events, but Apple/Meta/Google all hope that this is what will get people to buy their expensive and uncomfortable headsets in the first place, making the very high cost more promotional costs for the platform than an actual income stream.

            And that's partly why I am so optimistic with the time line. Apple probably sold less than 500K AVP, and they apparently estimate to sell less than 200K of the new AVP M5, so the user numbers will be ridiculously low at least until a lighter and cheaper Vision Air released in 2027 at its earliest. Nonetheless they already offer dedicated 3D 4K streams that only work on AVP, hire known directors to create expensive immersive videos that are mostly showcases what the tech can do, and now announced the live immersive streaming of Lakers games, for which the license will be very expensive.

            None of this makes economic sense right now, but while a lack of market demand may prevent 4K tennis broadcasts from being a thing, being a money sink is apparently not an issue for content production for AVP. And Apple spend a lot less on this than what Meta burned through at MRL. Everything you wrote makes a lot of sense in a market driven by economic concerns, and most of what I wrote only makes sense if money at least for not is not the issue. We'll have to wait and see what direction Apple/Meta/Google are now heading towards, and whether we will still see a lot of money pumped into XR, or if maybe they now all mostly focus on smartglasses with only breadcrumbs remaining for the less popular brick shaped XR HMDs.

          • Olle

            An additional point is that if xr makes a breakthrough and the NextGen VR manages to create a compelling enough experience with vr sports stream for people to want to buy it, I would assume there would be a price hike and that the price wouldn’t come down until competitors start to offer alternatives which could be much further in the future. And Apple isn’t exactly known for being big on low prices. In the sports world, broadcasters buy the rights to an event and then they have a monopoly in their country so they can set the price to any unreasonable amount. For watching tennis in Sweden, the price for a subscription has increased more than threefold in 6 years to about 38 euros a month because the broadcasting companies are dicks. That is alot of money if you like me intend to watch a single match or two in a given tournament and is what what caused me to switch to iptv. I digress, my point is that I don’t expect Apple or Meta to miss the opportunity to fill their coffins just like todays broadcasters if given the chance and that it will stay that way for the forseeable future. I might be willing to pay about 150% more for a vr stream than the 13 euros/month I pay today, but I doubt it will be worth much more than that. But it is as you say, we will have to wait and see what happens.

  • STL

    I’m using spatial videos extensively with my Quest 3 and iPhone 15 Pro Max. My most favorite festure! It‘s not „3D“ it’s time travel to the past! I love it and it was the one and only reason to purchase a new iPhone!
    I‘m also grossly disappointed about the missing upgrade from 4k to 8k on iPhone 17 Pro Max.

  • R. m

    LMAO $1800 is absolutely DOA. These people took years to launch a headset with both an OS AND hardware significantly worse than the Vision OS and the Vision Pro, and then, they can't even undercut them on price in a significant manner after almost 2 years?! The play-for-dream headset has the exact same specs that were leaked for this headset and they are RIGHT NOW $1900. This is a MINISCULE Chinese company that managed to launch the product BEFORE SAMSUNG, CHEAPER, and with their own OS!! Are you kidding me? This will basically showcase the android XR for Google and will spread the OS to other OEMs, which is good for XR technology, but SAMSUNG are truly destroying themselves with this pathetic hardware.

    • Christian Schildwaechter

      TL;DR: Nobody outside of a tiny subsegment of VR users cares about cheap HMDs for streaming PCVR, so when analyzing the design and pricing of HMDs, you have to look at what they actually want to achieve with them. And it doesn't even matter how much they sell, as these first gen high end, high price HMDs are mostly test balloons for future, consumer targeted devices.

      Being cheap was never the goal, even if most VR users for years cried for someone to enter into a ruinous HMD price competition with Meta. Pico made a short attempt, paid for with the billions ByteDance made from TikTok, but right from the start even they only targeted the higher end Quest configurations with more storage, never the USD 300 entry level. And no other large player bothered to enter the standalone market until Apple legitimized a high price segment with the USD 3500 AVP, which according to teardowns only costs USD 1400-1800 to build.

      So Samsung is entering the market not to compete with Meta or any smaller competitor with a low price, but to establish themselves and Android XR as a high quality/high value option for media and productivity use. Samsung sells a lot of very expensive smartphones, taking more than just one page from Apple's playbook, and those two companies combined sell around 50% of all smartphones worldwide, which is pretty much the opposite of destroying themselves.

      And those MINUSCULE Chinese companies could create cheaper headsets only because they got a lot of venture capital in the wake of the Apple Vision Pro announcement, with lots of money invested to win a piece of the (high price) media/productivity HMD market, not the low price VR gaming market. And it was only possible because they could buy ready made components like microOLEDs from BOE, SeeYa or Sony, and SoCs from Qualcomm, and whole HMD construction kits from manufacturers like GoerTek, and use the Google's free Android as a base, and the OpenXR stacks provided by Qualcomm/GoerTek to make it work as a headset.

      And that will still be not enough, because they completely lack the applications or the market share to attract developers, so they will inevitably switch to AndroidXR (outside of China) for access to the Play Store, once it becomes available to other companies than Samsung. Play for Dream's only chance to get noticed was rushing to the market before a lot of other, very similar HMDs appear, and selling at a low price. That's less a sign of superior engineering capabilities, and more a necessary survival strategy. But consequently their Android based OS with a DIY visionOS clone will only cover a few core use cases like streaming PCVR or video, but otherwise in no way offer a similar experience as AVP or Project Moohan, making this a very apples vs. oranges comparison.

      Sure, you can use the Play for Dream as an untethered HMD for PCVR streaming, and it will be cheaper than Project Moohan for that while offering similar specs. But that is an ultra-niche application that pretty much nobody cares about outside of VR enthusiasts with enough money to buy a high end PC. So concluding that the HMD is DOA at USD 1800 just because it doesn't cater to that one specific low price use case that has very little to do with why Samsung entered the XR market in the first place, means that you are very much missing the bigger picture here. Especially the purpose of first gen devices with a very small target audience mostly paving the way for future, more capable and affordable versions.

  • fcpw

    Hilariously, Samsung even rips off the failing products from Apple.