Forum Discussion

jeremy.deats's avatar
2 years ago

When will we get object and image classification (Computer Vision) for Quest 3 and Quest Pro?

If I wanted to build a Mixed Reality app that can detect when a certain brand logo is visible on a poster, coffee cup coaster, etc... and then allow spatial anchoring relative to that logo there seems to be no way to achieve this today. Compute vision for Quest 3 and Quest Pro developers is limited to a very restricted list of "semantic classification" labels, all room architecture and furniture related objects (ceiling, floor, wall, door fixture, lamp, desk, etc..) full list here: https://developer.oculus.com/documentation/unity/unity-scene-supported-semantic-labels/?fbclid=IwAR3KeVSJCLX977HPLKVDkFM3YqG71p_Blo_eoC7onKkax7wyCafLV0gXTCc

This also prohibits any kind of AR/MR training experience where some physical world object (e.g. a bulldozer operations panel) could be detected and spatial anchors augmented relative to specific control panel features to provide dialogs, etc.. all the things you'd expect from industrial AR applications. But this is not just useful for Enterprise/industrial AR, image and object classification is actually a core AR/MR feature required to build compelling experiences. Without this, we just have novelty use cases

Looking at the competition, I see Byte Dance is solving this but just allowing camera feed access on their Enterprise Pico 4. The retail version they block it. I doubt Meta will provide camera feed access as they are no longer selling Enterprise specific hardware and this would require a special firmware update to enable. 

Apple has provided camera access to iOS developers using ARKit for years, but for Vision Pro's ARKit implementation they are restricting camera feed access, however they are still providing image classification/detection and their computer vision/classification models, allowing developers to add their own images for recondition, here's a page from their docs.

https://developer.apple.com/documentation/visionos/tracking-images-in-3d-space

I am really surprised Quest Pro has been out almost a year and this sort or core AR/MR functionality is completely absent. With Quest 3 now released, more attention will be on AR/MR experiences, and Meta has great in house AI technology, they have computer vision models they could build a closed pipeline where the raw image feed is not accessible, but the classifier model is compiled and through a closed system the detection can happen in Unity3D or Unreal apps. 

Regardless of how they achieve it, this is so very important to future MR/AR type apps. Without it basically all you can do is very simple spatial anchoring, which may be suitable for novelty games but it's very restrictive and not reflective of the power of MR/AR. 

21 Replies

  • Thanks Jeremy  for such an elaborate post about that topic
    I figure Meta is  just slow shifting their pure consumer focused VR/XR approach over to some more business and technology orientated mindset. Or at least i hope so that this is happening

    Consumer VR looks like its staying in its niche for a while longer i think

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      In every way the Quest 2 and Quest 3 are marketed as a game console product, with Quest 3 even including a voucher for a future released game.  I get that. But you can't do anything meaningful in AR/MR without computer vision. Quest 3 has computer vision, just restricted to common architecture and furnishings. 

      Here's an example consumer application that the Quest 3 hardware is capable, but due to the limitations on computer vision can not be built-

      Imagine an MR/AR experience where a standard deck of playing cards can be used and the headset uses image detection to determine what's in the players hand (King of Hearts, Seven of Clubs, Ace of Spades, etc...) the experience then teaches the user how to play various card games, spatially anchoring dialogs to specific cards to provide information and clues.

      What described above can only be achieved if developers are given access to computer vision models, the company Vuforia provides an SDK and end-to-end solution, where they host the model. As a developer you can upload images for training for your experience. I could build the experience described above using Vuforia SDK and services and make it multi-platform, working on HoloLens, Magic Leap, or on an iPhone or Android phone.  But due to the API restrictions AR SDK vendors like Vuforia and even open tools built on OpenCV will not work with Quest Pro or Quest 3.

      With native ARKit development I could build this for Apple's Vision Pro headset. I don't think a lot of people realize it yet, but this means the Vision Pro is going to be capable of delivering this entire broad array of AR type experiences through MR that Quest 3 is incapable of, all due to this restriction and lack of forethought into a CV pipeline. It's kind of assinine when you think about it. 

      Without this there is no path to building true MR/AR experiences on Quest 3.  To me, this is the difference between the device being a toy and being taken seriously as a tool. I really hope they provide developers a path to at least perform image recondition. 

       

       

       


      spacefrog wrote:

      Thanks Jeremy  for such an elaborate post about that topic


      spacefrog wrote:

      Thanks Jeremy  for such an elaborate post about that topic
      I figure Meta is  just slow shifting their pure consumer focused VR/XR approach over to some more business and technology orientated mindset. Or at least i hope so that this is happening

      Consumer VR looks like its staying in its niche for a while longer i think




      I figure Meta is  just slow shifting their pure consumer focused VR/XR approach over to some more business and technology orientated mindset. Or at least i hope so that this is happening

      Consumer VR looks like its staying in its niche for a while longer i think




      Meta has a big push for AI technologies, which is great. But computer vision like what I'm describing above is the essential AI technology for AR/MR that is entirely off limits to developers. 

       

  • I agree, having access to the raw camera feed is very important for industrial/non-regular consumer focused apllications. If this could be accessed on a Quest 3 or Pro, even if we would have to pay some more for an enterprise version like with the Pico, this would open up the device to many more use cases. Robotics, education, medical, and many other fields would benefit a lot from this due to image processing and object recognition.

    Because of the higher field of view (though the warping on the passthrough/video see-through could be improved), I could see this easily replacing development on the HoloLens or Magic Leap if raw video feed, and possibly point cloud data from the depth camera, were to be accessible. 

    I can only assume this isn't currently available for developers because of privacy issues for regular consumers, but there should be a way to allow for this for enterprise use. Hopefully this will come in the future or else it is a missed opportunity.

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      Camera feed access is unavailable for privacy issues... Apple is also disabling raw camera feed access on the Vision Pro, but Apple does offer a pipeline where, through ARKit developers can add images and I believe objects as well for recondition. So Apple host an instance of the computer vision model and developers can train that model and have the ability to recognize images/objects and set spatial anchors on those images/objects in real-time... all of this without access to the raw camera feed on Vision Pro. Apple does this without a fee.  Apple developers do have to pay an annual fee of around $100 to be part of the Apple Developer program (one time annual fee, which covers all Apple devices), but there is nothing additional and no run-time expenses involved in using ARKit.

      See:

      https://developer.apple.com/documentation/visionos/tracking-images-in-3d-space

      https://developer.apple.com/documentation/arkit/imageanchor

      All Meta needs to do is find their own way of providing a counterpart to what Apple is providing through that API to open the door to a large range of MR/AR experiences. Quest 3 hardware is capable, just limited by software in this case. Developers need to be able to detect and perform spatial anchors at least on images detected in the environment. 

      Meta has invested heavily in AI technologies, including computer vision. They have all the technology to build a closed pipeline without exposing the raw camera feed. You can only build a very limited range of AR/MR experiences without this. 

      It would be a shame if Meta does make this closed pipeline exclusive to some Enterprise package, that would be closing the door on consumer apps being built that could benefit from this pipeline. Also I think it would be a mistake for them to try to commoditize the pipeline. If they do go that route, I hope they provide a gracious free tier so developers (including App Lab developers) can experiment and release great experiences. 

      It really makes no sense to allow Apple's Vision Pro developers to build this entire range of apps that can't also be ported to Quest 3, but until Meta builds this closed CV pipeline or allows camera feed access, it really limits the Quest 3's MR/AR use cases to Meta's in-house apps and games that can only build spatial anchors to attach to and augment over basic room geometry and fixtures. It's one the key differences in the device being  a toy/console and a spatial computer. 

       

       

       

  • Hi Jeremy,

    I wonder if Vision Pro can achieve Object Detection function, for example, detection a basketball.

    Thank you!

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      ARKit can for iOS, but from the documentation it appears Apple has only enabled the computer vision model to be trained by developers on images for RealityOS. 

      • monsterbai's avatar
        monsterbai
        Explorer

        Image Detection means it can only recognize some 2D pictures but can not recognize 3D object right? And will apple open its access to object detection in the near future. I truly agree with your points that about the limitations without these algorithms.

  • And in this situation, if we want to recognize a basketball in Vision Pro, we need to add maybe thousands of pictures from different angles of basketball to asset folder?

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      You would have to experiment and see, I've not worked first hand with Apple's Computer Vision approach on AVP, just noted it was possible in the developer docs.

      • monsterbai's avatar
        monsterbai
        Explorer

        OK, I may can have a try, once I get results, I will inform you!

        Thank you! And I think we can have more discussion, I have learned a lot from you!

  • I don't know if this is relevant, but Apple bought many years ago a company called Metaio, which had an AR SDK at the time that competed against Vuforia and ARToolkit.

    Metaio allowed for image (fiducial marker) and 3D object detection and tracking (moving 3D objects not just static placement). 

    Here is a video as an example from about 10 years ago, there are probably many more videos online. 

    https://m.youtube.com/watch?v=m73gJGdiTik&pp=ygUWTWV0YWlvIG9iamVjdCB0cmFja2luZw%3D%3D

     

    I can only imagine that Apple may integrate Metaio's old software into Apple Vision Pro someday (hopefully improved after all of these years), and I hope this pushes Meta to integrate it into the Quest as well, least into the Quest Pro, to allow for better tools for developers to develop AR applications.

    • monsterbai's avatar
      monsterbai
      Explorer

      Thank you for your update, I believe apple and Meta will allow us to do Object detection in the near future.

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      Keep in mind, neither company will grant access to the raw camera feed from official store apps. This means industrial grade AR toolkits like Vuforia will never be supported without some deep partnership where these companies make exclusive agreements to allow.  Apple does this by controlling the pipeline and keeping it a black box to the developer. You can't touch the actual image feed coming in, through Apple's libraries it just gets funneled to the CV model for classification. 

      Meta could do the same, but they have to build a CV model, build the entire pipeline. Without this Mixed Reality is really just a gimmick. You could be very simple experiences, 3D table top games, things like that. But nothing beyond toys. 

       

      • monsterbai's avatar
        monsterbai
        Explorer

        I do really learn something new from you. I am not familiar with this. Thank you!

  • Here is an interesting post on the need to open VR camera access to developers and possible ways companies could provide this while still focusing on privacy. In the post it is mentioned that Pico 4 Enterprise headset provides camera access if requested, which is an interesting solution that Meta could use with their headsets - https://skarredghost.com/2024/03/20/camera-access-mixed-reality/

  • OK,... Apple did it with VisionOS. An Enterprise API with camera access has been released: https://developer.apple.com/documentation/visionOS/building-spatial-experiences-for-business-apps-with-enterprise-apis

    The list of features is wild,... video capture, neural nets for machine learning tasks and object detection with parameter adjustment. As implied by the API name the features are for business applications only, but the first step has been taken! 🙂

    • jeremy.deats's avatar
      jeremy.deats
      Protege

      I don't want to give a lot of discussion to Apple Vision Pro on a Meta Quest Developer forum, but want to point out that the new Enterprise API for Vision Pro comes with some strong caveats, the big one being you can not build a Vision Pro app with these APIs and list it on the Vision Pro store for general consumer use.

      The only Vision Pro hardware that will be able to run apps created with these Enterprise APIs will be Vision Pro devices with the correct security provisioning. I will give kudos to Apple for making this provisioning process easy and the ability to build Enterprise apps doesn't require any special fees to Apple, you're $100/year dev account required to deploy builds on any Apple hardware also gives you privilege to build Enterprise apps and provision them for Enterprise clients. 

      But this is still a huge feature, it means Apple Vision Pro developers can now port any HoloLens or Magic Leap app, it means Enterprise AR frameworks like Vuforia will now be able to fully support Apple Vision Pro. 

      Meta has an opportunity to implement this and go further. Yes, there is a privacy concern with giving devs full camera access, no one is arguing that. However, Meta is much further along in the AI space than Apple in many regards. Meta can build a secure pipeline from the Quest's camera feed to their backend models and allow the APIs to just broker this, meaning the image data/video feed flows from cameras to Meta's servers to be processes by Meta controlled AI models and customers can opt-in to running these type of apps without the fear the developer is inappropriately using the camera feed. 

      It can all be done in a protected way. Meta has the AI technology to do this and gain leverage. Apple is playing catch up with AI, at least on the software side of things. 

      But will they? Are you listening Boz?