cancel
Showing results for 
Search instead for 
Did you mean: 

When will we get object and image classification (Computer Vision) for Quest 3 and Quest Pro?

jeremy.deats
Protege

If I wanted to build a Mixed Reality app that can detect when a certain brand logo is visible on a poster, coffee cup coaster, etc... and then allow spatial anchoring relative to that logo there seems to be no way to achieve this today. Compute vision for Quest 3 and Quest Pro developers is limited to a very restricted list of "semantic classification" labels, all room architecture and furniture related objects (ceiling, floor, wall, door fixture, lamp, desk, etc..) full list here: https://developer.oculus.com/documentation/unity/unity-scene-supported-semantic-labels/?fbclid=IwAR3...

This also prohibits any kind of AR/MR training experience where some physical world object (e.g. a bulldozer operations panel) could be detected and spatial anchors augmented relative to specific control panel features to provide dialogs, etc.. all the things you'd expect from industrial AR applications. But this is not just useful for Enterprise/industrial AR, image and object classification is actually a core AR/MR feature required to build compelling experiences. Without this, we just have novelty use cases

Looking at the competition, I see Byte Dance is solving this but just allowing camera feed access on their Enterprise Pico 4. The retail version they block it. I doubt Meta will provide camera feed access as they are no longer selling Enterprise specific hardware and this would require a special firmware update to enable. 

Apple has provided camera access to iOS developers using ARKit for years, but for Vision Pro's ARKit implementation they are restricting camera feed access, however they are still providing image classification/detection and their computer vision/classification models, allowing developers to add their own images for recondition, here's a page from their docs.

https://developer.apple.com/documentation/visionos/tracking-images-in-3d-space

I am really surprised Quest Pro has been out almost a year and this sort or core AR/MR functionality is completely absent. With Quest 3 now released, more attention will be on AR/MR experiences, and Meta has great in house AI technology, they have computer vision models they could build a closed pipeline where the raw image feed is not accessible, but the classifier model is compiled and through a closed system the detection can happen in Unity3D or Unreal apps. 

Regardless of how they achieve it, this is so very important to future MR/AR type apps. Without it basically all you can do is very simple spatial anchoring, which may be suitable for novelty games but it's very restrictive and not reflective of the power of MR/AR. 

21 REPLIES 21

I don't want to give a lot of discussion to Apple Vision Pro on a Meta Quest Developer forum, but want to point out that the new Enterprise API for Vision Pro comes with some strong caveats, the big one being you can not build a Vision Pro app with these APIs and list it on the Vision Pro store for general consumer use.

The only Vision Pro hardware that will be able to run apps created with these Enterprise APIs will be Vision Pro devices with the correct security provisioning. I will give kudos to Apple for making this provisioning process easy and the ability to build Enterprise apps doesn't require any special fees to Apple, you're $100/year dev account required to deploy builds on any Apple hardware also gives you privilege to build Enterprise apps and provision them for Enterprise clients. 

But this is still a huge feature, it means Apple Vision Pro developers can now port any HoloLens or Magic Leap app, it means Enterprise AR frameworks like Vuforia will now be able to fully support Apple Vision Pro. 

Meta has an opportunity to implement this and go further. Yes, there is a privacy concern with giving devs full camera access, no one is arguing that. However, Meta is much further along in the AI space than Apple in many regards. Meta can build a secure pipeline from the Quest's camera feed to their backend models and allow the APIs to just broker this, meaning the image data/video feed flows from cameras to Meta's servers to be processes by Meta controlled AI models and customers can opt-in to running these type of apps without the fear the developer is inappropriately using the camera feed. 

It can all be done in a protected way. Meta has the AI technology to do this and gain leverage. Apple is playing catch up with AI, at least on the software side of things. 

But will they? Are you listening Boz?

  

What about on-device image classification/detection, as indicated in the original post? I would suppose that something like e.g. YOLO would run pretty well on Quest 3, Quest Pro or even Quest 2. 

Meta did also publish research on Vision Transformers (ViTs) for object detection in 2022: https://ai.meta.com/blog/efficient-accurate-object-detection-for-hundreds-of-uncommon-object-classes...

I feel even the ability to detect a set of standard objects and/or trainable images alone would help to enhance the AR/MR experience significantly.