Showing results for 
Search instead for 
Did you mean: 

When will we get object and image classification (Computer Vision) for Quest 3 and Quest Pro?


If I wanted to build a Mixed Reality app that can detect when a certain brand logo is visible on a poster, coffee cup coaster, etc... and then allow spatial anchoring relative to that logo there seems to be no way to achieve this today. Compute vision for Quest 3 and Quest Pro developers is limited to a very restricted list of "semantic classification" labels, all room architecture and furniture related objects (ceiling, floor, wall, door fixture, lamp, desk, etc..) full list here:

This also prohibits any kind of AR/MR training experience where some physical world object (e.g. a bulldozer operations panel) could be detected and spatial anchors augmented relative to specific control panel features to provide dialogs, etc.. all the things you'd expect from industrial AR applications. But this is not just useful for Enterprise/industrial AR, image and object classification is actually a core AR/MR feature required to build compelling experiences. Without this, we just have novelty use cases

Looking at the competition, I see Byte Dance is solving this but just allowing camera feed access on their Enterprise Pico 4. The retail version they block it. I doubt Meta will provide camera feed access as they are no longer selling Enterprise specific hardware and this would require a special firmware update to enable. 

Apple has provided camera access to iOS developers using ARKit for years, but for Vision Pro's ARKit implementation they are restricting camera feed access, however they are still providing image classification/detection and their computer vision/classification models, allowing developers to add their own images for recondition, here's a page from their docs.

I am really surprised Quest Pro has been out almost a year and this sort or core AR/MR functionality is completely absent. With Quest 3 now released, more attention will be on AR/MR experiences, and Meta has great in house AI technology, they have computer vision models they could build a closed pipeline where the raw image feed is not accessible, but the classifier model is compiled and through a closed system the detection can happen in Unity3D or Unreal apps. 

Regardless of how they achieve it, this is so very important to future MR/AR type apps. Without it basically all you can do is very simple spatial anchoring, which may be suitable for novelty games but it's very restrictive and not reflective of the power of MR/AR. 



And in this situation, if we want to recognize a basketball in Vision Pro, we need to add maybe thousands of pictures from different angles of basketball to asset folder?

You would have to experiment and see, I've not worked first hand with Apple's Computer Vision approach on AVP, just noted it was possible in the developer docs.

OK, I may can have a try, once I get results, I will inform you!

Thank you! And I think we can have more discussion, I have learned a lot from you!


I don't know if this is relevant, but Apple bought many years ago a company called Metaio, which had an AR SDK at the time that competed against Vuforia and ARToolkit.

Metaio allowed for image (fiducial marker) and 3D object detection and tracking (moving 3D objects not just static placement). 

Here is a video as an example from about 10 years ago, there are probably many more videos online.


I can only imagine that Apple may integrate Metaio's old software into Apple Vision Pro someday (hopefully improved after all of these years), and I hope this pushes Meta to integrate it into the Quest as well, least into the Quest Pro, to allow for better tools for developers to develop AR applications.

Thank you for your update, I believe apple and Meta will allow us to do Object detection in the near future.

Keep in mind, neither company will grant access to the raw camera feed from official store apps. This means industrial grade AR toolkits like Vuforia will never be supported without some deep partnership where these companies make exclusive agreements to allow.  Apple does this by controlling the pipeline and keeping it a black box to the developer. You can't touch the actual image feed coming in, through Apple's libraries it just gets funneled to the CV model for classification. 

Meta could do the same, but they have to build a CV model, build the entire pipeline. Without this Mixed Reality is really just a gimmick. You could be very simple experiences, 3D table top games, things like that. But nothing beyond toys. 


I do really learn something new from you. I am not familiar with this. Thank you!


Here is an interesting post on the need to open VR camera access to developers and possible ways companies could provide this while still focusing on privacy. In the post it is mentioned that Pico 4 Enterprise headset provides camera access if requested, which is an interesting solution that Meta could use with their headsets -

For anyone interested in this feature,... I am looking at you, Meta developers 😉, see here:

Anyway, I would love to see real object detection for Meta OS, too. And I totally agree, Meta is leading in AI. Why not implement something e.g. CNN based as YOLO?


OK,... Apple did it with VisionOS. An Enterprise API with camera access has been released:

The list of features is wild,... video capture, neural nets for machine learning tasks and object detection with parameter adjustment. As implied by the API name the features are for business applications only, but the first step has been taken! 🙂