When will we get object and image classification (Computer Vision) for Quest 3 and Quest Pro?
If I wanted to build a Mixed Reality app that can detect when a certain brand logo is visible on a poster, coffee cup coaster, etc... and then allow spatial anchoring relative to that logo there seems to be no way to achieve this today. Compute vision for Quest 3 and Quest Pro developers is limited to a very restricted list of "semantic classification" labels, all room architecture and furniture related objects (ceiling, floor, wall, door fixture, lamp, desk, etc..) full list here: https://developer.oculus.com/documentation/unity/unity-scene-supported-semantic-labels/?fbclid=IwAR3KeVSJCLX977HPLKVDkFM3YqG71p_Blo_eoC7onKkax7wyCafLV0gXTCc
This also prohibits any kind of AR/MR training experience where some physical world object (e.g. a bulldozer operations panel) could be detected and spatial anchors augmented relative to specific control panel features to provide dialogs, etc.. all the things you'd expect from industrial AR applications. But this is not just useful for Enterprise/industrial AR, image and object classification is actually a core AR/MR feature required to build compelling experiences. Without this, we just have novelty use cases
Looking at the competition, I see Byte Dance is solving this but just allowing camera feed access on their Enterprise Pico 4. The retail version they block it. I doubt Meta will provide camera feed access as they are no longer selling Enterprise specific hardware and this would require a special firmware update to enable.
Apple has provided camera access to iOS developers using ARKit for years, but for Vision Pro's ARKit implementation they are restricting camera feed access, however they are still providing image classification/detection and their computer vision/classification models, allowing developers to add their own images for recondition, here's a page from their docs.
https://developer.apple.com/documentation/visionos/tracking-images-in-3d-space
I am really surprised Quest Pro has been out almost a year and this sort or core AR/MR functionality is completely absent. With Quest 3 now released, more attention will be on AR/MR experiences, and Meta has great in house AI technology, they have computer vision models they could build a closed pipeline where the raw image feed is not accessible, but the classifier model is compiled and through a closed system the detection can happen in Unity3D or Unreal apps.
Regardless of how they achieve it, this is so very important to future MR/AR type apps. Without it basically all you can do is very simple spatial anchoring, which may be suitable for novelty games but it's very restrictive and not reflective of the power of MR/AR.