Showing results for 
Search instead for 
Did you mean: 

Voice SDK - General Questions Thread

Retired Support

Do you have any general questions regarding the framework within the Voice SDK? Drop your questions, concerns, and feedback here!


Level 6

If a word I am saying appears incorrectly recognised (e.g. I said 'Close' but the text in the utterance shows as 'Lows') the guidance seems to imply that I should correct the text then validate. Is that correct? But if it consistently becomes misinterpreted, is it ok to add the incorrect word to the list of keywords for the entity?

Level 6

How can you add context to the interpretation of utterances? I'm specifically thinking of ones that might be using the same words but have different meanings (or 'intents').

For example "Let's play ball" - assigned to the scene_load intent, with 'ball' as the entity which has the Role of scene_name and resolves to 'basketball' (the Unity scene name)
However, in the same app, if I want to detect "Pass me the ball" but mapped to another response (e.g. npc_command) with 'ball' this time being an entity with the Role of object_name, or even "Let's play ball" uttered in this new scene but this time needing to trigger a different intent (perhaps npc_command with the entity resolving to approach_player)

Hope that makes sense! Any thoughts on how best to approach this?

Level 6

Some of my older Wit.Ai applications have 'live utterances' i.e. audio recordings in the Understanding section that need that need to be listened to and confirmed before they can be trained.
I've a few questions about this:
1. Why does my current app not do this? Is it because it is set to public whereas the others are private? If mnot, why not? (has it been disabled recently? Is there a setting that needs to be set?)
2. What does making an app public actually mean?

3. Are there not some serious privacy implications at being able to replay the actual audio from a user's run through your app? In my initial tests when I was still learning about how to properly sign post that the mic was activated, I was getting recordings of people talking in their home without them really knowing this was happening. Seems to me that there should be some very clear guidance to developers that they need to make this explicit to players. The generic "Do you allow this app to have access to your Mic" app manifest warning doesn't quite seem to cover this.

Level 2

Not sure which forum to post this to. I already reached out here: so if there is any issue please disregard this double post.


We are currently developing a VR simulation for Emergency Services to train users in the language and vocabulary used by personnel responding to active shooter situations & other mass casualty events. Our application relies heavily on speech recognition, and we have implemented Oculus's Voice SDK, which uses, and it works perfectly for our needs.


Unfortunately, due to the nature of the delivery of the training program; we cannot guarantee internet access at the locations of the training, and we are looking to implement an offline solution, which to our knowledge is not supported.


We are looking to inquire:
1.) Is there possibly an offline-supported package for Oculus's Voice SDK using that may be available?
2.) We use a mobile server to network our training; is there a way we can host the required software on our mobile server?


Please let us know if either of these are possible or if another solution to support this speech-to-text software offline for the purposes of our Emergency Services simulation.

Level 2

im having issues with missing parts of my transcripts. If the user talks slowly or at a hint of hesitancy between words the transcription is cut at the point of slightest pause. This was the expected behavior quoting reliable sources here. "After activate is called it will listen indefinitely until the minimum volume threshold is hit. Once the minimum volume threshold is hit it will send up to 20 seconds of audio to Wit for processing."Ive tuned the WIT voice config back and forth with no luck. This happens for me even if i have the volume threshold at zero. The debug wave file seems fine but the transcript from WIT ends to early if any slightest pauses in the speech.