Forum Discussion
dev_pirate_00
3 years agoHonored Guest
using voice SDK / wit.ai as speech-to-text converter
Wit is very good at understanding what both human voice and intent. but sometimes I want to make the player to say exactly what I set in the Utterance section. an example could be an in-game passcode. I don't know how to do it since Wit's response matcher seams to work differently than a simple speech-to-text converter.
TLDR: I want to use wit in unity as a speech-to-text converter.
any ideas?
4 Replies
- KhadappeHonored Guest
I ran into a similar issue and ended up combining Wit.ai with a separate image ocr process for text detection in screenshots. It worked surprisingly well—used voice for one part and image ocr to pick up any text elements that weren’t said out loud. If you’re showing UI or text prompts during conversations, layering both captures more data and helps the backend understand intent better.
- RiverExplorerStart Partner
In addition to the app usage itself. I find it great for saying error message for bugs that are rare and difficult to reproduce. That way, it can help track down a bug pattern.
- RiverExplorerStart Partner
Meta has an voice dictation sample that is included in the voice package. You use 'dictation' and not the other (utterance) features. Their docs are very limited.
In addition (last I used it) the character limit is different then their docs. The actual text limit per packet is less than the docs say (I think it was about 1/3 less) . Before sending the text to Wit, I split it into sentences. Then if the sentences are long, I try split the sentences into parts. I throw the parts into a queue. Then have a coroutine send the parts one at a time.
The harder part is synchronizing what is sent to when it is done after you split them. Most of the time that is not needed. It would be nice if the API gave back a ID when you sent text then told you that ID had finished. Instead it send you the text string that finished. So when you have to know when the entire (split) text is finished speaking, you have to string compare the parts you sent, to the parts that it said finished.
- RiverExplorerStart Partner
In addition, it has several events you can catch. Almost zero docs, and zero docs on some of them. I have been trying to figure out which one gets me the audio WAV (or whatever) sent back. So I can cache ones that are used frequently. They have a TTSCACHE, however it only used by them. It does not tell you what file name it saved your request to. And most of the time it caches nothing at all. It might work, if it had docs to tell you how to use it. The docs it does have are outdated and do not match up with what they ship.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 3 years ago
- 6 years ago
- 2 years ago