Forum Discussion
MrsVR
10 years agoHonored Guest
Voice input
Has anyone tried this? Simple things like "yes" and "no"? I have no idea how I would go about implementing this into my Unity project, but it's an idea I'd like to explore. I want to try to focus on social interactions with avatars.
Any tips? Software, hardware, best practices?
Any tips? Software, hardware, best practices?
3 Replies
- saviorntProtegeFor unity, this should be pretty straight forward.. should be.
The namespace is speech.recognition
See the MSDN: https://msdn.microsoft.com/en-us/librar ... ition.aspx
After a bit of searching, AT&T released a Unity simple* API for it, which can be found here: http://developerboards.att.lithium.com/ ... ba-p/35570 - aparsons671Honored GuestI was working on a project that would work with the FSX ATC engine and allow you to voice control as if you were a pilot. It was a basic idea to play with the speech recognition engine. I made a class to contain all the recognition items and it would raise events that are captured in the main form for things like a volume bar that moves as you speak.
I wrote it in VB.NET in VB Express 2010...keep in mind, it worked, but just barely was functional and had a lot of cleanup to go. This is more the get it working to get a proof of concept stage, but it should get the idea across. It did recognize me when I talked to it and gave it the right commands and it also ignored me when I issued an unknown command.
SpeechReco class
Imports System.Speech
Imports System.Speech.Synthesis
Imports System.Collections.ObjectModel
Imports System.IO
Imports System.Reflection
Imports System.Xml
Public Class SpeechReco
Private RecoEnabled As Boolean = False 'is the engine running or is it off
Private WithEvents Reco As Recognition.SpeechRecognitionEngine 'recognition engine - does all the hard work
Private Gram As Recognition.Grammar 'grammar object - holds what the engine is trying to match with the input audio
Private RecoLoaded As Boolean = False 'is the recognition engine loaded
'event definitions for passing data back to the parent object
Public Event AudioUpdated(ByVal nAudioLevel As Integer)
Public Event AudioStateChanged(ByVal nState As Recognition.AudioState)
Public Event RecognitionCompleted(ByVal nRecoTxt As String)
'properties to be available from the parent object
'is the recognition engine enabled - true/false
Public ReadOnly Property RecognitionEnabled As Boolean
Get
Return RecoEnabled
End Get
End Property
'current audio input volume level
Public ReadOnly Property RecognitionInput As Integer
Get
Return Reco.AudioLevel
End Get
End Property
'current audio input recognition status
Public ReadOnly Property RecognitionState As Speech.Recognition.AudioState
Get
Return Reco.AudioState
End Get
End Property
'updates the recognition engine's timeout
Public Sub UpdateRecognitionTimeout()
Reco.EndSilenceTimeout = TimeSpan.FromSeconds(My.Settings.RecognitionDelay / 1000)
End Sub
'enables the speech recognition engine to parse what is said
Public Sub Enable()
RecoEnabled = True
Reco.SetInputToDefaultAudioDevice()
Try
Reco.RecognizeAsync()
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
'stops the recognition engine from listening to the input source
Public Sub Disable()
RecoEnabled = False
Try
Reco.RecognizeAsyncCancel()
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
'create a new instance of the SpeechReco object
Public Sub New()
Reco = New Recognition.SpeechRecognitionEngine
UpdateRecognitionTimeout()
Reco.UnloadAllGrammars()
BuildGrammar()
End Sub
'build the ACTypes list and output to a XML-based file using the GRXML formatting
Private Sub MakeBasicACTypes()
Dim XD As StreamWriter = File.CreateText(Application.StartupPath & "\Data\ACTypes.grxml")
Dim ACTypes As String() = {"Cessna", "Piper", "Baron", "Aztec", "Seneca", "Arrow", "Navajo", "Boeing", "Airbus"}
XD.WriteLine("<?xml version=""1.0"" encoding=""utf-8"" ?>")
XD.WriteLine("<grammar version=""1.0"" xml:lang=""en-US"" mode=""voice"" root=""ACTypes"" tag-format=""semantics/1.0"" xmlns=""http://www.w3.org/2001/06/grammar"">")
XD.WriteLine(vbTab & "<rule id=""ACTypes"" scope=""public"">")
XD.WriteLine(vbTab & vbTab & "<tag>out="""";</tag>")
XD.WriteLine(vbTab & vbTab & "<one-of>")
For Each ACType As String In ACTypes
XD.WriteLine(vbTab & vbTab & vbTab & "<item>" & ACType & "<tag>out=""" & ACType & """;</tag></item>")
Next
XD.WriteLine(vbTab & vbTab & "</one-of>")
XD.WriteLine(vbTab & "</rule>")
XD.WriteLine("</grammar>")
XD.Close()
End Sub
'build the grammar by either creating it from scratch or loading a preexisting grxml file
Private Sub BuildGrammar()
Try
If Not Directory.Exists(Application.StartupPath & "\Data\") Then
Directory.CreateDirectory(Application.StartupPath & "\Data\")
End If
If Not File.Exists(Application.StartupPath & "\Data\ACTypes.grxml") Then
MakeBasicACTypes()
End If
Gram = New Recognition.Grammar(Application.StartupPath & "\Data\ACTypes.grxml")
Gram.Name = "ACTypes"
Gram.Enabled = True
Reco.LoadGrammarAsync(Gram)
Catch ex As Exception
MsgBox(ex.Message)
End Try
Try
Dim Asm As Assembly = Assembly.GetExecutingAssembly()
Dim AsmName As String = Asm.GetName.Name.ToString() & ".ATCRules.grxml"
Dim SR As Stream = Asm.GetManifestResourceStream(AsmName)
Gram = New Recognition.Grammar(SR)
Gram.Name = "ATCComms"
Gram.Enabled = True
Reco.LoadGrammarAsync(Gram)
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
'when a user's speech results in a recognition event this is called
Private Sub Reco_Recognized(ByVal sender As System.Object, ByVal Phrase As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles Reco.SpeechRecognized
Console.WriteLine("Rec: " & Phrase.Result.Semantics.Value)
RaiseEvent RecognitionCompleted(Phrase.Result.Semantics.Value)
'Enable()
End Sub
'when the recognizer completes the input and is no longer parsing what is spoken this is called
Private Sub Reco_Completed() Handles Reco.RecognizeCompleted
'Enable()
End Sub
'when the recognitoin engine is guessing what is being said and comparing with all the possibilities this is called
Private Sub Reco_Hypothesized(sender As System.Object, e As Recognition.SpeechHypothesizedEventArgs) Handles Reco.SpeechHypothesized
Console.WriteLine("Hyp: " & e.Result.Text)
End Sub
'when the grammar files have been loaded this is called
Private Sub Reco_LoadCompleted(sender As System.Object, e As Recognition.LoadGrammarCompletedEventArgs) Handles Reco.LoadGrammarCompleted
RecoLoaded = True
Console.WriteLine("Recognition Engine Load Completed: {0}", e.Grammar.Name)
End Sub
'as the audio input changes this is called - passes back out to the parent object via event
Private Sub Reco_AudioUpdate(sender As Object, e As Recognition.AudioLevelUpdatedEventArgs) Handles Reco.AudioLevelUpdated
RaiseEvent AudioUpdated(e.AudioLevel)
End Sub
'when the audio state changes this is called - passes back out to the parent object via event
Private Sub Reco_AudioChanged(sender As Object, e As Recognition.AudioStateChangedEventArgs) Handles Reco.AudioStateChanged
RaiseEvent AudioStateChanged(e.AudioState)
End Sub
End Class
And here is some code from my frmMain.vb that uses the class...
Public Class frmMain
...
Private WithEvents Speech As SpeechReco
...
Public Sub New()
' This call is required by the designer.
InitializeComponent()
' Add any initialization after the InitializeComponent() call.
... 'initialization stuff
End Sub
'on load, create a new instance of the speechreco object - can be any time you wish just do it before you use it
Private Sub frmMain_Load(sender As Object, e As System.EventArgs) Handles Me.Load
...
Speech = New SpeechReco()
...
End Sub
'when the form closes, disable the speech engine to prevent issues and allow other programs to latch on
Private Sub frmMain_FormClosing(sender As Object, e As System.Windows.Forms.FormClosingEventArgs) Handles Me.FormClosing
...
Speech.Disable()
...
End Sub
'the speechreco class' AudioUpdated event handler
'updates a variable I use to determine maximum input volume...if the current value is
'higher than the progressbar's maximum, update the progressbar's max to match
'pbSpeechVol is a progressbar with a continuous style that indicates the current input volume heard by the recognition engine
'the minimum is 0 and maximum is 10 at start of program, but the maximum will change as the user is heard
'lblSpeechState is a label that is used to output the numeric state of the speech recognition engine
'this is only so I know what the speech engine is doing and would be eliminated or modified for end users
Private Sub Speech_AudioUpdated(ByVal nLvl As Integer) Handles Speech.AudioUpdated
Try
If nLvl > MaxInput Then
If nLvl > AbsMax Then
nLvl = AbsMax
MaxInput = AbsMax
pbSpeechVol.Maximum = AbsMax
Else
MaxInput = nLvl
pbSpeechVol.Maximum = MaxInput
End If
End If
pbSpeechVol.Value = nLvl
lblSpeechState.Text = CState
Catch ex As Exception
End Try
End Sub
'the speechreco class' AudioStateChanged event handler
'updates a textbox with the currently parsed command being spoken
Private Sub Speech_AudioUpdated(ByVal nState As System.Speech.Recognition.AudioState) Handles Speech.AudioStateChanged
Try
CState = nState
lblSpeechState.Text = CState
Catch ex As Exception
End Try
End Sub
'the speechreco class' RecognitionCompleted event handler
'passes the recognized user input message to my sub-class for processing the input and deciding what to do with it
Private Sub Speech_RecogitionCompleted(ByVal nRecoText As String) Handles Speech.RecognitionCompleted
ATCSys.ParseRecoMessage(nRecoText)
End Sub
'used a keyboard hook as a Push to talk key
'this enables speech to listen to the user
Private Sub KeyboardHook_KeyDown(sender As Object, e As KeyEventArgs) Handles KbdHk.KeyDown
...
Speech.Enable()
...
End Sub
'when the user lifts the push to talk key stop listening to the user
Private Sub KeyboardHook_KeyUp(sender As Object, e As KeyEventArgs) Handles KbdHk.KeyUp
...
Speech.Disable()
...
End Sub
End Class
This is a very work-in-progress program and I cut out a lot of it as it would make no sense for you to see what I'm doing with the speech recognition engine itself, but I left the class interaction items to show you how I use it. I don't presume to be a world-renowned programmer, especially when it comes to speech, but it might give you something useful you can use to get started on your own.
As for integrating into Unity, no idea...this is a stand-alone program that works with the speech recognition engine.
All of the work is being done in the SpeechReco class. By keeping it in a class and responding to it's events, it makes it easy to attach to a timer or a background worker and still respond to it in a thread-safe manner. - ZhamulExplorerRemember that you can use microphone also for various voice inputs that are not speech. You can for example detect if the player is holding breath or screaming (ideal for horror or underwater stuff). I used simple "smooch" sound detection in Pus Pus Platypus for allowing players to shoot kisses in the game by doing smooching sounds into the microphone.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 7 years ago
- 2 months ago
- 1 year ago
- 2 years ago