Forum Discussion

🚨 This forum is archived and read-only. To submit a forum post, please visit our new Developer Forum. 🚨
MrsVR's avatar
MrsVR
Honored Guest
10 years ago

Voice input

Has anyone tried this? Simple things like "yes" and "no"? I have no idea how I would go about implementing this into my Unity project, but it's an idea I'd like to explore. I want to try to focus on social interactions with avatars.

Any tips? Software, hardware, best practices?

3 Replies

  • I was working on a project that would work with the FSX ATC engine and allow you to voice control as if you were a pilot. It was a basic idea to play with the speech recognition engine. I made a class to contain all the recognition items and it would raise events that are captured in the main form for things like a volume bar that moves as you speak.

    I wrote it in VB.NET in VB Express 2010...keep in mind, it worked, but just barely was functional and had a lot of cleanup to go. This is more the get it working to get a proof of concept stage, but it should get the idea across. It did recognize me when I talked to it and gave it the right commands and it also ignored me when I issued an unknown command.

    SpeechReco class

    Imports System.Speech
    Imports System.Speech.Synthesis
    Imports System.Collections.ObjectModel
    Imports System.IO
    Imports System.Reflection
    Imports System.Xml

    Public Class SpeechReco
    Private RecoEnabled As Boolean = False 'is the engine running or is it off
    Private WithEvents Reco As Recognition.SpeechRecognitionEngine 'recognition engine - does all the hard work
    Private Gram As Recognition.Grammar 'grammar object - holds what the engine is trying to match with the input audio
    Private RecoLoaded As Boolean = False 'is the recognition engine loaded

    'event definitions for passing data back to the parent object
    Public Event AudioUpdated(ByVal nAudioLevel As Integer)
    Public Event AudioStateChanged(ByVal nState As Recognition.AudioState)
    Public Event RecognitionCompleted(ByVal nRecoTxt As String)

    'properties to be available from the parent object
    'is the recognition engine enabled - true/false
    Public ReadOnly Property RecognitionEnabled As Boolean
    Get
    Return RecoEnabled
    End Get
    End Property

    'current audio input volume level
    Public ReadOnly Property RecognitionInput As Integer
    Get
    Return Reco.AudioLevel
    End Get
    End Property

    'current audio input recognition status
    Public ReadOnly Property RecognitionState As Speech.Recognition.AudioState
    Get
    Return Reco.AudioState
    End Get
    End Property

    'updates the recognition engine's timeout
    Public Sub UpdateRecognitionTimeout()
    Reco.EndSilenceTimeout = TimeSpan.FromSeconds(My.Settings.RecognitionDelay / 1000)
    End Sub

    'enables the speech recognition engine to parse what is said
    Public Sub Enable()
    RecoEnabled = True
    Reco.SetInputToDefaultAudioDevice()
    Try
    Reco.RecognizeAsync()
    Catch ex As Exception
    MsgBox(ex.Message)
    End Try
    End Sub

    'stops the recognition engine from listening to the input source
    Public Sub Disable()
    RecoEnabled = False
    Try
    Reco.RecognizeAsyncCancel()
    Catch ex As Exception
    MsgBox(ex.Message)
    End Try
    End Sub

    'create a new instance of the SpeechReco object
    Public Sub New()
    Reco = New Recognition.SpeechRecognitionEngine
    UpdateRecognitionTimeout()
    Reco.UnloadAllGrammars()
    BuildGrammar()
    End Sub

    'build the ACTypes list and output to a XML-based file using the GRXML formatting
    Private Sub MakeBasicACTypes()
    Dim XD As StreamWriter = File.CreateText(Application.StartupPath & "\Data\ACTypes.grxml")
    Dim ACTypes As String() = {"Cessna", "Piper", "Baron", "Aztec", "Seneca", "Arrow", "Navajo", "Boeing", "Airbus"}

    XD.WriteLine("<?xml version=""1.0"" encoding=""utf-8"" ?>")
    XD.WriteLine("<grammar version=""1.0"" xml:lang=""en-US"" mode=""voice"" root=""ACTypes"" tag-format=""semantics/1.0"" xmlns=""http://www.w3.org/2001/06/grammar"">")
    XD.WriteLine(vbTab & "<rule id=""ACTypes"" scope=""public"">")
    XD.WriteLine(vbTab & vbTab & "<tag>out="""";</tag>")
    XD.WriteLine(vbTab & vbTab & "<one-of>")
    For Each ACType As String In ACTypes
    XD.WriteLine(vbTab & vbTab & vbTab & "<item>" & ACType & "<tag>out=""" & ACType & """;</tag></item>")
    Next
    XD.WriteLine(vbTab & vbTab & "</one-of>")
    XD.WriteLine(vbTab & "</rule>")
    XD.WriteLine("</grammar>")
    XD.Close()
    End Sub

    'build the grammar by either creating it from scratch or loading a preexisting grxml file
    Private Sub BuildGrammar()
    Try
    If Not Directory.Exists(Application.StartupPath & "\Data\") Then
    Directory.CreateDirectory(Application.StartupPath & "\Data\")
    End If
    If Not File.Exists(Application.StartupPath & "\Data\ACTypes.grxml") Then
    MakeBasicACTypes()
    End If
    Gram = New Recognition.Grammar(Application.StartupPath & "\Data\ACTypes.grxml")
    Gram.Name = "ACTypes"
    Gram.Enabled = True
    Reco.LoadGrammarAsync(Gram)
    Catch ex As Exception
    MsgBox(ex.Message)
    End Try
    Try
    Dim Asm As Assembly = Assembly.GetExecutingAssembly()
    Dim AsmName As String = Asm.GetName.Name.ToString() & ".ATCRules.grxml"
    Dim SR As Stream = Asm.GetManifestResourceStream(AsmName)
    Gram = New Recognition.Grammar(SR)
    Gram.Name = "ATCComms"
    Gram.Enabled = True
    Reco.LoadGrammarAsync(Gram)
    Catch ex As Exception
    MsgBox(ex.Message)
    End Try
    End Sub

    'when a user's speech results in a recognition event this is called
    Private Sub Reco_Recognized(ByVal sender As System.Object, ByVal Phrase As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles Reco.SpeechRecognized
    Console.WriteLine("Rec: " & Phrase.Result.Semantics.Value)
    RaiseEvent RecognitionCompleted(Phrase.Result.Semantics.Value)
    'Enable()
    End Sub

    'when the recognizer completes the input and is no longer parsing what is spoken this is called
    Private Sub Reco_Completed() Handles Reco.RecognizeCompleted
    'Enable()
    End Sub

    'when the recognitoin engine is guessing what is being said and comparing with all the possibilities this is called
    Private Sub Reco_Hypothesized(sender As System.Object, e As Recognition.SpeechHypothesizedEventArgs) Handles Reco.SpeechHypothesized
    Console.WriteLine("Hyp: " & e.Result.Text)
    End Sub

    'when the grammar files have been loaded this is called
    Private Sub Reco_LoadCompleted(sender As System.Object, e As Recognition.LoadGrammarCompletedEventArgs) Handles Reco.LoadGrammarCompleted
    RecoLoaded = True
    Console.WriteLine("Recognition Engine Load Completed: {0}", e.Grammar.Name)
    End Sub

    'as the audio input changes this is called - passes back out to the parent object via event
    Private Sub Reco_AudioUpdate(sender As Object, e As Recognition.AudioLevelUpdatedEventArgs) Handles Reco.AudioLevelUpdated
    RaiseEvent AudioUpdated(e.AudioLevel)
    End Sub

    'when the audio state changes this is called - passes back out to the parent object via event
    Private Sub Reco_AudioChanged(sender As Object, e As Recognition.AudioStateChangedEventArgs) Handles Reco.AudioStateChanged
    RaiseEvent AudioStateChanged(e.AudioState)
    End Sub
    End Class


    And here is some code from my frmMain.vb that uses the class...


    Public Class frmMain
    ...
    Private WithEvents Speech As SpeechReco
    ...

    Public Sub New()
    ' This call is required by the designer.
    InitializeComponent()

    ' Add any initialization after the InitializeComponent() call.
    ... 'initialization stuff
    End Sub

    'on load, create a new instance of the speechreco object - can be any time you wish just do it before you use it
    Private Sub frmMain_Load(sender As Object, e As System.EventArgs) Handles Me.Load
    ...
    Speech = New SpeechReco()
    ...
    End Sub

    'when the form closes, disable the speech engine to prevent issues and allow other programs to latch on
    Private Sub frmMain_FormClosing(sender As Object, e As System.Windows.Forms.FormClosingEventArgs) Handles Me.FormClosing
    ...
    Speech.Disable()
    ...
    End Sub



    'the speechreco class' AudioUpdated event handler
    'updates a variable I use to determine maximum input volume...if the current value is
    'higher than the progressbar's maximum, update the progressbar's max to match

    'pbSpeechVol is a progressbar with a continuous style that indicates the current input volume heard by the recognition engine
    'the minimum is 0 and maximum is 10 at start of program, but the maximum will change as the user is heard

    'lblSpeechState is a label that is used to output the numeric state of the speech recognition engine
    'this is only so I know what the speech engine is doing and would be eliminated or modified for end users
    Private Sub Speech_AudioUpdated(ByVal nLvl As Integer) Handles Speech.AudioUpdated
    Try
    If nLvl > MaxInput Then
    If nLvl > AbsMax Then
    nLvl = AbsMax
    MaxInput = AbsMax
    pbSpeechVol.Maximum = AbsMax
    Else
    MaxInput = nLvl
    pbSpeechVol.Maximum = MaxInput
    End If
    End If
    pbSpeechVol.Value = nLvl

    lblSpeechState.Text = CState
    Catch ex As Exception
    End Try
    End Sub

    'the speechreco class' AudioStateChanged event handler
    'updates a textbox with the currently parsed command being spoken
    Private Sub Speech_AudioUpdated(ByVal nState As System.Speech.Recognition.AudioState) Handles Speech.AudioStateChanged
    Try
    CState = nState
    lblSpeechState.Text = CState
    Catch ex As Exception
    End Try
    End Sub

    'the speechreco class' RecognitionCompleted event handler
    'passes the recognized user input message to my sub-class for processing the input and deciding what to do with it
    Private Sub Speech_RecogitionCompleted(ByVal nRecoText As String) Handles Speech.RecognitionCompleted
    ATCSys.ParseRecoMessage(nRecoText)
    End Sub

    'used a keyboard hook as a Push to talk key
    'this enables speech to listen to the user
    Private Sub KeyboardHook_KeyDown(sender As Object, e As KeyEventArgs) Handles KbdHk.KeyDown
    ...
    Speech.Enable()
    ...
    End Sub

    'when the user lifts the push to talk key stop listening to the user
    Private Sub KeyboardHook_KeyUp(sender As Object, e As KeyEventArgs) Handles KbdHk.KeyUp
    ...
    Speech.Disable()
    ...
    End Sub
    End Class


    This is a very work-in-progress program and I cut out a lot of it as it would make no sense for you to see what I'm doing with the speech recognition engine itself, but I left the class interaction items to show you how I use it. I don't presume to be a world-renowned programmer, especially when it comes to speech, but it might give you something useful you can use to get started on your own.

    As for integrating into Unity, no idea...this is a stand-alone program that works with the speech recognition engine.

    All of the work is being done in the SpeechReco class. By keeping it in a class and responding to it's events, it makes it easy to attach to a timer or a background worker and still respond to it in a thread-safe manner.
  • Remember that you can use microphone also for various voice inputs that are not speech. You can for example detect if the player is holding breath or screaming (ideal for horror or underwater stuff). I used simple "smooch" sound detection in Pus Pus Platypus for allowing players to shoot kisses in the game by doing smooching sounds into the microphone.