Speech recognition is a complex mathematical and statistical process. It enables users to interact with computers or devices by using their voice.
When developing a speech recognition system, the following issues are considered. The first is the basic pronunciation unit, such as a phoneme set or syllable set. Next is the vocabulary size (from small vocabulary size to large vocabulary size) to be recognized. The last is applying a statistical language model to allow only possible valid word sequences among large searching paths. After that, grammar parsing and semantic analysis modules help to convert speech into correct text. To further understand the user's intention, dialogue management modules do the discourse analysis and respond with a final output sentence. A TTS (text-to-speech) module is then used to convert the text sentence into playback voice. Rule-based and corpus-based are two commonly used methods for a TTS system. The latter gives better sound quality.
The response phrases in Delta's telephony applications are prerecorded sound files that provide a natural and fluent speech quality. |