What is Automated Speech Recognition?

Automated speech recognition (ASR) is a technology designed to allow spoken language to be digitized, stored and manipulated. It primarily is used in word processing and translation, but also is included in programs designed for accent reduction and speech therapy. It also has limited applications in the field of security for voice identification purposes.

Automated speech recognition had its start in the 1950s, and the research was jointly funded by the defense and intelligence communities. The technology required to make it useful didn’t exist at the time, and the initial work didn’t prove to be fruitful. As technology advanced, the development of automated speech recognition shifted to non-military uses such as providing computer accessibility for the disabled and verbal word processing.

The most basic type of automated speech recognition is discrete input. This is a simple method in which each word and phrase is pronounced with a pause in between them. It makes the user speak in a stilted fashion, emphasizing the enunciation of each individual word. Suited to slower processors and less advanced programs, this method is highly accurate but is very slow in its implementation.

The development of faster computers and more complex programming led to the continuous speaking system, which permits the speaker to talk in a normal method with full sentences and a regular cadence. This speaker-dependent program learns how the user speaks as an individual and then patterns its predictions for word choice based on the actual speaker. This knowledge makes the program very accurate, but only for the individual whom it has learned to understand.

There also are independent and adaptive technologies that work with any user. These programs incorporate complex subroutines with predictive characteristics that analyze phonemes against a large database and then produce the text. This type of program adapts itself when faulty input is user-corrected, and it then makes the appropriate inference the next time that it encounters that word. This method is not as accurate as the user-dependent system, because of the disparate nature of speaking between users. Most modern software includes dependent, independent and adaptive technology, and it boasts a recognition rate of more than 90 percent.

Automated speech recognition technology is encountered every day. Banks and other businesses use it in telephone communications, allowing the customer to state questions and go through the various menu options. Court reporters using voice silencers are able to eliminate background noise in the courtroom and provide a perfect transcript of legal proceedings. Finally, in a return to its original purpose, military units have used automated speech recognition in two-way phraselators that permit instant translation on the battlefield.