What Is the Connection between Speech Synthesis and Recognition?

Speech synthesis and recognition are the two sides of computerized speech analysis. Speech synthesis is the creation of human speech by a computer; for instance, a computer reading written text. Speech recognition is the creation of computer information from spoken words, such as dictating a paper to a computer. While the two processes are not directly related, speech synthesis and recognition both rely on a computer’s ability to understand human speech and inflection. One method is output and the other is input.

The processes used by speech synthesis and recognition are very similar, even if the end product is different. The process consists of two parts, one part with human interaction and one without. The human portion is when human words enter the program; the non-human part is when the program interoperates the input.

A speech synthesis program will take in human input in the form or typed or written human language. The program will read the language and determine what each word is, using sentence placement and punctuation to determine inflection. When a word could be pronounced multiple ways, such as in the case of ‘live,’ the program will look for nearby words and context clues to determine which word is actually being used. The words will then go to the second part of the program, where they are spoken aloud.

In a speech recognition program, the process is opposite. The input comes from a human speaker saying words into a computer. The computer will listen to each word and compare the pattern generated by the speaker’s voice to a library of possible sounds and words. It then makes a determination of the most likely word and sends it to the second part of the system. This portion actually prints the words out on the screen, similar to how the synthesis program says the words.

Since every speaker sounds slightly different, speech synthesis and recognition programs often have a wide margin of error. One of the ways people combat these errors is through individualized speech profiles. A single speaker will have his speech analyzed by the program to find his specific vocal patterns. When he finds errors in the computer translation, he can specifically correct them. The corrections are analyzed and stored by the program so when the troublesome word comes up again, the program will translate it correctly.

There is a wide application for speech synthesis and recognition programs. In the medical field, these programs allow people to communicate who otherwise might not be unable to. These programs have a wide application in business as a faster means of translating reports and documents. Speech recognition is also a common method of setting up hands-free devices in automobiles, allowing people to talk on the phone more safely while driving.