History Of Speech Recognition Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

What is Voice Recognition Technology? Voice recognition is the processes of converting an acoustic signal to a set of words that can be understand by computer. The capturing process can be done by using a microphone or even a telephone. This process has evolved into an advance technology that allows user to control the computer by speaking. This technology will have to work closely with speech recognition software in order to perform its function. Speech recognition software/ voice recognition software will capture the words that are spoken by the user intelligently and convert it to the language that is understandable by the computer. Then, the converted words can become the final results display on the screen, a command for an application, data entry and more.

Speech recognition applications include voice user interface such as voice dialling and call routing that built in most of the high-end smart phones nowadays. Voice dialling is referring to a function of a high-end smart phone which will dial to the targeted person that the user mentioned to it. Demotic appliance control and simple data entry also one of the speech recognition applications that used by certain organization. The best example will be Microsoft call centre that uses speech recognition to serve their customer. Furthermore, speech recognition applications cover preparation of structured documents such as a radiology report, speech-to-text processing integrated in word processors and aircraft.

Types of Speech Recognition

Speech recognition can be categorized into several different classes by determining the difference between their ability to recognize words.

Isolated Words

Isolated words recognizer is a lower class of speech recognition system. Isolated word recognizer does not mean that it can only accept a single word. Instead, user can only pronounce a word at a time. It requires the user to pause between words because the system will process word by word. Usually the system will has 2 states which is "Listen" or "Not-Listen". Normally when the system is processing a word, it will change to "Not-Listen" state because it cannot accept another word anymore.

Continuous Speech

Continuous speech recognizer is an enhancement from isolated words recognizer. Continuous speech recognizer is more complex and required longer development cycle because of complex components within the system to recognized continuous speech. User can speak to the computer naturally with the speed of normal conversion.

Spontaneous Speech

Spontaneous speech recognizer is similar to continuous speech recognizer. However, spontaneous speech recognizer is slightly more advance. It accepts a speech that is natural sounding and not rehearsed. Speech recognition usually requires user to pronounce a word in a specific way. However, spontaneous speech recognition do not requires user to do so. It has the ability to handle a variety of pronunciation.


Speaker- dependant speech recognizer requires user to train the system by providing some samples of the speech before the system can recognize the word. This process also known as user enrolment.


Speaker-independent will have a large set of words pre-coded in the system so the user does not need to train the system to understand them.

How speech recognition technology works?

The ability of speech recognition technology to accept and process spoken language, speech-to-text, involves the following steps:

Audio received from the speaker/ microphone will be turned into a waveform which is a mathematical representation of sound.

Capture the sound waves into a speech engine.

Process and convert the sound waves into basic language units that the computer can understand, phonemes.

Construct the words from phonemes.

Analyze the words to avoid wrong interpretation of sound alike words.

Finalize the words and display on the screen or issue the command.

There are two main tools used by the speech recognition engine in order to operate with acceptable level of speed and accuracy. First tool will be a grammar bank. It is the most important tools in a speech engine because a grammar will defines the recognized words. Therefore, a grammar bank must include nearly all the words from the dictionary. Second will be a speaker profile. A speaker profile is playing a very important role in either speaker-independent or speaker-dependent speech recognition system. By having a speaker profile, the speech recognition software can identify and accommodate a user's unique speech patterns and accent. This tool can ensure the accuracy, consistency and reliability of a speech recognition engine.

Different Types of Speech Recognition Software

IBM ViaVoice

Features of speech recognition technology

Today, there are too many speech recognition software in the market. However, there are some common features that owned by almost every speech recognition software in the market.

Commands system

Every speech recognition software has a command system. The command system is used to serve for easier navigation in clicking a button or menus in our computer. User just require to say the words on whatever button he/she would like to press, then the command system will do it for him/her. This feature is exceptionally useful in browsing the web. Speech recognition software can discern the links that the user are saying and click them accurately. There is a fantastic command that is very useful especially for user that is having hard time getting the program to recognize where you want it to click. This command is known as "show numbers" command. This command will show a number box over every possible menu, link and button on your screen. User would just have to say the number to select it. Speech recognition software is so efficient with a proper command system.


Speech recognition software shines better in dictation mode compared to the command system. Dictation or known as a grammar bank is the heart of a speech recognition software. The most impressive part of speech recognition software in dictation mode was the formatting, correcting and punctuation capabilities. The accuracy of recognizing the punctuation and correction commands is the key of the speech engine. User can move the cursor around the screen easily or select a word by saying it. In dictation mode makes the user easier to format the text that he/she had selected too.


Speech recognition software will fix and predict some incorrectly recognized words. If the user pronounces the word which the speech recognition software could not recognize, then it will select the best and possible word from a list of alternatives for the similar word that pronounce by the user. For example, if the user says:"I want to go bag home", speech recognition software will correct the "bag" to "back" automatically.

Interactive Tutorial

There is an interactive tutorial within speech recognition software to boost up end user's learning curve. It can effectively increase the understanding of user on controlling and commanding the speech recognition software. At the same time, the speech recognition software will recognize the user's voice when the interactive tutorial is being carried out.


User can personalize the speech recognition software by adding voice command into it. User can has own personalized command that triggers certain action.


The more frequent a user uses the speech recognition software, the more accurate the speech recognition software. Speech recognition software will adapt to both the user's speaking style and accent.

Benefits of speech recognition technology

Speech recognition technology seems to have exploded throughout multiple industries globally. Some companies in different industries had implemented robust speech recognition system to help them in their business processes. I strongly believed that speech recognition technology can bring precious value to certain industries. The following are the benefits of speech recognition technology.

Cost reduction

Many businesses can benefit from speech technology in reducing support costs. The best example will be routine customer service inquiries. By having speech recognition technology, it can replace the man power needed in customer service centre. Most importantly, customer service centre can work 24/7 by implementing speech recognition system. It could reduce cost and improve customer satisfaction at the same time.

Improves efficiency

Speech recognition technology can reduce the number of live calls. Options will be provided to the consumer who is on the line and allows the consumer to gather any information that he or she needs without accessing the agent. Therefore, it definitely reduces the amount of time an agent needs to remain on a line and more agents will be able to handle calls that could not be resolved with self-service. It increases the number of calls an agent can handle. This shows that the technology has the ability to improve the efficiency and productivity of a company.

Improve accessibility

Speech recognition technology improves computer accessibility for vision impaired people and people who are unable to use a keyboard and mouse. It becomes extremely useful for people who have vision problems to control the computer as if they are using conventional computer input devices. Furthermore, it enables students or patients who are physically handicapped to enter text or control the computer verbally.

Ease of use

The method of accessing and controlling applications with voice command is easy to use. Teachers and lecturer can capture their speech and allow students to download the audio file to do self revision at home. Students who unable to attend to classes can "attend" the class virtually. Furthermore, speech recognition technology has been used in several computer games in teaching kids who are below 5 years old. It shows speech recognition technology is so easy to use that a kid below 5 years old manages to control it.

Current issues for speech recognition technology

Along these years, computer users often have a bad impression on speech recognition technology because of some bad user experience. Users are normally frustrated by the slow response of speech recognition technology and of course the inaccurate recognition as well. In fact, these issues are closely related to some issues around us.


Noise could come from anywhere around us, a dog barking, a radio playing somewhere down the streets, a car passing by, another human speaking in the background and more. There is another kind of noise which is echo effect. This noise is produced naturally when the human speaks. Reason is the sound wave might bounce on some object around the speaker. Then, the sound wave will bounce back to the microphone a few milliseconds later. This is usually unwanted information by the speech engine that will leads to inaccurate recognition.

Processor speed

A speech engine would have to depend closely on the processor speed in its converting process. A faster processor leads to a faster capturing and converting of a speech engine. On the contrary, a slower processor will have some delay in converting the sound wave to the computer language. That is reason user will experience slow response of speech recognition technology.

Speed of speech

Human speak naturally without pauses between the words and normally the pause appear only after a phrase or a sentence. Furthermore, different human speaks with different speed at different time. When a human is tired, he/she will tend to speak slower and softer. On the other hand, if a human is angry or stress, he/she will speak louder and faster. This introduces a tough problem for speech recognition.


Homophones are referring to the words that sounds exactly the same, but with different orthography and meaning. For example, the tale of a dog and the tail of a dog. Both words have exactly the same pronunciation. Even human sometimes find it difficult to distinguish between homophones. Therefore, a speech engine will face the same difficulty and came out with an inaccurate result.

Quality of microphone

The quality of a microphone will affect the quality of audio produced. At the same time, the audio quality will have a significant effect on the speech engine recognition process. A bad quality of audio waves might bring the wrong message or some unreadable waves to the speech engine that will leads to inaccurate recognition.