Deaf Education Using Speech Recognition Technology

Abstract- This paper deals with the improvement in the education system to the deaf students using the technology "Speech Recognition". The current education system to the deaf students is mainly based on the sign language and lip reading. Due to the shortage of teachers knowing sign language many deaf schools are closed. In this paper we propose a system which teaches the deaf students even if the teacher is not aware of the sign language. The proposed system captures the speech given by the teacher and recognizes it, and then the corresponding lip movement, sign language and the word is displayed on the screen. The main advantage in this proposed system is the page refreshes automatically every fraction of seconds to display the sign language, lip movement and the word according to the speech of the teacher continuously.

The deaf students acquire knowledge and exchange with others mainly through the sign language, lip reading and writing. So if a normal person wants to communicate with the deaf person he has to communicate with writing only. But the deaf students can't be lectured through writing alone.

Existing System

The teacher should have the sign language proficiency; only then the teacher can teach the deaf students. The deaf education is provided only by the teacher who knows sign language .Hence learning sign language has become a compulsion to provide deaf education. The teachers can learn the sign language only to some extent i.e. The sign language for the fundamentals and some basic concepts. Learning beyond this is a complicated one for most of the people .So the teachers who know sign language are less. This is one of the major constraints in providing deaf education in the current deaf education system.

Proposed System

In this paper, we have introduced a new approach of using speech recognition for educating the deaf students so that a normal teacher without the knowledge of sign language can teach the deaf students.

Speech Recognition

Speech Recognition is a technology that converts the voice signal into the corresponding text or commands through the process of identifying and understanding. Its ultimate goal is to achieve human and machine communication in natural language. It is a multileveled pattern recognition task, in which acoustical signals are examined and structured into a hierarchy of sub word units (e.g., phonemes), words, phrases and sentences. Each level may provide additional temporal constraints e.g., known word pronunciation or legal word sequences , which can compensate for errors or uncertainties at lower levels .This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels and making discrete decisions only at the highest level.

Raw Speech

Speech is sampled at a high frequency e.g. 16 KHz in a microphone or 8 KHz in a telephone. This provides a sequence of amplitude values over time.

Signal analysis

Initially raw speech should be transformed and compressed to simplify the further processing. The main purpose of signal analysis is to extract the desired features and compress the data without losing any important information.

Speech Frames

The signal analysis yields a sequence of speech frames, at 10 millisecond intervals with about 16 coefficients per frame. These frames provide explicit information about speech dynamics and are also used for acoustic analysis.

Acoustic models

The acoustic models vary in their representation, granularity, context dependence and other properties .The speech frames are analysed using the acoustic models.


(1) Sign language is non-uniformity. Sign language includes

nature sign language

grammar sign language

However, nature sign language comes from the habits of the region, as region different it is different. So it is very difficult to keep unified for the faculty.

(2) The order of word, sign language sometimes can not be fully in accordance with the order of the language, which affects the language rehabilitation and understanding for deaf students

(3) Due to the irreproducibility of the sign language, faculty need certain time to understand and digest the knowledge in class. What the teacher teaches using the sign language is Transitory, which causes certain difficulty for deaf students

(4) The vocabulary of the sign language known by the faculty are few, especially professional, abstract as well as the emotion glossary is very difficult to express using the sign language, which creates certain barriers for their exchange .

(5) Due to the polysemy of the sign language, faculty needs to guess according to scene because one kind of hand gesture possibly had many kinds of words and expressions to match, if the communicator is distracted, it is very difficult for deaf students to understand the precise meaning of the hand gesture

(6) Although the teacher uses lip reading and writing to help the sign language to carry on the teaching and exchange Sometimes, the request of lip reading to the environment is high ,moreover it is not suitable for group teaching, the amount of information through writing is few, And if the teacher is doing something or his hand is busy, it is difficult to use writing to teach.

For so many disadvantages, they may affect the teaching effectiveness. Because of the hearing-impaired, the eyes become the most active and most important sense organs for the deaf, they see the world with their eyes basically.

Practice has shown that the written language is only the clearest, most accurate and easy way for the deaf and hard hearing people to accept commonly. We have been exploring how to use the feature with normal vision to enable deaf students to greatest enjoy the classroom knowledge.

Into the 21st century, with the development and mature of computer technology, the speech recognition technology which converts voice to text has gradually become mature. In this project we have introduced speech recognition technology to support education for deaf students.


Deaf education using speech recognition is the simplest way to teach the deaf students. Words, corresponding gesture action image and lip movements images are stored in the database.

When the faculty pronounces any word, first it will check the word in the database. Once the word is identified, the corresponding gesture action, lip movements will be display in the webpage. After the fraction of seconds the word and corresponding images for that word will be deleted from the webpage.

The webpage refresh automatically for every fraction of seconds. When the faculty pronounces any words, then the word and corresponding images will be display for fraction of second, after that it will delete from the webpage. When the faculty take classes continuously means, the webpage will get refresh automatically and display the corresponding gesture actions images, lip movements images and words in the webpage.

By this method, it become easier to teach the deaf students without any knowledge in the sign language and also it is possible to teach the programming languages in an easy way. This model is more efficient when comparing to the current system.


A microphone is a speech input device which inputs speech converts audio signals to electrical waves and these can be converted by electronic circuitry in the computer to digital form. This device is used here to capture the speech of the teacher.

Recognizing the speech

The two primary components of speech recognition are acoustic model and language model. The first piece, called the acoustic model, analyses the sounds of your voice and converts them to phonemes, the basic elements of speech. The English language contains approximately 50 phonemes.

Here's how it breaks down your voice: First, the acoustic model removes noise and unneeded information such as changes in volume. Then, using mathematical calculations, it reduces the data to a spectrum of frequencies (the pitches of the sounds), analyses the data, and converts the words into digital representations of phonemes.

Now the second major component of speech recognition software, the language model. The language model analyses the content of your speech. It compares the combinations of phonemes to the words in its digital dictionary, a huge database of the most common words in the English language. Most of today's packages come with dictionaries containing about 150,000 words. The language model quickly decides which words you said and displays them on the screen.

Application Server

The Application server contains all the middleware languages. The languages used are JSP (Java Servlet Pages), HTML (Hyper Text Mark-up Language) and Voice xml.

JSP is used to carry the recognized word and to check with the database. The corresponding lip movement, sign language retrieved from the database is given to the JSP which is used to display in the output screen.

Voice xml is used to give the speech output to the teacher in case of any errors or any message to convey. HTML is used to design the page. The Meta tag of HTML is used to refresh the page automatically for the desired fraction of seconds.


Database is a collection of related information. In our paper we use relational database management system .There are two tables , one has all the words , sign language and the lip movement with word as the primary key , another one is the desired word table which has the requested word and its corresponding sign language and lip movement . Whenever there is a request words are retrieved from the word table and given to the desired word table from where the JSP takes the needed information and whenever the page gets refreshed data in the desired word table gets truncated.


Support for Deaf Students

The proposed system provides the following support to the deaf students

Immediately adds the word subtitles which are displayed in written form on the screen and records real-time during the speeches or lectures.

These deepen deaf students' understanding of the knowledge and help theirs language rehabilitation.

Theirs organization of Language are more standardized and writing are more in line with the written language requirements.

Supports PowerPoint, audio and video formats. the teacher uses the advantages of multimedia to create rich and varied teaching and learning environment .

By converting silent knowledge into tangible and colourful form in the classroom, the novel and lively teaching atmosphere transform s the traditional pure knowledge presentation into teaching more fit for understanding that may give the students a larger space of thinking to support they understand .

Thus arouses the students' learning enthusiasm in the very great degree, these can help compensate their weakness such as slow reaction the narrow range of sensory, the attention is not easy to focus on. Thus enhances their learning quality in class.

Support for Teachers

The proposed system reduces the difficulty involved in learning sign language for deaf students greatly. Even the lectures can be recorded using Voice xml and they can use it for improving their teaching ability and also can be used to revise the lectures.

future work

In this paper we have proposed a system of recognizing the lecture of the teacher and displaying the corresponding sign language and lip movement. As a future work these lectures can be saved in the database and if the teacher just say the topic name, it automatically teaches the deaf students without the need of a teacher in front of the deaf students.