This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
With the development of new technology and the popularity of Computer application, the capability of the Human-computer interaction is becoming an important factor which can decide the success or the failure of the Computer system, the technology of the Human-computer is the key point for the new generation of computer. Nowadays, great changes have taken place in our modern lives, people enjoy their lives better. In this paper, we discuss the new application in Speech recognition and predicting the speech with automatic caption and automatic timing for impaired disability.
Human-computer interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them [Hewett et al. 1992].
Since interactive computer technology has been widely used in the sixties last century, an increasing number of people are interested and researched the Human-Computer Interaction. In the early seventies, people began to explore the perspective of the basic principles and theories of the human-computer interaction design in the cognition aspect. The study also mainly be focused in the exploratory stage of theory in that time. And there was a turning point in the human-computer interaction research in the eighties. An increasing number of researchers recognized that the capability of the Human-computer interaction was becoming an important factor which can decide whether the Computer system is a success or a failure. the technology of the Human-computer is the key point to the new generation of advanced computer technologies.
In recent years, some findings in the research of human-computer interaction have begun to become practical, there has been a number of practical value about human-machine interface. Many countries set up special institutes to launch the human-computer interface research. Human-computer interaction has became an independent branch of the discipline.
The purpose of this study in human-computer interaction is to make the computer more suited to meet needs of users and be accepted to improve the interaction between human and the computer, and make the computer system closer to the human system. The main characteristic of the computer system will no longer the storage capacity and processing speed, but the good and bad of the human-computer interaction.
Research in human-computer interaction relies on the progress of many subjects, mainly related to artificial intelligence, cognitive psychology, computer hardware and software disciplines, from a psychological point of view, we should understand whether the basic principles of human brain functional activities match the computer theories, and which cognitive characteristics should be considered in computer design. From the engineering point of view, mainly focus on the research in some kinds of intelligent sensor, the signal recognizer, knowledge processing system, even the new computer architecture research, for example, the neural computer. In this paper, we introduce the comprehensive advanced technology for speech recognition in the Human-computer Interaction.
It is also known as Automatic speech recognition or as computer speech recognition which converts the words into text. The speech recognition is also sometimes used as speech recognition in which the recognition system is trained for particular speaker as we have in the most desktop recognition software, in which there is an element which recognize the person speaking, which is used for better recognition for a particular person to tone the system to understand for particular person pronunciation to increase the efficiency.
Speech Recognition is widely used term which means to recognize almost anybody speech for example in call-center, system is designed to recognize almost anybody speech. Utterance is defined as the vocalization in which the word or words represents the same meaning to the computer, which can be single word, sentence or multiple sentence.
Types of Speech Recognition
Speech recognition system can be classified on the basis of what type of utterances they have the ability to recognize, most of the speech recognition system can fit into more than one class depending on which mode they are using.
Isolate word usually have to be quiet on both sides of the sample window for each utterance which doesn't mean that it can only accept only single word but it can accept a single utterance at a time, these system have listen and non listen states which is used for the speaker to wait between the utterances and this pause is used for the processing of the input.
Connected words are similar to Isolated words the only difference is it allow separate utterances to run together with the minimum pause between them
Continues Recognition is the continuous speech recognizer capability which is most difficult to create because it should determine the utterance boundaries, this method allows the user to speak naturally and the system determines the content.
Spontaneous Speech has different definition but at the basic level it can be thought as the natural sounding speech and not rehearsed. Automatic Speech Recognizer with Spontaneous Speech ability could handle a variety of natural speech such as words being run together ums and ahs.
Evolution of Interacting Computer with Speech
Human computer interaction via speech became the hop topic in the late 19th century and more research has being done till now to enhance the performance and efficiency in this field. To interaction is done with one of the above stated method but the main thing we must first understand in the interacting compute with speech is what are the technical issue currently being addressed by the researcher in the communication of information between the human and computer using speech and there are two main issue in this field which is ASR and text to speech synthesis (TTS) which is the problems in handling the problem in transforming speech to text and text to speech respectively. It is a task referred to concept rather than text at one end of the process i.e. speech understand where the output is the action taken by the computer based on the understanding of speech instead of literal textual translation of speech.
Most researcher has assumed that the speech is at the one end of the communication and text is at the other end of communication, this is true for synthesis where the input to the process is always a text than of recognition and the speech is increasingly used for the machine control or to initiate dialog with the database rather than the replacement for the keyboard.
At first the ASR appears to be inverse of speech synthesis but indeed the ASR simulate the person who is listening and the TTS simulate the person who is speaking, but the algorithm designed for the ASR cannot be reverse to to understand the speech synthesis, human auditory system is similar to other animals and mammals which have evolved much early than the speech organ, many mammal have similar vocal track but only the human have the brain power to use the vocal organ to produce the speech.
ASR is difficult and useful for two speech processing task, to define the asymmetry of the recognition and synthesis of speech consider two sub task of either people or machine to process the speech or text by segmentation and adaptation , task requiring intelligence for language processing can be simplified by subdividing into similar task. To identify and interpret the utterance of any length , the machine or human tries to segment the speech into smaller units using some criteria. The natural way of segmentation are by words and phrases but there are also proposed linguistic units are phonemes which is the fundamental sound of the speech but it has the complex relationship to aspects of acoustic speech signal so reliable segmentation by either human or machine is difficult.
There are lots of commercial synthesizers which are widely used but only for the world major language, and can provide speech whose intelligibility approaches 100% and the naturalness is also improving, Microprocessor can handle the computation speed needed for many synthesizer and function in the software but the requirement of the waveform concatenation system strains the capacities of some piratical system. The development of the computer made the realistic synthetic speech possible and increasingly inexpensive memory led to the use of large inventory of speech unit to overcome the coarticulation problem. In ASR the stochastic methods involves simple automata but large amount of training data and memory to accommodate great variability in the speaker talk but in future it may enhance with the knowledge based method[O'SHAUGHNESSY. 2003].
Research in Recognition of Noisy Speech
The automatic recognition of speech make the foundation for the natural and the easy to use method of the communication between the human and the computer , the major limitation in this area is whenever the human speech is superimposed with the background noise for example consider the interior of the car which is the popular field in the application of the speech recognizer which allow the hands-free operation for the text messaging or centre console during the driving of the car the noise produced has the major impact on the speech recognition system which is not only the major problem in car but also in all major application of the human computer interaction with speech recognition.
There is lots of research under progress to increase the performance of the speech recognition under noise surrounding as the first step the filtering or spectral subtraction can be applied to improve the signal before the speech feature is extracted and the wel known example for this approaches are applied in the front end feature extraction (AFE) or Unsupervised Spectral Subtraction(USS). Then the pattern for the auditory modeling have to be extracted from the speech signal which allow the reliable distinction between the phonemes or word classes in the vocabulary of the speech recognizer , apart from this the widely used features are Mel-Frequency cepstral coefficient(MFCCs) and also the extraction of Perceptual Linear Prediction (PLP) coefficients which is the effective method for speech representation and the third stage is the obtained by the enhanced feature to remove the effect of the noise by using the Normalization methods like Cepstral Mean Subtraction (CMS), Mean and Variance Normalisation (MVN) or Histogram Equalisation (HEQ) which are the techniques used to reduce the distortion of the frequency domain representation of speech .
There are also alternative technique used which are model based feature enhancement approach applied to compensate the effect caused by the background noise and uses a Switching Linear Dynamic Model(SLDM) which is used to capture the dynamic behavior of the speech and another is Linear Dynamic Model(LDM) which describe the additive noise there are the strategy of the joint speech and noise modeling concept which aims to estimate the clear speech feature of the noisy signal.
The derivation of speech model is considered as the next stage in the design of the speech recognizer, the commonly used speech recognizer are Hidden Markov Models (HMMs) for the speech modeling and there are also numerous alternatives available they are like Hidden Conditional Random Fields (HCRFs) ,Switching Autoregressive Hidden Markov Models (SAR-HMMs) or other more general speech model like Dynamic Bayesian Network structures which has been developed in recent years , then extending the SAR-HMM to an Autoregressive Switching Linear Dynamical System (AR-SLDS) which includes an explicit noise model and leads to the increase in the noise robustness compared to SAR-HMM.
Speech model can adapt to noisy condition when the training of the recognizer is conducted using the noisy training material and since the noise condition during the test phase is known priori the equal properties of the training for noise and the test hardly occur in the reality, but however the recognizer is build or designed only for a certain field of application as like in the car speech recognizer which is calculate from the information of the speed of the car. So there are lots of research approach is introduces to increase the recognition performance in the noisy surrounding and one of the experiment performed in the car isolated digit and spelling recognition task to explain the performance with different speech models under noisy condition[Schuller et al. 2009].
KEY TECHNIQUES IN FUTURE APPLICATION SYSTEM
In the future the following technique will be important in the field of speech recognition and understanding domain,
It will be crucial for the following application domain,
Substantially Robust Systems
In this system there do not have robustness in the wide range of the speaker and environment , for example the recognition performance reduces due to not only of acousticphonetic variations but also the characteristics of the speech of children, elderly people and handicraft .These application machine is installed in the public space which include the unwanted noise source so these system must require more robustness to perform operation in the form of guidance , shopping, information retrieval and many more. In this system the variation is basically unexpected so the conventional training technique will be less effective.
Multimodal Interaction Systems
This system will become the main stream in the human machine interface, in which most of the system employs humanized agents as metaphors of the system and it encourages the user to speak to the computer system. One direction of the multimodal system is to combine with one of the expert system which is already in some practical use and thereby improving the human computer interface, if the visual input/output and audio input/output are made in addition to speech these more sophisticated function can be used in the areas like consultation system, training or education system and many more.
Multilingual Spoken Dialogue Systems
It includes the speech to speech translation system, however other system supporting multi language will be useful. In this system it is also possible that the part of the software module can be distributed and reusable in the other language system while currently system implementation starts with gathering real speech samples for example in HMM based recognition is applicable to another language recognition but it needs the speech samples ant the corresponding other language transcriptions, it would be effective if we use a common transcript unit set as a IPA symbol set and define distance measure between this set and the individual language transcriptional set.
In all the above stated system it is difficult to recognize all utterances of individual speaker so the system recognize the utterances based on the certain information such as dialog context or another modality. The phrase spotting technique can also focus on acoustic domain, blind source separation, multimodal domain, conversational domain and many more.
2.Phonetic Symbol Distance Calculation in Symbolic Domain
Performing the distance between the phonetic sequences in the symbolic domain is more efficient than the acoustic domain which is effective for large vocabulary processing in predicting word candidate, estimating the difficulty of speech recognition in given vocabularies and also for the phase spotting. The distance calculation is composed on two steps, the first is the two phoneme sequence is converted to the subphonetic unit sequence and next the distance between the two sequence is estimated by the dynamic programming where the subphonetic unit distance valued are defined using each HMM for making the distance very closely related to acoustic phonetic domain distance.
The model structure of adaptation and the object is basically fixed and the categories of the training sample are known but the critical point lie in the issue of what samples must be provided for training or learning in other words the system require ability for training values of sample and estimating the model for structuring itself.
4. Software Modularization /Multi-agent System
In the future the system will be composed of complex configuration and to the extend the advance programming technique such as controlling the multi agent system is required and the units of knowledge base and the software model will be more compact for maintenance and reuse in several environment.
These are the next major techniques in the speech recognition technology(Tanaka, K, 1998).
Analysis - application in speech recognition
Depending on the different recognized object, the recognition task can be devided into three categories, isolated word recognition, keyword recognition and continuous speech recognition. Among them, the task of isolated word recognition is to identify isolated words which have been known in advance, such as "boot", "shutdown", and so on, and the task of continuous speech recognition is to identify any continuous speech, such as one sentence or passage, the keyword spotting aims at the continuous speech, but it does not recognize all the words, only detect known keyword where they are, for example, retrieve the words "river", "lake" in a sentence.
According to the pronunciation, speech recognition technology can be divided into specific human speech recognition and non-specific human speech recognition, the former can only identify the speech of one or a few people, while the latter can identify the speech of anybody. Clearly, the non-specific human speech recognition system is more adapt to the actual needs in the reality, but it is more difficult compared with the recognition aiming at special human speech.
Speech recognition technology is used widely, common applications include speech inputing system, as opposed to keyboard input method, it is more like accord with the daily habits of people, more natural, more efficient, speech-control system, namely to control the device with speech operation, faster and more convenient compared with the manual control, and according to speech of the users to conduct operations, to provide users with a natural friendly database retrieval services, such as family services, hotel services, travel agency services, systems, reservation systems, medical services, banking services, stock check and so on.
Although there are a lot of speech recognition algorithm problems without being resolved, the speech recognition technology has begun to gradually enter the practical stage. The speech recognition technology has been developed for the information service system and query system, people can check the information by the speech recognition computer system, and get very good feedback. According to the newest survey, it shows that more than 80% of users speak highly of these services based on the speech recognition technology services.
The speech recognition chip with small vocabulary dictionary system have reached the stage in which can be applied in the reality. Currently, the research institute in this field also get the increasing number of fund to develop the speech recognition chip with dialogue ability and speech recognition technology. nd development investment has also increased significantly.
Speech recognition application specific integrated circuit(ASIC), namely speech recognition chip, a series of the kind of chip can make up a complete speech recognition computer system. Thus, the computer system should have speech prompts and speech playback function for the purpose of forming a good human-computer interface and the identification of the verification. And this computer system is mostly the real-time system, once the user finish the speech, the computer system can finish the identification and make a response to the user, and it require a high technology. Also, the computer system require the small size, high reliability, low power consumption and less cost.
Application of speech recognition chip in computer, in recent years, this kind of computer chip is widely used,
(1) Apply this technology and chip to the speech-controlling of the automobile. As the driver drives the automobile, the driver's hands must be on the steering wheel, so if the driver want to know the traffic conditions about some streets, he can do it by the computer with speech recognition chip and advanced technology, he don't need to stop the car and then check it. In addition, as for the control of the window, door and air-conditioner, also can be achieved in this way. Also, in the way to the plane or other vehicles.
(2) Industrial control and medical fields. When the operator's eyes and hands are occupied and if they want to do additional operations, the best way is to increase the interface between human and computer with some devices used to help people. Absolutely, the computer system with speech recognition chip becomes the best way for the operators. Also, the computer can response the people with the speech.
(3) Personal digital assistants (Personal Digital Assistant, PDA) speech interface. Because the small size of PDA, human-computer interface is the bottleneck for its application and technology. has been one of the bottlenecks. Due to the inconvenient of the input based on keyboard, therefore, PDA with speech recognition chip is the best way to achieve, with the development of the speech recognition technology, speech will become the main medium for the human-computer interface in the PDA.
(4) Remote control of home appliances. People can remote the television, air conditioner, curtains by the computer system with speech recognition chip, so any opreration of the home appliances may be easier to implement.
As referred before, also some speech recognition algorithm problems have not been resolved, but actually, the application of this technology has begun to gradually enter the practical stage. The speech recognition technology has been developed in some republice services, such as the information computer service system, speech recognition control system in vehicle system, mechanism producing, etc, and these applications receive a good feedback. According to the newest survey, the majority of users think highly of these services based on the speech recognition technology services.
Absolutely, this advanced technology of speech recognition in the human-computer interaction will be widely used in the future, in national defense, medicine, control of vehicle, personal digital assistants, republic information computer system, etc. And suppose if disabled people want to use computer, but they cann't use the keyboard or the mouse, so a computer with speech recognition can do a great favor for them. They can just say what they want their computer to do, and computer recognize the voice and judge the meaning and do the reaction to answer for the users.
Also, the twenty-first century is a new time for the advanced and rising technology, also the time for integration of different technologies in different fields, especially for the multi-perception in human-computer interaction, like the integration of the multi-perception chip and application. And currently, researchers have gone about this chip about the integration of these technologies, supposing that a computer system based on the speech recognition is provided to the disabled people without speaking ability, and how it can work? And the technology in the gaze recognition, is not good enough but only can do a basic operation, so we suppose that we can make a chip integrating these two technologies to help people. Then the computer system maken on the basis of this chip can help more and more people and make more convenient for the modern life than before. The creation and innovation is one of the most important factors for the human-computer interaction technology.
Communicating with computer, it can understand what people say and do reactiong to meet the needs of users is the dream of people. Speech recognition technology is the advanced technology that make the computer transcribe the speech signal to the text or order by the recognition and understand processes.
With the very rapid expansion of computer's influence in the modern society, high intelligence, high performance and high usability are generally regarded as the main trend of the advanced technology of current computer science development. Absolutely, with the development in computer technology, communication and display technology, the increasing number of limitations of the traditional human-computer interaction technology based on traditional input devices become more and more apparnt in updating showing technology, virtual reality computer. The aim of this kind research in speech recognition technology is to develop the technology to solve the problem of high intelligence and high usability of computer devices and to create the harmonious and natural human-computer interaction environment. The key point is to make the computer precisely perceive the different human speech-expressing means human natural language. speech recognition, as one of the important research areas of human-computer interaction, has exhibited more and more interests in HCI society.
The aim of speech recognition is to provide an efficient and accurate method to transcribe speech into something that the computer can recognize then do the reaction to meet the needs of users so that communication between disabled people and computer can be more convenient.
The research on speech recognition has many applications, for example:
1) Speech recognition make the communication between the disabled and computer possible
2) From the cognitive point of view, the research on the mechanism of understanding human speech language can improve the computer intelligence to understand human language.
3) Agent of virtual reality can be controlled by speech.
4) Demonstration learning of robot.
5) Multi-modal interface in virtual reality and augmented reality.
In summary, the research on sign language recognition not only has theoretical values, but also wide application areas.