Covid-19 Update: We've taken precautionary measures to enable all staff to work away from the office. These changes have already rolled out with no interruptions, and will allow us to continue offering the same great service at your busiest time in the year.

Computer Application for Speech Recognition

2235 words (9 pages) Essay in Information Technology

18/05/20 Information Technology Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

                  Speech is the most basic, common and efficient form of communication method for people to interact with each other. Speech recognition is a technology that can catch the words spoken by a person with the assistance of a mouthpiece. These words are later perceived by a recognizer, and at last, the framework yields the recognized words. To understand how speech recognition works it is desirable to have knowledge of speech and what features of it is used in the recognition process. In human brain via ear, thoughts are constructed into sentences and the nerves control the shape of the vocal tract (jaws, tongue, mouth, vocal cords, etc.) to produce the desired sound (Vimala). The sound coming out in phones are the building blocks of speech. Each phoneme resonates at a fundamental frequency and harmonics of it and thus have high energy at those frequencies (Vimala). The first three harmonics have significantly high energy levels and are known as format frequencies. Each phoneme has a unique fundamental frequency and hence unique formant frequencies and it is this feature that enables the identification of each phoneme at the recognition stage. In general, speech recognition systems have stored reference templates of phonemes or words with which input speech is compared and the closest word or phoneme is given out. Since it is the high frequencies that are to be compared, the spectra of the input and output reference template are compared rather than the actual waveform.

               Speech acknowledgment systems can be broken down into the number of classes depending on their ability to identify that word and the slope of words they have. A few classes of speech recognition are classified as isolated speech, connected speech, continuous speech and spontaneous speech. (Toledano). Separated words often involve a delay between two utterances; it doesn’t imply that it only acknowledges a single word yet instead it requires one expression at a time. Connected words or connected speech is like isolated speech yet permits separate utterances with insignificant interruptions between them. Continuous speech enables the client to talk normally, almost naturally, it is also called the computer correspondence. In spontaneous speech at an essential dimension, it tends to be thought of as speech that is regular sounding and not practiced. An ASR system with spontaneous speech ought to have the option to deal a variety of natural speech features for example, words being run together. “ums” and “ahs”, and even slight stutters (Douglas).

            The idea of speech recognition began somewhere in 1940s, esentially the primary speech recognition program was appeared in 1952 at the bell lab and was about recognition of a digit in a noise free environment (Vimala). In 1970s the medium vocabularies (request of 100-1000 words) using straight forward layout – based, design recognition methods were perceived. In 1980s enormous vocabularies (1000-unlimited) were utilized and speech recognition problems based on statistical (Vimala). With a large scope of systems for dealing with language structures was addressed. The key invention of this era was hidden Markova model (HMM) and the stochastic language model, which together empowered incredible new strategies for dealing with new methods for handling continuous speech recognition problem efficiently and with high elite (Scheffer). In, 1990s the key tech created during this period were the strategies for stochastic language understanding, statistical learning of acoustic and language models, and the methods for implementation of large vocabulary speech understanding systems (Vimala). After the five years of research, the speech recognition technology has at last come to the commercial centers or marketplace, profiting the clients in variety of ways. The challenge of design a machine that truly functions like an insightful human is still a man. Yes, you have also used this system in your mobile phone google speech.

                     Speech recognition can be used for different purposes by different group of people in various kinds of fields. It can be used for medical purposes, military purpose and educational purpose. Individuals with disabilities can be profited by the speech recognition programs. Speech recognition is particularly beneficial for individuals who have trouble utilizing their hands, in such cases speech recognition programs are much helpful and they can use for functioning the computers. Speech recognition programs is used in deaf telephony, as like speech mail to text. Speech recognition programs are significant from military point of view; in Air Force speech recognition has distinct potential for tumbling pilot’s outstanding burden. Alongside the Air Force likewise programs can also be prepared to be used in helicopters & battle management and other applications (Douglas).  From educational purpose, people with learning incapacities who have issues with thought to paper communication can profit from the software. Some other applications areas of speech recognition inovation are portrayed next. Command and Control –ASR systems that are intended to perform functions and actions on the frame are defined as Command and Control systems (Douglas). Articulations like “Open Netscape” and “start a new browser” will do only for that. Telephony Some Speech Mail Systems enable guest to talk directly as oppose to pressing buttons to send specific tones. Numerous individuals experience issues composing because of physical restrictions such as repetitive strain injuries (RSI), muscular dystrophy, and many others. As like people with difficulty hearing could use a system connected to their telephone to convert the caller’s speech to text (Douglas).

                        There are several areas for application of speech recognition technology. Speech controlled appliances and toys, speech assisted computer games, speech assisted virtual reality, telephone assistance systems, speech recognition systems, speech to text translation (SCHULLER). A few advantages are that the person who is unable to write or see with the help of this application can perform their task such as transaction process etc. Best benefits of this strategy are that degradation of the possibility of copying security passwords because there is no need of composing security passwords and the whole can be done without any worry. Voice is preferred as an input because it doesn’t require training and it is much faster than any other input. Also, information can be input while the person is engaged in other work, voice is a very natural way to interact, and it is not necessary to sit at a keyboard or work with a remote control (Lewis). Activities and information can be fed via microphone, Greater Mobility Voice is a very natural way to interact. Some advantages and disadvantages of speech recognition are, that the student does not have to use a keyboard to input information. Builds efficiency, can help with perusing and looking over, can help individuals who have trouble utilizing their hands, can help individuals who have intellectual inabilities and has long haul benefits for understudies. The software spells every word correctly. Students can compose as fast as they talk, 100+ words per moment. Students can deliver a large amount of composing, which they would then be able to alter. Students can write papers without being kept down by spelling or keyboarding problems. The software will read back to students what they have composed, assisting with editing. Voice recognition software use is expanding rapidly. Both Windows and Macintosh operating systems have voice acknowledgment implicit. And another disadvantages of speech recognition are that the product must be able to recognize the user’s voice (Scheffer). This is achieved by perusing entries given by the program. Clients must talk unmistakably all together for the product to function admirably. On the off chance that the understudy has non-standard speech, will in general run words together, or mutter, the preparation procedure might be long. Some prominence must be directed. The software spells each word it perceives accurately. Normally, it recognizes 5-20% words inaccurately (Scheffer). It cannot recognize homonyms while users may, talk quickly, what they produce will most likely be disorganized and grammatically indecent. The other few disadvantages are that the users must oversee. Users can undoubtably get words written down, however there is considerably more engaged with recording a paper than simply putting words. Speech recognition utilizes a lot of memory. The software has explicit equipment necessities.  It can be hacked with prerecorded verbal messages, has an underlying time of changing in accordance with every client’s voice? less precise when there is foundation commotion, can be diverting in a desk area condition (Dong).

 Utilizing speech acknowledgment can enable you to calibrate your articulation. When you work on talking freely, it tends to be hard to hear what you’re fouling up, so having an application that can pinpoint issues is a helpful method to figure out what you must chip away at (Lewis). Moreover, speech acknowledgment isn’t threatening. Since you aren’t working with a genuine individual, you won’t be humiliated on the off chance that you commit an error. Rehearsing with an application can make you progressively sure about your aptitudes and in the end remove a portion of the terrorizing variable of chatting with genuine local speakers (Lewis). The evaluation on spoken English that emphasizes vocabulary and grammar, the multi-parameter evaluation model based on accuracy, speed, rhythm and intonation can reflect the level of spoken English more accurately (Yanping). At long last, speech acknowledgment applications give an adaptable report choice. You can get talking practice at whatever point you need, without expecting to timetable time with your discussion accomplice. That implies you can pack in some additional speech practice whether you’re holding up in line at the store, unwinding in bed or even in the shower (simply ensure your telephone is waterproof first) (Yuen).

 It is very evident that there is still exceptionally long approach before content to-speech union, particularly abnormal state combination, is completely worthy. In any case, the advancement is going ahead consistently and over the long haul the innovation appears to gain ground quicker than we can envision. Therefore, when building up a speech combination framework, we may utilize practically all assets accessible, because in couple of years today’s high assets are accessible in each PC (Meysam). Notwithstanding how quick the improvement procedure will be, speech union, at whatever point utilized in ease mini-computers or best in class interactive media arrangements, has likely the most encouraging future. In the event that speech acknowledgment frameworks sometime accomplish for the most part adequate dimension, we may produce for instance a correspondence framework where the framework may initially break down the speakers’ voice and its attributes, transmit just the character string with some control images, lastly integrate the speech with individual sounding voice at the opposite end (Meysam). Indeed, even translation from a language to another may wound up possible. Be that as it may, clearly, we should sit tight for quite a while, perhaps decades, until such frameworks are conceivable and ordinarily accessible.

Bibliography

  • Dong, Meng, et al. “Unsupervised Speech Recognition through Spike-Timing-Dependent Plasticity in a Convolutional Spiking Neural Network.” PLoS ONE, vol. 13, no. 11, Nov. 2018, pp. 1–19. EBSCOhost, doi: 10.1371/journal.pone.0204596.
  • DOUGLAS O’SHAUGHNESSY, “Interacting with Computers by Speech: Automatic Speech Recognition and Synthesis”, Proceedings of the IEEE, VOL. 91, NO. 9, September 2003, 0018-9219/03$17.00 © 2003 IEEE.
  • Lewis, Dawna, et al. “Effects of Noise on Speech Recognition and Listening Effort in Children with Normal Hearing and Children with Mild Bilateral or Unilateral Hearing Loss.” Journal of Speech, Language & Hearing Research, vol. 59, no. 5, Oct. 2016, pp. 1218–1232. EBSCOhost, doi:10.1044/2016_JSLHR-H-15-0207.
  • Meysam Mohamad pour, Fardad Farokhi, “An Advanced Method for Speech Recognition”, World Academy of Science, Engineering and Technology 49 2009
  • Scheffer, C., et al. “A Comparative Evaluation of Neural Networks and Hidden Markov Models for Monitoring Turning Tool Wear.” Neural Computing & Applications, vol. 14, no. 4, Oct. 2005, pp. 325–336. EBSCOhost, doi:10.1007/s00521-005-0469-9
  • SCHULLER, BJÖRN W. “Speech Emotion Recognition: Two Decades in a Nutshell, Benchmarks, and Ongoing Trends.” Communications of the ACM, vol. 61, no. 5, May 2018, pp. 90–99. EBSCOhost, doi:10.1145/3129340.
  • Toledano, Doroteo T., et al. “Multi-Resolution Speech Analysis for Automatic Speech Recognition Using Deep Neural Networks: Experiments on TIMIT.” PLoS ONE, vol. 13, no. 10, Oct. 2018, pp. 1–24. EBSCOhost, doi: 10.1371/journal.pone.0205355
  • Vimala.C, and Dr.V. Radha. “A Review on Speech Recognition Challenges and Approaches.” World of Computer Science and Information Technology, pdfs.semanticscholar.org/04c8/b7668bc09eebcb56d54ba221a26d8fd174d7.pdf.
  • Yanping Zhang, and Limin Liu. “Using Computer Speech Recognition Technology to Evaluate Spoken English.” Educational Sciences: Theory & Practice, vol. 18, no. 5, Oct. 2018, pp. 1341–1350. EBSCOhost, doi:10.12738/estp.2018.5.033.
  • Yuen, Kevin Chi Pun, et al. “The Mandarin Spoken Word—Picture Identification Test in Noise—Adaptive (MAPID-A) Measures Subtle Speech-Recognition-in-Noise Changes and Spatial Release from Masking in Very Young Children.” PLoS ONE, vol. 14, no. 01, Jan. 2019, pp. 1–23. EBSCOhost, doi: 10.1371/journal.pone.0209768.
Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

Cite This Work

To export a reference to this article please select a referencing style below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please:

Related Lectures

Study for free with our range of university lectures!