This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The system, which I am developing, called SINHALA TEXT TO SPEECH is a one kind of fully research project. This documentation briefly describes the functionality of my STTS and highlights the important and benefits of the project. So this system will allow user to enter sinhala texts and internally it will convert in to pronunciation form. Actually it will happen after user select the particular option (convert to voice) to convert it in to that pronunciation form. So totally this system is capable of accepting characters in sinhala language (sinhala fonts) and makes them in to sound waves, which can be captured by a technical object (speakers). User will able to select the voice type, which he/she like, it mean there are three option called child voice, female voice and adult (male) voice to select. By selecting that function user can hear the voice, which he/she like most. And the system will carry out several benefits to users, those who will use this system. The users who are not able to read sinhala, but those can understand verbally will encourage to use this system, because using this product they can overcome that problem very easily. If somebody needs documents with sinhala texts, then he or she can use this system to get that one. In today world there are no such systems for sinhala language like which I am going to develop.
Table of Contents
Table of Contents 3
SINHALA TEXT TO SPEECH 4
3.STUDY PROBLEM 5
4.RELEVANCE OF THE PROJECT 5
5.LITERATURE REVIEW 6
6.SPECIFIC STUDY OBJECTIVES 7
7.PROPOSED APPROACH 8
7.1 User 8
7.2 Data 8
7.3 Input 8
7.4 Processes 9
7.5 Output 9
8.RESEARCH METHODOLOGY AND TCHNOLOGIES 9
8.1 Database Technology 9
9.PROJECT PLAN 10
9.1 ARCHTECTURE 10
9.1.1 Design Architecture 10
9.1.2 Text process Architecture 11
9.1.3 Voice Tag Selection Process 12
9.1.4 Voice Control Process 13
11.1 SPEECH ANALYSIS AND SYNTHESIS 14
11.2 SPEECH CODING 14
SINHALA TEXT TO SPEECH
"Sinhala Text To Speech" is the system I am hoping to develop as my final research project. As a post graduate student I selected a research project that will convert the Sinhala input text into a verbal form.
Actually, the term "Text-To-speech" (TTS) refers to the conversion of input text into a spoken utterance. The input is a Sinhala text, which may consist of a number of words, sentences, paragraphs, numbers and abbreviations. TTS engine should identify it without any ambiguity and generate the corresponding speech sound wave with acceptable quality. The output should be understandable for an average receiver without making much effort. This means that the output should be made as close as to the natural speech quality.
Speech is produced when air is forced from the lungs through the vocal cords (glottis) and along the vocal tract. Speech is split into a rapidly varying excitation signal and a slowly varying filter. The envelope of the power spectra contains the vocal tract information.
The verbal form of in input should be understandable for the receiver. This means that the output will be made as closer as the natural human voice. My system will carry out few main features. Some of them are, after entering the text user will capable of selecting one of voice qualities, means women voice, male voice and child voice. Also the user is capable of doing variation in speed of the voice.
Actually, my project will carry out main few benefits to the users, those who intend to use this.
Below I have mentioned the basic architecture of our project.
Text in Sinhala
Voice and speed
To develop a system, that can able to read text in sinhala format and covert it in to verbal (sinhala) form. And also, It will capable to change the sound waves, It mean user would able to select voice quality according to his/her opinion. There are might be three voice selections. These are kind of woman voice, kind of male voice and kind of kid's voice. And user can change the speed of the voice. If somebody needs to hear low speed voices or high-speed voice, then he/she can change it according to their requirements.
Actually before start this project I have accessed in to the Internet and collect more information regarding this particular field. First-of-all I have to provide a facility to enter sinhala font in to the computer. So, to overcome this matter I intend to use UNICODE. When we pronounce sinhala text, sometime we need use pronouncing voices of two texts.
It means to create voice for some texts we need to combine another two text voices. So to have voices we should store voices to each and every text in the voice database. Then voices come from voice database according to the text which we entered. Actually after we entered text internally it (texts) get in to different groups.
RELEVANCE OF THE PROJECT
The thought of developing a Sinhala Text To Speech (STTS) engine have begun when I considering the opportunities available for Sinhala speaking users to grasp the benefit of Information and Computer Technology (ICT). In Sri Lanka more than 75% of population speaks in Sinhala, but it's very rare to find Sinhala softwares or Sinhala materials regarding ICT in market. This is directly effect to development of ICT in Sri Lanka.
In present few Sinhala text to speech softwares are available but those have problems such as quality of sound, font schemas, pronunciation etc. Because of these problems developers are afraid to use those STTS for their applications. My focus on developing an engine that can convert Sinhala words in digitized form to Sinhala pronunciation with error free manner. This engine will help to develop some applications.
Some applications where STTS can be used
Document reader. An already digitized document (i.e. e-mails, e-books, newspapers, etc.) or a conventional document by scanned and produced through an optical character recognizer (OCR).
Aid to handicap person. The vision or voice impaired community can use the computers aided devices, directly to communicate with the world. The vision-impaired person can be informed by a STTS system. The voice-impaired person can communicate with others by providing a keypad and a STTS system.
Talking books & toys. Producing talking books & toys will boost the toys market and education.
Help assistant. Develop help assistant speaks in Sinhala like in MS Office help assistant.
Automated News casting. The future of entirely new breed of television networks that have programs hosted by computer-generated characters is possible.
Sinhala SMS reader. SMS consist of several abbreviations. If a system that read those messages it will help to receivers.
Language education. A high quality TTS system incorporated with a computer-aided device can be used as a tool, in learning a new language. These tools can help the learner to improve very quickly since he/she has the access to the correct pronunciation whenever needed.
Travelers guide. System that located inside the vehicle or mobile device that will give information current location & other relevant information incorporated with GPRS.
Alert systems. Systems that can be incorporated with a TTS system to attract the attention of the controlled elements since as humans are used to draw attention through voice.
Specially, countries like Sri Lanaka, which is still struggling to harvest the ICT benefits, can use a Sinhala TTS engine as a solution to convey the information effectively. Users
can get required information from there native language (i.e. by converting the text to native language text) would naturally move there thoughts to the achievable benefits and will be encouraged to use information technology much frequently.
Therefore the development of a TTS engine for Sinhala will bring personal benefits (e.g. aid for handicapped, language learning) in a social perspective and definitely a financial benefit in economical terms (e.g. virtual television networks, toys manufacture) for the users.
"Text to speech "is very popular area in computer science field. There are several research held on this area. Most of research base on "how to develop more natural speech for given text ". There are freely available text to speech package available in the world. But most of software develops for most common language like English, Japanese, Chinese languages. Even some software companies distribute "text to speech development tools "for English language as well. "Microsoft Speech SDK tool kit" is one of the examples for freely distributed tool kit developed by Microsoft for English language.
Nowadays, some universities and research labs doing their research project on "Text to speech". Carnegie Mellon University held their research focus on text to speech (TTS). They provide Open Source Speech Software, Tool kits, related publication and important techniques to undergraduate student and software developer as well. TCTS Lab also doing their research on this area. They introduced simple, but general functional diagram of a TTS system [Ref. 2].
Image Credit: Thierry Dutoit.
Figure5.1. A simple, but general functional diagram
SPECIFIC STUDY OBJECTIVES
Produce a verbal format for the input sinhala text.
Input Sinhala text which may be a user input or a given text document will be transformed in to sound waves, which is then output is captured by speakers. So the disabled people will be one of the most beneficial stakeholders of Sinhala Text to Speech system. Also undergraduates and research people who need to use more references can send the text to my system, just listen and grab what they need.
The output would be more like natural speech.
The human voice is a complex acoustic signal, which is generated by an air stream expelled at either mouth, nose or both. Important characteristics of the speech sound are speed, silence, accentuation and the level of energy output. The tongue appropriately controls the air steam, lips with the help of other articulators in the vocal system. Many variations of the speech signal are caused by the person's vocal system, in order to convey the meaning and emotion to the receiver who then understand the message. Also includes many other characteristics, which are in receiver's hearing system to identify what is being said.
Identify an efficient way of translating sinhala text in to verbal form.
By developing this system we would be able to identify and proposed a most suitable algorithm, which can be used to translate sinhala format to verbal form by a fast and efficient manner.
Control the voice speed and types of the voice (e.g. man, women, child voice, etc.).
Users would be capable of selecting the quality of the sound wave, which they want. Also they would be allowing to reset the speed of the output as they need. People, those would like to learn Sinhala as their second language to learn elocution properly by changing the speed (reducing and increasing). So this will improve the listening capabilities.
Small kids can be encouraged to learn language by varying the speed and types.
Propose ways for that can be extended the current system further more for future needs.
This system only gives the basic functions. My system is feasible of enhancing further more in order to satisfy the changing requirements of the users. This can be embedded in to toys so can be used to improve children listening and elocution abilities. So those will Borden their speaking capacity.
Main function of my system is read sinhala digitized characters and speak out those words as closer sounds that human natural voice.
My basic idea is to develop systems that cater all kinds of users. That mean who know the operate computer very well and also who is beginner to the computer field. Users only want to do insert text in sinhala.
In my database I am hoping to store voice tags, sinhala characters and pronunciation rules. And also I wish to introduce efficient algorithms for search relevant voice tags from the database.
Proposed system will get sinhala-digitized characters, voice selection as input.
Get the sentence from the user and it will identified end of sentence by full-stop and it will separate two words by the space between two words. Those words will break down to smaller parts. Then after capture the relevant voice tags according to rules that I have given and merge those voice tags. Then after get voice selections that user given and process to give those sound effects.
Produce the related sinhala voices for text that is given by the user according to sinhala pronunciation rules as well as voice selection done by the user.
RESEARCH METHODOLOGY AND TCHNOLOGIES
8.1 Database Technology
Hope to use OO methodologies and Relational Database Management System (MicrosoftÂ® SQL Serverâ„¢ 2005) to develop centralized database on main server. A database management system, or DBMS, is software design to assist in maintaining and utilizing large collection of data [Ref. 3]. The SQL Server 2005 is design to work as a data storage engine for thousand of concurrent users who connect over a network, it is also capable of working as a stand-along database directly on the same computer as an application [Ref. 4]. DBMS provide some important functionality. Applications are independent from data representation, storage and location (data and location independence). DBMS is able to scan through million of record and retrieve efficiently (efficient data access). DBMS enforce integrity constrain and security permission on the data (data integrity and security). DBMS provide facilities to data and its efficient accessibility (data administration). DBMS schedule concurrent access to the data in such manner that user can think of the data as being accessed by one user at a time. Further, DBMS protects users from the effects on of system failures (concurrent access and crash recovery). There for hope to use MicrosoftÂ® SQL Serverâ„¢ 2005 to develop voice and text information database.
9.1.1 Design Architecture
Text in Sinhala
Voice and speed selection
Array of text (Sinhala)
Process in detail
Process the Text
Get the voice tags according to the Text and merge them
Figure 184.108.40.206 Text process Architecture
Detect full-stops, commas, brackets etc.
Separate out numbers
Get unique number to each letter and store it in an array
Send the data in array to voice tag selection process
Separate the text to sentences
Group the text according to letters
Array of letter values
This process gets a text as the input. It detect whether there are any full-stops, commas etc. to avoid confusions. If there any numbers in the text they are separate out and text is partition in to sentences. After that each letter in a sentence grouped, give a unique number store in an array. This array is send to the next process.
9.1.3 Voice Tag Selection Process
Get voice tags from voice Database
Array of letter values
Merged Voice tag
Merge voice tags to the order
Send the merged voice tags to voice Control process
Select the voice type
This process gets the array, which gives from Text process and voice selection as inputs. By using these inputs this process gets voice tags for each letter and merge them. Merge voice tags send to the voice control process.
9.1.4 Voice Control Process
Store the voice text array
Control the speed Voice speed
Speak the voice array
Merged Voice tag
This process gets merge voice tags and voice speed selection as input. It organize the Merge voice tags according to speed selected. Then it will speak out speech each voice tag.
[Ref. 1] Building Synthetic Voices, [Online] http://www.festvox.org/festvox/
[Ref. 2] An Introduction to Text-to-Speech Synthesis, [Online] http://tcts.fpms.ac.be/synthesis/introtts.html
[Ref. 3] - Raghu Ramakrishnan, Johannes Gehrke/Database Management System
Third edition - 2001/ McGraw-Hill
[Ref. 4] - SQL Server Books, [Online] 1988-2005 Microsoft Corporation.