Data Visualization Model For Speech Articulators Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract-The proposed work describes on study and development of visualization tool for investigating the speech perception from internal articulators movement of Tamil speech sounds. To improve accuracy of speech production a computer aided articulator interface is developed. The interface contain front and the side view of an animated face model which displays various possible movements of visible inner articulators such as tongue and lip. The tongue and the lip model plays a vital role to display the place of articulation which improves the speech intelligibility in case of early language learners as well as in the case speech therapy for subjects suffering from hearing impairments or articulation disorders. In this work both the tongue and the lip is modeled using set of polygons. The tongue made of two layers each containing 49 control points arranged in a 7 x 7 grid. From 49 control points to parameterize tongue, seven control points have been identified. Lip modeling is done by 6 x 7 grid consist of 42 control points to parameterize the lip, six major control points have been used. To reconstruct the position of the tongue and lip, the seven control points of tongue and the six control points of the lip are extracted from the mid-sagital Magnetic Resonance Imaging (MRI) images captured during the articulation of each phoneme. The focus of proposed method is to develop an interface which is usable without prior training or instruction and to improve speech.

Keywords-Speech perception, Internal articulators, Speech production, Computer aided articulator interface, Speech intelligibility, Speech therapy, MRI

Introduction

Computer Aided Data Visualization system provides an interesting tool for investigating gain of visual information which can enhance speech intelligibility. The visualization of speech production helps subjects to know about the place of inner articulators and to control their speech organs. Due to hidden articulators and other social issues, visual speech perception is a complex task [1]. Perceptual research has been to a certain degree informative about the how visual speech is represented and processed, but improvements in visual speech synthesis need to be much more driven by detailed studies of how real humans produce speech, because human speech production is a very complex process.

To explain articulatory processes, speech therapists uses static pictures of articulator position from mid-sagittal or frontal view are used. Such static pictures do not consider the coarticulatory interaction of real speech trajectories [2]. To provide co-articulation a talking head with three dimensional model of the vocal tract is important. The dynamic information from a talking head used as speech trainer (e.g. language acquisition or speech therapy) offers the possibility to show the internal articulators to explain the production of different speech sounds.

The ultimate goal of research on data visualization model is to develop an interface which includes an animated face model with visible inner articulators (tongue, lip and jaw). Speech production acquired from both front and the side view of an animated face model. Interface contain control panel which helps to show the various possible movements of each articulators. Our computer aided data visualization system is to train and improve speech intelligibility of second language learners and hearing impaired subjects in Tamil language.

The rest of this paper is organized as follows. Section II reviews some of the related works in this context. Section III explains about the implementation details and techniques used. Section IV shows some of the experimental results obtained. Finally, Section V provides a discussion to extend the work of this system in future.

RELATED WORK

In this section, the discussion of the related works in the field of speech perception, computer aided articulator interface and articulator modeling have been presented.

A subject with hearing impairments suffers from lack in auditory feedback and problem in gaining speech production. With these difficulties most of the subjects do not learn to speak properly despite a fully functional speech production system. Speech therapy can improve their speech intelligibility dramatically. Speech training systems provides visual feedback of vocal tract shape which are found to be more useful to know the correct place of articulation. There is wide range of computer based speech training (CBST) systems used as therapists for subjects with hearing impaired and speech impairment. Some of the CBST systems are SpeechViewer, Box of Tricks, Indiana Speech Training Aid, Speech Illumina Mentor, Speech Training, Assessment and Remediation system [10]. These therapists are extensively used and acknowledged.

In previous work [1] the interface named Computer Aided Articulatory Tutor (CAAT) was developed using suitable computer graphics and MRI technique to develop inner articulatory movements of the animated tutor. Three dimensional vocal tract articulatory models were developed by polygon modeling technique. The polygon modeling was most commonly used method to model three dimensional models. The polygon models are relatively easy to create and can be deformed easily. However, the smoothness of the surface is directly related to the complexity of the model.

The tongue model in the CAAT interface was modeled as a set of polygons. The tongue was visualized as made up of 50 control points. To construct the entire three dimensional shapes of the tongue, five control points identified as major points. To perform articulation for phonemes, the five major control points of the tongue were extracted from the mid-sagittal MRI images and stored along with corresponding phoneme. Using key framing and interpolation technique performed speech articulation. With key framing technique base and the target position of tongue was defined [8]. To compute intermediate deformation value interpolation technique was used.

SYSTEM DESIGN

The embedded modules are articulator modeling module, visual articulation module, and control interface module. In this articulator modeling module developed following models such as animated face model, tongue model, lip model and lower jaw model. The visual articulation module involves generating a series of parameter settings for virtual articulators. For each speech sounds the visual articulation module provides a co-articulated target position which is held for a fixed fraction of the speech duration. The control interface model assign control to each articulator such as tongue, lip which allow performing following possible movements

Lip opening and closing

Lip rounding

Tongue body raise

Tongue contact with palate

Tongue front and back

Tongue tip raise

Data Acquisition

Data acquisition is the first step in constructing an articulatory model. Three dimensional models are based on geometry, typically a polygon mesh that is deformed to produce animation. To develop an initial mesh whose geometry data has to acquire from suitable method or technique. There are many methods to acquire data on inner articulators: Magnetic Resonance Imaging (MRI), Kinematic data from Electropalatography (EPG) and Electromagnetic Articulography (EMA) has been used [12]. Each of these methods in isolation can provide useful information [9]. But MRI is the dominating measurement method for three dimensional imagining and provides detailed three dimensional data of the entire vocal tract and tongue [6]. MRI is amenable to computerized three dimensional modeling and provides excellent structural differentiation. Due to technical advances, possible to collect full three dimensional data without subjects to sustain the articulation artificially. Moreover, MRI does not cause any known health risks to subjects. Therefore MRI can be used for large corpus studies and for repeated measurements with same subjects.

Tongue Modeling

Tongue is an important organ in human speech production. For realistic speech animation requires a tongue model with clearly visible tongue motion that is an important cue in speech reading. Our tongue model is implemented as a set of polygon, consisting of 98 control points joined by 86 polygons, making up a polygon mesh. The surface of tongue is rendered using illumination and shading, giving the surface a smooth appearance. The tongue model is realized as made up of two layers each containing 49 control points arranged in a 7x7 grid. In order to parameterize the tongue, seven major control points have been identified. These include the tongue center, three control points along the right lateral median and lower three control points along the vertical center line. All the seven control points are located on the top layer of the tongue Fig. 1.

Fig. 1 Tongue model

Our tongue model can perform the following deformation such as: tongue tip raise, tongue body raise, tongue forward and backward movement and tongue contact with palate. The deformation includes rotation, scaling, translation and pull. These deformations are applied to control points in a defined area of influence, thus creating possible movements for the tongue model.

Lip Modeling

Lips consist of two portions upper lip and lower lip. To perform animation on lip, which require motion of upper lip and lower lip in parallel. Our lip is modeled using technique polygon modeling. The use of polygonal modeling which results in smoother shapes and structure. Benefit of using polygon modeling is that calculation of the changing shapes in the polygon models can be carried out much faster. And also to achieve the desired lip shapes directly.

Our lip model consists of 42 control points which include 28 polygons. Each polygon consists of 5 vertices. The entire three dimensional lip models are arranged in 6 x 7 control grid. In the lip model there are three basic groups of line in both upper and lower lip (inner, median and outer line). Each line comprising seven control points form a required set of 21 control points to define the upper lip and remaining 21 control points to define the lower lip. In order to parameterize the lip, six major control points have been identified. These include the lip center, two control points above the center point along horizontal line and three control points right to center point along the vertical line. All the six control points are located on the upper lip Fig. 2.

Fig. 2 Lip model

In our lip model six types of deformation can take place they are: lip rounding, lip protrusion, upper lip raise, lower lip depression, upper lip retraction and lower lip retraction.

Visual Articulation Interface

Visual articulator interface displays a double view of the animated face from the front and side. The surface of the face is made semitransparent to display the inner articulators such as tongue, lip and jaw Fig. 3. This capability is essentially useful in explaining non-visible articulation in the language learning situation. The animated face is typically described as a polygonal mesh that is deformed by parameterization. The visual articulator interface is developed by using java programming. The necessary controls that are required are incorporated in the interface. To perform articulation user select any one of the speech sound from the drop down list box. Once a speech sound is chosen, the corresponding picture of speech sound is displayed in the interface. Along with corresponding co-articulation is animated and displayed in front and side view of animated face.

Fig. 3 Interface of visual articulator

The visual articulator interface uses key framing and interpolation technique to achieve speech articulation. Interpolation is the most common method of animating three dimensional models. The basic principle is that first to define the key frames for the base and the target position of the articulators. Once the key frames are identified then in-between frames can be determined by interpolation. In our work to perform articulation of different speech sounds, seven major control points of tongue and six major control points of lip are extracted from the mid-sagittal MRI images capture during the articulation of each speech sounds. These values are stored in a database along with corresponding speech sounds. The points corresponding to the base position of the tongue is shown in Table 1 and base position of the lip is shown in Table 2.

Table 1. Coordinates for the tongue's base position

Point No.

X-Coordinate

Y-Coordinate

Z-oordinate

1

-0.0298804

-0.8214912

-1.9494

2

0.0859375

-0.8214912

-1.9494

3

0.1950554

-0.8214912

-1.9494

4

0.3038823

-0.8214912

-1.9494

5

-0.0298804

-0.9450541

-1.9494

6

-0.0298804

-1.0559431

-1.9494

7

-0.0298804

-1.1666168

-1.9494

Table 2. Coordinates for the lip's base position

Point No.

X-Coordinate

Y-Coordinate

Z-oordinate

1

-0.02988

-1.56662

-1.99494

2

0.014519

-1.46662

-1.99494

3

0.024519

-1.38662

-1.99494

4

0.03452

-1.32662

-1.99494

5

-0.02988

-1.56662

-1.83494

6

-0.02988

-1.50662

-1.80494

The process of visual articulation process is depicted in the Fig. 4. The major control points extracted from MRI for each speech sound are stored in database. To reconstruct the tongue and vocal tract model, the coordinates of seven major control points of tongue and six major control points of the lips is retrieved from the stored database, using correction factor the complete tongue shape and lip shape is plotted using the various calculation.

Fig. 4 Flow diagram for visual articulation

Control Panel Interface

The interface comprise of front and side view of animated face with visible inner articulators, along with set

Fig. 5 Interface of Control panel

of control to each articulators (tongue, lip and jaw). The controls are used to show different possible movement of each articulator. The tongue model has four controls which enable following movements such as: tongue body raise, tongue contact with palate, tongue forward and backward movement and tongue tip raise. The lip model has three controls to perform movements such as: lip open and close, lip rounding and lip protrusion. The control panel interface is shown in Fig. 5.

Our control panel interface is designed for two main reasons. Firstly, to give training for subjects having difficulty in producing particular speech sounds. By providing training to subjects by showing the movement of articulators via moving the control, they can understand precisely about the articulation for particular speech sounds. Secondly, to perform articulation process for new speech sound instead of using MRI to obtain major control points for tongue and vocal tract, data can be obtained from control panel interface.

Experimental results

In our visualization tools for speech articulator performed articulation process for frequently used Tamil speech sound. The articulation for the letter "THA" results in target location of the tongue and lip. To reconstruct the place of articulation for corresponding speech sound, the seven major control points for tongue shown in Table 3 and six major control points for lip shown in Table 4 were extracted from the MRI images. Using these points, the entire tongue and the lip model was plotted.

Table 3. Coordinates for the tongue's position for sound THA

Point No.

X-Coordinate

Y-Coordinate

Z-oordinate

1

-0.0298804

-0.8214912

-1.9494

2

0.0859375

-0.8214912

-1.9494

3

0.1950554

-0.8214912

-1.9494

4

0.3038823

-0.8214912

-1.9494

5

-0.0298804

-0.9450541

-1.8594

6

-0.0298804

-1.0559431

-1.7894

7

-0.0298804

-1.1666168

-1.7294

Table 4. Coordinates for the lip's position for sound THA

Point No.

X-Coordinate

Y-Coordinate

Z-oordinate

1

-0.02988

-1.56662

-2.06494

2

0.014519

-1.46662

-2.06494

3

0.024519

-1.38662

-2.06494

4

0.03452

-1.32662

-2.01494

5

-0.02988

-1.56662

-2.22494

6

-0.02988

-1.50662

-2.25494

discussion

This visualization tool is aimed to help Hearing Impaired and second language learns in acquisition of speech sounds. To improve realism and accuracy of visible speech production, interface has been developed. Interface comprises of animated head with modeled tongue and lip along with visual cues helps to perceive the position and manner of each speech sounds. The interface used for two purposes. Firstly, used as a speech therapy this shows the articulation process of each speech sound. Secondly, used as speech articulator trainer and to acquire control points, which used as input to perform articulation. Developed interface provide user friendly interface which enable to use without prior training or instruction.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.