Analysis of theoretical approaches to speech perception


Speech perception is vital to language used in day to day lives. When someone speaks, the air pressure fluctuates and the waves strike the ears and in some way the individual turns these sound waves into meaningful consideration what the speaker is saying about. So, speech perception is important for human communication (Smith 2007). The core concept in speech perception is to explain the mechanism of perceiving the words correctly despite of inconsistent information provided by speech signals. It can be understood by the fact that human can perceive as many as fifty phonemes per second in a language in which the individual is fluent. This suggests that speech is perceived with marked rapidity. On contrary when the individual is not fluent in a particular language (e.g. foreign language) can perceive only about two third of single phone per second (of non speech sound) (Sternberg 2009:351). Over the last five decades researcher in the field of speech perception focused more attention on establishing the relationship between the properties of acoustic signal and linguistic components (phonemes and distinctive features). This has turned out the result to be more complex, and still how human perceive speech is not ascertain. Thus, this extensive research for explanation has given rise to important theoretical perspective on speech perception (ncbi 2009). The purpose of this essay is to critically evaluate various theoretical approaches toward speech perception. The most influential theories of speech perception include Motor theory, Cohort theory, and TRACE model (Eysenck 1995:280). First the essay will scrutinize the main claims proposed by motor theory; then it will examine the critique of cohort theory and finally analyze the TRACE model. A conclusion can be drawn that among all of these above mentioned theories cohort theory is better, scientific, and there is good evidence which suggests rationale justification of speech perception ( n.d.).

The motor theory was the first and the most influential theory of speech perception proposed by Liberman et al. (1967). This theory is based on the following assumptions: Speech perception is perception of the articulatory gestures. 'Phenomenon of speech perception as special' (Sternberg 2009:352). Involvement of the motor system in perception of the speech (ncbi 2009).

Speech perception is perception of articulatory phonetic gestures. 'Listeners engage in a certain amount of mimicking of articulatory movements of the speaker' (Eysenck 1995: 280). This is a controversial claim of the motor theory of speech perception, it state that the phonetic gestures produced in the air are not the objects of speech perception; instead the gestures of the vocal tract of the speaker are the real objects of speech perception (ncbi 2009). The evidence which supports this claim that the object of speech perception is gestural and not acoustic is the finding that 'the hand motor system to be activated by linguistic tasks, most notably pure linguistic perception but not by auditory or visuospatial processing' (ebscohost 2003)

'Phenomenon of speech perception as special' (Sternberg, P.352): It is difficult to evaluate this claim because the term 'special' in itself is ambiguous. At least three possibilities can be drawn by the term 'special' as speech perception is special with respect to audition or speech perception is special with respect to audition that mean application of motor system in speech perception or speech is produced and process by a special neural track (ncbi 2009). There are no sufficient evidence available to support this claim so, should probably be retarded (talkingbrains 2008).

The third claim proposed by motor theory of speech perception is recruitment of motor system in perception of speech. This claim have some strength, there is now evidence that perceiving speech involves neural activity of the motor system. It was found from the study that there is activation of speech related muscles when the motor cortex is stimulated by transcranial magnetic stimulation. It has been shown by the recent research 'that when listeners hear utterances that include lingual consonants, they show enhanced muscle activity in the tongue' (Fadiga et al. 2002 cited in ncbi 2009) Another researcher found 'that both while listening to speech and while seeing speech-related lip movements, people show enhanced muscle activity in the lips' (Pulvermller et al. 2006 cited in ncbi 2009). These findings suggest the involvement of motor system in speech perception (ncbi 2009) Out of three proposed claims only one claim has strength. Due to the inadequate research evidence and mark ambiguity in motor theory it can not be relied upon and cannot be supported.

When considering other theories of speech perception as TRACE model and cohort theory are in competition. These both theories share some similarity as 'several sources of information combine interactively to achieve word recognition' (Eysenck 1995: 283). TRACE model is hierarchically structured, interactivation network model proposed by McClleland and Elman (1986) and McClleland (1991). It works on connectionist principles'. It is based on the following theoretical assumptions that 'there are individual processing units or nodes at three different levels they are features, phonemes and words'. These processing units are connected to each other and these nodes have inhibitory connections at the same level and between levels are facilitatory (Eysenck 1995: 283). Processing proceeds by applying idealized spectral presentation to the feature level (Allopena et al. 1998). Interactivity is the hallmark of TRACE model and activation is radical. The important advantages of this model over cohort theory are: it has been well studied, clear computational implementation is available and it makes direct test of behavior of the model comparatively easy to carry out. Despite the fact that TRACE model is well explored it incorporates a distinct weaknesses as its architecture psychologically it is implausible (Jusczyk 2002). It does not take into account the parameters like accent, ambient acoustic characteristics and rate of speaking. Importantly priming is not possible as there is no relevance between the activation of given set of nodes and those nodes representing same features elsewhere in the statement. In a nutshell these disadvantages are due to the facts that aren't exceptional representation for given linguistic structures (Hawaii 2006).

In succinct about Cohort theory it was proposed by Marslen-Wilson and Tyler (1980). It is based on interactive approach rather than alternating serial processing approach and its activation is constrained. There is no lexical competition via lateral inhibition (Jusczyk 2002). The other simple ideas of cohort theory on which it is based are: on listening of the start of the word all neighbor hood candidates (called cohort) becomes activated. Further the words belonging to this are drop down till one is left the words drops because they are inconsistent to the context this is 'recognition point' of a word (Eysenck 1995). Behavioral and modeling evidence suggest that during auditory word recognition the word compete and the driving factor is the phonological similarity in this word (cohort) competes (Desroches et al. 2008). There are some potential aspects of cohort theory as: it uses distributed representational approach sequentialization by employing recurrent network and activation flow is feed forward (Hawaii 2006). Information flows from features to words and bottom up process is facilitative and inhibitory. It proposed 'explicit mechanism for the effect of context on word recognition. Cohort theory focuses more attention on 'temporal dynamics of spoken word recognition.'(Jusczyk 2002: 13). A study was carried out by using event related potentials (ERPs) to examine the temporal dynamic of spoken word (cohort) competition. The findings of this research support the cohort theory that the phoneme level and lexical information are engaged concurrently during auditory word recognition that permits bottom up and top down competition effects (Desroches et al. 2008).

Despite the well recognition of this theory there are some drawbacks associated with it as: Initial part of the word is given undue importance. Cohort theory states that 'spoken word recognition proceeds in extremely efficient fashion' instead processing is less efficient (Eysenck 1995: 282). It has been found from the study that the subject made use of subsequent context rather than preceding to listen, in non-favorable listening condition (Shillock and Altman 1988 cited in Eysenck 1995).

Conclusion: To sum it up the cohort theory is more significant in comparison to TRACE model and motor theory because of its strengths and evidence supports this theory. Even after fifty years of Liberman motor theory there is no significant achievement in understanding how speech is perceived? Even after great advancement in the technology there is still modest research development in the field of speech perception. This piece of work identify the gaps between these theories and focus attention on those theories which needs further development and which theory needs to be retarded.