Analog And Digital Signals Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Digital Signal Processing (DSP) is concerned with the representation, transformation and manipulation of signals on a computer. Nowadays, DSP has become an important field, and has penetrated a wide range of application systems, such as consumer electronics, digital communications, medical imaging and so on. With the dramatic increase of the processing capability of signal processing microprocessors, it is the expectation that the importance and role of DSP is to accelerate and expand.

DSP stands for Digital Signal Processing - the basis of many areas of technology, from mobile phones to modems and multimedia PCs. A signal in this context can mean a number of different things. The term "digital" comes from "digit", meaning a number so "digital" literally means numerical. Digital Signal Processing is the science of using computers to understand these types of data. A signal is any variable that carries information.

Image processing is important in modern data storage and data transmission especially in progressive transmission of images, video coding (teleconferencing), digital libraries, and image database, remote sensing. It has to do with manipulation of images done by algorithm to produce desired images. Digital Signal Processing (DSP) improve the quality of images taken under extremely unfavourable conditions in several ways: brightness and contrast adjustment, edge detection, noise reduction, focus adjustment, motion blur reduction etc .The advantage is that image processing allows much wider range of algorithms to be applied to the input data in order to avoid problems such as the build-up of noise and signal distortion during processing.


The signal is initially generated is in the form of an analog electrical voltage or current, produced for example by a microphone or some other type of transducer. The output from the readout system of a CD (compact disc) player, the data is already in digital form. An analog signal must be converted into digital form before DSP techniques can be applied. An analog electrical voltage signal, for example, can be digitized using an electronic circuit called an analog-to-digital converter or ADC. This generates a digital output as a stream of binary numbers whose values represent the electrical voltage input to the device at each sampling instant.

Digital signal processing (DSP)-digital representation of signals and the use of digital processors to analyze, modify, or extract information from signals. Many signals in DSP are derived from analogue signals which have been sampled at regular intervals and converted into digital form. The key advantages of DSP over analogue processing are

Guaranteed accuracy (determined by the number of bits used)

Perfect reproducibility

No drift in performance due to temperature or age

Takes advantage of advances in semiconductor technology

Greater flexibility (can be reprogrammed without modifying hardware)

Superior performance (linear phase response possible, and filtering

algorithms can be made adaptive)

Sometimes information may already be in digital form.

There are however (still) some disadvantages

Speed and cost (DSP design and hardware may be expensive, especially with high bandwidth signals)

Finite word length problems (limited number of bits may cause degradation).

Application areas of DSP are considerable:

Image processing (pattern recognition, robotic vision, image enhancement, facsimile, satellite weather map, animation)

Instrumentation and control (spectrum analysis, position and rate control, noise reduction, data compression)

Speech and audio (speech recognition, speech synthesis, text to speech, digital audio, equalisation)

Military (secure communication, radar processing, sonar processing, missile guidance)

Telecommunications (echo cancellation, adaptive equalisation, spread spectrum, video conferencing, data communication)

Biomedical (patient monitoring, scanners, EEG brain mappers, ECG analysis, X-ray storage and enhancement).


As a new convenient means of human-machine interaction, speech recognition is widely applied to many portable embed speech products. The ultimate aim of speech recognition is to make machine understand natural language. It is of great significance not only in practical application but scientific research. The research on speech recognition technology mainly concentrates on two aspects. One is the software running on computer, the other is embedded

systems. The advantages of Embedded systems are high-performance, convenience, cheap and they have huge potential for development.

Voice/Speech is one of the natural forms of communication. Recent development has made it possible to use this in the security system. In speaker identification, the task is to use a voice sample to select the identity of the person that produced the voice from among a population of speakers. In speaker verification, the task is to use a voice sample to test whether a person who claims to have produced the voice has in fact done so. This technique makes it possible to use the speakers' voice to verify their identity and control access to services such as voice dialling, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.

Processing of the speech signal in the autocorrelation domain in the context of robust feature extraction is based on the following two properties: 1) pole preserving property (the poles of a given (original) signal are preserved in its autocorrelation function), and 2) noise separation property (the autocorrelation function of a noise signal is confined to lower lags, while the speech signal contribution is spread over all the lags in the autocorrelation function, thus providing a way to eliminate noise by discarding lower-lag autocorrelation coefficients).

There is a long history of feature extraction algorithms for Automatic Speech Recognition (ASR) that use techniques based on processing in the autocorrelation domain. The attraction of autocorrelation domain processing can be illustrated easily by taking a simple example, where a speech signal for a vowel frame (/iy/) is corrupted by additive white noise. Figures 1, 2 and 3 show the autocorrelation sequences of clean speech, white noise, and corrupted speech, respectively, for this example. It is well known that if the speech signal and white noise sequence are uncorrelated, the autocorrelation of their sum is equal to the sum of their autocorrelations. Furthermore, as seen from Fig. 2, the autocorrelation sequence of the white noise sequence is an impulse-like signal. These properties combine to show that the contribution from white noise in the autocorrelation sequence is neatly contained in the zero-lag coefficient, while the contribution from the information carrying speech signal is spread over a broad range of lag indexes (Fig. 1). When we consider more realistic coloured noises occurring in real life (such as car noise, babble noise, etc.), their contribution to the autocorrelation sequence may spread away to lags greater than zero (as shown in Fig. 4), but it is still confined to relatively lower lags. Therefore, noise-robust spectral estimates should be possible through algorithms that focus on higher lag autocorrelation coefficients. The autocorrelation sequence has another important property, which states that it preserves in it the poles of the original signal sequence, as illustrated by McGinn and Johnson. Assuming the original signal to be an all-pole sequence generated by an all-pole model that has been excited by a single impulse, they showed that the poles of the autocorrelation sequence are the same as the poles of the original sequence. The concept of the pole preserving property was extended by Mansour and Juang to include an impulse train excitation and a white Gaussian noise excitation. These are better approximations of the excitation source for voiced and unvoiced speech signals, respectively. The pole preserving property is important when processing in the autocorrelation domain. It means spectral estimates made with the autocorrelation sequence will show poles in the same place as estimates made with the original time domain signal, thus the autocorrelation domain processing will provide information about the signal similar to that obtained from the original signal directly.

A number of techniques have been proposed in the literature based on autocorrelation domain processing. The first technique proposed in this area was based on the use of High-Order Yule- Walker Equations (HOYWE), where the autocorrelation coefficients that are involved in the equation set exclude the zerolag coefficient. Other similarmethods have been used that either avoid the zero-lag coefficient, or reduce the contribution from the first few coefficients. All of these methods are based on linear prediction (LP) approach and provide some robustness to noise, but their recognition performance for clean speech is much worse than the unmodified or conventional LP approach.

A potential source of error in using LP methods to estimate the power spectrum of a varying SNR signal is highlighted by Kay. He showed that the model order is not only dependent on the AR process, but also on the prevailing SNR condition.

Figure 1: Autocorrelation sequence of vowel /iy/.

Figure 2: Autocorrelation sequence of white Gaussian noise.

Figure 3: Autocorrelation sequence of corrupted /iy/ with white Gaussian noise.

Figure 4: Autocorrelation sequence of car noise.


Speaker recognition methods can be divided into text-independent and text-dependent methods. In a text-independent system, speaker models capture characteristics of somebody's speech which show up irrespective of what one is saying. In a text-dependent system, on the other hand, the recognition of the speaker's identity is based on his or her speaking one or more specific phrases, like passwords, card numbers, PIN codes, etc. Every technology of speaker recognition, identification and verification, whether text-independent and text dependent, each has its own advantages and disadvantages and may require different treatments and techniques. The choice of which technology to use is application-specific. At the highest level, all speaker recognition systems contain two main modules feature extraction and feature matching.

Voice Recognition Basics

Fig.1 shows the Voice recognition algorithm flow. A typical speech recognition system starts with the Mel Frequency Cepstrum Coefficient (MFCC) feature analysis stage, which is composed of the following items: 1) Pre-emphasis. 2) Divide the speech signal into frames. 3) Apply the hamming window. 4) Compute the MFCC feature. The second stage is vector quantization stage. In this stage, codebook is used to quantize the MFCC feature and get MFCC feature vector. The codebook is generated on compute via LBG arithmetic, and is downloaded to ROM. The last stage is recognition, which is performed by using a set of statistical models i.e. hidden Markov models (HMM). In this stage, the probability.

Voice Recognition Algorithm Flow

The MFCC processor

A block diagram of the structure of an MFCC processor is given in Figure 1. The speech input is recorded at a sampling rate of 22050Hz. This sampling frequency is chosen to minimize the effects of aliasing in the analog-to-digital conversion process. Figure 2. shows the block diagram of an MFCC processor .

Figure 2 Block diagram of the MFCC processor

Mel-frequency wrapping

The speech signal consists of tones with different frequencies. For each tone with an actual

Frequency, f, measured in Hz, a subjective pitch is measured on the 'Mel' scale. The mel-frequency scale is a linear frequency spacing below 1000Hz and a logarithmic spacing above 1000Hz. As a reference point, the pitch of a 1kHz tone, 40dB above the perceptual hearing threshold, is defined as 1000 mels. Therefore we can use the following formula to compute the mels for a given frequency f in Hz:

mel(f)= 2595*log10(1+f/700) ……….. (1)

One approach to simulating the subjective spectrum is to use a filter bank, one filter for each desired melfrequency component. The filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a constant mel-frequency interval.


In the final step, the log mel spectrum has to be converted back to time. The result is called the mel frequency cepstrum coefficients (MFCCs). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis. Because the mel spectrum coefficients are real numbers(and so are their logarithms), they may be converted to the time domain using the Discrete Cosine Transform (DCT). The MFCCs may be calculated using this equation:

where n=1,2,….K

The number of mel cepstrum coefficients, K, is typically chosen as 20. The first component, c0 , is excluded from the DCT since it represents the mean value of the input signal which carries little speaker specific information. By applying the procedure described above, for each speech frame of about 30 ms with overlap, a set of mel-frequency cepstrum coefficients is computed. This set of coefficients is called an acoustic vector. These acoustic vectors can

be used to represent and recognize the voice characteristic of the speaker. Therefore each

input utterance is transformed into a sequence of acoustic vectors. The next section describes how these acoustic vectors can be used to represent and recognize the voice characteristic of a speaker.

MFCC Feature analysis

Figure 2 shows the process of creating MFCC features. The first step is to be taken the Discrete Fourier Transform (DFT) of each frame. Certain amount of 0s are added to the end of Time-domain signal s(n) of each frame, in order to form the sequence of N-length. And then the DFT of each frame is taken to get the linear spectrum X (k) . In the second step, linear spectrum X (k) is multiplied by the Mel frequency filter banks and converted to Mel spectrum. Mel frequency filter banks are several band pass filtersH (k) m , and each band pass filter is defined as follows:

Where 0 ≤ m < M , M is the number of the band pass filters, and f (m) is the central frequency. The third step is to be taken the logarithm of Mel spectrum to get logarithmic spectrum S(m) .

Thus, the transfer function from linear spectrum X (k) to logarithmic spectrum S(m) is In the last step, logarithmic spectrum S(m) is transformed into cepstrum frequency by Discrete cosine Transform (DCT)in order to yield MFCC feature.

MEL Frequency Cepstral Coefficients (MFCC) are used extensively in Automatic Speech Recognition (ASR). MFCC features are derived from the FFT magnitude spectrum by applying a filter bank which has filters evenly spaced on a warped frequency scale. The logarithm of the energy in each filter is calculated and accumulated before a Discrete Cosine Transform (DCT) is applied to produce the MFCC feature vector. The frequency warping scale used for filter spacing in MFCC is the Mel (Melody) scale. The Mel scale is a perceptually motivated scale that was first suggested by Stevens and Volkman in 1937. The scale was devised through human perception experiments where subjects were asked to adjust a stimulus tone to perceptually half the pitch of a reference tone. The resulting scale was one in which 1 Mel represents one-thousandth of the pitch of 1 kHz and a doubling of Mels produces a perceptual doubling of pitch. The Bark scale provides an alternative perceptually motivated scale to the Mel scale. Speech intelligibility perception in humans begins with spectral analysis performed by the basilar membrane (BM). Each point on the BM can be considered as a bandpass filter having a bandwidth equal to one critical bandwidth or one Bark. The bandwidth of several auditory filters were empirically observed and used to formulate the Bark scale. It will be shown in this paper that an MFCC like feature, based on the Bark scale and referred to as BFCC, yields similar performance in speech recognition experiments as MFCC. The performance of MFCC and BFCC features are also compared to Uniform Frequency Cepstral Coefficients (UFCC). It will be shown that the scale used to space the filter bank provides little advantage, especially when the training and testing conditions match.

Vector Quantization

Due to the discrete hidden markov model is used, it is necessary to transform continuous MFCC feature which has been yielded into discrete MFCC feature. Vector quantization is to map one K dimensional vector X € X~ € RK to another K dimensional quantize vector , in where X is input vector, Y is quantize vector or codeword, X~ is source space, N Y~ is output space, N is the size of codebook, and RK is K dimensional Euclidean space.

The process of quantizing vector X is to search a codeword which is the nearest one from the vector X in codebook N Y~ .Square distortion measure is applied to calculate distortion, which is defined as

Fourier Transform

In short, A bridge between the time domain and the frequency domain of a physical process. Fourier and Laplace transforms, are widely used in solving problems in science and engineering. The Fourier transform is used in linear systems analysis, antenna studies, optics, random process modeling, probability theory, quantum physics, and boundary-value problems and image processing tool which is used to decompose an image into its sine and cosine components. The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression.

Properties of the Fourier transform

The Fourier transform, in essence, decomposes or separates a waveform or function into sinusoids of different frequency which sum to the original waveform. It identifies or distinguishes the different frequency sinusoids and their respective amplitudes.

The Fourier transform of f(x) is defined as

Applying the same transform to F(s) gives

If f(x) is an even function of x, that is f(x) = f(-x), then f(w) = f(x). If f(x) is an odd function of x, that is f(x) = -f(-x), then f(w) = f(-x). When f(x) is neither even nor odd, it can often be split into even or odd parts.

To avoid confusion, it is customary to write the Fourier transform and its inverse so that they exhibit reversibility:

so that

as long as the integral exists and any discontinuities, usually represented by multiple integrals of the form ½[f(x+) + f(x-)], are finite. The transform quantity F(s) is often represented as and the Fourier transform is often represented by the operator

There are functions for which the Fourier transform does not exist; however, most physical functions have a Fourier transform, especially if the transform represents a physical quantity. Other functions can be treated with Fourier theory as limiting cases. Many of the common theoretical functions are actually limiting cases in Fourier theory.

Usually functions or waveforms can be split into even and odd parts as follows


and E(x), O(x) are, in general, complex. In this representation, the Fourier transform of f(x) reduces to

It follows then that an even function has an even transform and that an odd function has an odd transform. Additional symmetry properties are shown in below table



real and even

real and even

real and odd

imaginary and odd

imaginary and even

imaginary and even

complex and even

complex and even

complex and odd

complex and odd

real and asymmetrical

complex and asymmetrical

imaginary and asymmetrical

complex and asymmetrical

real even plus imaginary

odd real

real odd plus imaginary

even imaginary





Symmetry Properties of the Fourier Transform


Back ground theory of Fast Fourier transform

The Fourier transform, named after the French mathematician Jean Baptise Joseph, Baron de Fourier (1768-1830), is a mathematical tool to convert a time-domain signal into a frequency-domain signal. It has been extensively used by communication engineers, physicists and statisticians. It can simplify the mathematics, and also it makes the phenomenon of interest easier to understand.

In mathematics, the Fourier transform (often abbreviated FT) is an operation that transforms one complex-valued function of a real variable into another. In such applications as signal processing, the domain of the original function is typically time and is accordingly called the time domain. That of the new function is frequency, and so the Fourier transform is often called the frequency domain representation of the original function. It describes which frequencies are present in the original function. This is in a similar spirit to the way that a chord of music can be described by notes that are being played. In effect, the Fourier transform decomposes a function into oscillatory functions. The term Fourier transform refers both to the frequency domain representation of a function and to the process or formula that "transforms" one function into the other.

The Fourier transform and its generalizations are the subject of Fourier analysis. In this specific case, both the time and frequency domains are unbounded linear continuations. It is possible to define the Fourier transform of a function of several variables, which is important for instance in the physical study of wave motion and optics. It is also possible to generalize the Fourier transform on discrete structures such as finite groups, efficient computation of which through a fast Fourier transform is essential for high-speed computing.

A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory; this article gives an overview of the available techniques and some of their general properties, while the specific algorithms are described in subsidiary articles linked below.

A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields (see discrete Fourier transform for properties and applications of the transform) but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the obvious way, using the definition, takes O(N 2) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions-in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N/log(N). This huge improvement made many DFT-based algorithms practical; FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers.

The most well known FFT algorithms depend upon the factorization of N, but (contrary to popular misconception) there are FFTs with O(N log N) complexity for all N, even for prime N. Many FFT algorithms only depend on the fact that is an Nth primitive root of unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms.

HMM Recognition

Very efficient programs for searching a text for a combination of words are available on many computers. The same methods can be used for searching for patterns in biological sequences, but often they fail. This is because biological 'spelling' is much more sloppy than English spelling: proteins with the same function from two different organisms are almost certainly spelled differently, that is, the two amino acid sequences differ. It is not rare that two such homologous sequences have less than 30% identical amino acids. Similarly in DNA many interesting signals vary greatly even within the same genome. Some well-known examples are ribosome binding sites and splice sites, but the list is long. Fortunately there are usually still some subtle similarities between two such sequences, and the question is how to detect these similarities. The variation in a family of sequences can be described statistically, and this is the basis for most methods used in biological sequence analysis, for a presentation of some of these statistical approaches. For pairwise alignments, for instance, the probability that a certain residue mutates to another residue is used in a substitution matrix, such as one of the PAM matrices. For finding patterns in DNA, e.g. splice sites, some sort of weight matrix is very often used, which is simply a position specific score calculated from the frequencies of the four nucleotides at all the positions in some known examples. Similarly, methods for finding genes use, almost without exception, the statistics of codons or dicodons in some form or other.

A hidden Markov model (HMM) is a statistical model, which is very well suited for many tasks in molecular biology, although they have been mostly developed for speech recognition since the early 1970s.The most popular use of the HMM in molecular biology is as a 'probabilistic profile' of a protein family, which is called a profile HMM. From a family of proteins (or DNA) a profile HMM can be made for searching a database for other members of the family. These profile HMMs resemble the profile and weight matrix methods and probably the main contribution is that the profile HMM treats gaps in a systematic way. The HMM can be applied to other types of problems. It is particularly well suited for problems with a simple 'grammatical structure,' such as gene finding. In gene finding several signals must be recognized and combined into a prediction of exons and introns, and the prediction must conform to various rules to make it a reasonable gene prediction. An HMM can combine recognition of the signals, and it can be made such that the predictions always follow the rules of a gene.

Introduction to Hidden Markov Models

A hidden Markov model is defined by specifying five things:

Q = the set of states = {q1; q2..., qn}

V = the output alphabet = {v1; v2...., vm}

Ï€(i) = probability of being in state qi at time t = 0 (i.e., in initial states)

A = transition probabilities = {aij},where aij = Pr[entering state qj at time t+ 1 j in state qi at time t]: Note that the probability of going from state i to state j does not depend on the previous states at earlier times; this is the Markov property.

B = output probabilities = {bj(k)},where bj(k) = Pr[producing vk at time tj in state qj at time t]

Figure 4: A Hidden Markov Model

As an example, two biased coins, which are flipping, and an observer is seeing the results of our coin flips (not which coin we're flipping). The process is describing in Figure 4. Here, the states of the HMM are q1 and q2 (the coins), the output alphabet is {H; T}, and the transition and output probabilities are as labelled. If we let π(q1) = 1 and (q2) = 0 then the following is a example of a possible transition sequence and output sequence for the HMM in Figure 1:

We can easily calculate probabilities for the following events.

1. The probability of the above state transition sequence:

Pr[q1q1q1q2q2q1q1] = π(q1)a11a11a12a22a21a11 ~0.025

2. The probability of the above output sequence given the above transition sequence:

3. The probability of the above output sequence and the above transition sequence:

Pr [(HHTTTTH) ^ (q1q1q1q2q2q1q1)]= (0:025) . (0:023)~ 5:7 * 10-4

While we just considered the case where we knew both the results of the coin flips, and which coin was being flipped, in general, we consider the case where we do not know which coins are being flipped. That is, while we know the underlying model is as described in Figure 1, and we observe the output symbol sequence, the state sequence is \hidden" from us. In this case, we can also figure out the answers to the following questions:

1. What is the probability of the observed data O1;O2;......OT given the model? That is, calculate Pr(O1;O2;.....OT model).

2. At each time step, what state is most likely? It is important to note that the sequence of states computed by this criterion might be impossible. Thus more often we are interested in what single sequence of states has the largest probability. That is, and the state sequence q1; q2; : : : ; qT such that Pr(q1; q2; .......; qT (O1;O2; : : :OT ; model) is maximized.

3. Given some data, how do we \learn" a good hidden Markov model to describe the data? That is, given the topology of a HMM, and observed data, how do we and the model which maximizes Pr (observations| model)? To answer the first two questions, we can use dynamic programming techniques. To compute the answer to the last question, we can use the Baum-Welch method.

Building HMM-Profiles

Let's see how we can use HMMs to model them. Here are the some of the profile for the alignment:





Ignoring the background frequencies for now, a profile for this alignment can be viewed as trivial HMMs with one match state for each column, where consecutive match states are separated by transitions of probability 1. To define output probabilities for each of these match states; these come from the probability of observing a particular amino acid in the corresponding column (i.e., these are identical to the probabilities we compute per column for the original profile method).We also introduce dummy" begin and end states which emit no output symbols. This trivial HMM is illustrated in Figure 2.

Figure 5: A profile-HMM

Now let's extend our model to handle insertions. Insertions are portions of sequences that do not match anything in the above model. We will introduce insert states Ij , which will model inserts after jth column in our alignment. See Figure 6. Typically, the output probabilities for insert states are set equal to the background probabilities. Note that we can have different probabilities for entering different insert states, and this model the fact that insertions may be less well-tolerated in certain portions of the alignment. Also, for any particular insert state, we may have different transition probabilities for entering it for the first time vs. staying in the insert state; this models affine gap penalties.

Figure 6: Insert States

One could model deletions is as in Figure 7. However, arbitrarily long gaps introduces

lots of transitions in the model. Instead, we will introduce delete" states that do not emit any symbols (see Figure 8). The structure of the complete HMM with both inserts and deletes is shown in Figure98.

Figure 7: Possible deletions

Figure 8: Deletions

Figure 9: The complete HMM formation

We have just given the overall topology of a profile-HMM, but we still need to decide how many states our HMM has, what the transition probabilities are, etc. As an example, let's build an HMM profile for following multiple sequence alignment:








How do we pick the length of the HMM (i.e., how many match states do we have in the profile?). One heuristic is to only include those columns that have amino acids in at least half of the sequences. For example, in the above alignment, there would be match states for each column except for the fourth and fifth columns. How do we pick output probabilities for match states? We will use the same technique as was used with building our non-HMM profiles. For example, for the first match state we have:

In reality, we must connect for zero frequency case. As mentioned in the last lecture, a common way of doing this by adding a small amount to every frequency (e.g., the add-one" rule). Using the add-one rule, our probabilities would be:

How do we pick transition probabilities? We let the transition probability of going from state k to state l akl be equal to:

number of times go from state k to state l

the number of times go from state k to any other state

So, to calculate the probability of transition from match state 1 to match state 2, we count the number of times we get a match (=6) in the second column, as well as the number of gaps (=1). (Note, using the initial alignment and our model, we only have insertions after the third match state.)

Again, using the add-one rule, we correct our probabilities to be:

The rest of the parameters are calculated analogously. In the next lecture, we will address the question of how we use a given profile model of a protein family to figure out whether a new protein sequence is a member of that family.

We can actually build profiles from unaligned sequences using the Baum-Welch procedure,

but we won't have time to go over this. Note, however, the topology of the HMM is fixed before learning. We might not always know the number of relevant positions in the family (i.e., the number of conserved positions). One heuristic to get around this is as follows. First, guess the number of states by choosing an initial length. Then, after learning, if more than half of the paths of sequences choose the delete state at a particular position, do \model surgery" and remove the whole position. If more than half of the paths choose an insert state in a position, then add insert states at this position, with the number of new states equal to the average number of inserts. In practice, methods for learning HMMs are prone to getting caught in local minima, and, so it is best to start with a good initial guess (in our case, an alignment), and to start with different starting parameters. In fact, mostly HMM-profiles are built just from alignments, similar to the way we have just described via our example (i.e., without trying to learn the parameters).

The role of HMM Recognition is to find out the maximum probability of the HMM which has generated the feature vector, according to the given feature vector. The given HMM parameters λ = {π , A, B} ( { }, { }, { }) i ij jk π = π A = a B = b , and the observation sequence O= O1 ,O2 , ,OT , in where N is the number of HMM states, ( j) t δ is the highest probability along with a single path, at time t , which accounts for the first observations and ends in state j , ( j) tϕ is the HMM state at time t .

The detailed algorithm is defined as follow:

Algorithm improving

In practice, π , Aand B are decimal fractions between 0 and 1. It is not conducive for FPGA to implement decimal fraction operation, because decimal fraction multiplication may cause the problem of gross underflow when T is larger than a threshold. So it is important to take the logarithm of π , A and B before operation. When π , A and B are transformed to logarithmic probability π ' , A' and B' , floating point numbers multiply operation is transformed to integer addition operation. In addition, considering taking out the sign bit before operation, (4) and (5) should be changed to


Matlab is a commercial "Matrix Laboratory" package which operates as an interactive programming environment. It is a mainstay of the Mathematics Department software lineup and is also available for PC's and Macintoshes and may be found on the CIRCA VAXes. Matlab is well adapted to numerical experiments since the underlying algorithms for Matlab's builtin functions and supplied m-files are based on the standard libraries LINPACK and EISPACK.

Matlab program and script files always have filenames ending with ".m"; the programming language is exceptionally straightforward since almost every data object is assumed to be an array. Graphical output is available to supplement numerical results.

IMREAD Read image from graphics file.

A = IMREAD(FILENAME,FMT) reads a grayscale or color image from the file

specified by the string FILENAME. If the file is not in the current directory, or in a directory on the MATLAB path, specify the full pathname. The text string FMT specifies the format of the file by its standard file extension. For example, specify 'gif' for Graphics Interchange Format files. To see a list of supported formats, with their file extensions, use the IMFORMATS function. If IMREAD cannot find a file named FILENAME, it looks for a file named FILENAME.FMT.

IMFINFO Information about graphics file.

INFO = IMFINFO(FILENAME,FMT) returns a structure whose fields contain information about an image in a graphics file. FILENAME is a string that specifies the name of the graphics file, and FMT is a string that specifies the format of the file. The file must be in the current directory or in a directory on the MATLAB path. If IMFINFO cannot find a file named FILENAME, it looks for a file named FILENAME.FMT.

IMWRITE Write image to graphics file.

IMWRITE(A,FILENAME,FMT) writes the image A to the file specified by

FILENAME in the format specified by FMT.

A can be an M-by-N (grayscale image) or M-by-N-by-3 (color image) array. A cannot be an empty array. If the format specified is TIFF, IMWRITE can also accept an M-by-N-by-4 array containing color data that uses the CMYK color space.

FILENAME is a string that specifies the name of the file.

SIZE Size of array.

D = SIZE(X), for M-by-N matrix X, returns the two-element row vector D = [M,N] containing the number of rows and columns in the matrix. For N-D arrays, SIZE(X) returns a 1-by-N vector of dimension lengths. Trailing singleton dimensions are ignored.

[M,N] = SIZE(X) for matrix X, returns the number of rows and columns in

X as separate output variables.

IMSHOW Display image in Handle Graphics figure.

IMSHOW (I) displays the grayscale image I. IMSHOW(I,[LOW HIGH]) displays the grayscale image I, specifying the display range for I in [LOW HIGH]. The value LOW (and any value less than LOW) displays as black, the value HIGH (and any value greater than HIGH) displays as white. Values in between are displayed as intermediate shades of gray, using the default number of gray levels. If you use an empty matrix ([]) for [LOW HIGH], IMSHOW uses [min(I(:)) max(I(:))]; that is, the minimum value in I is displayed as black, and the maximum value is displayed as white.

MEAN Average or mean value.

For vectors, MEAN(X) is the mean value of the elements in X. For matrices, MEAN(X) is a row vector containing the mean value of each column. For N-D arrays, MEAN(X) is the mean value of the elements along the first non-singleton dimension of X.

MIN Smallest component.

For vectors, MIN(X) is the smallest element in X. For matrices, MIN(X) is a row vector containing the minimum element from each column. For N-D arrays, MIN(X) operates along the first non-singleton dimension.

MAX Largest component.

For vectors, MAX(X) is the largest element in X. For matrices, MAX(X) is a row vector containing the maximum element from each column. For N-D arrays, MAX(X) operates along the first non-singleton dimension.

DOUBLE Convert to double precision.

DOUBLE(X) returns the double precision value for X. If X is already a double precision array, DOUBLE has no effect.

DOUBLE is called for the expressions in FOR, IF, and WHILE loops if the expression isn't already double precision. DOUBLE should be overloaded for all objects where it makes sense to convert it into a double precision value.

RAND Uniformly distributed pseudo-random numbers.

R = RAND(N) returns an N-by-N matrix containing pseudo-random values drawn from a uniform distribution on the unit interval.

RAND(M,N) or RAND([M,N]) returns an M-by-N matrix.

RAND(M,N,P,...) or RAND([M,N,P,...]) returns an M-by-N-by-P-by-... array.

RAND with no arguments returns a scalar.

RAND(SIZE(A)) returns an array the same size as A.