Special Form Of Dimensionality Reduction Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The goal of features is to characterize data from measurements whose values are very similar to objects in the same class and very different for objects in a different class. As well as providing discriminatory information (Kostka P et al, 2004), one of the most important functions of feature extraction is a dimensionality reduction of the data. This classification algorithm extracts several features of respiratory signals and utilizes for apnea detection. The feature extraction plays a very important role (Hasselgren M et al) since the classification is completely based on the values of the extracted features. Feature extraction can be done through a technique called signal processing. It can be found in theory, algorithms, architecture, implementation, and applications correlated with processing information (Padhy. P. K et al ,2011) contained in different formats almost designated as signals.

In signal processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g., the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named feature vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.

Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy. Feature extraction creates new features of functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points)

Feature selection techniques provide these main benefits when extracting predictive models.

Improved model interpretability

Shorter training times

Enhanced generalization by reducing over fitting

Feature selection is also useful as part of the data analysis process, as shows which features are important for prediction and how these features are related.


Signal processing deals with processing on or searching of signals in both discrete and continuous time and it is used in an area of mathematics, electrical engineering and system engineering. It includes signals such as sensor data, sound, time-varying measurement values, and images. ECGs, telecommunication transmission signals, control system signals and numerous signals are some of the examples of biological data which are widely used in this area. Signals are the time-varying or spatial varying physical quantities contain some information.

Second order autoregressive modeling (Iyer V.K et al, 1989) for the estimation of parameters is used here. Since the respiratory signal does not have constant amplitude and variations, we cannot predict the features directly. So the respiratory signal is modeled as a second order Auto regressive equation and then the coefficients are calculated. Then by using the coefficient the features of the respiratory signal is extracted.



Feature extraction concerned with reducing the amount of resources required to define a large set of data accurately. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with large number of variables generally requires a large amount of memory and computation power or a classification algorithm which over fits the training sample and generalizes poorly to new samples. The fundamental features of the respiratory signal provide the numerical values (Lee .J et al, 2010) which are compared with the threshold values and the classification results will be produced. Ultimately, the data sets are created based on the classified features.

3.3.1 Second order Auto Regressive modelling

A model based on both inputs and outputs of the system is called an autoregressive-moving-average model (ARMA). The model which depends only on the previous outputs of the system is called an autoregressive model (AR) and the model which depends on the inputs of the system (Lu S et al, 2001) is called a moving average model (MA). Previous outputs are considered as the main factor to find the present output. In our work, AR modeling is used for respiratory signal analysis.

The autoregressive model predicts an output y [n] of a system based on the previous outputs (y [n-1], y [n-2],) and inputs (x [n], x [n-1],) and it is one of a group of linear prediction formulas.

It is known as an infinite impulse response filter (IIR) or an all pole filter in the filter design industry, and it is known as a maximum entropy model in physical applications.

The definition used here is,


Where ai is the auto regression coefficients, and is the series and n are the order of the filter which is very much less than the length of the series generally. The noise term or residue, ∈ in the above is mostly assumed to be Gaussian white noise. The linear weighted sum of previous terms in the series is used to estimate the current term of the series. The weights are auto regression coefficients. AR model is used for:

Each type of process (MA, AR and ARMA) can be converted to other types. An AR model can be found by solving a linear set of equations, unlike the others. An AR spectrum, calculate from a signal of length N.T, can have much better frequency resolution than the 1/ (N.T) of classical estimators. Under certain circumstances, an AR model for Pss (w) can maximize entropy. An AR model can have far fewer coefficients than the corresponding MA model, just as a Butterworth IIR filter has far fewer coefficients than an FIR filter of similar performance.

AR spectral estimation gives a very significant improvement in frequency resolution compared to the traditional periodogram method as implemented by the FFT. The estimated AR spectrum is a continuous function of frequency and it would be evaluated numerically at any number of frequencies which is uniformly spaced or otherwise in the interval, 0 ≤ f ≤ 0.5fs where fs is the sampling frequency.

AR model framework assumes that an all-pole linear filter describes the generation of the signal under consideration (Vanderschoot J et al, 1991) and that the filter is driven by a white noise signal.The AR model therefore specifies the shape of the signals spectrum and it is used for analyzing stationary stochastic processes for different applications like radar, geophysics and economics.

AR coefficients can be computed by a number of techniques are:

Method of moments (Through Yule Walker equations)

Least squares method and

Burg method

The commonly used least squares method is based on the Yule-Walker equations. It involves the use of autocorrelation or covariance estimation. But in some special cases, the Yule-Walker estimation leads to poor parameter estimates, even for moderately sized data samples and it may lead to an unstable model.

3.3.2 Burg's Method

Burg developed a method for spectrum estimation known as a maximum entropy method. As a part of this method, which involves finding an all-pole model for the data and it estimates the AR coefficients by determining reflection coefficients k that minimizes the sum of forward and backward residuals (Sankur B et al,1994). The extension of the algorithm to segments is that the reflection coefficients are estimated by minimizing the sum of forward and backward residuals of all segments taken together. The new weighted Burg algorithm allows combining segments of different amplitudes.

The Burg algorithm finds a set of all-pole model parameters that minimizes the sum of the squares of the forward and backward prediction errors. However, in order to assure that the model is stable; this minimization is performed sequentially with respect to the reflection coefficients. Since the Burg algorithm does not apply a window to the data, the estimates of auto regressive parameters are more accurate than those obtained with the auto correlation method. Burg's technique has the advantages of Stable AR model with high frequency resolution and it is computationally efficient method.

The AR co-efficient calculation using Burg's algorithm is given in appendix.


The fundamental features of respiratory signals (Chowdhury S. K et al, 1981) are given below

1. Energy Index (EI)

2. Respiration frequency (FZX)

3. Dominant frequency estimated by AR modeling (FAR)

4. Strength of the dominant frequency estimated by AR modeling (STR) Energy Index (EI)

Energy index is the maximum amount of energy present in the signal.

Given a continuous-time signal f (t), the energy contained over a finite time interval is defined as follows



Equation (3.2) defines the energy contained in the signal over time interval from T1 till T2. On the other hand, equation (3.3) defines the total energy contained in the signal. If the total energy of a signal is a finite non-zero value, then that signal is classified as an energy signal. Subsidiary the signals which are not periodic turn out to be energy signals.

The equation for computing Energy index is T1 T2


Where N is the number of total data samples.

Energy is defined as the ability to do the work and their Physical units are routinely discarded in digital signal processing and then, the signals are renormalized.

Another common description when x is real is the mean square. When x is a complex sinusoid is any function of the form. Respiration Rate (FZX)

It is defined as the number of breaths a person takes during one minute. The average respiratory rate of a healthy adult at rest is 12-18 breaths per minute. The various respiratory rates according to the age are given in the Table 3.1

Table 3.1 Average Respiratory Rate





Less than 1 year


1-3 Years


3-6 Years


6-12 Years


12-17 Years




Respiration frequency (FZX) was determined by counting the number of times that x (n) cross a baseline which is defined as the square root of EI.


Thus for different sections of the respiration recording, baseline was automatically adjusted to alleviate the artifacts. Zero crossing Algorithm

In mathematical terms, when the sign of a function changes from positive to negative and vice versa, then that point is referred as a zero crossing and it is represented by a crossing the zero value. To estimate the fundamental frequency of speech, counting zero-crossing (Harper .R et al, 1987) is a method used in speech processing... The interval between zero crossings gives a good estimation of its frequency. The representation of the zero crossing is shown in the Fig 3.1

File:Zero crossing.svg

Figure 3.1 Representation of Zero crossing Dominant Frequency (FAR)

In order to obtain the features FAR and STR, the coefficients of a second order AR model have to be estimated (Vannuccini L et al, 2000). The respiration signal can be modeled as a second order autoregressive model as the following


Where e (n) is the prediction error i.e., the error between the actual value and the predicted value and {a1, a2} are AR model coefficients.

The frequency that is occurring more often in a signal is called the dominant frequency of the signal. Using second order autoregressive model coefficients, one can determine the dominant frequency and the signal regularity strength as the following

Dominant frequency (Freq AR) = Freq AR = arctan (3.7)

Where fs is the sampling frequency and arc tan gives the arc tangent of a1/a2 taking into account which quadrant the point (a1, a2 ) is on. A sampling frequency of 250Hz was used for analysis. In addition to features derived above, the average energy of a respiration segment is also calculated. Strength of dominant Frequency (STR)

The AR coefficients are used to form a second auxiliary polynomial and FAR and STR are determined from the locations of a pair of complex conjugate roots that is,

FAR=sampling frequency*angle/360

STR=distance from the origin

Basically, FAR and STR serve the same purpose as power spectrum usually does, indicating the dominant frequency and its corresponding power level. The classification of the signal is based on the derived parameters shown above of the other features extracted a modified zero crossing algorithm and thresholds would be properly initialized to allow accurate classification.

Signal strength (MagAR) = MagAR = (3.8)

Where a1 and a2 are AR coefficients calculated by Burg's algorithm and Mag is the magnitude of the AR coefficients

Only two features were used to classify the respiratory signals so far. Thereby, motion artifact could not be identified correctly. So, many samples with artifact are ignored. In order to identify the motion artifact accurately, the two more features are introduced in this work. Dominant frequency (FAR) and strength of dominant frequency (STR) are the two main features which are discussed above.

3.3.4 Threshold Scheme

The guidelines to make these algorithms more strong, the threshold parameters for classification can be obtained as follows.

The required thresholds also can be determined automatically by the following approach. Divide a typical normal respiration signal of interest into smaller segments and compute the average energy index.

Then 33% and the 150% of the calculated energy index can be used as the low and the high energy threshold respectively. The normal breathing frequency for a human being is usually between 0.2-0.3Hz and maximum frequency is unlikely to exceed 0.7-0.8Hz.Hence these values are used as the minimum and the maximum threshold for the respiration rate.

Using the square root of the energy index as the appropriate baseline value for zero crossing, the number of times the signal crosses the baseline value is recorded and the respiratory frequency is detected from it. A moving baseline is used to allow changes in the average respiration level.

The calculated energy index the respiration rate the dominant frequency and the strength of the signal were compared with the set threshold values and are classified as normal respiration, apnea or respiration with the artifact. Threshold values are determined experimentally and it is given in Table 3.2

Table 3.2 Threshold values used for classification of respiratory signal




33% & 150% of average energy


0.2 Hz &0.7 Hz of average rate


50% & 150% of dominant frequency


75% & 95% of average strength

The average energy, respiration rate, dominant frequency and average strength of dominant frequency are calculated based on the equation given in (3.4) (3.5) (3.7) and (3.8) and it is given in Table 3.3

Table 3.3 Threshold values for a real time respiratory signal




33% & 150% of 0.0379


0.2 Hz &0.7 Hz of 0.1948


50% & 150% of -41.4606


75% & 95% of 1.9822

3.3.5 Data sets

The datasets are created using the above mentioned features which are extracted from the respiratory signal. 16 PSG data from 60 male subjects with or without sleep apnea syndrome available at MIT-BIH (Massachusetts Institute of Technology and Boston's Beth Israel Hospital) polysomnography (Goldberger.A.L et al, 2000) database are selected. The mean age of the subjects is 40 (range: 32-56). The recording time is between 2 and 7h according to the patients. The respiratory signals are digitized at a sampling interval of 250Hz and 12 bits/sample. Polysomography waveforms are displayed on CRT display and edited by using a program called WAVE (Waveform Analyzer, Viewer and Editor) which is developed at Massachusetts Institute of Technology)

The signal used in this work is a direct respiratory signal, obtained from nasal thermostat, i.e., a thermostat placed in the airflow of the nasal passages. This is used as the reference respiration signal. In this database, they have given 15000 samples of each recording.

The available signals from the database are then given as input to the feature extraction coding. The four important features are extracted from the respiratory signal using AR modeling Burg's Algorithm. With these features, dataset comprising of 300, 600, 2400 and 15000 samples are created. These datasets are then given to the neural classifier. Different training algorithms are used, among which best algorithm are chosen. All these measures are calculated in MATLAB version 7.9.


Feature extraction algorithm is proposed based on energy of the signal and Autoregressive modeling. The complete algorithm for the proposed framework is as follows:

The human respiratory signal is taken as samples from MIT-BIH Polysomography database.

The respiratory signal is classified using the program designed in MATLAB 7.9.0

The samples of respiratory signals is given as input to the classifying program

The mathematical model is formed for the input respiratory signal using second order auto regressive modeling.

Four features are extracted from the signal for determining the threshold values.

In these four features, energy index and respiratory rate are calculated using (1) and (5)

The next two features, dominant frequency and strength of dominant frequency are calculated using burgs algorithm as mentioned in (6) and (7)

Threshold values are calculated using the above obtained features in reference to threshold table 1.

For each cycle of the input respiratory signal, again the four features are extracted and compared with the threshold values.

Based on this comparison the respiratory signal is classified.

Respiratory Classification flowchart is shown in Fig 3.2

Compute EI, FZX, FAR, STR



0.2>far1>0 0.7<FZX









str1>STR_HIGH 0.2<fzx1<0.7 far1>0.2








Respiration with artifacts






Eiy<EI_LOW fzx1<0.2




Figure 3.2 Flowchart


Step 1: Start the program

Step 2: Calculate EI, FZX, FAR, STR

Step 3: Check whether eiy is greater than EI_LOW

If yes, check the condition str1>STR_HIGH, 0.2<fzx1<0.7, far1>0.2

Else check fzx1<0.2

Step 4: If the condition fzx1<0.2 is true, then it is apnea.

Else it is unclassified

Step 5: If str1>STR_HIGH 0.2<fzx1<0.7 far1>0.2 is satisfied, then the signal is normal

Else check the next condition Str1>STR_LOW, eiy>EI_HIGH

Step 6: If the above condition is true, then check the condition 0.2>far1>0 0.7<FZX to

Prove the signal is a respiratory signal with artifacts

Else it is an artifact

Step 7: If eiy>EI_HIGH str1<STR_LOW then it is an artifact

If not then check eiy<EI_LOW fzx1<0.2

Step 8: If eiy<EI_LOW fzx1<0.2 is true then it is apnea else normal.


This work focused on extracting the fundamental features of the respiratory signal such as Energy index, Respiration frequency, Dominant frequency, Strength of the dominant frequency using Burg's method through second order autoregressive modeling. Dominant frequency and strength of the dominant frequency are the two main features used to classify the respiratory signals. Samples from the MIT-BIH are used to create the dataset which will be trained out on neural networks.Thereby; respiratory signals are classified for detection of sleep apnea and motion artifacts. Respiratory signal features are used for classification which is carried out using neural networks