Looking At Real Data Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

To quantify the performance of a new clutter for rejecting the clutter, the Doppler data from URI downloaded and generate Doppler IQ using MATLAB (MathWorks, Inc., Natick, MA). Ultrasound research interface (URI) and Ultrasound research interface offline processing tools (URI-OPT) are software and sample data. In this work we will concentrate on URI-OPT. URI-OPT are a Matlab based program for reading and processing the RF data acquired from a URI-equipped Antares system. URI-OPT can be used to display different Doppler imaging mode. One of the most important modes that we are interested in is spectral Doppler mode, which is used to display the Doppler spectrum of RF data. The speed of flow information within the Doppler range gate is displayed as gray scale intensities at a time versus velocity plot.

The data used are data of Doppler spectrum collected from URIDmode. The data tested first on the program to display the spectrum, and then the data extracted and stored in Matlab. Matlab program was developed to read the saved data and then generate Doppler In-phase/Quadrature (IQ) data, which is used to test our proposed clutter rejection filters and comparisons between different types of clutter filters. The parameters used to generate the Doppler IQ data illustrated in table 6.1. The generated Doppler IQ data is a complex matrix X in 100 x 7923.

Table 6.1. Parameters used to generate Doppler IQ

Data Parameters


First value


Last value


Range gate start


Range gate size


Vector group


Real group






The complex data matrix X obtained can be expressed as:


Where M is the number of pulses and N is the axial sample volume. Each column in the matrix X represents a vector with length M.

The input sample vector to clutter rejection filter with index depth equal to n, can be represented by the following expression:

, n = 1, …, N (6.2)

6.2 Signal Model

The generated Doppler signal data originated not only from blood flow, but also it originated from different tissue regions with different motion patterns, the clutter Doppler signal is a sum of contributions from different regions, and figure 6-1 shows the Doppler signal from blood. We assume that the resulting signal consists of a blood signal component b originated from the reflected echo from the moving red blood cells, a clutter component c originated from surrounding and moving tissue and white noise n originated from electronics or any other component. The signal can be modeled as:


The signal characterized by the correlation matrix [3]. The correlation matrix Rx given by:


In our case the correlation matrix expressed as


Where, Rc is the clutter correlation matrix, σn is the noise variance, Rb is the blood correlation matrix and I is the identity matrix.

The three components originated from different source and are statistically independent. Thus with the proposed methods we can easily determine the basis vectors that are statistically independent [142].

Figure 6-1 Doppler signal spectrums

The Doppler IQ data prepared to satisfy our proposed clutter rejection method based on ICA and PCA by doing some preprocess steps, such as applying discrete Fourier transform (FFT) and the absolute value to the data so as to remove the imaginary values. Assume that our input signal f(x,y) is a function of 2-D space define over an x-y plane. The two-dimensional FFT takes a complex array and expressed by using the following form:


A small window has taken for testing our clutter rejection filters. The result Doppler IQ signal illustrated in figure 6-2, only 8 signals were shown for simplicity.

Figure 6-2. The generated Doppler IQ signal for simulation

The Doppler data preparation and cluttering process illustrated in figure 6-3, in data preparation the Doppler data generated and prepared for cluttering, in cluttering steps the Doppler signal with two peaks (clutter and flow peak) applied to the filter, and then the spectrum of the filtered signal calculated to give the blood flow signal spectrum only.

Figure 6-3 Pre-preparation and cluttering process with different filters

6.3 Cluttering with PCA

Principal component analysis (PCA) is the techniques that based on sophisticated mathematical principle to transform correlated variables into smaller numbers of variables known as principle components (PCs). The PCs are calculated as the eigenvectors of the covariance matrix of the data [26]. The variance corresponding to these eigenvectors are denoted as the eigenvalues. PCA is one of the most useful tools in modern data analysis, because it is simple and non-parametric methods for extracting useful information from perplexing data set. PCA uses a vector space transform to achieve the reduction and de-noising of the large number of data set. This is particularly useful in application of PCA if a set of data used has many variables lies in actuality, close to two-dimensional plane [19, 131]. Using PCA will help to identify the most meaning full basis to re-represent the desired data set. This new basis filters out the noise and reveals hidden structure.

The input data X is a matrix represented in term of the M-by-N with observation (samples) in columns and variables in its rows. The main approach to analysis the data is to use the data averaging strategies to expose the hidden input intrinsic nature of the data. The error due to noise will be canceled out when a mean of data is calculated. The mean of the data matrix calculated by:


The mean of each of the measurements, subtracted from original input data matrix X, each entry in the matrix is replaced by its difference with mean. This produces a data with zero mean. Then the covariance was calculated from the resulting matrix, so as to measure the degree of linear relationship between a pair of variables. A large positive value indicates positive correlation and large negative value indicate negative correlations. Since the resulting matrix from subtracting the mean of the data consist of a row vector for each variable, each vector contains all samples for one particular variable. The covariance expressed as a dot product matrix [50], and given by:


Where, D is the matrix resulting from subtracting the mean from the original data and T is transpose.

The result is a square symmetric matrix in term of the M-by-M. The diagonal terms of the resulting matrix are the variance of exacting measurement. The off diagonal terms of the matrix are the covariance between the measurements.

Since the covariance matrix is a square in term of the M-by-M, this matrix can be used to calculate the eigenvector and eigenvalue. The eigenvector and eigenvalue give quite different values for eigenvalues. So the eigenvector with highest eigenvalue represent the principal components of the data set.

After getting the eigenvectors of the covariance matrix, they ordered by eigenvalues, highest to lowest. If the lesser significant component ignored this lead to losing some information, but if the eigenvalues are small, there have not much lost in information. Leave out some information lead to reduction in data set dimension.

Considering some of eigenvectors from the list of eigenvectors, and forming a matrix with these eigenvectors in term of columns, gives a matrix of vector (feature vector). Finally to get the PCA filtered of the data set X, the data mean-adjusted matrix of each axial line was projected onto the selected basis function, as described by


Where, Y represent the final filtered data set, P is the matrix with eigenvectors in columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top and X is the mean-adjusted data transposed.

6.4 Cluttering with ICA

There are several transformation methods proposed for data analysis and finding a suitable representation of the multivariable data such as PCA. A recent developed transform method is independent component analysis (ICA), which is used to minimize the statistical dependent of the component of the representation. Our goal is to use ICA to estimate the original data set of mixed data with clutter noise. In other words separate the clutter from the blood flow data. This is referred to as the blind source separation (BSS) problem [20, 136, 137].

ICA technique based on non-Gaussinanity and use higher order statistics rather than second order to separate the signal from the clutter [20, 138]. Beside the non-Gaussian, ICA assumes the components to be independent [139]. This is powerful and attractive set of assumption that make ICA very aggressive tasks, however, ICA treat the observed signal as a set of random variables without considering the dependency of adjacent time point.

Since ICA uses higher order statistics rather than second order moments to determine the basis vectors that are statistically independent as possible, ICA can consider as an extension of PCA [138, 140]. This made ICA gives a better separation result in most applications. A fast fixed-point algorithm (FastICA) for Matlab is a program package used for implementing ICA [20, 140]. The first step in ICA is whitening (sphere) the data. Before applying the ICA to the data and after centering, the observed vector transformed linearly so as to obtain a new vector that is white, its component un-correlated and their variance equal to unity (the covariance of a new vector equals the identity matrix). The covariance matrix expressed as:


Several methods proposed for whitening, the most popular used is eigenvalue decomposition (EVD) of the covariance matrix


Where, x is the observed vector, is a new vector, E is the orthogonal matrix of eigenvectors of and D is the diagonal matrix of its eigenvalues. The whitening expressed by:


Dimension reduction was performed, besides whitening the data, the reduction done by discarding the small eigenvalues, which perform in statistical technique of PCA. Three conventional methods can be used for utilizing the high-order information. The projection pursuit technique was used to find linear combinations of maximum non-Gaussianity. The central limit theory shows that the distribution of a sum of independent random variables tends toward a Gaussian distribution. Thus, a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original random variables. The non-Gaussianity was measured for solving the ICA problem, several methods proposed for measuring non-Gaussianity. The classical measure of non-Gaussianity is kurtosis or fourth-order cumulant. Kurtosis is zero for Gaussian random vector and nonzero for non-Gaussian random vector. Kurtosis can be positive or negative. The Kurtosis principle is maximized by applying the FastICA algorithm, to estimate the independent component.

6.5 Cluttering with Non-adaptive Filters

The non-adaptive filters FIR, IIR and PR used for cluttering the Doppler signals ware designed using the parameters presented in table 6.2. The filter designed to give same characteristics.

Table 6.2. FIR, IIR and PR filters design parameters

Filter Type


Cutoff frequency

Maximum dp

Minimum ds



0.09 π


- 80



0.2 π








6.6 Clutters Evaluation

The proposed methods for Doppler signal clutter compared with present clutter rejection methods. The present filters designed using the parameters illustrated in table 6.2 to achieve filters with the same characteristics. Root mean square deviation (RMSD) or root mean square error (RMSE) and error are frequently used measure of the differences between values predicted by a model or an estimator and the values actually observed. RMSE and error are a good measure of accuracy. The accuracy of each method was computed, the result from the proposed methods compared with the result from present cluttering methods.

The error was calculated by subtracting the output signal from clutter filter with the input signal to the clutter filter. The error calculated using the following expression:

E (6.13)

RMSE was computed using the following expression:



Where, f is the input matrix to the clutter filter, g is the output from the clutter filter and mean square error (MSE) is the square of the difference.

Beside the error also the performance used to evaluate the clutter rejection filters. The performance categorized from 1 to 5, the clutter with highest performance gives lower error and the clutter with lower performance gives highest error value.