Multivariate Control Charts And Statistical Methods Accounting Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

2.0 Introduction

The first section trys to introduce most of the studies that are talking and interpreting about the multivariate control charts, and this section concentrates on the statistical method to simultaneously monitor multiple variables, that is the Hotelling control charts. The second section represents the publications that are talking about the main issue in the thesis,that is the robust Hotelling control charts. Such as this section explains in details about the publications and trys to give the relationships between these publications and the current study. For instance the methods, the procedures or the robust estimators that are used in this study. However the third section gives presention about the robust location estimators, and consist of subsections of some types of the robust location estimators, such as α -Trimmed Mean, Modified one step m-estimator (), Winsorized Mean, Median and the last subsection is the Hodges-Lehmann estimator. Whereas the fourth section in this chapter is the robust scale estimators. Likewiase, this section consist of four subsections are talking about the four types of the robust scale estimators, such as Median Absolute Deviation (), , , and the fourth robust scale estimator is the α-Trimmed standard deviation. Ultimately, the fifth section is talking about the quality control charts.

2.1. Multivariate Control Charts

The multivariate statistical process control charts (MSPCC) is known as one of the most rapidly advancing areas of statistical process control. It is known since 1947 when Harold Hotelling was published his paper, the important of this paper that is the first trying to write in this subject. According to Hotelling(1947, 1951) the Hotelling's T2 control chart is the most frequently control chart that are using under the multiple setting, and this chart simultaneously monitor multiple quality characteristics.

Jackson (1985) was determined the better performance for MSPCC tool when be able to

 1) Make sure that the process is in control by considering all the measures characteristics simultaneously.

2) Think over the interdependency among process aspects.

3) Control on Type I error.

Despite of these advantages, there are some practical drawbacks to MSPCC, such as the possible uncooperative of specific causes leading to the signal. For example, when there is an out-of-control signal, it is not always readily obvious as which characteristic or set of characteristics are responsible on that signals (Jackson, 1985). According to Alt (1985) and Ryan (1989) Bonferroni inequality can be used to determine which variable is responsible among the all variables. Another drawback of multivariate control charts for the Hotelling control charts is unsupportive of shifts in mean and variance. The solution of this unsupportive is simultaneously to run control chart on the process of shifted mean and covariance (Healy, 1987; Hawkins, 1991).

Sepulveda and Nicholas (1997) proposed a multivariate control charts simulated minimax control charts, where the upper limit depends on the maximum value of correlated standardized mean vectors and the lower control limit depends on the minimum value of the standardized mean vector. The performance of the minimax control chart had been judged by comparing the value of ARL of minimax control charts with the value of ARL of the chi-square control charts. The results appeared that the ARL of minimax control charts better than the value of ARL of chi-square control charts.

Multivariate Hotelling statistic is very versatile multivariate statistic. It can be used not only in phase I operation to identify outliers in the historical data set, but also in phase II operations to detect process shifts using new incoming observations. Furthermore there are many other applications of the statistic in multivariate control charts. For instance, procedures are presenting of improving the sensitivity of the statistic to the detection of small process shifts, and for adjusting the statistic to give observation vectors that are correlated over time (Mason and Young, 1999). Furthermore a useful general survey in a multivariate Hotelling's statistic for multivariate process can be found in Mason and Young (2001), whereas they extended the using of the Hotelling statistic to multivariate subgroups processes, such as they showed the worth of the Hotelling statistic in monitoring the process through two phases for the subgroups data, such as phase I and II. Whereas, phase I concentrates on how to detect and eliminate the outliers, and phase II concentrates on the process for the future observations and how to apply the Hotelling statistic by using known or estimated in control means and covariance matrices.

Likewise, Woodall (2004) mentioned that the process of the products is passing through two distinct phases of control charts, phase I, where in this phase check whether the process is in control when the first subgroups were drawn, whereas in phase II the control charts were used to check whether the process of the products is still in control in the future subgroups. This means that to get the best work or process enhancement, the process must be passed during phase I and phase II.

Mason and Young (2002) provided a review explaining for the Hotelling statistic in their textbook, whereas this textbook consists of eleven chapters each chapter talks about one side of the duties and aspects of Hotelling'sstatistic, such as foundation and concept about statistic, using of statistic, the two phases of Hotelling's statistic phase I and II that are applying in any process of the production, how to improve the sensitivity of the statistic and ultimately the using statistic in case of subgroups data and how to use it in the process of the product during the two phases.

Furthermore Chang and Bai (2004) proposed a constructing Hotelling control charts using the weighted standard deviation estimator for the skewed population. In conclusion, Type I error and out of control average run length (ARL) of the proposed control charts are compared with the standard control charts for the multivariate weibull, lognormal and gamma distribution. At the last they concluded that the proposed control charts gave an enhancement results in case of skewed distribution.

Williams, Sullivan, and Birch (2009) used the Multivariate Hoelling control charts based on Hotelling statistic to detect the occurrence of the outliers and the shifted mean vector. Therefore, they used the estimator successive differences covariance matrix, which is regarded as an effective in detecting the occurrence of outlier's data and shifting the mean vector. And since the exact distribution of Hotelling's statistic is unknown when the successive differences estimator is used, therefore, they derived and proposed the largest value of the Hotelling's statistic by using the successive differences covariance matrix estimator.

On the other hand, because of the Hotelling's control charts are based on the sample covariance matrix in its constructing, Sullivan and Woodall (1996) interested in the individual observations that are talking about the sample covariance matrix of the historical data set which leads to poor properties in detecting the shifts in the mean vector. Furthermore, they found that the statistic based on the usual sample variance-covariance matrix estimator was not only less effective in detecting the shift in the mean vector, but, also increase the power to detect the shift decreased. Therefore, they proposed many estimators of the covariance matrix and concluded that the estimators which are based on the successive differences more effective than the others by detecting the process shift.

Nevertheless, according to Mason et al. (2003) an arbitrary benefit value of the Hotelling's statistic causes the trend, cycles, or autocorrelation and this is resulting from the existing of common causes. Whereas, the trends, cycles or autocorrelation exist in the historical data set because of the absence of special causes of variations. For this problem the statisticians started to propose new methods of the estimators, especially, using the robust estimators to overcome this problem since that in this type of estimators no need to interest if there are outliers or not.

Consequently, Williams et al. 2006 suggested that the choice of the estimator for the sample covariance matrix is very important to successfully detecting the existing of special causes of the variation, when multivariate historical data set. Moreover, many studied showed that this choice of the sample covariance matrix impairs the detection of sustained step in the shift mean vector when the individual observations are used. Therefore they proposed alternative estimator which depend on the multivariate successive differences for the individual observations. Indeed this estimator increased the probability of Type I error as the size of step shifts increased. The distribution of the statistic has not been determined when the successive estimator is used. Therefore they demonstrated many properties for statistic which based on the successive differences estimator and gave more accurate approximate distribution for calculating the UCL for individual observations. Moreover if the successive difference's covariance matrix estimators in case of in-control observations are independent and identically distribution (i.i.d), then the precise of false alarm probability for the control charts will be increased.

The following section gives a revision about the previous studies that are dealt with the robust Hotelling's charts.

2.2. Robust Multivariate Control Charts

The most common measure of location estimator is the arithmetic meanand the most common of the scale estimator is the sample variance covariance matrix .In the traditional Hotelling's statistic, the represents the center for the quality characteristics and the variance covariance matrix represents the dispersion of the data from the sample mean vector. However, and are known to be very sensitive to outliers and will be greatly influenced by their presence. Therefore Singh (1982) and Johnson (1987) found that the traditional Hotelling's statistic can not resist the departure from the normal distribution. Moreover, Croiser (1988) mentioned that the robustness against the multiple outliers is very necessary in the multivariate quality control. He defined the multiple outliers that are the observations that cause a large value for the Hotelling's statistic. Likewise, Brooks (1985) took notice about the outliers, that is these data errors are increasing when the development of manufacturing system because of the huge of collecting data. For all that, the need for the robust multivariate control charts are very vital and practical. To deal with this problem, researchers found alternatives modified Hotelling's control charts to the traditional Hotelling's statistic by using robust location and scale estimators which are regarded insensitive to the outliers. Therefore , there are many studies that have been proposed for constructing robust multivariate control charts such as:

Alloway and Raghavachari (1990) proposed robust multivariate control chart based on trimmed mean and trimmed variance covariance matrix. They confirmed in their study that the proposed Hotelling statistic is robust and resistant to the contamination observations in case of symmetrical distribution. They used the Mahalonobis distance to execute the method of the trimming. Their method depends on trimming pair vectors of the data that have the largest two values of the Mahalonobis distances and replace them by another two vectors of the data that have the third and fourth ordered values of Mahalonobis distance. Consequently, this thesis discussed the robust Hotelling's statistic when the trimming is symmetric. Therefore the method of Alloway and Raghavachari (1990) by using the Mahalonobis distance is suitable to apply. However, the new thing in this research is the trimming of outliers depends on percentage and not on only two outliers observations that are represent the largest values of Mahalonobis distances. Moreover, this study used the Mahalonobis distance formula after the modification, where the modified occurred by replacing the sample mean by the median and replacing the sample standard covariance matrix by the three trimmed variance covariance matrices of the robust scale estimators

Likewise, Alloway and Raghavachari (1991) proposed a control chart based on the robust location Hodges-Lehmann estimator associated with the Wilcoxon signed rank statistic. This proposed control chart is nonparametric and thus maintains on the nominal of Type I error specified. When the data are from a Gaussian distribution this approach compares favorably with the Shewhart control chart. In case of the moderate sample sizes from long-tailed symmetric distribution, its performance is better than the traditional approach. These properties make it well suited for early production, but it runs of limited size when the distribution of the process statistic is unknown. They concluded from the results that there is a little difference between trimmed and untrimmed method in case of heavier tails.

Surtihadi (1994) used the median as a robust location estimator. He constructed a robust bivariate control chart based on the bivariate sign tests of Blumen and Hodges. Moreover, he found that this control chart needs fewer assumptions than the traditional control chart. Also, it needs the underlying distribution to be continuity and symmetry; as a result this control chart has a good protection in the presence of the extreme data error.

Abu-Shawiesh and Abdullah (2001) proposed a new robust Hotelling's statistic for a bivariate data, it based on the Hodges-Lehmann and Shamos-Bickel-Lehmann estimators. To judge the performance of the new robust chart, they used the contaminated normal distribution with percentages 10% and 20 % of the outliers in the data. Likewise, they applied the robust chart on the subgroups data where the number of the subgroups is 20. Moreover, to study these charts more, they took many values of the sample size for each subgroup. Consequently, they confirmed that the performance of the robust Hotelling statistic is well in the case of symmetrical contamination. However, there is a little difference in performance between the Hotelling statistic and the proposed robust Hotelling's statistic for slightly heavier tails such as the proposed robust method is superior as the tail size is increasing. Similarly, this thesis used the Hodges-Lehmann estimator, however the new idea in this research is using Hodges-Lehmann estimator with the three robust scale estimators.

Vargas (2003) proposed control chart based on robust estimators of location and dispersion using the minimum volume ellipsoid () estimators. He used individual multivariate observations in phase I and then calculated six different alternatives robust Hotelling control charts and showed that the probability of detection can be improved by using robust estimators. Simulation studies showed that the robust Hotelling statistic that are using the minimum volume ellipsoid () estimators are efficient in detecting the multiple outliers and can deal with the masking effect.

Jensen, Birch and Woodall (2006) studied the high breakdown estimation method based on most popular robust estimators the minimum volume ellipsoid () and the minimum covariance determinant (). They determined which estimator of them better to use in the robust control charts in term of detecting the multiple outliers, since according to the previous studies about the using of these two robust estimators were not obvious which one of them more suitable to use in the domain of control charts. Therefore their study by using simulation gave guidance for when can we use the suitable one of them in robust control charts.

Alfaro and Ortega, (2008) proposed a new alternative robust Hotelling control charts to the traditional Hotelling control charts. They replaced the sample mean vector in the traditional Hotelling statistic by the trimmed mean vector, and replaced the variance covariance matrix by the trimmed variance covariance matrix to construct the alternative robust Hotelling statistic. They concluded in their study that the new robust Hotelling statistic is more effective than the traditional Hotelling statistic, especially, in term of detection outliers.

Alfaro and Ortega, (2009) had been developed four alternatives robust Hotelling charts to the traditional Hotelling chart, these proposed control charts used minimum volume ellipsoid () estimator, minimum covariance determinant () estimator, reweighted estimator and the trimmed mean estimator. They used many types of upper control limits, in order to evaluate the performances of these robust Hotelling charts, such as the quantile Snedecor F- distribution, quantile of the beta distribution, chi-square distribution and the simulation study when the distribution of Hotelling statistic is unknown and the size of data sets are small. Their study depend on the simulation and went through two phases I and II, in phase I they calculated the traditional and robust estimators whereas in phase II they generated a new observation and then calculated the value of Hotelling statistic for each new observation. They concluded that the robust alternatives Hotelling charts behaved better than the traditional Hotelling charts when the presences of outliers. Furthermore, they recommended using the Robust Hotelling charts that depend on the trimmed mean and the modified of the estimators when the amount of outliers is small. On the other side, they recommended using the other two robust Hotelling statistic of and when the detection of outliers is more important.

Chenouri, Variyath and Steiner (2009) proposed multivariate robust Hotelling charts for multiple individual observations using location and scale for reweigh minimum covariance determinant () estimators. These estimators are highly robust and more efficiency than the location and scale ordinary estimators. To get the control limit formulas, they calculated the empirical quantile by using Monte Carlo simulations. They evaluated the performance of the robust Hotelling statistic by monitoring the values of the probability of Type I error and the probability of detection outliers. They concluded that the proposed robust control charts perform similar to the traditional control charts in term of type I error while as better and more efficient than the traditional chart in term of detecting process shifts and outliers.

Midi, Shabbak, Talib, Hassan (2009) proposed two robust Hotelling statistic by using the robust efficient estimators the minimum covariance determinant () and the minimum volume ellipsoid () for the multiple subgroups observations. Their study showed that the robust Hotelling statistic and the traditional Hotelling statistic are the same equally well in clean process, but at the same time the robust Hotelling statistic are more efficient than the traditional Hotelling statistic in detecting the outliers.

This study suggested some robust location and scale estimators that seem to be efficient and have high breakdown points to be integrated into the Hotelling's statistic. Therefore, this study proposed nine modified robust Hotelling's statistic by using robust location and scale estimators. The followin section gave some characteristics and aspects for these robust location and scale estimators.

2.3. Robust Location Estimators

By far, the most common measure of location estimator is the arithmetic mean. In the Hotelling statistic, the represents the center for the quality characteristics. However, is known to be very sensitive to outliers and will be greatly influenced by their presence. To alleviate this problem, researchers find alternatives to the traditional Hotelling statistic using robust location estimators which are insensitive to outliers. The main goal of these estimators is to give reasonably high efficiency for a range of distributional shapes, such as the normal and longer tail distributions, in other word symmetrical and asymmetrical distribution. Inherent the robust estimators of locations have been considered for symmetrical distributions. However in asymmetrical distributions, there are no general arguments to indicate what aspect of the distribution should be studied. Therefore Ansell and Margaret (2009) provided three main properties of combined distribution of potential interest:

The mean or median of the original uncontaminated distribution.

The mean of the contaminated distribution.

The median of contaminated distribution.

In the following sections present some important types of the robust location estimators. These estimators are chosen with respect to the shape of the distributions. Therefore, one of these estimators depends on the symmetrical trimming, and the other one depend on the asyemmetrical distribution, whereas the third one does not depend on the trimming.Trimmed mean is robust location estimator which depend on the symmetrical trimming. However,the modified one step M-estimator () is efficient when is asymmetrical trimming, while as the third robust location estimator the Hodges-Lehmann estimator where it does not depend on the trimming . In this study, these three different types of robust location estimators will be replaced instead of the in the traditional Hotelling's statistic.

2.3.1 α -Trimmed Mean

Trimming is a process to cancel the extreme values from each tail of distribution. The amount of trimming points from each end of distribution is between 10%-25% for each end of order statistics . According to Rocke, Downs and Rocke (1982) conducted the best percentage is 20%-25% in symmetric distribution. Rosenberger & Gasko, 1983; Wilcox, 1995 suggested to trim 20% from each tail of the order statistics. Wilcox (1994b,1995a) suggested that to trim 20 % for each side of order statistics. Also Othman, Keselman, Wilcox and Fradette (2004) confirmed that the best achievement of Type I error control when the percentage of trimming is moderate namely 10%-15%. According to Pie ChenWu (2007) there are two concerns regarded to the trimming data the practical concern and the theoretical concern. First concern about, what is the proper amount of the trimming ? in the simulation 20% is suggested from many statisticians but this percentage is unsatisfactory in case of asymmetric distribution with heavy-tails and in term of type I error. But in the second concern when we assume that the data are symmetric, this percentage will be satisfied.

The -trimmed mean, is determined by removing () 100% from each end of order statistics and then calculate the mean of the remaining observations. The trimmed mean is simple, flexible location estimator, easy to compute and understand. If the assumption of symmetry is violated, the trimmed mean will estimate the quantity depending on the size trimming on the two sides left and right (Hogg 1974; Mehrotra et al. 1991). When the sample size n is fairly large, the trimmed mean estimator has an approximately normal distribution (Bickel, 1965). The importance of the trimmed mean estimator is when outliers are present in the data.

Another importance of using the trimmed mean is that, one will feel by the noisy with the situation when there are two choices : the sensitivity in case of the sample mean or the robustness when we deal with insensitivity estimator as the sample median (Siegel 1988). Therefore the trimmed mean has been enhanced to give a compromise between these two estimators. Moreover, the trimmed mean is indeed better than the sample mean in case of occurrence of the outliers in the data. Using the trimmed mean may cause the loss of efficiency when there are no extreme outliers in the data. Also the effeciency and the robustness make the trimmed mean more important than the median. The efficiency estimators make the control charts which are based on them more able to detect outliers and other departure from a stable process or a state of control. Also, its breakdown is equal to the percentage of the trimming values from two sides, for example if the percentage of the trimming is α % from each side then the breakdown points is equal to 2α %.

The trimmed mean of the n observations, after the smallest and largest observations are eliminated from each tail in each order statistics observations is defined as the following formula

= [2.1]


: is i-th ordered observation in individual observations.

= denotes the greatest integer less than or equal to, where .

In the multivariate case Alloway and Raghavachavari (1990) examined three methods for trimming the multiple outliers, first method is trimming the largest and the smallest values for many variables individually and then calculate the trimmed mean and the trimmed variance covariance matrix for the remaining data. But the problem in this method is not the calculating of the trimmed mean but in the calculating of the variance covariance matrix because as it is known the variance covariance depends on the pairs values of each two random variables and in this case we cannot satisfy this condition. The second trimming method is based on which the variables are more important than the others or which one has more outliers. And the third trimming method depends on all sample information like the covariance, variance, and the distance of the data from the center. This method is called the Mahalonobis squared distance. It selects the data pairs in the individual observations to be trimmed and winsorized. The formula of Mahalonobis distance is written as follows:

where and are depending on the original data. This method can be summarized it by the following words:

Trimmed the data pairs that have the largest values of Mahalonobis squared distance

Replace these two pairs of data by the data of the third and fourth largest values of to construct the winsorized sample.

The thesis is used this method to conduct the trimming for the outliers, however since the sample sizes are large because of the individual observations, the percentage of the trimming is excuted instead of the original method, the percentage that is considered in the thesis is the largest 40% of the Mahalonobis distance.

2.3.2. Modified one step M-estimator (MOM)

Trimmed mean suffers from two practical concerns; first the amount of the trimming is fixed priori. If the percentage of the trimming 20%, the effeciency is good versus the mean under normality, but when sampling from sufficiently heavy tailed distribution, the effeciency can be poor versus using more trimming. A second practical concern is a suitable trimmed mean assume symmetric trimming, this means that the proportion of observations that are trimmed from each end are equal. When sampling from symmetric distribution the symmetric trimming is suitable, but asymmetric trimming is more suitable if the degree of skewness increases (Wilcox, 2003).

Unlike trimmed mean, the modified one-step M-estimator empirically trims the data or may not trim different amounts of trimming on the two sides of the order observations (Wilcox and Keselman, 2003). Besides to the drawback of lower breakdown point for trimmed mean and the amount of trimming is usually fixed prior to data analysis. If the above two concerns are given, how can we determine the best trimming amount that would make sure good Type I error control? One solution to this problem is to use the modified one-step M-estimators (). Empirically the modified one-step M-estimator () determines whether an observation should be trimmed, or the possibility of no trimming as well as different amount of trimming in the left against the right (Wilcox and Keselman, 2003). It's breakdown highly relative 0.5 (Pei-chen,2002).

Let be a random sample from any distribution. p is the number of the random variables.

The MOM estimator suggested by Wilcox and Keselman (2003) is defined as

Where = order statistic in characteristic variable.

: Number of that satisfies the criteria

: Number of that satisfies the criteria

: Number of observations in each variable.

Where the scale estimators is . The constant K= 2.24 is motivated to give a good efficiency for the robust scale estimators when the sampling from a normal distribution (Othman, et al., 2004). This study is also used the modified one step M-estimator (MOM) to trim the outliers when the data are asymmetry but by modifying the criterion of the trimming by using the other two robust scale estimators in addition to the median absolute deviation estimator (). Othman, et al. (2004) and Wilcox, Keselman (2003) found that when they used the correction factor K= 2.24 they reach to the high efficiency for the modified one step M-estimator when they used the. Therefore to modify again the formula of modified one step M-estimator by using the same correction factor K= 2.24 and the new two robust scale estimators and we must calculate the efficiency of the modified one step M-estimators again if the efficiency of the modified one step M-estimator is also high then we can make the modification on the criterion of the trimming. For this reason a program is designed by MATLAB version 7.8 (2009a) to calculate the efficiency of the MOM when it is replaced the by the two robust scale estimators and As a result, the standard error of the sample mean divided by the standard error of the MOM is also high which means that the efficiency is also high when the correction factor K= 2.24 is used with the new two robust scale estimators and The results of the efficiency of when the robust scale estimator is used is equal to 0.9207 when the sample size n=20, and the value of the efficiency of MOM when is used instead of is equal to 0.913. Such as the efficiencies of the three robust scale estimators have been found high when the constant K= 2.24 is used in the criterion of the trimming. Therefore the modification of the MOM is preceded. Yahaya Syed et al. (2006) proved that these estimators are able to control on Type I error even under extreme violation of the assumptions. They also, modified the criterion for choosing the sample values for modified one step M-estimator ().

2.3.3 Winsorized Modified one step M-estimator (MOM)

The winsorized Modified one step M-estimator (MOM) is one of the central tendency measurements; it is the mean after replacing the outliers from each end by the next largest and smallest values from the remaining consistent data after conducting the trimming by using Modified one step M-estimator (MOM) criterion (Maher and Al-Khazaleh, 2009). According to Wilcox (1997) the mean is one of the most popular location estimators but the problem in this measurement that is the tail of the distribution control on its value, and this clear through the unbounded influence function, breakdown point of 0 and a lack of qualitative robustness. One of the solutions to this problem is to give these values less weighs and give more attention to those values which nearer to the center. Therefore, the using of the robust measures of the center and the sample winsorized MOM, which is one of these measures that can be used instead of the sample mean are another easier solution to tackle the problem of sensivity of the sample mean. The winsorized sample MOM can be calculated with respect to the trimming outlier's data. Such as, the constructing of the winsorized sample goes as follows: (Wilcox, 1997, p. 35).

: denotes to the number of smallest outliers of the data.

: denotes to the number of the largest outliers in the data.

Consequently, the winsorized sample is constructed according to the following formula:

= [2.3]

The winsorized sample is obtained from the remaining observations. Pull the smallest observations to the ordered statistic value and pull the largest observations to the ordered statistic value . Finally,The estimated winsorized mean for the population winsorized mean is .

2.3.4 Median

The median is the most popular robust estimator among the robust measures of center; it is used early in the statistical process control charts. It is defined according to Bluman (2009, p. 109) is "the midpoint of the data array". Mathematically, it is defined as the following.


Where . . . are the values of the sample order statistics, (Chernobai and Rachev, 2005).

The main properties of the sample median can be summarized as follow:

The maximum value of the breakdown point is equal to 0.5 (Geyer Charles J, 2006, Hampel, 2000).

Easy to calculate just it is the value of the middle point among the array of data (Bluman, 2009, p. 116).

It is not affected by the outliers or by the conduct of the tails of the distributions (

The efficiency of the median under the normal distribution will be decreased as the sample size will be increased; its efficiency reaches to 64%.

Since the calculating of the median depends on only the middle value of the array data then the data may cause confusing in the distribution of the long tails because it will leave a lot of information (Janacek; Meikle, 1997).

The difficulty of using it in the mathematical equations (

The median is used when we deal with open ended distribution (Bluman, 2009, p. 116).

2.3.5 Hodges-Lehmann estimator

Hodges-Lehmann estimator is one of the class R-estimators, and was proposed in 1963 independently by Joseph Hodges, Pranab Kumar Sen and Eric Lehman ( topic/ Hodges and Lehmann). It is used to estimate the location parameter in one sample or in two samples. According to Brown and Kildea (1978) defined the Hodges and Lehmann estimators as follow:

"A simple Hodges-Lehmann estimator for that = θ + , j = 1, 2, n where are i.i.d rv's symmetric about zero, with df G and continuous bounded density g. The H-L estimator of θ is the median of {,}, and an asymptotically equivalent estimatoris the median of {, }."

The main properties of the sample Hodges-Lehmann as follows:

The breakdown point is 29% (Hampel, 2000).

Symmetric distribution about the parameter θ (Hodges, 1967).

It is robust against the gross error, and its asymptotic relative efficiency ARE is the same as the Wilcoxon signed rank test (Hodges, 1967).

The asymptotic relative efficiency, ARE is 0.955 when the underling distribution is normal, while it is often greater than unity when the underlying distribution is from nonormal distribution (Lehmann, 1983).

It has some "robust" properties (Bickel, 1965).

It needs at least O () operation because it is costly in case of large sample size (Antille, 1974).

2.4 Robust Scale Estimators

The sample standard deviation , is the most commonly used estimator of scale. It is the most capable when the data come from a normal distribution. It does not have the robustness of efficiency as in case of a non-normal distribution. Nonetheless, the sample standard deviation is very sensitive to outlier characteristics. The robustness of the efficiency of the statistic is the highly capable not only in different situations but also in just one situation. The efficient means that when the estimate of the statistic is close to the optimal estimate statistic which is given in the distribution of the data that we are interested in. Moreover, since the represents the dispersion of quality and is an important part in the Hotelling statistic, therefore it is necessary to give it more important. So, to overcome this problem, researchers have found alternatives in the robust Hotelling statistic that are using robust scale estimators which are insensitive to the outliers.

There are several robust scale estimators, in this thesis the median absolute deviation (MADn), Sn, and are used. It is important to mention that robust scale estimators Sn, and were proposed by Rousseeuw and Croux (1993). All these robust scale estimators have highest breakdown point, bounded influence function, easy to compute, have reasonable efficiency when the observations are from normal distribution and positive definite.

The following subsections give more informations about the construction and the main properties of these estimators:

2.4.1 Median Absolute Deviation ()

The median absolute deviation () is regarded as a robust alternative to the standard deviation and was promoted first by Hampel (1974) by the following formula:


Where 1.4826 is chosen in case of the usual parameter for a normal distribution and make consistent with the parameter. The main properties for estimator which were investigated by Rousseeuw and Croux (1993) are as the following:

The breakdown points of is 50 % (Hampel, 2000).

In the case of standard normal distribution, the influence function is bounded.

The gross-error sensitivity of the estimator is equal to 1.167 which is the smallest value that anyone can obtain for any scale estimator in case of normal distribution.

It takes symmetric dispersion situation about the sample median. Such as, to find this means that the symmetric interval around the median contains 50% of the data.

5. Easy to compute.

6. Efficiency at normal distribution is equal to 37 %.

7. It needs to the location estimator of the data which is called the median.

8. It is positive definite, it is proved by using simulation as it is discussing above.

The does not natural when the distribution is asymmetric because it take symmetric shape on dispersion, this view because it depends on the central tendency measurement the estimated median, moreover, it takes the distance between the observations and the median.These distances once positive and in another case negative because of that its approach is not natural to be seem for asymmetric distribution, (Rousseeuw and Croux, 1993,Yahaya, Othman, and Keselman, 2006).


Another scale estimator that proposed by Rousseeuw and Croux (1992) is the robust scale estimator. It's formula and the main properties are as follows:


Such that. This estimator has the highest breakdown point about (50%), it has continuous bounded influence function and its efficiency of (52%). It is positive definite. simple and explicit formula. It is useful estimator in case of asymmetric distributions.


Let be a sample of data set, then the robust scale estimator is defined as the following:


Where is a correction factor which make Sn unbiased for finite samples.This read as follows for each we compute the median of .(Rouesseuw and Croux ,1993).

The estimator is very similar to the formula of and it is used as an alternative of the , the only difference between them is that the operation was moved outside the absolute value . The estimator is simple mixture of medians an absolute values. The estimator Sn measures the distance between the values, while the Median Absolute Deviation measures the distance between the observations and the median.

The main properties for the estimator which investigated by Rousseeuw and Croux (1993) are as the following:

The Sn estimator has a maximal 50 % breakdown.

In the case of the standard normal distribution, the influence function of the Sn estimator is bounded.

Its efficiency at normal distribution is 58 % which is very high.

The Sn estimator is measuring by how far away the observations at a typical distance between observations, which is still valid at asymmetric distributions.

It does not depend on the symmetry because it just measures the distance between the values of each data set.

It is affine equivariance estimator because if the observations are transforming to

+b then the value of Sn will multiply by

It is positive definite

is regarded a suitable for the symmetric and asymmetric distributions because measures the distance between values, while the standard deviation and the measure the values from the central location.

2.4.4 α-Trimmed Covariance Matrix

As it is well known from the previous sections that the variance covariance matrix is sensitive for the extreme data. Therefore, it is necessary to find the alternative estimator to use when the outliers are present. Consequantly, the trimmed variance covariance matrix is one of these robust estimators that can be used instead of the traditional variance covariance matrix. The calculation of this estimator needs depends on the winsorized variance covariance matrix whereas its formula as follows:


where is the number of all data, is the number of the data after the trimming. And is the winsorized variance covariance matrix. The winsorized variance covariance matrix is calculating as the calculating of the traditional variance covariance matrix but for the winsorized sample. The robust location estimator the trimmed mean with the trimmed variance covariance matrix were used to construct the modified Hotelling control charts. The trimming of the outliers data acheived by using the Mahalanobis distance but after modifying, such as the median vector is replaced instead of sample mean vector and the three robust scale covariance matrices of are replaced instead of the variance covariance matrix. The trimming and replacement of the data by using modified Mahalonobis distance values dependent on the percentage and not on the largest two values of Mahalonobis distance as Alloway and Raghavachari(1990). Since the α-trimmed variance covariance matrix depends on the winsorized variance covariance matrix, then the properties of the α-trimmed variance covariance matrix depend on the properties of the winsorized variance covariance matrix, consequently, according to Mingxin Wu (2006) the properties of winsorized variance covariance matrix are as follow:

Provide highest breakdown point and bounded for different distribution.

Efficient more at light tailed symmetric models.

It gives highly efficient for the heavy tailed or skewed distribution.

When the contaminated points are concentrated around the center then they have the best performance among the other high breakdown points.

2.5  The Quality Control Charts

The quality denotes to any excellent achievement like the products, services or any other things which is considered to exceed the expectations. For instance, when a customer expects a different performance from a plain steel washer than from a chrome-plated steel washer, when the performance of the product exceeds our expectations, then we consider the quality for the product. Another definition of the quality is given in (ISO 9000: 2000 ) where the quality is the degree of the set of original characteristics to achieve needs, such as the adjectives like the poor, good and excellent original, whereas the original means as a continuity characteristic which exists in something. The characteristics may be qualitative or quantitative (Besterfield 2004, p. 2).

While as, the control charts are used to monitor the performance and the capability of the process are affected by shifted mean, outliers and the deviation from the in control distribution. The control charts are univariate and multivariate, since this study interests by the multivariate control charts, therefore, the most important duties of the multivariate control charts according to Sepulveda and Nicholas (1997) as follows:

Type I error must be fixed without any effect by the changes of the number of random variables.

Available of the judgment criterion to explain the signals in the control charts.

The computational effort should be modest enough to analyze each sample when non -automated processes.

Control charts are defined on the other hand, as the using of the techniques and activities to improve the products. According to Besterfield (2004,p.2) the techniqes and activities consist of the following related:

Specifications of what is required.

Propose the product or the service to suit the specifications.

Production to suit the goal of the specifications.

Check to see the similarity to specifications.

Review of usage to give information about the specification of requirment.

Through these activities the customers can possess the best products or services at the

cheapest prices or costs.