This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Tolerance intervals are widely applied in clinical and industrial applications which include quality control, environmental modelling, pharmaceutical studies, manufacturing and so on. The two sided frequentist tolerance interval, say (L, U), contains at least a specified proportion of the population with a specified confidence . Here L and U are called respectively the lower and upper tolerance limits. The formal definitions of one- and two-sided tolerance intervals are given below:
Let be a continuous random variable with cumulative distribution function (c.d.f) where is a possibly vector valued unknown parameter. Let L and U be respectively the lower and upper limits of a tolerance interval such that . denotes the probability set function.
The one-sided (β,γ) tolerance interval associated with the lower tolerance limit, L of the form is required to satisfy the condition
The one-sided (β,γ) tolerance interval associated with the upper tolerance limit, U of the form is required to satisfy the condition
The two-sided (β,γ) tolerance interval [L,U] satisfies
The construction of two-sided tolerance intervals is more challenging than that of its one-sided counterpart.
The choice of prior distribution is the most critical and criticized point of Bayesian analysis. Undeniably, selecting the prior distribution which is the key to Bayesian inference is a challenging task. According to Ghosh et al. (2008), with sufficient information from past experience, expert opinion or previously collected data, subjective priors are ideal, and indeed should be used for inferential purposes. However, we can use Bayesian techniques efficiently even without adequate prior information with some default or objective priors. A specific objectivity criterion for such priors which has found appeal to both frequentists and Bayesians is the probability matching criterion. Based on Datta and Sweeting (2005), a probability matching prior (PMP) is a prior distribution under which the posterior probabilities of certain regions coincide with their coverage probabilities, either exactly or approximately. Probability matching priors are applied in the construction of tolerance intervals which has important applications in industry.
Ong and Mukerjee (2011) developed two-sided Bayesian tolerance intervals, with approximate frequentist validity, for a future observation in balanced one-way and two-way nested random effects models using probability matching priors (PMP). On the other hand Krishnamoorthy and Lian (2012) examined closed-form approximate tolerance intervals by the modified large sample (MLS) approach. The objective of this work is to evaluate and perform a comparative study via Monte Carlo simulation between the PMP and MLS tolerance intervals for both normal and non-normal error distributions when the balanced one-way random effects models are of concern. The non-normal error distributions which are applied include the t-distribution, skew-normal (see Azzalini, 1985) and the generalized lambda distribution (see Karian and Dudewicz, 2000). Both t- and skew-normal distributions have heavier tails than the normal distribution while the generalized lambda distribution is a flexible four parameter distribution which is able to produce distributions with various shapes and skewness.
The second part of the research aims at developing two-sided tolerance intervals in a fairly general framework of parametric models. Higher order asymptotics are developed to obtain explicit analytical formulae for these intervals in both Bayesian and frequentist setups which lead to a characterization for probability matching priors ensuring approximate frequentist validity of two-sided tolerance intervals. For instances where the probability matching priors are difficult to be obtained, we develop purely frequentist tolerance intervals which cater to situations of this kind. We also apply these intervals with real life data.
2. Literature review
The construction of tolerance intervals for continuous distributions was extensively studied since the pioneering work of Wilks (1941, 1942). Guttman (1970) and Hahn and Meeker (1971) provided informative reviews up to various stages while Krishnamoorthy and Mathew (2009) did an excellent and up-to-date study on tolerance intervals. Several authors explored tolerance intervals for the one-way random effects model for both balanced and unbalanced cases. Sahai and Ojeda (2004) gave a detailed study on fixed, random and mixed analysis of variance (ANOVA) models.
One-sided tolerance intervals for these models were investigated among others by Mee and Owen (1983), Mee (1984), Vangel (1992), Bhaumik and Kulkarni (1996), Krishnamoorthy and Mathew (2004) and Liao et al. (2005). The study of two-sided tolerance intervals is more challenging than that of its one-sided counterpart. Mee (1984) extended the procedures in Mee and Owen (1983) to find two-sided tolerance intervals; see Beckman and Tietjen (1989) for further results in this direction. Wolfinger (1998) presented the Bayesian simulation approach which handles different types of Bayesian tolerance intervals. Hoffman and Kringle (2005) constructed two-sided tolerance intervals for general random-effects model for both balanced and unbalanced cases. Rebafka, Clémencon and Feinberg (2007) derived the new nonparametric bootstrap approach for two-sided mean coverage and guaranteed coverage tolerance limits for a balanced one-way random effects model.
Recently, Ong and Mukerjee (2011) derived two-sided Bayesian tolerance intervals with approximate frequentist validity, for a future observation in balanced one-way and two-way nested random effects models using probability matching priors (PMP). Some of the works which discussed probability matching priors include Datta and Mukerjee (2004), Datta and Sweeting (2005) and Ghosh et al. (2008). Krishnamoorthy and Lian (2012) studied closed-form approximate tolerance intervals by the modified large sample (MLS) approach which was introduced by Krishnamoorthy and Mathew (2009). The MLS approach is based on the procedure by Graybill and Wang for finding upper confidence limits for a linear combination of variance components. Krishnamoorthy and Lian (2012) also compared the MLS tolerance intervals with the tolerance intervals constructed using the generalized variable approach which was introduced by Liao et al. (2005). The PMP and MLS intervals were applied for non-normal errors and the distributions of interest are the t-distribution, skew-normal (Azzalini, 1985) and generalized lambda distributions. Karian and Dudewicz (2000) extensively studied the generalized lambda distribution.
In the second part of the study, we develop two-sided Bayesian and frequentist tolerance intervals for a general framework of parametric models. Probability matching priors for one-sided tolerance intervals were characterized in Mukerjee and Reid (2001). The tolerance intervals which will be studied involve the normal, Weibull and inverse Gaussian distributions.
Young (2010) gave a useful R package for obtaining tolerance intervals involving discrete and continuous cases as well as regression tolerance intervals. Krishnamoorthy and Mathew (2009) discussed non-normal tolerance intervals such as log-normal, gamma, two-parameter exponential, Weibull and other related distributions. For the Weibull distribution, tolerance limits were constructed using the generalized variable method. Statistical problems concerning the Weibull distribution are not simple due to the parameters not being in closed form. Thus, they are computed numerically. Approximate methods were proposed in constructing one-sided tolerance intervals for the Weibull case and these do not require simulation. Some of the works include Mann and Fertig (1975, 1977), Engelhardt and Bain (1977) and Bain and Engelhardt (1981). Krishnamoorthy and Mathew (2009) discussed Monte Carlo procedures for the computation one-sided tolerance limits, estimating a survival probability and for constructing lower limits for the stress-strength reliability involving the Weibull distribution. Tang and Doug (1994) proposed one-sided tolerance limits for the inverse Gaussian model and carried out Monte Carlo simulations to evaluate these limits in terms of coverage probability and average values.
We apply the two-sided tolerance intervals to real data. For the Weibull tolerance interval, we consider the shelf life data in Gacula and Kubala (1973). As for the inverse Gaussian case, it is mentioned in Chhikara and Folks (1989) that the inverse Gaussian model fits the failure of ball bearings data in Lieblin and Zelen (1956).
3. Two-sided tolerance intervals for the balanced one-way random effects model
Research methodology and outputs
3.1 The balanced one-way random effects model
The balanced one-way random effects model is as follows:
Here denotes the observation, is the population mean, is the random effect parameter for the class and is the observational error associated with . The and are independent, and Then the maximum likelihood estimator (MLE) of is given by ,where
In the above,is the grand mean of the 's, while MSW and MSB are the usual mean squares within and between classes, i.e,
being the mean of the class. In the following sections, we consider asymptotics as so as to ensure the consistency of these MLEs.
3.2 The Bayesian tolerance interval with approximate frequentist validity
Following Ong and Mukerjee (2011), we consider a prior (>0) which is twice continuously differentiable. Let
, and , (3.2)
where and . Let be the common t-variate normal density of , where . For , define
The matrix is positive definite and by writing , we have in particular,
Under the balanced one-way random effects model given in (3.1), each A Bayesian tolerance interval under a prior , for a future observation having the same distribution is considered. According to Ong and Mukerjee (2011), such a tolerance interval, which is -content with posterior credibility level , is given by:
where , (is a standard normal cdf),
The interval in (3.3) has approximate frequentist validity, i.e., it is -content with frequentist confidence level , when is taken as a PMP. Following Ong and Mukerjee (2011) such a prior is given by
In our comparisons, we will consider the interval (3.3) based on the PMP as specified by (3.4) and (3.5).
3.3 Modified large sample (MLS) tolerance intervals
The MLS method was first proposed by Graybill and Wang (1980) in constructing confidence intervals. Krishnamoorthy and Mathew (2009) applied this method to construct two-sided tolerance intervals for some linear models.
According to Krishnamoorthy and Lian (2012), to construct the tolerance intervals for a distribution, let and , so that . Let
and (From (3.2)).
, , and
, and are mutually independent.
Note that , , and .
The construction of the tolerance interval simplifies to the construction of an upper confidence limit for and Krishnamoorthy and Lian (2012) provide a detailed discussion on the modified large sample tolerance intervals.
The MLS tolerance interval is given by:
where the MLS upper confidence limit for is given by
3.4 Monte Carlo simulation study
We perform a Monte Carlo simulation study to compare the performance of the two-sided Bayesian PMP tolerance interval and the MLS tolerance interval for the balanced one-way random effects model with the experimental error following the standard normal distribution and the non-normal distributions such as the t-distribution, skew-normal distribution and generalized lambda distribution. The PMP and MLS tolerance intervals were used for all cases as if the assumptions where all underlying distributions are normal are justified even though the data comes from another distribution. Our purpose is to see the effect on the expected width as well as the confidence level when the distribution generating the data deviates from the normal.
The two-sided PMP and MLS tolerance intervals were constructed for and for data from both normal and non-normal experimental error distributions. For each simulated interval, the content was calculated as where U and L respectively represent the upper and lower bounds of the tolerance intervals. This process was repeated 2500 times for various combinations of (n, t) and , the intra-class correlation coefficient where . The confidence level or the proportion of times the content of the simulated intervals was at least was computed. The confidence depends on parameters estimated via . We do not vary the mean, in the model (3.1) as it has no impact on the interval.
Tables 1 and 2 give the confidence level and expected widths for various and some combinations of n and t when the error distribution is standard normal. Both PMP and MLS tolerance intervals show conservatism in terms of confidence levels for small and moderate values of but the PMP method is slightly less conservative for moderate . The MLS tolerance interval seems to work well for smaller sample sizes and shows slight conservatism as the number of classes increases. The PMP tolerance interval appears to be more accurate for larger values of and has confidence level close to the nominal value 0.95 when the number of classes is around 25 to 50, when t remains as 2. It is necessary to maintain the balance between n and t to achieve confidence level close to 0.95. The ratio n: t is approximately 12.5:1 to 25:1 to attain this for the PMP case. The expected widths for the MLS tolerance interval are wider than that of the PMP. The wider expected widths for the MLS case enables it to cover a proportion closer to 0.95 for smaller sample sizes but is conservative as the number of classes increases.
We study data generated with experimental error following the t-distribution with degrees of freedom 3, 5, 10, 15 and 25. When, the confidence level for both PMP and MLS tolerance intervals gets closer to 0.95 as the degrees of freedom increase from 15 onwards. It seems that the confidence level happens to be close to the nominal level 0.95 for degrees of freedom as small as 3 for both cases as and 0.999. The expected widths for both instances are comparable to the standard normal case.
The probability density function (pdf) for the skew-normal distribution (Azzalini, 1985) is
where , and are the location, scale and shape parameters respectively.
The data generated has eij following a skew normal distribution with , and shape=. The error distribution follows a standard normal when . We study the tolerance intervals for the skew-normal distribution whose tail is heavier than the normal distribution involving different shape parameters. Both PMP and MLS tolerance intervals appear to have confidence levels close to 0.95 when is small i.e. 0.40 for. The results become conservative as increases and are still acceptable for. However, the results involving the expected widths and confidence levels tend to be comparable to that of the standard normal case when and 0.999. We did not report the results for the negative shape parameters here because they were very similar to the positive shape parameters.
When the experimental error follows the generalized lambda distribution, we refer to its percentile function based on the Ramberg and Schmeiser's parameterization method which is:
where . and are respectively the location and scale parameters while and jointly determine the shape (with mostly effecting the left tail and the right tail).
We use the normal approximation parameters, GLD (0, 0.1975, 0.1349, 0.1349) suggested by Karian and Dudewicz (2000).
The results for these estimates are comparable with the standard normal case. We study the performance of the GLD experimental error by varying the parameters. The results are conservative for and more accurate for and 0.999. For and 1.00 where , the distribution is no longer symmetrical. The PMP and MLS tolerance intervals seem to be comparable with the normal case when and 0.999.
Table 1. Simulated confidence levels for two-sided tolerance intervals using the probability matching prior (PMP) method and modified large sample (MLS) procedure where the error distribution follows the standard normal distribution.
Table 2. Expected widths and their respective standard errors (bracketed) of the tolerance intervals where the error distribution follows the standard normal distribution using the PMP and MLS methods.
4. Two-sided Bayesian and frequentist tolerance intervals
Research methodology and outputs
4.1 Two-sided Bayesian tolerance intervals
We shall consider independent and identically distributed scalar-valued observations from a population specified by a density where =is an unknown parameter that belongs to the p-dimensional Euclidean space or some open subset thereof.
Let (> 0) be a smooth prior on . We work under the assumptions of Johnson (1970). For the frequentist calculations, we require the Edgeworth assumptions of Bhattacharya and Ghosh (1978). These two sets of assumptions hold under wide generality for models from exponential and curved exponential families and also for many other models such as Cauchy, Student's t and so on(Datta and Mukerjee (2004)).
Let be the c.d.f. corresponding to and let be the th quantile of the population represented by .The tolerance interval [,], where
,(> 0) satisfy =, covers a proportion of the population.
For notational simplicity, we write = and = which leads to the Bayesian tolerance interval:
where= is the maximum likelihood estimator of based on the data , and
where , are functions of X (these may as well involve the prior). To choose and so that (4.1) is -content with posterior credibility level , namely,
where. being the posterior probability measure under the prior .
The following notation will be used:
is the per observation observed information matrix at . Write = .
Let and be vectors with sth elements given by and respectively. We assume that is non-null for every , which implies that is also non-null and that, as a result, the quantity M = is positive.
Let == , =,
=, = . (4.4)
While defining the quantity M and other quantities in this work, the summation convention is followed, with implicit sums on repeated sub- or superscripts in a product ranging over 1… p.
Theorem 1. The tolerance interval [,] is -content with posterior credibility level +, i.e., (4.3) holds, provided and in the expression (4.2) for satisfy = and
where q is the -th quantile of the standard univariate normal distribution, and
=, =. (4.5)
In Theorem 1, is free from the prior, while involves the prior only though the term. With and as in Theorem 1, it is easy to find satisfying (4.2). All the symbolic computation such as partial derivatives for and , as evaluated at and x = or can be readily obtained via MATLAB symbolic computation. It is easy to write a program in MATLAB to compute the tolerance intervals.
4.2 Frequentist tolerance interval via probability matching prior
The prior under which it is -content not only with posterior credibility level but also with frequentist confidence level is referred to as a probability matching prior for a two-sided tolerance interval .
Let I = denote the per observation expected Fisher information matrix at , and write =, where . Note that because of our assumption that the vector is nonnull for every. Then the following result, characterizing probability matching priors in the present context, holds.
Theorem 2. The Bayesian tolerance interval in Theorem 1 is -content with frequentist confidence level if and only if the prior satisfies the partial differential equation
= 0. (4.6)
If if p = 1, i.e., is a scalar, then both and I are scalars. Since is assumed non-null, is either positive or negative for all . Thus,
The matching condition reduces to with the unique solution ; Jeffreys' prior. Thus we obtain a probability matching property of Jeffreys' prior for two-sided tolerance intervals in the case of scalar .
Alternative choices for
In general, if a probability matching prior, say is available, a -content with frequentist level two-sided tolerance interval is simply constructed as
Simulation studies help us to determine the appropriate choice of .
Example 1: Consider the Weibull model =, , where and ,.
Here = and =, where and .
Hence one can check that
and, the assumption that is non-null for every holds. In this example, , and , and hence by (4.7), is a constant free from . As a result, emerges as a solution to the matching condition (4.6).
Solutions for the matching conditions are readily available for the Weibull example. However, there are cases such as the inverse Gaussian model where finding a solution to (4.6) can be difficult. This is because such models do not admit analytical expressions for and , and hence do not allow us to write (4.6) explicitly. Thus, it is not always possible to obtain a two-sided frequentist tolerance interval using a matching prior in Theorem 1. A direct method is required to construct two-sided frequentist tolerance intervals. Interestingly, even for purpose, the Bayesian approach continues to be useful.
4.3 Purely frequentist two-sided tolerance intervals
The Bayesian tolerance interval depends on the prior only through the term , of order , in the expression for. This motivates us to consider a purely frequentist tolerance interval of the same form, with replaced appropriately by a term which is also of order but does not involve any prior
We write to make explicit the dependence of on , and define = and , where , . Also, let = , where =, with
and defined similarly, replacing by in (4.8).
Theorem 3. The tolerance interval [,],
where =, = and
with, , as in (7), and
= , (4.9)
is -content with frequentist confidence level +.
A simple interpretation of in (4.9) can be written as + where
The form (4.9) has the advantage that it allows calculation of even when analytical expressions for and are not available, because one needs only and for this purpose.
With and as in Theorem 3, there are numerous choices of . These include:
==, == and ==. (4.11)
Simulation studies enable us to choose the suitable . For the inverse Gaussian model, the choice =entails a fast convergence of the simulated frequentist confidence level to the target. We notice the similarity between in (4.10) and matching condition (4.6). This shows that the Bayesian form is useful even in proving Theorem 3 which is a purely frequentist result.
4.4 Simulation study and application to real data
For the Weibull model in Example 1, the closed form expression for =, the MLE is not available. Hence, we calculate it from a given data set using standard statistical software such as MATLAB. Here p = 2, and it can be seen that
=, =, =,
where =( j = 1, 2, 3) and =.
(B) Inverse Gaussian Model
The inverse Gaussian model is specified by
where is the standard univariate normal density, and ,.
Here p = 2, and =, =, where and are the arithmetic and harmonic means, respectively, of .
=, = 0, =,
=, =, =0, =,
, , , ,
, , .
The expression for is simplified to some extent for the inverse Gaussian model when.
4.4.1 Simulation study
We perform a simulation study to examine the finite sample implications of our results by examining the simulated frequentist confidence levels for the following tolerance intervals:
(I) The Bayesian-cum-frequentist interval for the Weibull model under the matching prior .
(II) The purely frequentist interval for the inverse Gaussian model.
We take= 0.05, = 0.05, i.e., = 0.90, and = 0.90 and 0.95 based on 10000 iterations. Both intervals (I) and (II) are -content with frequentist confidence level . We also display the merits of the naïve interval , where = as in Theorem 1 or 3 for comparative purposes. The naïve interval is based on simpler asymptotics. It is -content with frequentist confidence level rather than .
Results show that the convergence of the simulated frequentist confidence level to the target is quite fast in Table 3 and slightly slower for the inverse Gaussian case in Table 4. The tables also show that the convergence to the target is much slower for the naive intervals. Hence, the higher order asymptotics approach entails significant gains.
Table 3. Simulated confidence levels for higher order asymptotic interval (top entry) and the naive interval (bottom entry) for the Weibull model;
= 0.05, = 0.05
Table 4. Simulated confidence levels for higher order asymptotic interval (top entry) and the naive interval (bottom entry) for the inverse Gaussian model;
= 0.05, = 0.05
4.4.2 Application to real data
Weibull tolerance interval
The following data from Gacula and Kubala (1975) represent shelf life in days of a refrigerated food product:
24 24 26 26 32 32 33 33 33 35 41 42 43
47 48 48 48 50 52 54 55 57 57 57 57 61
The Weibull model fits the data well. Therefore, we apply our results to this data set under the framework of the Weibull model. For this data, n = 26. We take = 0.05, = 0.05, i.e., = 0.90, and = 0.90 and 0.95. Then, under the Weibull model, for the present data set, = 47.2816 and = 4.3329.
Based on Example 1, the prior meets the matching condition (4.6). Thus, the two-sided Bayesian tolerance interval in Theorem 1 is also frequentist. We take = where the simulation study earlier show to work well for the Weibull model. The Bayesian-cum-frequentist tolerance interval for the Weibull model is given by .
= 60.9067 and = 23.8223. Therefore, using (3.5) and the facts in (A), we get
M = 0.2425, = 0.7191, = 2.0224, =0.8436, = - 0.0219,
Hence, by Theorem 1, for = 0.90 and 0.95, the pair and the associated Bayesian-cum-frequentist tolerance interval as indicated above turn out to be as follows:
= 0.90: = (15.9195, 35.2285), tolerance interval = [19.0037, 65.7253].
= 0.95: = (20.4324, 42.7722), tolerance interval = [17.7811, 66.9480].
Thus, we can say that with about 90% confidence (=0.90), at least 90% of the refrigerated food products lasted between 15.9195 and 35.2285 days. The similar interpretation is applied for the results for =0.95.
Inverse Gaussian tolerance interval
The following data, originally from Lieblin and Zelen (1956), represent the number of million revolutions before failure for each of 23 ball bearings:
17.88 28.92 33.00 41.52 42.12 45.60
48.48 51.84 51.96 54.12 55.56 67.80
68.64 68.64 68.88 84.12 93.12 98.64
The inverse Gaussian model fits the data well. Thus, we apply the the purely frequentist two-sided tolerance interval [,] as given by Theorem 3, taking =
For the purpose of comparison, we also report the Bayesian tolerance interval [,] as given by Theorem 1, taking = and using the prior .
Here n = 23. We take = 0.05, = 0.05, i.e., = 0.90, and = 0.90 and 0.95. Then, under the inverse Gaussian model, for the present data set, = 72.2243 and = 231.6741, so that = 150.1856 and = 26.9034. Therefore, using (4.5), (4.9) and the facts noted in (B) above, we get
M = 0.2397, = 1.0385, = 0.9493,
= 1.7643, = 0.8377, = - 0.0098,
Theorems 3 and 1, for = 0.90 and 0.95, the pairs and , and the associated frequentist and Bayesian tolerance intervals as indicated above turn out to be as follows:
= 0.90: = (32.9318, 75.2455), = (32.9318, 72.9541),
Frequentist tolerance interval = [13.7880, 163.3009],
Bayesian tolerance interval = [14.1417, 162.9473].
= 0.95: = (42.2675, 91.2664), =(42.2675, 88.9750),
Frequentist tolerance interval = [10.8721, 166.2168],
Bayesian tolerance interval = [11.1951, 165.8938].
Interestingly, even though we did not work with a matching prior in the present context, the Bayesian tolerance interval comes quite close to the frequentist one for both = 0.90 and 0.95.