Monte Carlo Markov Chain Imputation Method Accounting Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

A gentle general introduction to imputation methods is provided .The focus of this report is to explore the Monte Carlo Markov Chain (MCMC) imputation method. Initial discussion is on the underlying mathematics of the procedure. Implementation of the procedure with the aid of SAS programs is also carried out to compare and contrast between multiple and single imputation methods and. Further discussion on the two imputation methods will be

1.2 Introduction

Multiple imputation (MI) is a statistical iterative technique used to analyze missing data. The procedure replaces each missing value a vector M2 of augmented data to complete the set to allow for usual statistics inference of the dataset. [1] After completing M sets of inferences under a specific model, the set are then combined to form one complete -data inference. This idea was first postulated by Rubin and has proved to be very valuable for researchers who have to treat missing data from surveys. It is imperative to note at this stage that there are uncertainties and assumptions that go with modelling missing data. While there are many imputation techniques that can be adopted, the method of choice will have to depend on various factors such as the nature of data collected and causes of missing data. Some of these methods will be discussed in this report but the focus of the report will be on one specific imputation method.

1.3 Types of Missingness of Data and Imputation Methods

Missing completely at random (MCAR): This is when the missing values in the survey are completely randomly distributed. Although in practice this hard to deduce if the missing data is missing completely at random carrying out statistical test such as t-test to test for significance of the missing data against complete data sets.

Missing at random (MAR) is when missing values are randomly distributed within one or more sub-samples and not across all observations. This is a more common case than the former.

There are various methods that can be used to impute values for missing values and the decision on the choice depends on the type of missing data.

A data set with variables (Y1, Y2, ..., Ym ) has a monotone missing pattern if event that has a variable Yj is observed implies all previous variables Ym, m< j, are also observed in that data set.

In this case a parametric regression method assuming multivariate normality or a nonparametric method utilizing propensity scores is appropriate.

For arbitrary missing data, a Markov Chain Monte Carlo (MCMC) method that assumes multivariate normality is ideal.

When you have a monotone missing data pattern, you have greater flexibility in your choice of strategies. For example, you can implement a regression model without involving iterations as in MCMC. When you have an arbitrary missing data pattern, you can often use the MCMC method, which creates multiple imputations by using simulations from a Bayesian prediction distribution for normal data. Another way to handle a data set with an arbitrary missing data pattern is to use the MCMC approach to impute enough values to make the missing data pattern monotone. Then, you can use a more flexible imputation method. In this report I will focus on MCMC.

1.4 Use Of Bayesian Theorem in Multiple Imputation

At this point I will not dwell into the specifics of the Bayesian theorem as a standalone concept but will instead explain the tailored version used in MI implementation. The appendix can be referred to for a more detailed introduction and explanation of the Bayesian Posterior Distribution.

For simplicity I will focus on a problem with two parameters γ1 ,γ2 and data y. These, Bayesian theorem and analysis have a joint posterior distribution

. (1)

Focusing on γ2 and partitioning the posterior


it can be deduced now that the marginal posterior for γ2 can be expressed as


And in particular, the posterior mean and variance for γ2 is given by:

(4) (5)

With aid of empirical moments approximations of those formulas can be deduced:

Let, m=1,…, M be draws from the marginal posterior distribution of .

Let (6)

The variance can also then be approximated to be:

. (7)

Finally, putting everything into perspective, this model can now be applied into MI procedures by having γ2 representing the parameters of the substantive model and γ1 representing the missing data.

2.0 The Markov Chain Monte Carlo Imputation Method.

A Markov chain is random variable sequence distribution where the value depends only on the previous ones. Examples include simple random walks. Markov chain Monte Carlo (MCMC) is a collection of methods for simulating random draws from nonstandard distributions via Markov chains. This is particularly useful when dealing with data of a nontrivial pattern. Here MCMC is initiated to produce several independent replacements of the missing data from a predictive distribution. These results are then used for multiple imputation inference. Because of its efficiency the MCMC requires only relatively a few draws and iterations for plausible imputation analysis to take place.

Here the posterior distribution is implemented with drawings from the Bayes' theorem:


One key assumption for using the MCMC method is that the dataset follows a multivariate normal distribution.

Imputation can then be carried out by carrying out the following steps repetitively:

2.1 Imputation Stage:

With the aid of a covariance matrix, ∑ and estimated mean vector,µ missing values are imputed without dependence on other observed values. The values are drawn from

Let be the mean vector of Yobs and Ymis partitioned respectively.

Additionally it is important that ∑ be partitioned,


Where ∑11 = covariance matrix for Yobs ,

∑22 = covariance matrix for Ymis

and ∑12 = covariance matrix between Yobs and Ymis .

In order to be able to calculate the conditional covariance matrix of variable Ymis with Yobs controlled I shall apply the sweep operator [2] on the pivots of ∑11 to obtain

. (10)

Here = . (11)

For clarity, it follows then that conditional distribution of Ymis knowing Yobs= y1 is multivariate normally distributed with mean vector:


2.2 Applying Bayes Theorem To Estimate Covariance Matrix And The Mean Vector

Letting Y= (y´1, y´2,...., y´n) be an (n*b) matrix with n(b*1) independent vectors yi, which have mean of zero and their covariance matrices represented by . It follows then by the theorem that

A = Y´Y = ∑yiy´i is Wishart distributed with W (n,). (13)

Intuitively from (13), it follows that


n being the degrees of freedom

and the precision matrix denoted by

From equations (9) and (12) and holding the multivariate normal distribution assumption we can deduce the prior [3] inverted Wishart distribution of these parameters as follows:

, (15)

, (16)

where r > 0 is non-changing number. [4] The posterior distribution is then


(n- 1) S is the CSSCP [5] matrix.

2.3 Combining Inferences Of Multiple Imputations.

After imputing m times standard data analysis can be applied to each set producing m sets inferences containing information such as variance. These can now be combined to produce point and variance estimates for a parameter Q. Letting I and be the point and variance estimates from the ith augmented data set, i=1, 2, ..., m. Then Q is the average of the m complete-data estimates given as shown below:


Let the average of the m complete-data estimates, be the within-imputation variance, given by:


B representing the between-imputation variance,


Applying Rubin's idea, total variance is thus given by:

. (20)

Combining (17) and parts (20) the "mean" (Q-T-(1/2) has an approximate t-distribution [6] with

vm degrees of freedom. Note vm is given as shown below:

v_{m}=(m-1) [1 + \frac{{\overline U}}{(1+m^{-1})B} ]^2 (21)

The ratio r is called the relative increase in variance due to nonresponse and is zero if there is no missing information about Q. When either m is large or r is small it results in the degrees of freedom v being large and (Q-{\overline Q}) T^{-(1/2)}will be approximately normally distributed .

Equally important is the insight into the fraction [7] of missing data regarding Q: \hat{\lambda}=\frac{r+2/(v+3)}{r+1} . (22)

These are imperative in the assessment of the contribution the missing data has on the uncertainty of Q.

2.3 Multiple Imputation Efficiency.

Rubin postulated that the more imputations one takes the less variance of the inferences will be.

It follows that carrying out an infinite imputations will be the best way to minimise the estimator variance. However this is neither practical nor economical. One can only compute several imputations and relative efficiencies (RE) of these can be calculated as functions of m and\lambda:


3.0 Implementation of the MCMC in SAS.

In this section I am going to carry out multiple imputations on a sample of data (restrnt.txt) [8] by programming with SAS©. This will allow me to carry out standard analysis of the data set taking into account the significance of any systematic reasons as to why the data is missing in the first place.

Additionally I will also carry out single imputation of the data and again carry out standard analysis on the "new complete data". This will form the basis on my discussion on the comparison between multiple and single imputations as choices for dealing with missing data.

Please note full codes and all output are included in the appendix for a detailed overview of the execution of the programs I made. Included in there also is a brief manual booklet explaining the workings of the commands employed.

3.1 Single And Multiple Imputation Models.


Method MCMC

Multiple Imputation Chain Single Chain

Initial Estimates for MCMC EM Posterior Mode

Start Starting Value

Prior Jeffreys

Number of Imputations 10

Number of Burn-in Iterations 200

Number of Iterations 100

Seed for random number generator 1305417Figure Model Information for Multiple Imputations..

Figure 1 is the SAS output of the code showing that the multiple imputation has been carried out ten times, m=10 and the method applied was the MCMC. In order to get around the fact that the Markvov chain reaches its stationary distribution after several iterations I have "burnt in" the first 200 iterations, to help discard the dependence in the early par of the Markov chain.

As detailed in the appendix the reports also shows the missing data patterns. Here each variable is analyzed and the frequency of missing data in each is noted. This may be crucial information from a data collector's stand point as it helps highlight the nature and source of missing data. Again analyzing those patterns it is a quick process for one to realize that the data is not missing monotonically as discussed above.

Figure 1.1 shows the model information for the single Imputation model. Note only one imputation took place and the assumption that no prior information is known about the covariance matrix and the mean is preserved by using the command "Jeffreys" on the prior call. For consistency the random number generator

The MI Procedure

Model Information


Method MCMC

Multiple Imputation Chain Single Chain

Initial Estimates for MCMC EM Posterior Mode

Start Starting Value

Prior Jeffreys

Number of Imputations 1

Number of Burn-in Iterations 200

Number of Iterations 100

Seed for random number generator 1305417

Figure 1.1

3.2 Variance Information of the Models.


Variable Between Within Total DF

SALES 47.688718 1499.058115 1551.515705 257.05

NEWCAP 0.244187 2.678453 2.947059 203.1

VALUE 67.740387 2657.343936 2731.858362 261.73

COSTGOOD 0.132985 0.658865 0.805148 123.29

WAGES 0.044648 0.428854 0.477967 191.37

ADS 0.006896 0.055431 0.063017 174.1

TYPEFOOD 0.000058083 0.002505 0.002569 263.33

SEATS 0.113954 13.736092 13.861442 271.86

OWNER 0.000096182 0.003242 0.003348 258.68

FT_EMPL 0.014620 1.182980 1.199062 269.87

PT_EMPL 0.015868 0.712927 0.730382 263.95

SIZE 0.000075479 0.002382 0.002465 257.14Figure 2.1 Information of The Variance of the Variables in MI.

Figure 2.2 Missing Values' Influence On Variance.

Relative Fraction

Increase Missing Relative

Variable in Variance Information Efficiency

SALES 0.034994 0.034056 0.996606

NEWCAP 0.100284 0.092817 0.990804

VALUE 0.028041 0.027437 0.997264

COSTGOOD 0.222024 0.187623 0.981583

WAGES 0.114520 0.104850 0.989624

ADS 0.136857 0.123201 0.987830

TYPEFOOD 0.025506 0.025005 0.997506

SEATS 0.009126 0.009061 0.999095

OWNER 0.032635 0.031819 0.996828

PT_EMPL 0.024483 0.024022 0.997604

SIZE 0.034858 0.033927 0.996619

Figures 2.1 and 2.2 here highlight the within-imputation, between-imputation and total variances with the latter the result of the complete-data analysis. Fig 2.2 is explicit on the effect of the missing values has on increasing variance. Despite his the relative efficiencies' of all the variables are all high at more than 98% efficient, with the least one being for the cost of good at 98.1583 % efficient. RE is discussed in greater detail in previous sections. Also in this case it is inferable that inferring for that same variable (costgood) brings about the largest relative increase in the variance. It can be conclude that the uncertainty associated with missing of data of this variable has uncertainty of relatively higher significance than the other variable.

Of course this does not apply to SI as there is only one set of imputations.

Posterior e

Figure .1 Parameter Estimates for Multiply Imputed Data.

Multiple Imputation Parameter Estimates

Variable Mean Std Error 95% Confidence Limits DF

SALES 324.900360 39.389284 247.3336 402.4671 257.05

NEWCAP 12.697842 1.716700 9.3130 16.0827 203.1

Multiple Imputation Parameter Estimates

t for H0:

Variable Minimum Maximum Mu0 Mean=Mu0 Pr > |t|

SALES 314.607914 332.600719 0 8.25 <.0001

NEWCAP 12.046763 13.683453 0 7.40 <.0001

Figure 3.2. Parameter Estimates for Multiply Imputed Data.

Multiple Imputation Parameter Estimates

Variable Mean Std Error 95% Confidence Limits DF

VALUE 344.361871 52.267182 241.4442 447.2796 261.73

COSTGOOD 45.296763 0.897301 43.5207 47.0729 123.29

WAGES 24.924820 0.691351 23.5612 26.2885 191.37

ADS 3.914029 0.251032 3.4186 4.4095 174.1

TYPEFOOD 1.864748 0.050684 1.7650 1.9645 263.33

SEATS 71.805036 3.723096 64.4753 79.1348 271.86

OWNER 2.126259 0.057859 2.0123 2.2402 258.68

FT_EMPL 8.014748 1.095017 5.8589 10.1706 269.87

PT_EMPL 12.798921 0.854624 11.1162 14.4817 263.95

SIZE 1.674460 0.049648 1.5767 1.7722 257.14

t for H0:

Variable Minimum Maximum Mu0 Mean=Mu0 Pr > |t|

VALUE 326.848921 353.402878 0 6.59 <.0001

COSTGOOD 44.899281 46.093525 0 50.48 <.0001

WAGES 24.507194 25.241007 0 36.05 <.0001

ADS 3.787770 4.050360 0 15.59 <.0001

TYPEFOOD 1.848921 1.874101 0 36.79 <.0001

SEATS 71.388489 72.312950 0 19.29 <.0001

OWNER 2.115108 2.143885 0 36.75 <.0001

FT_EMPL 7.877698 8.233813 0 7.32 <.0001

PT_EMPL 12.593525 12.949640 0 14.98 <.0001

SIZE 1.661871 1.690647 0 33.73 <.0001

Figures 3.1 and 3.2 show the result of the analysis of the parameter estimates

There is a great variation in the confidence limit at 95% of the variables computed under MCMC. This may not necessarily be inherent with the model itself but due to the fact that each of the variable may have an independent relationship with the response variable "outlook" which in turn will result in a noon trivial pattern of the imputed data. The standard errors of the variables are relatively low bar the imputed values of the "value" variable which shows the largest sampling uncertainty under this model.

With the aid of the MINIANALYSE procedure of SAS I was able to carry out several additional complete data analysis on the imputed data. Comparison of the covariance and means of the imputed data was carried out [9] . It is clear from observation of these results that MI in this case offers a method with less variance increase with each imputation. However it is important to note these results use adjusted degrees of freedom which could have an influencing factor in the relative higher variance in uncertainty with the SI method.

After carrying out several analyses I at this point recommend the use of MI over SI for imputing this particular dataset. MI's better performance could be attributed to the high multivariate level of the data and the fact that most of the variables had missing data. This however does not mean that MI is strictly better than SI but rather that decision on the usage of either should be made on a case by case basis.

4.0 Advantages And Disadvantages Of MCMC Method.

By definition MI completes missing data allowing for complete -data methods of analysis. Useful inferences can then be deduced from the otherwise unusable data. This is particularly important for survey results.

4.1 Advantages Of MCMC Multiple Imputation

The most important advantage the MCMC has it is its ability to infer data set which do not conform to trivial patterns. This is due to the fact that it utilises Markvov chain which are random patterns in the nature. This

Also due to its relative efficiency the MCMC multiple imputation only requires a few draws to carry out plausible imputations. This saves time and resources and makes it an attractive method of use.

Imputed values calculated based on the data collector's own values on other variables.

Unique set of predictors can be used for each variable with missing data ensuring patterns are preserved as practically possible in such situations.

4.2 Disadvantages.

Method only works well for estimations when dealing with monotonically missing data. But in reality

Reinforces existing relationships and reduces generalizability. [10] 

Must have sufficient relationships among variables to generate valid predicted values

Understates variance unless error term added to imputed value.

Replacement values may be out of range [11] .

5.0 Single Imputation Comparison.

In Single Imputation (SI) one value is generated for each missing value and this method is the most commonly implemented one for handling item in nonresponse for modern survey practice.

5.1 Advantages Of Single Imputation:

Just as is the case with MI once the values have been imputed standard statistical inferences can be drawn for the data with relative accuracy.

In many cases MI prove not to be necessary as the inference obtained is similar to that if SI had been used. In those cases using SI would be an advantage for efficiency and time saving purposes.

This is of importance as other methods of dealing with missing data may need extra input before the data can undergo standard analysis. Hence SI is an efficient method of dealing with nonresponse.

As SI can be carried out by the data collector, the procedure will benefit from having someone privy to source information carrying out the analysis.

5.2 Disadvantages Of Single Imputation.

The first obvious problem with SI is that because only one set of imputed values is produced it inevitably treats the missing data as known with adjusting for sampling variability or the uncertainty brought about by using that specific model to do the imputation in the first place.

6.0 Conclusion.

Multiple Imputation is a genuine way of dealing with missing data compare to the other alternatives. In additional to allowing complete data analysis on such data, more importantly it does not disregard there possibility of a systematic reasons as to why the data might be missing in the first place. Simply ignoring missing data may lead to wrong inferences being made by omitting trends oppressed due to missing data. In conclusion, there is no single imputation method that is appropriate for every scenario so I will say the burden is on the researcher to exercise extreme caution and make a valid judgment as to the choice of the imputation method on a case by case basis. Some of the imperative factors to consider have discussed in this report.


J. F. Hair, W.C. Black, B.J. Babin, R.E. Anderson, R.L. tatham. Multivariate Data Analysis. New Jersey: Pearson Education, Inc, 2006.

Rubin, Roderick J.A. Little & Donald B. Staitistical Analysis With Missing Data. New Jersey: John Wiley & Sons Inc, 2002.

Schafer, J.L. Analysis of Incomplete Multivariate Data (MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY). Florida: Chapman & Hall/CRC, 1997. (accessed January 11, 2010).