# Theoretical Framework of latent Class Analysis

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Traced back to the early twentieth century, the origin of latent variable modeling is about the study of human abilities. Spearman (1904) discusses that people, especially children, who perform well in one test of mental ability also do well on others. This fact led to the idea that there is some underlying general ability, which might be summarized as intelligence of these tests. However, the scores on different tests were not perfectly correlated and this could be explained by adding some factors to account for the variation in performance from one item to another. Thus the combinations of these factors produce the actual performance.

Using latent variables enables us to explain observed facts through unobserved reasons which could be both continuous and discrete variables. Since a lot of social sciences require categorical variables, latent structure analysis definitely provides a more flexible approach to classify data. Mccutcheon (1987) discusses that the basic aim of latent structure analysis is quite straightforward. The covariation actually observed among the manifest variables is due to each manifest variable's relationship to the latent variable. In other words, the latent variable "explains" the relationship between the observed variables. Mccutcheon also gives a religious example. Religious commitment is an unobserved variable, but we could detect people's frequencies of attending church and frequencies of praying. These observed variables are explained by those latent variables to some extent, and covariation among observed measurement is expected. Through studying the pattern of interrelationships among the manifest variables we could understand the underlying reasons.

Some other authors also believe that latent variable analysis is a rational statistical tool. Bartholomew and Knott (1999), for example, discuss two merits of latent variable analysis. The first one is that a latent variable model provides a way to condense many variables with as little loss of information as possible. In other words, the technique of latent variable analysis is to reduce dimensionality. When the information contained in the interrelationships of many variables could be transformed into a smaller set, the ability to understand and characterize the structure of the data will be improved. Large scale statistical investigation like social surveys, generate much more information that need to be summarized. For example, for a sample survey which contains 100 questions and has 1000 respondents, elementary statistical methods usually summarize the data by looking at the frequency distributions or the correlation coefficients. However, it is difficult to see the underlying pattern of the data with so many variables, because our ability to observe a relationship between variables is limited to two or three dimensions. This fact motivates us to reduce the dimensionality of data with as little loss of the structure as possible. And dimensional reduction is reasonable because most of the questions in a sample survey are overlap to some extent. For example, one's view about expectation of personal health care and tax levels for higher earners might both be regarded as reflex of a basic political position. Therefore, the variable condensing is rational, so is the latent structure analysis.

Moreover, latent structure analysis is necessary. This is especially true in social sciences research, because there is not any available tool to measure some concepts in social sciences. For example, Bartholomew and Knott (1999) discusses that business confidence is spoken of as a real variable, changes in which will influence the share prices or the value of the currency. But business confidence is an ill-defined concept since it is a complex of beliefs and attitudes. It is impossible to theorize a social phenomenon without introducing a hypothetical variable. Therefore, the problem for a statistician is to establish a theoretical framework within which we could represent these quantities by numbers. One way to do that is to set some indicators which can be measured. For example, answers a set of yes/no questions, then find the common response pattern to these questions.

## 3.2 Latent structure analysis

Generally, data could be classified into four types: nominal, ordinal, interval and ratio. Bartholomew and Knott (1999) adopt a twofold classification: metrical and categorical data. Metrical variables correspond to a set of real numbers including both discrete and continuous variables. Categorical variables, on the other hand, assign individuals to one of a set of categories, including ordered or unordered. Bartholomew and Knott (1999) also suggest a fourfold classification of latent variable methods, as below:

Table 3.1. Classification of latent variable methods

(reproduced from Bartholomew and Knott, 1999)

## Manifest Variables

Metrical

Categorical

## Latent Variables

Metrical

Factor analysis

Latent trait analysis

Categorical

Latent profile analysis

Latent class analysis

Part of the early work of latent variables used factor analysis. This is a common technique of multivariate analysis, focusing on understanding metrical latent variables through analyzing metrical observed variables. Mccutcheon (1987) also discusses that the popularity of regression analysis definitely contributed the development of factor analysis. Because factor analysis could transform several observed variables into a few latent variables, then the predicated factor scores could be used in the regression analyses.

Other researchers share common or similar opinions on the classification of latent variable methods. Lazarsfeld (1968) concludes factor analysis as latent structure method for continuous latent variables based on continuous manifest variables. Green (1952) discusses latent class analysis, can be considered a qualitative method identifying categorical latent variables from two or more categorical observed variables. Another two noteworthy methods complete the array of latent structure methods: latent trait analysis help the characterizing of metrical latent variables from categorical manifest variables, and latent profile analysis make characterizing of categorical latent variables from metrical manifest variables available.

Recently, research with categorical latent variable is recognized as a major research topic in social sciences, because most variables in social research, both observed and unobserved, are categorical. Such variables usually are measured by nominal or ordinal measurement. Therefore, latent class analysis, as a method for studying categorical data, became a vital topic recently. In chapter 3, we represent the theoretical framework and will examine the application of latent class analysis in chapter 5.

## 3.3 Basic latent class models

In modern times, latent structure analysis is usually called finite mixture models. This model was originally proposed by Lazarsfeld (1950) under the name "latent structure analysis." Although it has been used for more than 50 years, real boost in popularity have been seen over the last decade, due to the tremendous increase in available computing power. And the basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. (Linzer and Lewis, 2010).

A straightforward example given by Lazarsfeld and Henry (1968) could be used to illustrate latent class analysis. A group of 1,000 people are asked whether they have read the last issue of a magazine A and magazine B, their response can be easily presented by means of a fourfold table (see Table 3.2). Leaving aside the statistical significance, there exists a visual association between the two variables clearly.

Table 3.2. Readership of Two Magazines

(reproduced from Lazarsfeld and Henry, 1968)

## Read A

## Do not read A

## Total

## Read B

260

240

500

## Do not read B

140

360

500

## Total

400

600

1000

But someone may argue that this association could be spurious because it might be due to some third factor Y, with which variable A and variable B were both related-such as the education level of the respondents. Thus, we continue to look at the relation between A and B under the presence and absence of variable Y. For example, divide the respondents into two groups, High-Education and Low-Education. Then we have table 3.3.

Table 3.3 Readership of Magazines, Controlling for Education

(reproduced from Lazarsfeld and Henry, 1968)

## High-Ed

## Low-Ed

## Read A

## Do not read A

## Total

## Read A

## Do not read A

## Total

## Read B

240

60

300

20

80

100

## Do not read B

160

40

200

80

320

400

## Total

400

100

500

100

400

500

Obviously the original association has disappeared. More specifically, the condition of accepting education as an explanatory variable is that the cross product in both sub-tables of Table 3.3 to be zero. This cross product, written in terms of probabilities, could be symbolized by [AB], is

[AB] = - = -

(Given numbers a, b, c and d where the table is

a

b

a+b

c

d

c+d

a+c

b+d

a+b+c+d

ad - bc = (a+b+c+d) a - (a+b) (a+c). In the special case where a+b+c+d = 1, we have the probability statement given above.)

For the cross product to be equal to zero, the probability of A and B must be equal to the product of their individual probabilities. So in the reading magazine example,

and

where H denotes high education group and L denotes low education group.

## 3.3.1 Terminology and model definition

Suppose J categorical variables (the manifest variables) are observed, each of which contains possible outcomes, for individuals i = 1...N. These variables may have different numbers of outcomes, hence are indexed by j. The observed values of the J manifest variables are denoted by , such that = 1 if respondent i gives the kth response to the jth variable, or = 0 otherwise, where j = 1 . . . J and k = 1 . . .. Let denote the class-conditional probability that an observation in class r = 1 . . . R produces the kth outcome on the jth variable. Therefore within each class, for each manifest variable, = 1.The R mixing proportions are denoted by , which provide the weights in the weighted sum of the component tables, with = 1. The probability that an individual i in class r produces a particular set of J outcomes on the manifest variables, assuming local independence, is the product

= (3.1)

The probability density function across all classes is the weighted sum

(3.2)

The parameters estimated by the latent class model are and. Given estimates and of and respectively, the posterior probability that each individual belongs to each class, conditional on the observed values of the manifest variables, can be calculated using Bayes' formula:

= (3.3)

Recall that is the estimates of outcome probabilities conditional on class r. It is important to remain aware that the number of independent parameters estimated by the latent class model increases rapidly with R, J, and . Given these values, the number of parameters is . Linzer and Lewis (2010) also discuss that if this number either exceeds the total number of observations, or one fewer than the total number of cells in the cross-classification table of the manifest variables, then the latent class model will be unidentified.

## 3.2.2 Parameter estimation

Parameters could be estimated through maximizing the log-likelihood function, which is

Log L = (3.4)

with respect to and .

However, obtaining maximum likelihood (ML) estimates is lacking for many statistical models. The likelihood function is typically maximized using iterative methods. Pepe and Janes (2007) discuss an analytic expression for the parameter estimates in a 3-test latent class model. Still, iterative methods are needed to find the maximum likelihood estimates of parameters in a latent class model. Recently, the main algorithms used are the Expectation-Maximization algorithm (EM) and the Newton-Raphson method (NR).

## Expectation-Maximization algorithm

Snellman (2008) discuss the EM-algorithm is an iterative method for finding ML estimates of parameters when the data are incomplete. This method had been proposed for many years in special contexts before it was generalized and named by Dempster, Laird and Rubin in 1977.

Dempster et al. (1977) discuss that maximum likelihood from incomplete data could be estimated via the EM Algorithm. And the EM algorithm is also applicable in latent class model estimation because each individual's class membership is unknown and may be treated as missing data (McLachlan and Krishnan 1997).

The EM algorithm has two steps, expectation and maximization, which are interactively processed. Linzer and Lewis (2010) give a straightforward illustration. Start from arbitrary initial values of and, they are denoted by and. Then in the expectation step, the "missing" class membership are calculated using Equation 3.3, substituting in and . After that, in the maximization step, parameters estimation are updated by maximizing the log-likelihood function given these posterior (r|, with

= (3.5)

as the new prior probabilities and

= (3.6)

as the new class-conditional outcome probabilities (Everitt and Hand, 1981). In Equation 3.6, is the vector of length of class-r conditional outcome probabilities for the jth manifest variable; and is the N Ã- matrix of observed outcomes on that variable. The EM algorithm repeats these steps, assigning the new to the old, until theã€€log-likelihood reaches a maximumï¼Ž

## Newton-Raphson method

The Newton method, or the Newton-Raphson (NR) method as it is often called, is also an iterative method. Mccutcheon (2002) discuss that this algorithm begins with a set of initial parameter values. On each iteration, a gradient vector of the log-likelihood function, denoted by g(), and a matrix of second order derivates, called the Hessian matrix and denoted by H, are calculated. The parameter values on the (r + 1) th iteration are calculated using the gradient vector and the inverse of the Hessian matrix from the previous iteration r as follows:

= - * (3.7)

where -* is called the Newton step.

Seber and Wild (2003) do not recommend this unmodified Newton-Raphson method for maximizing the likelihood function for two reasons. If the starting values are not close enough to the final solution, the Newton step may be too long and even decrease the likelihood. Secondly, the Newton-Raphson method does not require the Hessian to be negative definite at each iteration and therefore does not ensure an increase in the likelihood at each Newton step. It is then possible that the method converges to a local minimum. Instead of using the Newton method described above perhaps one should use a modified version of it, which is called the Quasi-Newton method.

Basically, each of these algorithms has advantages and disadvantages. Snellman (2008) discusses that compared to the Newton-Raphson algorithm, EM-algorithm is relatively less sensitive to the choice of starting values. On the other hand, the Newton-Raphson algorithm is faster when close to the maximum. NR algorithm could also generates standard errors for parameter estimates as a by-product by inverting the Hessian matrix. Therefore, several software start the estimation process with the EM-algorithm and switch to Newton-Raphson method when approaching the maximum likelihood estimates. In this way the starting values are not too far from the final solution and the Newton-Raphson method most likely will converge.

Another problem discussed by Mccutcheon (2002) is that the log-likelihood function may have multiple local maxima and both Newton-Raphson and EM algorithm may converge to a local maximum solution instead of the global maximum. The algorithm stops when a maximum is reached, but it cannot distinguish the global maximum from a local maximum. One should repeat the estimation procedure using different starting values to ensure that the same parameter estimates are reached with each of the start values.

## 3.2.3 Model estimation

Compared to cluster analysis, there are more tools available for assessing latent class model fit and determining an appropriate number (denoted by K). Generally speaking, the researcher may begin fitting a complete "independence" model with R = 1, and then iteratively increasing the number of latent classes by one until a suitable fit has been achieved. Adding an additional class to a latent class model will increase the fit of the model, but at the risk of fitting to noise, and at the expense of estimating further unnecessary model parameters. Several criteria seek to strike a balance between over- fitting and under-fitting the model to the data by penalizing the log-likelihood by a function of the number of parameters being estimated. The two most widely used parsimony measures are the Bayesian information criterion or BIC (Schwartz, 1978) and Akaike information criterion, or AIC (Akaike, 1973). Let represents the maximum log-likelihood of the model and represents the total number of estimated parameters. Then,

AIC = -2 + 2 (3.10)

And

BIC = -2 + ln N (3.11)

The BIC will usually be more appropriate for basic latent class models because of their relative simplicity (Forster, 2000).

## 3.3 Latent class regression models

Another useful latent variable method is latent class regression models, also known as the latent class segmentation models. It generalizes the basic latent class model by permitting the inclusion of covariates to predict individuals' latent class membership (Hagenaars and McCutcheon, 2002). It is used to predict a dependent variable as a function of predictors, including an R-category latent variable, each category representing a homogeneous population (class, segment). In latent class regression analysis, different regressions are estimated for each population (for each latent segment), and cases are classified into segments and develops regression models simultaneously.

Compared to traditional regression models, latent class regression has several advantages. First, statistical criteria are available to determine the value for the number of classes, which is important in model diagnosing procedure. Second, Vermunt and Magidson (2005) discuss that this method relax the traditional assumption that the same model holds for all cases, in other words, only one class is in the traditional regression model. It allows the development of separate regressions to be used to target each segment. On the other hand, compared to basic latent class model, latent class regression model include covariates in the model thus help to classify each case into the most likely class.

Linzer and Lewis (2010) discuss that latent regression model is a so-called "one-step" technique for estimating the effects of covariates, because the coefficients on the covariates are estimated as part of the latent class model. Alternative estimating procedure being used is called the "three-step" model: estimate the basic latent class model, calculate the predicted posterior class membership probabilities using Equation 3.3, and then use these values as the dependent variable(s) in a regression model with the desired covariates. These two methods have been compared by several authors. Bolck et al. (2004) discuss that, the three-step procedure produces biased coefficient estimates. Therefore, it is preferable to estimate the entire latent class regression model all at once.

## 3.3.1 Terminology and model definition

Denote the mixing proportions in the latent class regression model as. It is still the case that = 1 for each individual. Then a logit link function for the effects of the covariates on the priors was introduced (Agresti, 2002). Again, let represent the observed covariates for individual i. A latent class was selected arbitrarily, as a "reference" class. Assume that the log-odds of the latent class membership priors with respect to this "reference" class are linear functions of the covariates. Let denote the vector of coefficients corresponding to the rth latent class. Because the first class is used as the reference, = 0 is fixed by definition. Then,

Ln ( =

Ln ( =

## â€¦

Ln ( = (3.12)

Following some simple algebra, these equations generates the general result that

= (; ) = (3.13)

The parameters estimated by the latent class regression model are the coefficients and, the class-conditional outcome probabilities from basic latent class model. When these parameters are estimated as and, the posterior class membership could be calculated using Equation 3.3 and Equation 3.13:

= (3.14)

## 3.3.2 Application in marketing research

Vermunt and Magidson (2005) discuss that typical marketing applications of latent regression analysis include three main categories. The first is customer satisfaction studies which identify particular determinants of customer satisfaction that are appropriate for each customer segment. Second, conjoint studies could be implemented through latent class regression analysis. It identifies the mix of product attributes that appeal to different market segments. The third one is more generally usage in marketing research. Identify segments that differ from each other with respect to some dependent variable criterion. In chapter 5 of this paper, an alternative application in marketing research, buyer behaviour, was introduced. Customer's attitudes toward a product are fitted by basic latent class model and covariates of demographic variables are considered in the latent class regression model. This trial provides a new approach for using latent class model in marketing research.

As an extension of the basic latent class model, latent class regression model could incorporate "other" variables into latent class analysis, such as back ground characteristics. Thus enhance one's insight into the meaning of the classifications discovered. This is extremely important in social sciences because accounting demographic variables definitely contribute the value of results. Therefore, latent class regression analysis is a useful model which worth further studying.