College of Natural Sciences
 
FAQs
This is for IE7 to hold div open

AMOS FAQ #5: Handling Missing Data using AMOS

Question:

I am using AMOS to perform a structural equation modeling analysis of a database that has missing data. How does AMOS deal with missing data? Can I manually replicate the way AMOS handles missing data?

Answer:

This FAQ assumes that you understand the assumptions of structural equation models (SEM) and can specify and test SEMs using AMOS. If not, see our AMOS tutorial.

AMOS uses a procedure known as Full Information Maximum Likelihood (FIML, also known as "Raw Maximum Likelihood") to handle missing data. A number of investigators have shown that FIML outperforms most common methods of handling missing data, including listwise and pairwise data deletion, mean substitution, and the Similar Response Pattern Imputation (SRPI) procedure implemented in LISREL 8.30 and higher (Joreskog & Sorbom, 1993). A recent article by Enders and Bandalos (in press) compares various methods of handling missing data in the structural equation modeling context. More general discussion on missing data handling methods can be found in General FAQ #25: Handling missing or incomplete data. This FAQ assumes you are familiar with the contents of General FAQ #25, including the assumptions underlying FIML, most notably that model residuals are normally distributed, that the fitted model is correct, and that data are missing at random (MAR). This FAQ also assumes that you are familiar with basic model specification in AMOS and know how to perform a multiple group analysis using AMOS. If you do not know how to perform a multiple group analysis using AMOS, see AMOS FAQ #3: Multiple group analysis.

Data analysts typically ask two types of questions when they fit structural equation models to data. The first question is: "Does the specified model fit the data on a global basis?" This question is usually addressed using a chi-square test of overall model fit. The second question is a follow-up to the first question, assuming the model cannot be rejected by the chi-square test of overall model fit: "What are the parameter estimates?"

To address the first question, that of overall model fit to the data, AMOS minimizes the discrepancy fit function defined by the available data points for each individual data point to obtain a log likelihood value for each data point. These individual log likelihood values are then summed to form an overall log likelihood value for the whole sample. The formula for this function, along with a brief description of the minimization process, can be found on line at: http://www.smallwaters.com/whitepapers/longmiss/Longitudinal and multi-group modeling with missing data.pdf.

AMOS performs this operation for two distinct models: the first model is a saturated model in which the number of estimated parameters is equal to the number of known inputs: means, variances, and covariances. For example, if you had a database containing four measured variables, it would have four variances, four means, and six covariances - 14 inputs. The saturated model would estimate the values of each of the aforementioned quantities.

The second model is the model you have specified, your structural equation model of interest. AMOS then computes the difference between the log likelihood values for the two models. This difference can be interpreted as a chi-square test with degrees of freedom equal to the difference between the two models' degrees of freedom. Since AMOS computes the log likelihood values for the two models, the difference between the log likelihood values, and the appropriate degrees of freedom, the chi-square test output by AMOS thus properly tests the overall goodness of fit of your proposed model.

To compute appropriate parameter estimates and standard errors, AMOS uses a variant of the Muthén, Hollis, & Kaplan (1987) multiple group structural equation modeling approach in which each distinct pattern of missing data is treated as a separate group in a multiple group structural equation model. By establishing equality constraints such that the estimates of the variances, covariances, means, and intercepts are the same across the different patterns of missing data, AMOS can estimate appropriate parameter estimates and standard errors.

An example may help to clarify how this process works. Suppose you have two variables, X and Y, and their sample data that appear below.

X            Y
100.00 93.00
98.00  89.00
90.00  75.00
88.00  66.00
86.00  54.00
84.00  35.00
83.00  80.00
78.00  55.00
77.00  70.00
75.00  53.00
70.00  25.00
65.00  55.00
64.00  78.00
62.00  65.00
55.00  88.00
51.00  35.00
49.00  40.00
49.00  20.00
48.00  35.00
47.00  88.00
38.00   5.00
35.00  25.00
33.00  50.00
21.00  70.00
19.00  12.00
17.00   6.00
12.00  35.00
 7.00  64.00
 6.00  20.00
 5.00   8.00

The mean for X is 53.73; its variance is 830.60. The mean for Y is 49.80 and its variance is 711.23. The covariance between X and Y is 448.41 (Note: All computations in this example assume division by N rather than N-1 because the methods described here are intended for computation of asymptotic statistics).

Suppose you now delete the first 10 cases of variable Y and fit the model using AMOS with FIML missing data handling activated. Under this scenario, the statistics for X remain unchanged, yet the mean of Y becomes 49.00; its variance is now 790.50. The covariance between X and Y is now 402.79. This model is saturated: there are an equal number of parameters estimated as there are known inputs to the analysis, so the chi-square test of model fit output by AMOS is 0.

You can set up a second model that imposes several restrictions on the model described above so that you obtain a non-saturated model. Suppose you want to test whether the mean of X is equal to the mean of Y and that the variance of X is equal to the variance of Y, simultaneously. Since two constraints are imposed on the saturated model, the resulting chi-square goodness of fit test for the second model has two degrees of freedom. The chi-square test value output by AMOS is .439 with a p-value of .803. The mean value for X and Y is 52.43 and the variance value is 849.65. Their covariance value is now 471.08.

So far, the numbers obtained above were generated using AMOS's FIML algorithm. For this example, it is possible to obtain the same numbers using the approach documented by Muthén et al. (1987). To do this, perform the following steps:

1. Split the original data file shown above into two distinct data files. The first data file contains the last twenty cases from the original sample that have complete data for both X and Y. The second data file contains the first ten cases with observed values for X and missing data points for Y. Note that both databases have X and Y variables so that AMOS recognizes that Y has missing data for the first ten cases.

2. In AMOS, set up a multiple group analysis. For the first group (n = 20), draw the model as usual: Include a rectangle for X and a rectangle for Y, allow their means and variances to be freely estimated, and draw a double-headed covariance matrix connecting the X and Y rectangles. Before you define the second group, select View/Set, then Interface Properties, then click on the Misc tab. Select the radio button labeled Allow different path diagrams for different groups. When you click the OK button, AMOS will flash a warning message notifying you that once you make this change you cannot delete a second group. Accept this warning.

3. Define a second group, then draw a single rectangle and name it X. Link each group to its respective database and give each mean and variance a unique name. For example, for group 1, the mean value for X might be named "mean_x1" and the variance value could be named "var_x1" whereas group 2's mean for X could be "mean_x2" and its variance could be called "var_x2".

4. Define a second model that is nested under the model you just defined. In this new model, constrain the mean of X for group 1 to be equal to the mean of X for group 2. Similarly, constrain the variance of X for group 1 to be equal to the variance of X for group 2. This model is equivalent to the saturated baseline model that AMOS fits using the FIML algorithm. Notice that when you fit this model, you obtain the same estimates for the means and variances of X and Y as you did under the saturated FIML model: The mean of X is 53.73 while the variance is 830.60. The mean of Y is 49.00 and its variance is 790.50. The covariance of X and Y is 402.79. Unlike the FIML model, however, AMOS outputs a chi-square value of 35.888 with 2 degrees of freedom for this model.

5. Define a third model that is nested under the second model described in the previous paragraph. For this model, set the mean of X equal to the mean of Y, and also set the variance of X equal to the variance of Y in the first group. The common mean value of X and Y for this model is 52.43 while the shared variance value is 849.65. The covariance between X and Y is estimated at 471.08. The chi-square model fit statistic for this model is 36.312 with 4 degrees of freedom.

The nested model comparison of the third to the second model tests whether the mean of X is equal to the mean of Y and that the variance of X is equal to the variance of Y, simultaneously. The chi-square difference value is .424 with 2 degrees of freedom. Although the parameter estimate values from the Muthén et al. (1987) method are identical to those produced by AMOS automatically under FIML missing data handling, the chi-square values are slightly different due to how the chi-square values are computed. The FIML chi-square value is .439 rather than .424. Multiplying the Muthén et al. (1987) method chi-square value by (N/(N-1)) yields a rough approximation of the FIML chi-square: (30/(30-1))*.424 approximates .439.

For more information on missing data handling methods in the structural equation modeling context, see the following publications:

Arbuckle, J., & Wothke, W. AMOS 4.0 User's Guide. Chicago: Smallwaters Corporation.

Enders, C.K., & Bandalos, D.L. (2001). The relative performance of full information maximum likelihood estimates for missing data in structural equation models. Structural Equation Modeling.

Joreskog, K.G., & Sorbom, D.L. (1993). PRELIS 2 user's reference guide. Chicago: Scientific Software International.

Muthén, B., Kaplan, D. and Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431-462

If you have further questions, send E-mail to stats@ssc.utexas.edu.