Mplus for Windows: An Introduction
This document introduces you to Mplus for Windows. It is primarily
aimed at first time users of Mplus who have prior experience
with either
exploratory factor analysis (EFA), or
confirmatory
factor analysis (CFA) and
structural equation modeling
(SEM). The document is organized into six sections. The first
section provides a brief introduction to Mplus and describes
how to obtain access to Mplus. The second section briefly reviews
SEM assumptions and describes important and useful model fitting
features that are unique to Mplus. The third section describes
how to get started with Mplus, how to read data from an external
data file, and how to obtain descriptive sample statistics.
The fourth section explains how to fit exploratoy factor analysis
models for continuous and categorical outcomes using Mplus.
The fifth section of this document demonstrates how you can
use Mplus to test confirmatory factor analysis and structural
equation models. The sixth section presents examples of two
advanced models available in Mplus: multiple group analysis
and multilevel SEM. By the end of the course you should be able
to fit EFA and CFA/SEM models using Mplus. You will also gain
an appreciation for the types of research questions well-suited
to Mplus and some of its unique features.
2. Introduction to EFA, CFA, SEM
and Mplus
Exploratory factor analysis (EFA) is a method of data
reduction in which you may infer the presence of latent factors
that are responsible for shared variation in multiple measured
or observed variables. In EFA each observed variable in the
analysis may be related to each latent factor contained in the
analysis. By contrast,
confirmatory factor analysis (CFA)
allows you to stipulate which latent factor is related to any
given observed variable.
Structural equation modeling
(SEM) is a more general form of CFA in which latent factors
may be regressed onto each other. Mplus can fit EFA, CFA, and
SEM models.
To effectively use and understand the course material, you should
already know how to conduct a multiple linear regression analysis
and compute descriptive statistics such as frequency tables
using SAS, SPSS, or a similar general statistical software package.
You should also understand how to interpret the output from
a multiple linear regression analysis. This document also assumes
that you are familiar with the statistical assumptions of EFA,
CFA, and SEM, and you are comfortable using syntax-based software
programs such as SAS. If you do not have prior experience with
exploratory factor analysis, see the usage note
Factor
Analysis Using SAS PROC FACTOR . If you do not have experience
with CFA or SEM, see our
AMOS
tutorial for more information about SEM. Finally, you should
understand basic Microsoft Windows navigation operations: opening
files and folders, saving your work, recalling previously saved
work, etc.
3. Accessing Mplus
You may access Mplus in one of three ways:
- License a copy from Muthén
& Muthén for your own personal computer.
- Mplus is available to faculty, students, and staff at
the University of Texas at Austin via the STATS Windows
terminal server. To use the terminal server, you must obtain
an ITS computer account (an IF or departmental account)
and then validate the account for Windows NT Services. You
then download and configure client software that enables
your PC, Macintosh, or UNIX workstation to connect to the
terminal server. Finally, you connect to the server and
launch Mplus by double-clicking on the Mplus for Windows
program icon located in the STATS terminal server program
group. Details on how to obtain an ITS computer account,
account use charges, and downloading client software and
configuration instructions may be found in General
FAQ #36: Connecting to published statistical and mathematical applications
on the ITS Windows Terminal Server.
- Download the free student version of Mplus from the Muthén
& MuthénWeb
site for your own personal computer. If your models
of interest are small, the free demonstration version may
be sufficient to meet your needs. For larger models, you
will need to purchase your own copy of Mplus or access the
ITS shared copy of the software through the campus network.
The latter option is typically more cost effective, particularly
if you decide to access the other software programs available
on the server (e.g., SAS, SPSS, AMOS, etc.).
4. Getting Help with Mplus
If you have difficulties accessing Mplus on the
Windows
Terminal Server, call the ITS helpdesk at 512-475-9400 or send
e-mail to help@its.utexas.edu.
If you are able to log in to the
Windows Terminal Server
and run Mplus, but have questions about how to use Mplus or
interpret output, call the ITS helpdesk at 512-475-9400 to schedule an appointment
with an SSC statistical consultant or
send e-mail to stats@ssc.utexas.edu.
Important note: Both services are available to University
of Texas faculty, students, and staff only. See our Web
site at http://ssc.utexas.edu/consulting/free_consulting.html for more details about consulting services, as well as
frequently
asked questions and answers about EFA, CFA/SEM, Mplus, and other
topics. Non-UT and UT Mplus users will find the Muthén
& MuthénWeb
site to be a useful resource; see the Mplus
Discussion forum for frequently-asked questions and answers.
You may also post your own questions in this forum.
The Mplus User's Guide is available for check out from
the PCL general circulation desk. Alternatively, you may order
a copy from the Muthén
& Muthén Web
site.
Section 2: Latent Variable Modeling
using Mplus
1. Overview
of SEM Assumptions for Continuous Outcome Data
Before specifying and running a latent variable models, you
should give some thought to the assumptions underlying latent
variable modeling with continuous outcome variables. Several
of these assumptions are shown below:
- A theoretical basis for model specification
- A reasonable sample size
- Identified model equations
- Complete data or appropriate handling of incomplete data
- Continuously and normally distributed endogenous variables
These assumptions apply equally to all EFA and CFA/SEM software
programs. The details of these assumptions can be found in our
AMOS
tutorial, but they may be summarized as follows: Recommendations
for sample size vary depending upon the complexity of the specified
model, but typical figures range from 5 to 15 cases per estimated
parameter with overall sample size preferred to exceed
N
= 200 cases. Furthermore, any model you consider should have
a theoretical basis, and substantive inferences should be drawn
based upon your ability to rule out alternative explanations
for findings, rather than on statistical considerations alone.
Like AMOS, Mplus features Full Information Maximum Likelihood
(FIML) handling of missing data, an appropriate, modern method
of missing data handling that enables Mplus to make use of all
available data points, even for cases with some missing responses.
For more details on missing data handling methods, including
FIML, see
General
FAQ #25: Handling missing or incomplete data and
AMOS
FAQ #5: Handling Missing Data using AMOS. One added missing
data handling feature that is unique to Mplus is its ability
to generate model modification indices for databases that are
incomplete.
2.
Categorical Outcomes and Categorical Latent Variables
Where Mplus diverges from most other SEM software packages is
in its ability to fit latent variable models to databases that
contain ordinal or dichotomous outcome variables. Note that
Mplus will not yet fit models to databases with nominal outcome
variables that contain more than two levels. Nonetheless, the
ability to fit models to variables that contain ordinal and
dichotomous categorical outcome variables is very useful. Furthermore,
Mplus will fit
latent class analysis (LCA) models that
contain categorical latent variables and fit
mixture models
that generate expected classifications of observations based
upon the characteristics of your specified model.
Should you use Mplus to perform EFA, CFA, and SEM analyses on
your data? In order to facilitate rapid access to both simple
and complex latent variable models, the Mplus developers have
built a streamlined set of data import and model specification
commands. All Mplus commands are specified using command syntax,
though a syntax generator is under development at the time of
this writing. If you are not comfortable with reading data and
specifying statistical models using command syntax, Mplus may
not be the optimal choice for you. On the other hand, if you
prefer to work with command syntax when you use statistical
software programs or you do not mind learning software syntax
to perform data analysis, you will probably find it useful to
learn Mplus. This is particularly true when you consider some
of the features unique to Mplus:
- The ability to build models with dichotomous and ordered
categorical outcome variables
- The capacity to build models that contain categorical
latent variables
- Optimal full information maximum likelihood (FIML) missing
data handling for both exploratory as well as CFA and SEM
models
- Modification index output, even when you invoke FIML
missing data handling
- The ability to fit multilevel or hierarchical CFA and
SEM models
Section 3: Using Mplus
1. Launching Mplus
If you are using a personal or demonstration copy of Mplus,
locate the
Mplus entry in the
Program Files subsection
of the Microsoft Windows
Start menu.. If you are using
the STATS Windows terminal server, locate the
Mplus for Windows
icon in the Citrix Program Neighborhood and double-click on
it to launch Mplus. You will then be prompted to enter your
Windows NT Services account name, your password, and the domain
name, WNT. After you have entered this information, click
OK
to launch Mplus. Once you have launched Mplus, you will see
the following window appear on your computer's desktop:
2. The Input
and Output Windows
The window shown above is the
input window. You write
Mplus syntax in this window to read the data to be analyzed
and to specify your model of interest. You then save your Mplus
syntax and select
Run Mplus from the
Mplus
menu to submit your syntax to the Mplus engine for processing:
Mplus
Run Mplus
Note: If you are using Mplus on the STATS terminal server,
do not save your work to the default Mplus directory. Instead,
save your work on one of the client disk drives or your allocated
WNTDISK server space. The server space is mounted as drive U
and has the advantage of being available to you whenever you
log in to the terminal server. There is, however, a nominal
fee associated with using this space for file storage. You may
also save your files on a local disk drive. Each local drive
is preceded by a $ (e.g., $C:) in the list of available disk
drives shown in the
Save As menu option in the
File menu.
Once Mplus has finished processing your command syntax, it replaces
the input window with the
output window. The output window
first displays your Mplus syntax. Below the Mplus syntax are
the Mplus model results. If there is an error in your Mplus
syntax or you want to modify your Mplus syntax in any way (e.g.,
to fit a different model to the data), you must return to the
appropriate command file by selecting that file's name from
the
File menu's list of recently-accessed files.
That action returns the input window's contents to the screen
and you can then modify the previous commands, save the modified
command file, and run Mplus once again to obtain new output.
3.
Reading Data and Outputting Sample Statistics
After you have launched Mplus, you may build a command file.
There are nine Mplus commands:
TITLE,
DATA (required),
VARIABLE (required),
DEFINE,
SAVEDATA,
ANALYSIS,
MODEL,
OUTPUT, and
MONTECARLO.
The most commonly used Mplus commands are described in this
document. According to the
Mplus User's Guide, "The Mplus
commands may come in any order. The
DATA and
VARIABLE
commands are required for all analyses. All commands must begin
on a new line and must be followed by a colon. Semicolons separate
command options. There can be more than one option per line.
The records in the input setup must be no longer than 80 columns.
They can contain upper and/or lower case letters and tabs."
(page 1).
A description of the Mplus defaults appears in
Mplus
FAQ #3: Mplus Defaults. You should review these defaults
carefully and be sure that you understand them fully prior to
analyzing data with Mplus.
The first Mplus syntax to appear in the command file is typically
a
TITLE command. The
TITLE command allows you
to specify a title that Mplus will print on each page of the
output file.
Following the
TITLE command is the
DATA command.
The
DATA command specifies where Mplus will locate the
data, the format of the data, and the names of variables. At
present, Mplus will read the following file formats: tab-delimited
text, space-delimited text, and comma-delimited text. The input
data file may contain records in free field format or fixed
format. If you are using data stored in another form (e.g.,
SAS, SPSS, or Excel), you will need to convert it to one of
the formats with which Mplus can work before you read it into
Mplus. See our
FAQs
for information on how to convert common statistical data file
formats to plain, comma-delimited, or tab-delimited text files.
The next command is the
VARIABLE command. The
VARIABLE
command names the columns of data that Mplus reads using the
DATA command.
Following the
VARIABLE command is the
ANALYSIS
command. The
ANALYSIS command tells Mplus what type of
analysis to perform. Many analysis options are available; a
number of these are shown in the examples that appear in this
document.
Consider the following example database: In 1939 Karl Holzinger
and Francis Swineford administered 26 aptitude tests to 145
students in the Grant-White School. Of the 26 tests, six are
used here: visual perception, cubes, lozenges, paragraph comprehension,
sentence completion, and word meaning. An additional variable,
gender, is included in the database, but not used in this example.
This database is available in SPSS format as one of the example
datasets used by the AMOS SEM software package. AMOS and its
example program files and datasets are available on the STATS
terminal server; a free student version of AMOS containing this
database may be downloaded from the
Smallwaters
Corporation Web site. The SPSS file's name is
grant.sav.
You can download this file in tab-delimited text format as
grant.dat.
Then you can write the following Mplus syntax to read the data
from the file.
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS U:\Projects\Documentation\Mplus\grant.dat ;
FORMAT IS free ;
VARIABLE:
NAMES ARE visperc
cubes
lozenges
paragrap
sentence
wordmean
gender ;
USEVARIABLES ARE visperc
cubes
lozenges
paragrap
sentence
wordmean ;
ANALYSIS:
TYPE = basic ;
In this sample program, the
DATA command uses the
FILE
subcommand to tell Mplus where to locate the relevant data file.
In this case, the file's location is U:\Projects\Documentation\Mplus\grant.dat.
The
FORMAT subcommand uses the default
free
option to let Mplus know that the data points appear in order
in the data file with the data points separated by commas, tabs,
or spaces. Alternatively, you can use FORTRAN format statements
to read data when data are in fixed columns. FORTRAN-formatted
input is recommended for large databases because it is more
efficient than the default
free data field input;
see the Mplus manual for a detailed description of how to specify
FORTRAN input formats.
The next command shown is the
VARIABLE command. The
VARIABLE
command uses the
NAMES subcommand to list the variables
contained in the Grant-White database. While it is possible
to have more than one variable name on a row of the command
file, this example lists the variables with one variable per
line becuase the appearance of the variable names in the command
file is easy to read. Becuase Mplus allows variable names to
have a maximum width of eight characters, the variable name
"paragraph" is shortened to
paragrap.
Following the
NAMES subcommand is the
USEVARIABLES
subcommand.
USEVARIABLES enables you to specify a
particular subset of variables to be used in the data analysis.
A similar subcommand,
USEOBS, allows you to select subsets
of cases to be used in a particular analysis. For example, if
you wanted to limit the analysis to female participants, you
could include the subcommand
USEOBS gender EQ 1 ;
where a
gender value of 1 designated female cases in
the database.
The
ANALYSIS command specifies the
TYPE of analysis
to be performed by Mplus. In this example the type is
basic.
The basic model type does not have Mplus fit any model to the
sample data; instead Mplus will compute sample statistics only.
Using basic as the analysis type is useful during the intial
phase of building your command file because you can use the
Mplus sample statistics output to compare Mplus results to results
you obtained using SAS, SPSS, Excel, or other statistical software
programs to verify that Mplus is reading your input data correctly.
It is worth noting that Mplus has many default settings that
enable you to write compact syntax, which results in brief command
files. Once you understand the Mplus defaults fully, you may
take advantage of them to write shorter command files. For instance,
the first example shown above may be simplified:
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS U:\Projects\Documentation\Mplus\grant.dat ;
VARIABLE:
NAMES ARE visperc
cubes
lozenges
paragrap
sentence
wordmean
gender ;
USEVARIABLES ARE visperc - wordmean ;
ANALYSIS:
TYPE = basic ;
The
FORMAT is free statement has been omitted because
the default format is free-field data input. The
USEVARIABLES
statement also shows a handy Mplus feature, the variable list
option. The variable list option enables you to conveniently
refer to a list of variables using a dash to separate the first
and last variables in the contiguous series of variables.
The output from the basic analysis appears below. Although Mplus
initially returns a copy of the input command file, that portion
of the output has been omitted here in the interest of saving
space.
SUMMARY OF ANALYSIS
Mplus VERSION 1.04
PAGE 2
Holzinger and Swineford Grant-White School Summary Statistics
Number of groups
1
Number of observations
145
Number of y-variables
6
Number of x-variables
0
Number of continuous latent variables
0
Observed variables in the analysis
VISPERC CUBES
LOZENGES PARAGRAP SENTENCE
WORDMEAN
Estimator
ML
Maximum number of iterations
1000
Convergence criterion
.500D-04
Input data file(s)
U:\Projects\Documentation\Mplus\grant.dat
Input data format FREE
RESULTS FOR BASIC ANALYSIS
SAMPLE STATISTICS
Means/Intercepts/Thresholds
1
2
3
4
5
________ ________
________ ________
________
1
29.579 24.800
15.966 9.952
18.848
Means/Intercepts/Thresholds
6
________
1
17.283
Covariances/Correlations/Residual Correlations
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC 47.801
CUBES
10.012 19.758
LOZENGES 25.798
15.417 69.172
PARAGRAP 7.973
3.421 9.207
11.393
SENTENCE 9.936
3.296 11.092
11.277 21.616
WORDMEAN 17.425
6.876 22.954
19.167 25.321
Covariances/Correlations/Residual Correlations
WORDMEAN
________
WORDMEAN 63.163
Mplus initially identifies the number of groups and observations
in the analysis, followed by the number of X (predictor) and
Y (outcome) variables and the sample (input) covariances, variances,
and means. Once you have verified that these values are correct,
you can turn your attention to fitting your model(s) of interest.
The next section continues with the same example database, but
describes how to perform an exploratory factor analysis of the
continuous variables in the Grant-White database using Mplus.
Section 4: Exploratory Factor Analysis
1. Exploratory Factor Analysis with Continuous Variables
Once you have read the data into Mplus and verified that the
sample statistics show that the data have been read correctly,
you can perform exploratory factor analysis using Mplus by altering
the
ANALYSIS command as follows:
ANALYSIS:
TYPE = efa 1 2 ;
ESTIMATOR = ml ;
This syntax instructs Mplus to perform an exploratory factor
analysis of the Grant-White database.
Efa tells
Mplus to perform an exploratory factor analysis. The 1 and 2
following the
efa specification tells Mplus to
generate all possible factor solutions between and including
1 and 2. In this instance, one and two factor solutions will
be produced by the analysis. Finally, the
ESTIMATOR = ml
option has Mplus use the maximum likelihood estimator to
perform the factor analysis and compute a chi-square goodness
of fit test that the number of hypothesized factors is sufficient
to account for the correlations among the six variables in the
analysis. This optional specification overrides the default
unweighted least-square (
uls) estimator.
If your data are not joint multivariate normally distributed,
you may want to replace the
ml with either the
mlm or
mlmv estimators. One useful
feature of Mplus is its ability to handle non-normal input data.
Recall that the default
ml estimator assumes that
the input data are distributed joint multivariate normal. If
you have reason to believe that this assumption has not been
met and your sample is reasonably large (e.g.,
N = 200),
you may substitute
mlm or
mlmv in
place of
ml on the
ESTIMATOR = line. The
mlm option provides a mean-adjusted chi-square
model test statistic whereas the
mlmv option produces
a mean and variance adjusted chi-square test of model fit. SEM
users who are familiar with Bentler's EQS software program should
also note that the
mlm chi-square test and standard
errors are equivalent to those produced by EQS in its
ML;ROBUST
method.
You may also add the
OUTPUT command following the
ANALYSIS
command. The
OUTPUT command is used to specify optional
output. For this example the keyword
sampstat
tells Mplus to include sample statistics as part of its printed
output.
OUTPUT: sampstat
;
Mplus produces the sample correlations, eigenvalues, and the
chi-square test of the one factor model to the sample data.
As you can see from the results, shown below, the chi-square
test is statistically significant, so the null hypothesis that
a single factor fits the data is rejected; more factors are
required to obtain a non-significant chi-square. Since the chi-square
test is sensitive to sample size (such that large samples often
return statistically significant chi-square values) and non-normality
in the input variables, Mplus also provides the
Root Mean
Square Error of Approximation (
RMSEA) statistic.
The RMSEA is not as sensitive to large sample sizes. According
to Hu and Bentler (1999), RMSEA values below .06 indicate satisfactory
model fit. The RMSEA yielded a result of .162, which was consistent
with the chi-square result in suggesting that the one factor
model does not fit the data adequately.
CONTINUOUS VARIABLE CORRELATION MATRIX
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC
CUBES
.326
LOZENGES
.449 .417
PARAGRAP
.342 .228
.328
SENTENCE
.309 .159
.287 .719
WORDMEAN
.317 .195
.347 .714
.685
Grant-White School: Exploratory Factor Analysis
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
1
2
3
4
5
________ ________
________ ________
________
1
3.009 1.225
.656 .530
.311
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
6
________
1
.270
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
CHI-SQUARE VALUE
43.241
DEGREES OF FREEDOM
9
PROBABILITY VALUE
.0000
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .162 ( .115
.212)
PROBABILITY RMSEA LE .05 IS .000
Mplus next produces the estimated factor loadings and error
variances. Notice that the
visperc,
cubes, and
lozenges factor loadings are low relative to the other
factor loadings displayed below. See the document
Factor Analysis using SAS PROC
FACTOR for more information on interpreting factor loadings.
ESTIMATED FACTOR LOADINGS
1
________
VISPERC
.415
CUBES
.272
LOZENGES
.415
PARAGRAP
.865
SENTENCE
.818
WORDMEAN
.827
ESTIMATED ERROR VARIANCES
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
.828 .926
.828 .252
.330
________
1
.316
The estimated correlation matrix is the correlation matrix reproduced
by Mplus under the assumption that a single factor is sufficient
to explain the sample correlations. From the model fit results
shown above, this is not the case, so it is not surprising that
this implied or model-based correlation matrix differs substantially
from the sample correlation matrix reported above.
ESTIMATED CORRELATION MATRIX
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC
1.000
CUBES
.113 1.000
LOZENGES
.172 .113
1.000
PARAGRAP
.359 .235
.359 1.000
SENTENCE
.339 .223
.340 .708
1.000
WORDMEAN
.343 .225
.343 .715
.677
WORDMEAN
________
WORDMEAN 1.000
The residuals matrix represents the difference between the sample
correlation matrix and the implied correlation matrix. As noted
above, since the model did not fit the observed data particularly
well, there are some values in this matrix that are non-trivial
in size. In particular, the
cubes-visperc,
lozenges-visperc,
and
lozenges-cubes residual values are high relative
to the other values in the matrix.
RESIDUALS OBSERVED-EXPECTED
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC
.000
CUBES
.213 .000
LOZENGES
.276 .304
.000
PARAGRAP -.017
-.007 -.031
.000
SENTENCE -.030
-.063 -.053
.011 .000
WORDMEAN -.026
-.030
.004 .000
.009
RESIDUALS OBSERVED-EXPECTED
WORDMEAN
________
WORDMEAN
.000
The Root Mean Square Residual (RMR) is another descriptive model
fit statistic. According to Hu and Bentler (1999), RMR values
should be below .08 with lower values indicating better model
fit. The value of .1225 shown below for the one factor solution
indicates unacceptably poor model fit.
ROOT MEAN SQUARE RESIDUAL IS
.1225
In short, the one factor solution was a poor fit to the data.
In particular, the model did not account well for the correlations
among the
visperc,
cubes, and
lozenges
variables. What about the two factor solution? Mplus reports
the two factor solution following the single factor model. The
chi-square test of model fit is non-significant, indicating
that the null hypothesis that the model fits the data cannot
be rejected (the model fits the data well). This finding is
corroborated by the RMSEA: Its estimate is zero; it's 90% confidence
interval has an upper bound value of .055, which is below the
Hu and Bentler (1999) recommended cutoff value of .06. The RMSEA
estimate and its upper bound confidence interval value should
both fall below .06 to ensure satisfactory model fit.
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
CHI-SQUARE VALUE
1.079
DEGREES OF FREEDOM
4
PROBABILITY VALUE
.8976
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .000 ( .000
.055)
PROBABILITY RMSEA LE .05 IS .944
For exploratory factor analysis solutions with two or more factors,
Mplus reports
varimax rotated loadings and
promax
rotated loadings.Varimax loadings assume the two factors
are uncorrelated whereas promax loadings allow the factors to
be correlated. Directly below the promax loadings is the factor
intercorrelatrion matrix.
In this example the two factors are correlated .480. With even
a modest correlation among the two factors, you should choose
to interpret the promax rotated loadings. The loadings show
that the
visperc,
cubes, and
lozenges variables
load onto the first factor whereas the remaining variables load
onto the second factor.
VARIMAX ROTATED LOADINGS
1
2
________ ________
VISPERC
.547 .250
CUBES
.550 .092
LOZENGES
.728 .196
PARAGRAP
.241 .830
SENTENCE
.174 .816
WORDMEAN
.247 .788
PROMAX ROTATED LOADINGS
1
2
________ ________
VISPERC
.540 .112
CUBES
.585 -.063
LOZENGES
.755 -.001
PARAGRAP
.046 .841
SENTENCE -.025
.846
WORDMEAN
.063 .794
PROMAX FACTOR CORRELATIONS
1
2
________ ________
1
1.000
2
.480 1.000
Mplus next reports estimated error variances for each observed
variable, the estimated correlation matrix, and the residual
correlation matrix. Notice that unlike the preceding one factor
solution, this dual factor solution's estimated correlation
matrix is very close in value to the original sample correlation
matrix. Accordingly, the residual correlation matrix has all
values close to zero and the RMR value of .0092 is well below
the Hu and Bentler (1999) recommended cutoff of .08.
ESTIMATED ERROR VARIANCES
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
1
.638 .689
.431 .253
.304
ESTIMATED ERROR VARIANCES
WORDMEAN
________
1
.318
ESTIMATED CORRELATION MATRIX
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC
1.000
CUBES
.324 1.000
LOZENGES
.448 .419
1.000
PARAGRAP
.339 .209
.338 1.000
SENTENCE
.299 .170
.286 .719
1.000
WORDMEAN
.332 .208
.334 .714
.686
ESTIMATED CORRELATION MATRIX
WORDMEAN
________
WORDMEAN 1.000
RESIDUALS OBSERVED-EXPECTED
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
_______ ________
________ ________
________
VISPERC
.000
CUBES
.002 .000
LOZENGES
.001 -.002
.000
PARAGRAP
.002 .019
-.010
.000
SENTENCE
.010 -.011
.000 .000
.000
WORDMEAN -.015
-.013
.013 .001
-.001
RESIDUALS OBSERVED-EXPECTED
WORDMEAN
________
WORDMEAN
.000
ROOT MEAN SQUARE RESIDUAL IS
.0092
This example assumes that the Grant-White database is complete.
In other words, there are no missing cases in the Grant-White
database. What if some cases had missing values? Often databases
have cases with incomplete data. The next section describes
a feature unique to Mplus: exploratory factor analysis of a
database with incomplete cases.
2.
Exploratory Factor Analysis with Missing Data
Suppose you altered the Grant-White database so that cases with
visperc scores that exceed 34 have missing
cubes
scores and that cases with
wordmean scores of 10 or below
have missing
sentence values. In this instance the missing
cubes and setence completion data are said to be
missing
at random (MAR) because the patterns of missing data are
explainable by the values of other variables in the database,
visual perception and word meaning. Ordinarily, if you do not
specify a missing data analysis in Mplus, Mplus performs
listwise or
casewise deletion of cases with any missing
data. That is, any case with one or more missing data points
is omitted entirely from analyses. However, for exploratory
factor analysis, confirmatory factor analysis, and structural
equation modeling with continuous variables, Mplus features
a missing data option that outperforms the default listwise
deletion method. The optional method that offers superior performance
is called full information maximum likelihood (FIML); details
on FIML can be found in
General FAQ #25: Handling missing or incomplete Data and in
AMOS FAQ #5: Handling missing data using AMOS.
Regardless of whether you choose to use FIML or listwise data
deletion to handle missing data, if you have missing data in
your input database, you must tell Mplus how the missing values
for each variable are represented in the database. You use the
MISSING subcommand of the
VARIABLE command to accomplish
this task. In this example, missing values for cubes and sentence
are represented by -9, so the
MISSING subcommand reads:
MISSING ARE all (-9) ;
The
all keyword tells Mplus that all variables
in the analysis use -9 to represent missing values. If your
database contains blanks to represent missing values, you may
use the specification
MISSING = blank ;
Similarly, you may use
MISSING ARE . ;
if your database contains period symbols to represent missing
values. Other missing value specifications are available; see
the
Mplus User's Guide for specifics.
If you insert the
MISSING syntax into the previous exploratory
factor analysis program and specify that Mplus use the newly-created
database that contains cases with missing values,
grant-missing.dat, Mplus will perform listwise
deletion of the cases with incomplete data. The Mplus command
file follows:
TITLE: Grant-White School: EFA with
Missing Data
DATA: FILE IS U:\Projects\Documentation\Mplus\grant-missing.dat
;
VARIABLE:
NAMES ARE visperc
cubes
lozenges
paragrap
sentence
wordmean
gender ;
USEVARIABLES ARE visperc - wordmean ;
MISSING ARE all (-9) ;
ANALYSIS: TYPE = efa 1 2;
ESTIMATOR = ml ;
Selected output from the analysis appears below.
Grant-White School: Exploratory Factor Analysis with Missing
Data
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
79
Number of y-variables
6
Number of x-variables
0
Number of continuous latent variables
0
Notice that Mplus considers the database to contain 79 usable
cases rather than the original 145 cases.
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
CHI-SQUARE VALUE
14.651
DEGREES OF FREEDOM
9
PROBABILITY VALUE
.1009
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .089 ( .000
.169)
PROBABILITY RMSEA LE .05 IS .199
The one factor solution also fits the database for the 79 useable
cases. This finding stands in direct contrast to the example
in the previous section where all 145 cases had complete data
and the one factor model was rejected. Clearly the reduction
of
N from 145 to 79 has resulted in a substantial loss
of statistical power to reject false hypotheses.
Fortunately, you can use Mplus's FIML missing data handling
option to rectify the problem. Add the keyword
missing
to the
TYPE subcommand of the
ANALYSIS command,
like this:
ANALYSIS:
TYPE = missing efa 1 2 ;
ESTIMATOR = ml ;
Run the analysis and consider the results, shown below.
Grant-White School: Exploratory Factor Analysis with Missing
Data
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
145
Number of y-variables
6
Number of x-variables
0
Number of continuous latent variables
0
Mplus now uses all 145 cases in its computations.
SUMMARY OF DATA
Number of patterns
4
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value .100
PROPORTION OF DATA PRESENT
Covariance Coverage
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC
1.000
CUBES
.697 .697
LOZENGES 1.000
.697 1.000
PARAGRAP 1.000
.697 1.000
1.000
SENTENCE
.821 .545
.821 .821
.821
WORDMEAN 1.000
.697 1.000
1.000
.821
Mplus futher recognizes that there are four distinct patterns
of missing data contained in the database and it displays the
amount of data used to generate each input covariance for the
analysis. From the missing data coverage matrix, you can see
that the
cubes-sentence covariance has the lowest coverage
with just under 55% of cases available to build the covariance.
Mplus requires a minimum coverage value of 10% per covariance,
though you can override this default if you wish.
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
CHI-SQUARE VALUE
29.732
DEGREES OF FREEDOM
9
PROBABILITY VALUE
.0005
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .126 ( .078
.178)
PROBABILITY RMSEA LE .05 IS .007
Unlike the example that used listwise deletion of cases with
missing data, the chi-square test of model fit for the one factor
solution rejects the one factor model. Using FIML missing data
handling, you conclude that one factor is not sufficient to
explain the pattern of correlations among the six input variables,
just as you did in the first example from the preceding section
where Mplus used the complete database containing 145 cases.
As with the complete dataset, the two factor solution fits the
data well using the FIML method with the incomplete dataset:
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
CHI-SQUARE VALUE
.578
DEGREES OF FREEDOM
4
PROBABILITY VALUE
.9655
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .000 ( .000
.000)
PROBABILITY RMSEA LE .05 IS .982
3.
Exploratory factor analysis with categorical outcomes
So far, the examples shown here contained continuous outcomes.
If you have observed outcome variables that have ten or fewer
categories, and the variables' responses are dichotomous or
ordered categories, you may elect to have Mplus treat these
variables as categorical indicators. This type of model is often
sensible for analyzing Likert scale items because while the
items themselves typically are coarsely categorized on a 1 to
5 or 1 to 7 scale, the items often attempt to measure an individual's
standing on a continuous underlying unobserved variable.
For the purposes of illustration, suppose that you recode each
variable into a replacement variable where all six variables'
values at the median or below are assigned a categorical value
of 1.00 and all values above the median assigned a value of
2.00. Mplus recodes the lowest value to zero with subsequent
values increasing in units of 1.00. While the two underlying
latent factors remain continuous, the six categorical observed
variables' response values are now ordered dichotomous categories.
To analyze the modified database using Mplus, you may use the
syntax that appeared in the initial exploratory factor analysis
example, with the following modifications, and the new data
file that contains the categorical variables,
grantcat.dat, as shown below.
TITLE: Grant-White School:
EFA with categorical outcomes
DATA: FILE IS U:\Projects\Documentation\Mplus\grantcat.dat
;
VARIABLE:
NAMES ARE viscat
cubescat
lozcat
paracat
sentcat
wordcat ;
USEVARIABLES ARE viscat - wordcat ;
CATEGORICAL ARE viscat - wordcat ;
ANALYSIS: TYPE = efa 1 2;
ESTIMATOR = wlsmv ;
OUTPUT: sampstat ;
First, you must change the names of the variables in the
NAMES
and
USEVARIABLES subcommands of the
DATA command.
Next, you tell Mplus which variables are categorical with the
CATEGORICAL subcommand of the
DATA command, like
this:
CATEGORICAL ARE vizcat ... wordcat ;
You should also change the
ESTIMATOR option for the
ANALYSIS
command. The default is unweighted least-squares (
uls),
which is fast and is useful for exploratory work, but a more
optimal choice for categorical outcomes, based on the work of
Muthén, DuToit, and Spisic (1997), is weighted least-squares
with mean and variance adjustment,
wlsmv.
ANALYSIS: TYPE = efa
1 2;
ESTIMATOR = wlsmv ;
Selected output from the analysis appears below. Notice that
the categorical nature of the data precludes computation of
the descriptive model fit statistics such as the RMSEA, though
Mplus does produce the familiar chi-square test of overall model
fit.
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
CHI-SQUARE VALUE
2.823
DEGREES OF FREEDOM
4
PROBABILITY VALUE
.5875
The chi-square result for the two factor model is not significant,
which indicates that two factors are sufficient to explain the
intercorrelations among the six observed variables. The varimax
and promax rotated factor loadings appear below. The pattern
and values obtained from this analysis are consistent with the
results of the first exploratory factor analysis of the completely
continuous data discussed previously.
VARIMAX ROTATED LOADINGS
1
2
________ ________
VISCAT
.571 .332
CUBESCAT
.700 .117
LOZCAT
.667 .244
PARACAT
.473 .642
SENTCAT
.235 .847
WORDCAT
.206 .858
PROMAX ROTATED LOADINGS
1
2
________ ________
VISCAT
.559 .159
CUBESCAT
.777 -.137
LOZCAT
.698 .022
PARACAT
.347 .550
SENTCAT
.005 .876
WORDCAT
-.031
.899
PROMAX FACTOR CORRELATIONS
1
2
________ ________
1
1.000
2
.557 1.000
Although Mplus does not produce the RMSEA descriptive model
fit statistic for categorical outcomes, it does output the standardized
root mean residual, RMR:
ROOT MEAN SQUARE RESIDUAL IS
.0310
The value of .031 suggests an excellent fit of the two factor
model to the observed data.
There are several notes worth keeping in mind when you perform
exploratory factor analysis with categorical outcome variables.
- Although one or more of the observed variables may be
categorical, any latent variables in the model are assumed
to be continuous (this is a property of the exploratory
factor analysis model; confirmatory factor analysis models
with categorical latent variables may be fit as mixture
models using Mplus; see the Mplus User's Guide for
more information about mixture models).
- FIML missing data handling is not available with the
analysis of categorical outcomes.
- The analysis specification and interpretation of the
output is the same whether one, a subset, or all observed
variables are categorical.
- Categorical observed variables may be dichotomous or