College of Natural Sciences
 
FAQs
This is for IE7 to hold div open

HLM FAQ #4: R-squared in a Hierarchical Model

Question:

I have a two level model in which students are level-1 units nested within schools, which are my level-2 units. My model has random level-1 intercept. Is it possible to obtain an R-squared value for my hierarchical model?

Answer:

It isn't possible to obtain a true R-squared value in HLM; however, there are statistics that provide a value of the total explainable variance that can be explained by the model, and they are often referred to as R-squared or pseudo R-squared values. HLM does not display these R-squared values in its standard output. However, you can compare the error terms in an unrestricted model and a restricted model to obtain the proportion of variance explained by your model. An unrestricted model or null model is one that contains a dependent variable and level-1 random intercept. Thus, an unrestricted model does not contain any independent variables. One formula, suggested by Kreft and de Leeuw (1998) and Singer (1998), that can be used for obtaining within- and between-unit variance explained is the following:

(unrestricted error – restricted error) / unrestricted error

The within-unit variance explained is a measure of how well the independent variables in the model explain the outcome variable. The between-unit measure is the amount of variance between level-2 units that is accounted for by the predictors in the model.

Some alternatives to the above formula are described by Snijders and Bosker (1999). They suggest the following formula for computing within-unit variance explained:

1 - ((level-1 restricted error + level-2 restricted error) / (level-1 unrestricted error + level-2 unrestricted error))

And the following for computing between-unit error variance. In this formula, n is the number of individuals in each level-2 unit. As it is rarely the case that there are equal numbers of individuals in every level-2 unit, Snijders and Bosker (1999) suggest either using a reasonable number or the harmonic mean for n in the following formula:

((level-1 restricted error / n) + level-2 restricted error) / ((level-1 unrestricted error / n) + level-2 unrestricted error)

These formulas can be illustrated using the hsb.ssm file that is available in the HLM Examples directory. If you are using HLM on the University of Texas terminal server, you can find the Examples directory in the following path: N:\Program Files\Hlm404\Examples. The first step to obtain the R-squared value is to run the unrestricted model. As previously stated, the model contains only a random intercept and no independent variables. In the dialog box below, the model has an outcome variable, mathach, but has no independent variables:

Unrestricted Model

The error terms for both the level-1 and level-2 models that you will use to obtain an R-squared are in the Final estimation of variance components section at the bottom of the output:

Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        2.93501       8.61431   159    1660.23264    0.000
  level-1,       R         6.25686      39.14831
 -----------------------------------------------------------------------------

You can see that the level-1 error term is 39.15 in this model and the level-2 error term is 8.61.

The next step to obtaining the values of interest is to replicate these statistics in a restricted model. This can be illustrated by adding an independent variable to the above model. In the dialog box below, the level-1 independent variable, ses, which is each student's socioeconomic status, is added to the level-1 model:

Restricted Model

Note that the level-2 intercept is fixed in the above model (there is no error term in level-2). Again, you will look at the Final estimation of variance components section at the bottom of the output to obtain the error terms:

Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        2.18361       4.76815   159    1037.09077    0.000
  level-1,       R         6.08559      37.03440
 -----------------------------------------------------------------------------

In this model, the level-1 error term is 37.03 and the level-2 error term is 4.77. These values can now be used to calculate the within- and between-unit variance explained. First, consider the within-unit formula, which is a measure of how well socioeconomic status explains math achievement scores:

(39.15 - 37.03) / 39.15 = .05

Thus, the model containing socioeconomic status explains 5% of the explainable variance using this formula. Using the formula recommended by Snijders and Bosker (1999), the following values are used to calculate the explained variance:

1 - ((37.03 + 4.77) / (39.15 + 8.61)) = .12

Next, it is often useful to examine the amount of between-unit variance explained. Here, using the formula provided by Kreft and de Leeuw (1998) and Singer (1998) with the level-2 variances, the following values are obtained:

(8.61 - 4.77) / 8.61 = .45

Or, using the Snijders and Bosker (1999) method with the harmonic mean of 41.03, the following values are obtained:

((37.03/41.03) + 4.77) / ((39.15/41.03) + 8.61)) = .59

Socioeconomic status explains 45% of the explainable between-unit variance in this model using the first formula and 59% using the second formula. Thus, it appears that socioeconomic status contributes greatly to explaining variation between schools, but does not explain much variance in math achievement scores.

It should be noted that there are some potential problems with the method described above. One possible problem is the possibility that the level-1 variance is larger in the restricted model than the unrestricted model, which would produce negative R-squared values. Kreft and De Leeuw (1998) point out that the formula may not apply to situations where there are random intercepts. This is especially true for computing the between-unit variance explained, as there is not a single level-2 error term in models containing random slopes.

References

Snijders, T., & Bosker, R. (1999). Multilevel Analysis: an introduction to basic and advanced multilevel modeling. London: Sage Publications.

Kreft, I., De Leeuw, J. (1998). Introducing Multilevel Modeling. London: Sage Publications.

Singer, J.(1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Education and Behavioral Statistics, 24 (4), 323-355.

If you have further questions, send E-mail to stats@ssc.utexas.edu.