College of Natural Sciences
 
FAQs
This is for IE7 to hold div open

General FAQ #28: How to compare sample correlation coefficients drawn from the same sample

Question:

I would like to compare two sample correlation coefficients, but they are drawn from the same sample.  Is there a method of testing for a significant difference between them that takes their dependence into account?

Note: If you have two correlation coefficients computed from two different samples, please consult General FAQ #26.

Answer:

Yes there is, and it involves a choice between two different methods.  The first method relies on a formula found in Cohen & Cohen (1983) on page 57.  The formula yields a t-statistic with n - 3 degrees of freedom.  As written below,  the formula tests for a significant difference in the correlation between variables X & Y and V & Y:

    t = (rxy - rvy)*sqrt((n-1)(1 + rxv))/(sqrt(2((n-1)/(n-3))|R| + ((rxy + rvy)/2)^2(1-rxv)^3))

where

    rxy = correlation coefficient between variables x and y
    rxv = correlation coefficient between variables x and v
    ryv = correlation coefficient between variables y and v

and |R| =  (1 - rxy ^2 - rvy^2 - rxv^2 + (2*rxy*rxv*rvy)), the determinant of the correlation matrix for X, Y, and V.
 

Unfortunately, the above method is not available as an option in any of the statistical procedures in either SPSS or SAS.  However,  SPSS users can adapt the following syntax to perform the test:

The above syntax will generate an active dataset which will appear in the data editor window. In the Variable View, change the number of decimals to the desired setting to display appropriate results.

Notice that the method above is limited to the three variable case (e.g X & Y, and V & Y).  The second method represents a more flexible approach to the problem.  With this method, one would use a statistical software package capable of estimating covariance structural models (e.g. SAS, AMOS, LISREL) to compare an observed correlation matrix to an estimated correlation matrix which includes restrictions that represent a null hypothesis. Steiger (1980) discusses this approach in more detail.

For example, suppose you have a set of four variables: X, Y, Z, and Q, and you want to test whether the correlation between X and Y is the same as that between Z and Q.  The observed correlation matrix would be symmetric and look like the one presented below,

          X     Y    Z    Q
    X    a     b    c    d
    Y    b     e    f     g
    Z    c      f    h     i
    Q   d     g     i     j

where capital letters represent variables and lower case letters represent correlation coefficients.

In the estimated correlation matrix, one would impose the constraint that b = i  in order to test the hypothesis that the correlation between X and Y is the same as that between Z and Q.  One could then test how well this estimated matrix fits the data using the standard output of any of the statistical packages mentioned above.  If it turns out that the restricted correlation matrix provides a reasonable fit of the data given a previously specified level of statistical significance, then this finding would be equivalent to retaining the null hypothesis that the correlation between X and Y is the same as that between Z and Q.  On the other hand,  if the restricted correlation matrix does not fit the data well, then this would be equivalent to rejecting the null hypothesis.  Provided that one is cognizant of the problem of making multiple inferences and the sample data conform to the assumptions necessary to perform a covariance structure analysis (e.g., sufficient sample size, joint multivariate normality of the population distribution of the input variables, etc.), one could test a series of hypotheses in this fashion given virtually any combination of variables.

The SAS program below demonstrates how PROC CALIS can be used to test the hypothesis that COV(X,Y) = COV(Z,Q).  In the example below, the covariance rather than the correlation matrix is used since theoretically the maximum likelihood procedure for comfirmatory factor analysis is derived for covariance matrices.   However, one may interpret results from this program as applying also to the correlation matrix for a set of variables.
 

The key element of the program can be found in the COV statement which specifies the estimated correlation matrix for the null hypothesis.  In order to impose the constraint that the COV(X,Y) = COV(Z,Q),  the name "cov_v1v2" is given to both the covariance between X and Y and between Z and Q in the COV statement.  The other covariances, on the other hand, will be freely estimated because each has been given a unique name.  The relevant output from this program can be found below.

                                      The CALIS Procedure
                  Covariance Structure Analysis: Maximum Likelihood Estimation

                  Fit Function                                         0.0638
                  Goodness of Fit Index (GFI)                          0.9710
                  GFI Adjusted for Degrees of Freedom (AGFI)           0.7104
                  Root Mean Square Residual (RMR)                      1.2634
                  Parsimonious GFI (Mulaik, 1989)                      0.1618
                     Chi-Square                                        1.2124
                     Chi-Square DF                                     1
                     Pr > Chi-Square                                   0.2709
                  Independence Model Chi-Square                        7.0050
                  Independence Model Chi-Square DF                     6
                  RMSEA Estimate                                       0.1057
                  RMSEA 90% Lower Confidence Limit                       .
                  RMSEA 90% Upper Confidence Limit                     0.6298
                  ECVI Estimate                                        1.3495
                  ECVI 90% Lower Confidence Limit                        .
                  ECVI 90% Upper Confidence Limit                      1.8356
                  Probability of Close Fit                             0.2822
                  Bentler's Comparative Fit Index                      0.7887
                  Normal Theory Reweighted LS Chi-Square               1.1334
                  Akaike's Information Criterion                      -0.7876
                  Bozdogan's (1987) CAIC                              -2.7833
                  Schwarz's Bayesian Criterion                        -1.7833
                  McDonald's (1989) Centrality                         0.9947
                  Bentler & Bonett's (1980) Non-normed Index          -0.2680
                  Bentler & Bonett's (1980) NFI                        0.8269
                  James, Mulaik, & Brett (1982) Parsimonious NFI       0.1378
                  Z-Test of Wilson & Hilferty (1931)                   0.6121
                  Bollen (1986) Normed Index Rho1                     -0.0385
                  Bollen (1988) Non-normed Index Delta2                0.9646
                  Hoelter's (1983) Critical N                         62
 

In the table shown above, the chi-square results for the test of the null hypothesis that COV(X,Y) = COV(Z, Q) are indented for emphasis. Most of the other results are general goodness of fit measures that do not apply to this limited use of structural covariance models, so they may be  ignored.  The results of the chi-square test on this particular set of data indicate that we should retain the null hypothesis (p = .27).   If we had observed a much lower p-value in this test (e.g. < .05) then we would have rejected the null hypothesis and concluded COV(X,Y) does not equal COV(Z, Q).

Note that this method allows you to test the equality of multiple sets of correlation coefficients within the same matrix simultaneously. For instance, if you had a correlation matrix consisting of 10 variables, you could easily test the v1-v2 with v3-v4 correlation equality at the same time you tested the v5-v6 and v7-v8 correlation equality. The resulting chi-square test statistic would have two degrees of freedom; it would test the joint hypothesis that the v1-v2 correlation is equal to the v3-v4 correlation and that the v5-v6 correlation is equal to the v7-v8 correlation.
 

References

If you have further questions, send E-mail to stats@ssc.utexas.edu.