I would like to compare two sample correlation coefficients, but they are drawn from the same sample. Is there a method of testing for a significant difference between them that takes their dependence into account?
Note: If you have two correlation coefficients computed from two different samples, please consult General FAQ #26.
Yes there is, and it involves a choice between two different methods. The first method relies on a formula found in Cohen & Cohen (1983) on page 57. The formula yields a t-statistic with n - 3 degrees of freedom. As written below, the formula tests for a significant difference in the correlation between variables X & Y and V & Y:
t = (rxy - rvy)*sqrt((n-1)(1 + rxv))/(sqrt(2((n-1)/(n-3))|R| + ((rxy + rvy)/2)^2(1-rxv)^3))
where
rxy = correlation coefficient between variables
x and y
rxv = correlation coefficient between variables
x and v
ryv = correlation coefficient between variables
y and v
and |R| = (1 - rxy ^2 - rvy^2 - rxv^2 + (2*rxy*rxv*rvy)),
the determinant of the correlation matrix for X, Y, and V.
Unfortunately, the above method is not available as an option in any of the statistical procedures in either SPSS or SAS. However, SPSS users can adapt the following syntax to perform the test:
* Dependent Correlation Comparison Program.
* Compares correlation coefficients from the same sample.
* See Cohen & Cohen (1983), p. 57.
DATA LIST free
/rxy rvy rxv.
BEGIN DATA.
.50 .32 .65
END DATA.
* Define the sample size.
COMPUTE n =50.
COMPUTE diffr = rxy - rvy.
COMPUTE detR = (1 - rxy **2 - rvy**2 - rxv**2)+ (2*rxy*rxv*rvy).
*Calculate (rxy + rvy)^2 .
COMPUTE rbar = (rxy + rvy)/2.
* Calculate numerator of t statistic.
COMPUTE tnum = (diffr) * (sqrt((n-1)*(1 + rxv))).
COMPUTE tden = sqrt(2*((n-1)/(n-3))*detR + ((rbar**2) * ((1-rxv)**3))).
COMPUTE t= (tnum/tden).
COMPUTE df = n - 3.
* Evaluate the value of the t statistic.
* against a t distribution with n - 3 degrees if freedom for.
* statistical significance.
COMPUTE p_1_tail = 1 - CDF.T(abs(t),df).
COMPUTE p_2_tail = (1 - CDF.T(abs(t),df))*2.
EXECUTE.
The above syntax will generate an active dataset which will appear in the data editor window. In the Variable View, change the number of decimals to the desired setting to display appropriate results.
Notice that the method above is limited to the three variable case (e.g X & Y, and V & Y). The second method represents a more flexible approach to the problem. With this method, one would use a statistical software package capable of estimating covariance structural models (e.g. SAS, AMOS, LISREL) to compare an observed correlation matrix to an estimated correlation matrix which includes restrictions that represent a null hypothesis. Steiger (1980) discusses this approach in more detail.
For example, suppose you have a set of four variables: X, Y, Z, and Q, and you want to test whether the correlation between X and Y is the same as that between Z and Q. The observed correlation matrix would be symmetric and look like the one presented below,
X
Y Z Q
X a b
c d
Y b e f
g
Z c f
h i
Q d g
i j
where capital letters represent variables and lower case letters represent correlation coefficients.
In the estimated correlation matrix, one would impose the constraint that b = i in order to test the hypothesis that the correlation between X and Y is the same as that between Z and Q. One could then test how well this estimated matrix fits the data using the standard output of any of the statistical packages mentioned above. If it turns out that the restricted correlation matrix provides a reasonable fit of the data given a previously specified level of statistical significance, then this finding would be equivalent to retaining the null hypothesis that the correlation between X and Y is the same as that between Z and Q. On the other hand, if the restricted correlation matrix does not fit the data well, then this would be equivalent to rejecting the null hypothesis. Provided that one is cognizant of the problem of making multiple inferences and the sample data conform to the assumptions necessary to perform a covariance structure analysis (e.g., sufficient sample size, joint multivariate normality of the population distribution of the input variables, etc.), one could test a series of hypotheses in this fashion given virtually any combination of variables.
The SAS program below demonstrates how PROC CALIS can be used to test
the hypothesis that COV(X,Y) = COV(Z,Q). In the example below, the
covariance rather than the correlation matrix is used since theoretically
the maximum likelihood procedure for comfirmatory factor analysis is derived
for covariance matrices. However, one may interpret results
from this program as applying also to the correlation matrix for a set
of variables.
The key element of the program can be found in the COV statement which specifies the estimated correlation matrix for the null hypothesis. In order to impose the constraint that the COV(X,Y) = COV(Z,Q), the name "cov_v1v2" is given to both the covariance between X and Y and between Z and Q in the COV statement. The other covariances, on the other hand, will be freely estimated because each has been given a unique name. The relevant output from this program can be found below.
The CALIS Procedure
Covariance Structure Analysis: Maximum Likelihood Estimation
Fit Function 0.0638
Goodness of Fit Index (GFI) 0.9710
GFI Adjusted for Degrees of Freedom (AGFI)
0.7104
Root Mean Square Residual (RMR) 1.2634
Parsimonious GFI (Mulaik, 1989) 0.1618
Chi-Square 1.2124
Chi-Square DF 1
Pr > Chi-Square 0.2709
Independence Model Chi-Square 7.0050
Independence Model Chi-Square DF 6
RMSEA Estimate 0.1057
RMSEA 90% Lower Confidence Limit .
RMSEA 90% Upper Confidence Limit 0.6298
ECVI Estimate 1.3495
ECVI 90% Lower Confidence Limit .
ECVI 90% Upper Confidence Limit 1.8356
Probability of Close Fit 0.2822
Bentler's Comparative Fit Index 0.7887
Normal Theory Reweighted LS Chi-Square 1.1334
Akaike's Information Criterion -0.7876
Bozdogan's (1987) CAIC -2.7833
Schwarz's Bayesian Criterion -1.7833
McDonald's (1989) Centrality
0.9947
Bentler & Bonett's (1980) Non-normed Index -0.2680
Bentler & Bonett's (1980) NFI 0.8269
James, Mulaik, & Brett (1982) Parsimonious NFI 0.1378
Z-Test of Wilson & Hilferty (1931) 0.6121
Bollen (1986) Normed Index Rho1 -0.0385
Bollen (1988) Non-normed Index Delta2 0.9646
Hoelter's (1983) Critical N 62
In the table shown above, the chi-square results for the test of the null hypothesis that COV(X,Y) = COV(Z, Q) are indented for emphasis. Most of the other results are general goodness of fit measures that do not apply to this limited use of structural covariance models, so they may be ignored. The results of the chi-square test on this particular set of data indicate that we should retain the null hypothesis (p = .27). If we had observed a much lower p-value in this test (e.g. < .05) then we would have rejected the null hypothesis and concluded COV(X,Y) does not equal COV(Z, Q).
Note that this method allows you to test the equality of multiple sets
of correlation coefficients within the same matrix simultaneously. For
instance, if you had a correlation matrix consisting of 10 variables,
you could easily test the v1-v2 with v3-v4 correlation equality at the
same time you tested the v5-v6 and v7-v8 correlation equality. The resulting
chi-square test statistic would have two degrees of freedom; it would
test the joint hypothesis that the v1-v2 correlation is equal to the v3-v4
correlation and that the v5-v6 correlation is equal to the v7-v8
correlation.
References
If you have further questions, send E-mail to stats@ssc.utexas.edu.