College of Natural Sciences
 
FAQs
This is for IE7 to hold div open

General FAQ #24: Contrast coding

Question:

I've run a balanced-data ANOVA with one between-subjects factor (GROUP) and one within-subjects factor (TIME). Group has three levels; time has three levels. My dependent variable is anxiety, measured at three equally-spaced intervals.

I now want to run a contrast analysis. I want to compare group 1 to group 2 across all three measurement occasions of anxiety. How can I determine what my contrast weights should be?

Answer:

One widely used method is first to specify your hypothesis in terms of your design's cell means. You then reexpress the hypothesis in terms of the model parameters used by your software. Finally, you match the weights found in this expression to the syntax required by your software.

1. Specify your hypothesis in terms of your design's cell means.

To identify the population cell means in your GROUP by TIME study, it's helpful to use a table, like this:

Population Means
  Time 1 Time 2 Time 3
Group 1 Mu11 Mu12 Mu13
Group 2 Mu21 Mu22 Mu23
Group 3 Mu31 Mu32 Mu33

Once you identify the cell means, your next step is to specify your null hypothesis as an equality among various combinations of these means.

Your hypothesis, stated in null hypothesis form, reads: "The population mean for group 1 equals the population mean for group 2, when this mean is taken across all measurement occasions of anxiety". Translating this natural language hypothesis into a statement about equality of population means, you get:

Mu11+Mu12+Mu13 = Mu21+Mu22+Mu23

2. Reexpress the hypothesis in terms of the model parameters.

For the standard balanced-data two-way ANOVA model, the relationship between a cell mean and the model parameters is widely known. In our case, this relationship for Mu11 is:

Mu11 = I+G1+T1+GT11

That is, each individual population mean is composed of an intercept term (abbreviated I), a main effect term due to group (abbreviated G), another main effect term due to Time (abbreviated T), and a group by time interaction term (abbreviated GT).

When we substitute these expressions into the null hypothesis formula shown above, we get:

(I+G1+T1+GT11)+(I+G1+T2+GT12)+(I+G1+T3+GT13) =
(I+G2+T1+GT21)+(I+G2+T2+GT22)+(I+G2+T3+GT23)

While this formula may seem intimidating, it is easily simplified. There are three intercept terms (I) before the equals sign and three intercept terms after it, so all intercept terms drop out of the formula. The T1, T2, and T3 terms also drop out. This leaves us with:

(G1+GT11)+(G1+GT12)+(G1+GT13) = (G2+GT21)+(G2+GT22)+(G2+GT23)

We now continue simplifying by collecting terms. We have three G1 terms and three G2 terms, giving us:

3G1 + GT11 + GT12 + GT13 = 3G2 + GT21 + GT22 + GT23

Notice that what we have left is an equality between each group's main effect and group by time interaction terms. We're collapsing across our time variable, which agrees with our hypothesis. However, you may be surprised by the interaction terms, since our hypothesis doesn't explicitly mention them. We'll say more about this later.

3. Translate this expression into the form required by the software's syntax.

Most software requires the expression to equate to a constant (usually zero), so we subtract one side from each side to get:

3G1 + GT11 + GT12 + GT13 - 3G2 - GT21 - GT22 - GT23 = 0

Then we need to arrange our terms in the order used by our software. For SAS or SPSS, the order in this case would be:

3G1 - 3G2 + GT11 + GT12 + GT13 - GT21 - GT22 - GT23 = 0

Finally, we need to include a term for every parameter in a variable, unless each parameter in the variable has a weight of zero. Our equation becomes:

3G1 - 3G2 + 0G3 + GT11 + GT12 + GT13 - GT21 - GT22 - GT23 + 0GT31 + 0GT32 + 0GT33 = 0

We can now read off the contrast weights, which are just the coefficients of the effect terms.

Although the exact specification of contrast statement varies from package to package, its general form is as follows:

"contrast-name" variable-name weights

where "contrast-name" is a quoted string that identifies the contrast on the software's output, "variable-name" is the name of the variable (e.g., GROUP), and "weights" are the contrast weights you've generated.

Let's put our contrast weights into this framework:

"my contrast" group 3 -3 0 group*time 1 1 1 -1 -1 -1 0 0 0

Contrast coding can be a challenging exercise. It is easy to produce contrast weights which do not test your hypothesis unless you follow a systematic method such as the one described here. Be sure to check carefully the contrast results produced by your software. These results should be consistent with the usual descriptive information (e.g., cell means, standard deviations, and standard errors) you should run before you perform a contrast analysis. If you are uncertain about the validity of your contrast results, contact a consultant at the E-mail address shown below for assistance.

For instance, you might have expected the interaction terms to have dropped out. This would be appropriate in a model employing the usual side conditions that the interaction terms within a level sum to zero. However, both SAS and SPSS use the "overparameterized" ANOVA model, which does not assume such restrictions.

For more information on generating and specifying contrast codes, see the online SAS manual at http://support.sas.com/documentation/. Under SAS Product Documentation, click on SAS/STAT. Click on SAS OnlineDoc under SAS/STAT 9.1.3; scroll down and click on SAS/STAT and then click on SAS/STAT User's Guide. Scroll down to The GLM Procedure; the Syntax section discusses the CONTRAST command.

If you have further questions, send E-mail to stats@ssc.utexas.edu.