I am predicting my dependent variable y from independent variables a and b. How can I calculate the interaction term a*b for use in my regression analysis?
There is differing opinion about how to compute the interaction term for use in an analysis. Some researchers compute the product of a and b and enter this product into their regression model, like so:
For SPSS, use the dialog boxes to compute the new interaction variable:
In the Data View window, click Transform and then Compute.
In the Target Variable box, type the name of the new interaction variable, e.g. ab.
In the Numeric Expression box, enter a*b. Click OK.
This computes the interaction term, ab, and adds it to the dataset.
Enter the variables a, b, and ab as independent variables in the regression model.
For SAS:
DATA origdata;
SET origdata;
ab = a*b ;
RUN;
PROC REG DATA = origdata;
MODEL y = a b ab ;
RUN ;
Other researchers advocate "centering" the a and b predictors before computing the interaction term. Centering the term means subtracting the variable's mean from each case's value on that variable. The result is known as a "deviation score". The SPSS and SAS code shown below can be used to create centered variables.
For SPSS:
In the Data View window, click Transform and then Compute.
Type the variable name, breakvar, in the Target Variable box. Enter a value of 1 in the Numeric Expression box. Click OK.
This creates a new variable, breakvar, with a value equal to 1. This variable is necessary for calculating the means of variables a and b.
Click Data, then Aggregate.
Click breakvar into the box labeled Break Variables.
Click on a and b to put them into the box labeled Aggregated Variables.
Make sure the function specified in the Summaries of Variables box is the mean of the variable. Make sure the default option of add aggregated variables to active dataset is checked. Click OK.
This will add the mean of a and the mean of b as two new columns in the dataset, a_mean and b_mean, respectively.
Click Transform, then Compute.
Type the centered variable name, acen, in the Target Variable box .
Enter a - a_mean in the Numeric Expression box. Click OK. This creates the centered variable of a.
Create the centered variable, bcen, by entering b - b_mean in the Numeric Expression box. Click OK.
Click Transform, then Compute.
Type the variable name, abcen, in the Target Variable box .
Enter acen* bcen in the Numeric Expression box. Click OK.
This creates the interaction term, abcen, based on the centered variables of a and b.
Use the centered terms acen, bcen, and abcen in the regression model instead of a, b, and ab.
For SAS:
PROC STANDARD DATA = origdata
OUT = centdata
MEAN = 0
PRINT;
VAR a b;
RUN;
DATA centdata;
SET centdata;
ab = a*b;
RUN:
PROC REG DATA = centdata;
MODEL y = a b ab;
RUN;
The centered and non-centered approaches yield identical overall regression model statistics and tests for the interaction effect (this assumes that the interaction effect is the last entered into the regression model, as is generally the case in this type of analysis).
Which approach should you use to compute your interaction term? The chief advantages of centering are that it (1) reduces multicollinearity (a high correlation) between the a and b predictors and the a*b interaction term and (2) can render more meaningful interpretations of the regression coefficients for a and b.
The regression coefficient for a*b will be the same for both approaches, but the coefficients for a and b will differ depending on which method you use. This is because in the non-centering method, the coefficient for a estimates the relationship between a and y where b equals zero. In the centering method, the coefficient for a estimates the relationship between a and y where b equals its average. In many situations, the predictors will not have a meaningful zero point, so a centering approach may be warranted.
Leona Aiken and Stephen West provide an example of this type of situation in their text titled Multiple regression: Testing and interpreting interactions (1991, Sage Publications, Newbury Park, Chapter 3).
As an example, suppose you are predicting athletes' strength levels (y) from height (a) and weight (b) measurements. Under the non-centering approach, the measure of the relationship between height (a) and strength (y) as estimated by the regression coefficient for height (a) occurs where b = 0, or weight equals zero pounds. No athlete we know of has a weight of zero pounds!
Centering provides one remedy to this situation: In the centered model, the regression coefficient for height (a) estimates the relationship between height (a) and strength (y) where weight (b) is equal to the mean weight in the data set instead of zero.
Aiken and West devote an entire chapter of their book to the topic of centering (chapter 3). This book is available from the Physics-Math-Astronomy library on campus. See http://catalog.lib.utexas.edu/.
If you have further questions, send E-mail to stats@ssc.utexas.edu.