How can I replace missing values with a mean value in SAS? Is there an easy way to do this for many variables?
SAS has a procedure called PROC STANDARD that can be used to standardize some or all of the variables in a SAS data set to a given mean and/or standard deviation and produce a new SAS data set that contains the standardized values. In addition, there is a REPLACE option that substitutes all missing values with the variable mean. If the MEAN=mean-value option is also specified, missing values are set instead to the user-specified mean-value.
The following SAS code demonstrates the use of PROC STANDARD for mean substitution.
DATA raw ;
INPUT v1-v10 ;
CARDS;
1 1 1 1 1 . 1 1 1 1
2 2 2 . 2 . 2 2 2 2
3 3 3 3 3 3 . . 3 3
4 4 4 . . 4 4 4 4 4
5 5 5 5 5 5 5 5 . .
;PROC STANDARD DATA=raw OUT=stnd REPLACE PRINT;
VAR v1-v10;
RUN;
The following SAS code demonstrates another way of substituting mean values
for missing values.
DATA raw ;
INPUT v1-v10;
CARDS;
1 1 1 1 1 . 1 1 1 1
2 2 2 . 2 . 2 2 2 2
3 3 3 3 3 3 . . 3 3
4 4 4 . . 4 4 4 4 4
5 5 5 5 5 5 5 5 . .
;PROC MEANS NOPRINT;
VAR v1-v10;
OUTPUT OUT=meandat(DROP=_TYPE_ _FREQ_) MEAN=m1-m10;
RUN ;PROC PRINT DATA=meandat;
RUN ;DATA meansub (DROP=m1-m10 i);
IF _N_ = 1 THEN SET meandat;
SET raw;
ARRAY old(10) v1-v10;
ARRAY means(10) m1-m10;
DO i = 1 TO 10;
IF old(i) EQ . THEN old(i) = means(i);
END;
RUN;
In the first DATA step, the data set raw is created with 10 variables, v1 through v10. Notice that there are one or more missing values (periods) for each observation in the data records.
PROC MEANS is used to produce a new dataset meandat which has variables m1 through m10 holding the means for the variables v1 through v10. PROC PRINT is used to verify this.
The second DATA step performs the substitution, creating a final data set called meansub. It defines two arrays: old represents v1 through v10 and means represents m1 through m10. A DO loop moves through the array variables, checking each value of array old to see if it is missing. If it is missing, then the value is set to the corresponding value from the array means; this is the mean substitution.
For more information on handling missing data, please see General FAQ #25: Handling missing or incomplete data.
For additional information on SAS procedures, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.
If you have further questions, send E-mail to stats@ssc.utexas.edu.