College of Natural Sciences
 
FAQs
This is for IE7 to hold div open

SAS FAQ #6: Removing duplicate observations from a dataset using SAS

Question:

How can I remove duplicate observations from my SAS dataset?

Answer:

You can use PROC SORT with the NODUPLICATES option to remove unwanted duplicate observations from your SAS dataset. The following sample code illustrates how to use PROC SORT to do this.

DATA test ;
INPUT id varone vartwo ;
CARDS ;
1 23 45
2 35 98
3 83 45
1 23 45
;
PROC PRINT ;
PROC SORT IN=test OUT=test2 NODUPLICATES ;
BY _ALL_;
PROC PRINT DATA=test2 ;
RUN ;

The _ALL_ keyword is required for SAS to correctly identify and remove the duplicate observations.

If you want to remove duplicate records for specific BY variables only (as opposed to all variables, shown above), substitute the keyword NODUPKEY for the NODUPLICATES keyword in the example above. In addition, you should substitute a specific BY variable (e.g., vartwo) for the _ALL_ keyword used in the above example.

For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

If you have further questions, send E-mail to stats@ssc.utexas.edu.