Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Sentry Page Protection
Please Wait...
Data Manipulation [3-18]


Removing Duplicate Observations
Picture
Duplicate observations affect your analysis results.

You can remove them by using the NODUPKEY option in Proc Sort. 

Example
Picture

The SUPERMARKET data set contains 3 variables:
  • Product
  • Price
  • DemandPerWeek

Pringles is a hot seller. However, it is also duplicated in the data set.
Picture

​We can use the NODUPKEY option from Proc Sort to remove the duplicate observations.

Example

Proc Sort Data=Supermarket Out=Supermarket2 
NODUPKEY;
By Product Price DemandPerWeek;
Run;
Picture

The data set is sorted with the duplicated observation (Pringles) removed.

The Log window also shows a note about the removal of the duplicated observation.
Picture

Note: the NODUPKEY option should be used with caution. Triple check the duplication before you remove them from the data set.

Exercise

Copy and run the INCOME data set from the yellow box below.
Picture

The INCOME data set contains 4 variables:
  • HouseHoldID
  • NumMembers
  • HomeOwner
  • Income

Remove any duplicate observation(s) from the INCOME data set.
Next

Need some help? 


HINT:
It is highly recommended to create a new data set when removing the duplication. Keep the original data set intact in case you need to go back to the source data.


SOLUTION:
Proc Sort Data=Income Out=Income2 NODUPKEY;
By HouseHoldID NumMembers HomeOwner Income;
Run;


Fill out my online form.

Already a member? Go to member's area.