Sentry Page Protection
Data Manipulation [10-18]
Dealing with unmatched observations
More often than not, you will have to merge data sets that are not perfectly matching.
Let's take a look at an example.
Example
The DS1 data set contains a list of ID from 9001 to 9005.
The DS2 data set, on the other hand, contains the list of ID from 9003 to 9007.
Now, let's merge the two data sets.
Example
proc sort data=ds1;
by id;
run;
proc sort data=ds2;
by id;
run;
data ds;
merge ds1 ds2;
by id;
run;
Unmatched observations cause missing values and incorrect analysis results.
You must ensure the unmatched observations are handled properly when merging the data sets.
Identifying unmatched observations
In SAS, there is a special technique that allows you to identify unmatched observations.
Let's take a look at an example.
Example
data ds;
merge ds1 (in=a) ds2 (in=b);
by id;
if a and b then i = 1;
else if a then i = 2;
else if b then i = 3;
run;
The code above creates a new variable called "i" that flags the three type of observations:
Common observations (i=1)
The ID (9003, 9004 and 9005) appear in both the DS1 and DS2 data sets.
The observations associated with these ID are flagged as 1.
DS1-only observations (i=2)
The ID (9001 and 9002) appear in only DS1 but not DS2.
These observations are flagged as 2.
DS2-only observations (i=3)
Similarly, the ID (9006 and 9007) appear in only DS2 but not DS1.
These are flagged as 3.
This technique makes use of the (in=a) option as well as the conditional statements.
This will be explained in details in the next session.
Exercise
Copy and run the TRANSAC and TAX data sets from the yellow box below:
Copy and run the TRANSAC and TAX data sets from the yellow box below:
The TRANSAC data set contains 3 variables:
- ORDER: Order number
- PROVINCE: The province of the shipping destination
- TOTAL: Total sales amount
The sales tax on each transaction depends on the province of the shipping destination.
For example, a sales tax of 13% should be added to orders shipping to Ontario. A tax of 5% should be added to orders shipping to Alberta.
The sales tax information can be found from the TAX data set.
Task 1
Merge the TRANSAC and TAX data set by matching the province of the shipping destination. Flag the observations that don't match.
Task 2
Calculate the after-tax total for each transaction.
Need some help?
HINT:
The PROVINCE is the merging variable.
SOLUTION:
proc sort data=transac;
by province;
run;
proc sort data=tax;
by province;
run;
data transac2;
merge transac (in=a) tax (in=b);
by province;
if a and b then i=1;
else if a then i=2;
else if b then i=3;
Total2 = Total * (1 + tax/100);
run;
Fill out my online form.