Sentry Page Protection
Data Manipulation [11-18]
(In=) Option
The (in=) option is a special technique that allows you to identify observations when merging data sets.
Again, let's take a look at the DS1 and DS2 data sets.
Again, we are going to merge the DS1 and DS2 data sets.
Example
proc sort data=ds1;
by id;
run;
proc sort data=ds2;
by id;
run;
data ds;
merge ds1 (in=a) ds2 (in=b);
by id;
run;
(In=a) Option
The (in=a) option creates a variable called "a".
The variable "a" has an interesting characteristics:
It does not exist in the data set.
You cannot see it in the data set.
However, it exists in the data step.
That is, you can refer to this variable in your program.
Is there a way to show "a" in the data set?
No, the variable "a" is not created in the data set.
However, you can create a carbon-copy of "a" in the data set.
Example
data ds;
merge ds1 (in=a) ds2 (in=b);
by id;
var1 = a;
run;
The variable VAR1 is a carbon-copy of the variable "a".
It contains exactly the same set of data as "a".
By looking at VAR1, we can tell the values of "a".
Default values of "a"
By default, the variable "a" contains either 1's or 0's (binary).
It is 1's if the observation is contributed by DS1 (the data set "a" is associated with).
The ID's (9001, 9002, 9003, 9004, 9005) are all contributed by DS1.
They are all flagged as "1".
On the other hand, the observations that are NOT contributed by DS1 (i.e. 9006 and 9007) are flagged as "0".
Identifying observations
Now that we know the values of "a" (and "b"), identifying the matching and unmatched observations becomes much easier.
Example
data ds;
merge ds1 (in=a) ds2 (in=b);
by id;
if a=1 and b=1 then i=1;
else if a=1 then i=2;
else if b=1 then i=3;
run;
When both "a" and "b" are 1's, the observations are contributed by both DS1 and DS2.
These observations are flagged as 1 (i=1).
When the observations are not matching, and they are contributed by only DS1, the observations are flagged as 2 (i=2).
Similarly, when the observations are contributed by only DS2, they are flagged as 3 (i=3).
Overall, the observations are flagged into 3 groups:
- Matched observations (i=1)
- Unmatching observations from DS1 only (i=2)
- Unmatching observations from DS2 only (i=3)
How come ... the code looks different than before?
Please note the code below essentially achieves the exact same results:
data ds;
merge ds1 (in=a) ds2 (in=b);
by id;
if a and b then i=1;
else if a then i=2;
else if b then i=3;
run;
The (if a and b) statement is the same as (if a=1 and b=1).
Exercise
Copy and run the ALLPOP and DROPOUT data sets from the yellow box below:
Copy and run the ALLPOP and DROPOUT data sets from the yellow box below:
The ALLPOP data set contains a list of 10 patients enrolled in a clinical study.
While the clinical study has been ongoing, some patients dropped out of the study due to various reasons.
The list of dropped out patients are stored in the DROPOUT data set.
Task
Create a new variable that flags the patient:
- 1 for current patient
- 2 for dropped out patient
Create any additional data set or variable if needed.
Need some help?
Fill out my online form.