Sentry Page Protection
Data Manipulation [9-18]
Identifying the Merging Variable(s)
Whenever you merge data sets, you must first identify the merging variable(s).
Again, let's take a look at the PROFILE and EXAM data sets.
When merging the PROFILE and EXAM data sets, the merging variable is Student (e.g. SID0001, SID0002).
Merging the data sets by the Student variable ensures each of the students' profile is matched with their corresponding exam result.
That is, the profile of SID0001 is matched with SID0001's exam result, not SID0008's.
Once you identify the merging variable, you can follow the two steps below in merging the data sets:
Step 1: sort all of the input data sets by the merging variables.
Example
Proc Sort Data=Profile;
By Student;
Run;
Proc Sort Data=Exam;
By Student;
Run;
Note: this step is crucial.
SAS would not be able to merge data sets that are not sorted by the merging variables.
Step 2: merge the data sets using the MERGE and BY statements
Example
Data Combined;
Merge Profile Exam;
By Student;
Run;
The MERGE statement tells SAS to merge the data sets.
The BY statement tells SAS to merge them by matching the (merging) variable.
Entire Code
Proc Sort Data=Profile;
By Student;
Run;
Proc Sort Data=Exam;
By Student;
Run;
Data Combined;
Merge Profile Exam;
By Student;
Run;
Exercise
Copy and run the POPULATION and UBER data sets from the yellow box below:
Copy and run the POPULATION and UBER data sets from the yellow box below:
The POPULATION data set contains the population information related to 8 metropolitan cities across North America.
The UBER data set contains the number of Uber drivers in each of the metropolitan cities.
You are interested in analyzing the ratio of Uber driver against the city's population.
Merge the POPULATION and UBER data set and calculate the Driver to Population ratio:
Ratio = Number of Uber driver / Population.
Which city has the highest Uber driver to population ratio?
Need some help?
HINT:
Make sure merging variables have consistent variable name.
SOLUTION:
proc sort data=population;
by country city;
run;
proc sort data=uber;
by country cities;
run;
data ratio;
merge population uber (rename=(cities=city));
by country city;
ratio = NumDriver/population;
run;
The city of Cancun has the highest uber driver to population ratio.
Fill out my online form.