Sentry Page Protection
Data Manipulation [7-18]
Two (2) common issues when concatenating data sets
When combining data sets, you must pay attention to the two common issues below:
1. Inconsistent Variable Name
Both the REPORT1 and REPORT2 data sets contain the patient ID, gender, and age.
Let's combine them using the SET statement.
Example
Data Report;
Set Report1 Report2;
Run;
The PatientID and PID are not combined into one column because of the different variable names.
In order to solve this problem, one of the variable names must be changed.
This can be done by using a RENAME statement.
Example
Data Report;
Set Report1 Report2 (rename=(PID=PatientID));
Run;
The variable name PID was renamed as PatientID.
As a result, the PID and PatientID are combined into one column.
However, there is another problem when the two were combined.
2. Data Truncation
Data truncation is another common problem when concatenating data set.
For example, the data from the PID variable is truncated after combining the two data sets:
This is due to the inconsistent length assigned to the two ID variables.
The length of PatientID is 8:
However, the PID variable has a length of 12:
When combining the two data sets, SAS assigns the length based on the first data set in the SET statement:
In our example, REPORT1 is listed first in the SET statement.
As a result, the length of PatientID follows the length defined in REPORT1, which is 8.
The length of 8 can only store up to 8 characters.
The data from PID, which contain more than 8 characters, is truncated after concatenating the two data sets:
This issue can be resolved by using a LENGTH statement.
Example
Data Report;
Length PatientID $12;
Set Report1 Report2 (rename=(PID=PatientID));
Run;
The Length statement assigns the length of 12 to the PatientID variable.
This allows the variable to capture all of the data point without being truncated.
Note: the LENGTH statement must be added before the SET statement, which is where the variables attributes are defined.
Once the length has been defined, it cannot be modified.
A LENGTH statement added after the SET statement will not work.
Exercise
Copy and run the DESK1 and DESK2 data set from the yellow box below:
Copy and run the DESK1 and DESK2 data set from the yellow box below:
The DESK1 and DESK2 data sets contain a list of IKEA products.
Concatenate the two (2) data sets.
Create any data set or variable if needed.
Need some help?
HINT:
Ensure consistent variable name and length when combining the two data sets!
SOLUTION:
Data Desks;
Length Brand $12 Category $30;
Set Desks1 Desks2 (rename=(Product=Brand Type=Category WGT=width DPT=depth HGT=height));
Run;
Fill out my online form.