Advanced Data Manipulation [1-5]
Proc Transpose Transposing a data set is a new topic for the SAS certification exam.
It has been discussed in data manipulation [12-18]. In this module, we will provide additional training on this topic that will help you to prepare for the exam. Copy and run the code from the yellow box below: The DEMO data set contains five columns:
Let's first get familiar with the data set. The first two columns are PATIENT and VISITNO. Both are character variables. These columns are used to identify the unique rows in the data set. For example, let's look at the first observation: In the first observation, the PATIENT is ID1001, and the VISITNO is "Visit_1". This uniquely identifies that it is patient ID1001's first visit to the clinic. The other columns are:
These are all numeric variables containing the measurements at each visit. Now, let's do a simple proc transpose on this data set: proc transpose data=demo out=t_demo; run; This example is very simple. It is simply a TRANSPOSE procedure with the input data set (i.e. DEMO) and output data set (T_DEMO). The default behavior of proc transpose is to transpose all the numeric columns into rows. Yes, numeric columns only. For example, the SBP column is a numeric column. It was transposed from a column into a row in the output data set T_DEMO: The character columns, such as the PATIENT and VISITNO, are not transposed by default. All three numeric columns (SBP, DBP and PULSE) are transposed into rows in the T_DEMO data set: Transposing Character Columns If you want to transpose a character column, you must specify it in the VAR statement. Let's look at an example. proc transpose data=demo out=t_demo; var visitno; run; The VAR statement tells SAS to transpose only the VISITNO column. The output data set has one row for VISITNO: Now, let's go back to the original data set for a moment. The very first row is the data for ID1001's first visit to the clinic. The SBP, DBP and Pulse measurements are 92, 87, 62, respectively. Let's run the proc transpose step on the SBP, DBP and PULSE columns: proc transpose data=demo out=t_demo; var sbp dbp pulse; run; The step above transposed the SBP, DBP and PULSE columns. There is one issue. In the output data set, we cannot identify which patient and visit the transposed values belong to. For example, below is the COL1 column in the output data set: In this column, the SBP, DBP and PULSE are 92, 87 and 62, respectively. We know that these values belong to ID1001's first visit to the clinic. However, there is nothing in the output data set that indicates this! The two row-identifying values, PATIENT and VISITNO, must be added back to the output data set. Fortunately, this can be done using:
These are explained below. ID Statement The ID statement allows you to use the identifier values as the variable names. Without the ID statement, the variable names are automatically assigned as COL1, COL2, etc. When we use the ID statement, the identifier values are used in place of the "COL" prefix. Let's look at an example. proc transpose data=demo out=t_demo; var sbp dbp pulse; id patient visitno; run; The PATIENT and VISITNO column are specified in the ID statement. We now have variable names that are more descriptive: The columns created are ID1001Visit_1, ID1002Visit_2, etc. These are more descriptive. We now know which patient and visit each measurement belongs to! BY Statement The BY statement tells SAS to transpose the data set for each by-group. Let's look at an example. proc sort data=demo; by patient visitno; run; proc transpose data=demo out=t_demo; var sbp dbp pulse; by patient visitno; run; In this example, the PATIENT and VISITNO columns are specified in the BY statement. The transpose is done for each of the PATIENT and VISITNO by-groups. The SBP, DBP and PULSE are transposed for each patient and visit: By using the BY statement, the row values (i.e. SBP, DBP and PULSE) can now be identified by the patient and the visit no. Combining the use of the BY statement and ID statement It is more common to use both the ID and BY statement together when transposing a data set. Let's look at an example below. proc sort data=demo; by patient; run; proc transpose data=demo out=t_demo; var sbp dbp pulse; by patient; id visitno; run; In this example, we have a BY statement for the PATIENT column, and an ID statement for the VISITNO column. The transpose is done for each patient by-group. The VISITNO values is assigned as the variable name (e.g. VISIT_1, VISIT_2, etc.). Exercise
The SASHELP.CLASS data set contains five columns:
Transpose SEX, AGE, HEIGHT and WEIGHT columns into rows for each student. What is the name and type (numeric or character) of the column that contains the transposed value? Need some help?
HINT: SOLUTION: proc transpose data=sashelp.class out=t_class; The column created is called COL1 and it is a character column.
Fill out my online form.
|