Sentry Page Protection
Proc SQL II [4-17]
Inner Join with Summary Function
Let's assume you'd like to compute the summary statistics over a set of data.
However, the data has been split into different tables.
With data step, you will have to first merge or concatenate the multiple data sets before summarizing the data using a proc step such as Proc Means or Proc Freq.
The same task can be achieved more efficiently within a single proc step using Proc SQL.
Let's look at an example.
Copy and run the code from the yellow box below:
However, the data has been split into different tables.
With data step, you will have to first merge or concatenate the multiple data sets before summarizing the data using a proc step such as Proc Means or Proc Freq.
The same task can be achieved more efficiently within a single proc step using Proc SQL.
Let's look at an example.
Copy and run the code from the yellow box below:
The HR table contains a list of 10 patients and their associated heart rate.
The GENDER table contains the same list of patients and their gender:
We are now going to compute the average heart rate for each gender.
With the data step, we will have to first merge the HR and GENDER data sets before computing the average heart rate by gender.
This will require multiple steps to sort, merge and summarize the data.
With Proc SQL, this can be done within a single proc step.
Below is an example.
Proc sql;
select gender, mean(hrate) as avghrate
from HR as h, Gender as g
where h.patient = g.patient
group by gender;
quit;
select gender, mean(hrate) as avghrate
from HR as h, Gender as g
where h.patient = g.patient
group by gender;
quit;
This will give you the average heart rate for each gender:
Let's look into how the code is processed.
SAS first performs an inner join between the HR table and the GENDER table.
The table, although not shown in the output, would look like this:
The summary function in the SELECT clause then computes the summary statistics on the heart rate for each gender:
The heart rate is computed for both the male and female patients.
Note: below is how you would achieve the same results using the data step:
proc sort data=HR;
by patient;
run;
proc sort data=Gender;
by patient;
run;
Data HR2;
merge HR Gender;
by patient;
Run;
Proc means data=HR2 mean;
class gender;
var hrate;
run;
by patient;
run;
proc sort data=Gender;
by patient;
run;
Data HR2;
merge HR Gender;
by patient;
Run;
Proc means data=HR2 mean;
class gender;
var hrate;
run;
This is significantly longer than using the Proc SQL method.
Exercise
Copy and run the code from the yellow box below:
Copy and run the code from the yellow box below:
The code above creates two tables:
The SUPPORT table contains two columns:
- ID: the phone support ID
- RATING: the rating that was given with regard to the phone support
The RECORD table also contains two columns:
- ID: the phone support ID
- AGENT: the agent who is providing the phone support
You are the customer services manager and you're looking to find out which agent receives the lowest rating among the three agents.
Write a Proc SQL step to compute the average rating for each customer service agent.
Which agent receives the lowest rating among the three?
Need some help?
HINT:
Perform an inner join on the two tables by matching the ID column before summarizing the rating column.
SOLUTION:
Proc sql;
select agent, mean(rating) as mean
from support as s, record as r
where s.id = r.id
group by agent
order by mean;
quit;
Agent 3 has the lowest rating among the three with an average rating of 2.04.
Fill out my online form.