Sentry Page Protection
Proc SQL II [2-17]
Inner Joins
An inner join combines the SAS tables horizontally and keeps only the matching rows between the two (or more) tables.
Let's look at an example.
Copy and run the code from the yellow box below:
The DS1 and DS2 tables each contain information for three customers.
Note: Customers ID0002 and ID0003 are the common customers between the two tables.
Table: DS1
Table: DS2
Below is an inner join that combines the two tables horizontally:
Proc Sql;
select *
from DS1, DS2
where ds1.custid = ds2.custid;
quit;
select *
from DS1, DS2
where ds1.custid = ds2.custid;
quit;
The resulting table is displayed in the output:
With the Proc SQL inner join, only the matching customer IDs (i.e. ID0002 and ID0003) are displayed.
Now, let's look into how the inner join is processed in SAS.
Conceptually, when the Proc SQL step is executed, SAS first creates a Cartesian product from the DS1 and DS2 tables:
The Cartesian product is not shown in the output.
However, below is what the Cartesian product would look like:
Once the Cartesian product is created, SAS will filter the table using the joining condition specified in the WHERE clause.
The WHERE clause tells SAS to match the CUSTID column across the DS1 table and DS2 table.
Only the two matching rows are displayed in the output:
Now, let's look at the WHERE clause one more time:
You will notice that the two CUSTID columns in the WHERE clause are prefixed by the table names (e.g. ds1.custid).
Using the prefix is necessary when the columns that are being compared (e.g. CUSTID) have the same name in the two input tables.
Matching Columns with Different Names
You might have already noticed that, unlike the MERGE statement in the data step, the Proc SQL join does not require the matching columns to have the same name.
Let's look at an example.
Copy and run the code from the yellow box below:
The BRAND1 table contains two columns:
- REGION
- SALES
The BRAND2 table also contains two columns:
- AREA
- STAFF
Note: the REGION column from the BRAND1 table contains the same information as the AREA column from the BRAND2 table.
They both identify the cities which the data is associated with.
Now, let's do an inner join by matching the REGION column with the AREA column.
proc sql;
select *
from brand1, brand2
where region = area;
quit;
select *
from brand1, brand2
where region = area;
quit;
The inner join is performed by matching the REGION column with the AREA column.
The two tables are joined without having to rename any of the matching columns:
Exercise
Copy and run the code from the yellow box below:
Copy and run the code from the yellow box below:
The DRIVER table contains a list of Uber rides and the trip fares.
The RATING table contains the same list of Uber rides and the riders' rating.
Write a Proc SQL step to inner join the DRIVER and RATING tables by matching the Trip ID.
Need some help?
HINT:
Since the matching column has the same name between the two tables, you must prefix the column by the table names in the WHERE clause.
SOLUTION:
Proc sql;
select *
from driver, rating
where driver.tripid = rating.tripid;
quit;
Fill out my online form.