Sentry Page Protection
Proc SQL II [5-17]
Outer Joins
There are three types of outer joins that you can perform using Proc SQL:
In this section, we will focus on the full outer join
- Full outer join
- Left outer join
- Right outer join
In this section, we will focus on the full outer join
Full Outer Joins
Full Outer Joins join two tables by including all of the matching rows between the tables, as well as all of the non-matching rows from each input table.
Let's look at an example.
Copy and run the code from the yellow box below:
The MATH and ENGLISH tables contain the exam results for the math and English classes.
The MATH table has the exam records for:
- Ada
- Jon
- Arya
The ENGLISH table has the exam records for:
- Ada
- Jon
- Gary
Ada and Jon appear in both tables.
We are now going to perform a full outer join on the two tables.
Proc sql;
select *
from math as m full join english as e
on m.name = e.name;
quit;
select *
from math as m full join english as e
on m.name = e.name;
quit;
The syntax for the outer join is a little different than the inner join.
The keywords full join are specified between the two tables in the FROM clause:
The joining condition(s) is specified after the ON keyword.
In our example, we are matching the students' name across the two tables.
The output table contains the two matching rows (i.e. Ada and Jon):
It also contains the non-matching rows from each input table:
Please note that Proc SQL joins do not overlap columns that have the same name.
In our example, we have two incomplete columns that contain the list of names from each input table:
Each column contains one missing value and it doesn't look great.
The two columns can be combined using the COALESCE function.
Example
Proc sql;
select
coalesce(m.name, e.name) as name,
mathscore, engscore
from math as m full join english as e
on m.name = e.name;
quit;
select
coalesce(m.name, e.name) as name,
mathscore, engscore
from math as m full join english as e
on m.name = e.name;
quit;
The COALESCE function overlays the two NAME columns.
The output table will have a single and complete NAME column:
Exercise
Copy and run the code from the yellow box below:
Copy and run the code from the yellow box below:
The code above creates two tables:
The PRODUCT table contains a list of five televisions. It has three columns:
- SKU: the SKU that represents the product
- MODEL: the model number of the television
- DESCRIPTION: the description of the television
The PRICELIST table contains the product SKU and the prices.
Write a Proc SQL step to do a full outer join on the two tables by matching the product SKU (SKU column) across the two tables. Keep only one SKU column in the output.
Need some help?
HINT:
Use the COALESCE function to overlay the SKU column.
SOLUTION:
Proc sql;
select coalesce(pd.sku, pr.sku) as sku,
model, description, price
from product as pd full join pricelist as pr
on pd.sku = pr.sku;
quit;
Fill out my online form.