Proc SQL II [1-17]
Cartesian Product
A Proc SQL join is very different from the MERGE statement we have learned in the past.
Whether it is an inner join or outer join, a Proc SQL join always starts with a Cartesian product.
Let's look at an example.
Whether it is an inner join or outer join, a Proc SQL join always starts with a Cartesian product.
Let's look at an example.
The T1 and T2 tables each contain one column:
Table: T1
Table: T2
Let's look at an example where we join the two tables using Proc SQL.
Example
Proc Sql;
select *
from T1, T2;
quit;
select *
from T1, T2;
quit;
The Proc SQL step above joins the T1 and T2 tables and creates what we call a Cartesian product of the two tables:
The Cartesian product matches each row from the T1 table with every row from the T2 table.
For example, the first row from T1 is matched with the first row from T2:
The value 'A' is then matched with the second row from T2:
The T1 and T2 tables each contain two rows of data.
The Cartesian product will have 2 x 2 = 4 rows of data.
Let's look at another example.
Copy and run the code from the yellow box below:
The MEN and WOMEN tables each contain three names of people who are participating in a speed dating event.
Each person will get to spend time with every person of the opposite sex.
We will now perform a Proc SQL join and create a Cartesian product from the MEN and WOMEN tables.
Proc sql;
select *
from women, men;
quit;
select *
from women, men;
quit;
The resulting table will have 3 x 3 = 9 rows of data:
In practice, it is uncommon that you have to create only the Cartesian product for your data.
However, understanding the Cartesian product is essential when learning how SAS processes the more complex joins with Proc SQL.
In the next section, we will look at how you can perform an inner join using the SQL procedure in SAS.
Exercise
Copy and run the code from the yellow box below:
Copy and run the code from the yellow box below:
The FACEBOOKAD table contains a list of three Facebook advertisements.
These advertisements are to be shown to three different groups of audiences captured in the AUDIENCES table.
A marketing agency would like to find out which combination of advertisement and audience generates the highest amount of revenue.
Write a Proc SQL step to create a Cartesian product from the two tables.
Need some help?
HINT:
You can simply list the two tables in the FROM clause separated by a comma to create the Cartesian product.
SOLUTION:
Proc sql;
select *
from facebookad, audiences;
quit;
Fill out my online form.