Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Task [3-4]


Now that we have a good understanding of the data set, we can look into how we can create the macro that aggregates the newly collected sales data into the online_retail data set.

Download the france_sales_201110 data set from the link below:
france_sales_201110.sas7bdat
File Size: 196 kb
File Type: sas7bdat
Download File


Save it in the same directory folder as the online_retail data set so that it can be accessed in the same library on SAS Studio:
Picture
Picture

The france_sales_201110 data set contains the data from the sales department in France.

We need to aggregate it into the master online_retail data set.

In order for us to combine the two data sets, the france_sales_201110 must be in the same structure as our master sales data set.

The two data sets should have the same number of columns and data structure.

Is this the case with the france_sales_201110 data set? We can find out by using the COMPARE procedure to compare the two data sets.
proc compare
 base = a.online_retail
 compare = a.france_sales_201110
 listvar
 novalues;
run; 

The COMPARE procedure compares the online_retail data set with the france_sales_201110 data set.

There are a few things we noticed from the first table created:
Picture

There are 7 variables in common between the two data sets, which is good.

However, there is one variable in the France data set but not in the online_retail data set.

There are also three variables in the online_retail data set that are not in the France data set.

These variables have to get cleaned up.

In addition, there is one common variable in the two data sets that have conflicting type.

This means the variable is a numeric in one data set but character in the other.

Let's look at the second table and find out more details about the inconsistencies:
Picture

From the second table, we find out that INVOICEDATE variable appears only in the online_retail data set but not the France data set.

Instead of having the INVOICEDATE variable, the FRANCE data set has the four variables below:
  • INVOICE_MON
  • INVOICE_DAY
  • INVOICE_YEAR
  • INVOICE_TIME

The France data set essentially contains the same invoice information as the online_retail data set. 

It just has the information split into four variables.

Now, we can verify the information by just looking at the data portion of each data set.

Below is the INVOICEDATE column from the online_retail data set:
Picture

The France data set contains the same information but split into the four columns:
Picture

The four columns have to be combined into one column before we can aggregate the France data set to the online_retail data set.

Now, the second table from the COMPARE procedure also displays the variable that has conflicting type between the two data sets:
Picture

The CUSTOMERID variable is a numeric in online_retail but a character in the France data set.

This also has to be cleaned up before we combine the two data sets into one.
Go to task 4
Do you need help?
Already a member? Go to member's area.