Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
SAS Project Training 1
​[5-6]


The German data center has also submitted their sales data record to you for October, 2011.

Your job again is to update the online_retail data set to include the new German sales records.

​Unlike France, the Germans have provided you with two data sets, GERMAN_SALES_201110 and GERMANY_PRODUCTS. 

Download the two files below and save them in the same directory folder as the online_retail data set.

germany_products.sas7bdat
File Size: 196 kb
File Type: sas7bdat
Download File

germany_sales_201110.sas7bdat
File Size: 196 kb
File Type: sas7bdat
Download File


You should be able to access these two data sets in the same library we previously defined:
Picture

The German sales records are split into two data sets.

The germany_sales_201110 data set contains the following variables:
  • INVOICENO
  • STOCKCODE
  • QUANTITY
  • INVOICEDATE
  • CUSTOMERID
  • COUNTRY
Picture

The germany_products data set contains:
  • STOCKCODE
  • DESCRIPTION
  • UNITPRICE
Picture

We will merge the two data sets and combine the data into one data set.
proc sort data=a.germany_sales_201110 out=germany_201110_sort;
 by stockcode;
run;

proc sort data=a.germany_products out=germany_sort;
 by stockcode;
run; 

data germany_201110_combine;
 length match_flag $20;
 merge germany_201110_sort(in=a) germany_sort(in=b);
  by stockcode;
  
 if a and b then match_flag='in both';
 else if a then match_flag='sales only';
 else if b then match_flag='product list only';
 
run; 

The data step above merges the sales data set with the product data set that contains the product description and unit price.

It also creates a flag that identifies whether the observations are found:
  1. In both data sets.
  2. In the sales data set only.
  3. ​In the product list data set only.
Picture

Let's find out the match rates between the two data sets using the FREQ procedure.
proc freq data=germany_201110_combine;
 tables match_flag;
run; 

The FREQ procedure breaks down the number of matched and unmatched observations:
Picture

There are 1,378 matched observations.

There are also 1,115 observations that are found only in the product list data set.

These are likely to be the products that were discontinued, and are no longer for sale. These observations can be ignored.

​Finally, there are 56 observations that are found in the sales data set but not the product list data set.

These are the sales of the products that are not defined in the product list data set. We will also ignore them for now.


Now, we are going to ignore the unmatched observations and ​create a data set that contains only the matched observations.
data germany_201110_clean;
 set germany_201110_combine;
  where match_flag='in both';
  
 drop match_flag; 
run;  

The germany_201110_clean data set now contains only matched observations.​
Picture

Our goal, again, is to load the sales data from the German data set to the online_retail data set.

Before we do that, we must ensure the germany_201110_clean data set has the same structure as the online_retail data set.

​Let's run a COMPARE procedure to compare the two data sets.
proc compare
 base = a.online_retail
 compare = germany_201110_clean
 listvar
 novalues;
run; 

The result shows that the two data sets are consistent in data set structure and variable attributes.
Picture

Now, we can update the online_retail_new data set to include the new sales data from Germany.
data online_retail_new;
 set a.online_retail germany_201110_clean;
run; 

A new online_retail data set (i.e. online_retail_new) again is created.

It combines the original online_retail data set with the germany_201110_clean data set.
Picture
Next
Fill out my online form.
Already a member? Go to member's area.