Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
SAS Project Training 1
​[4-6]


From the previous exercises, we know that we have to:
  • Consolidate the invoice, day, month, year and time into one variable.
  • Convert the CUSTOMERID variable from character to numeric.

Run the following code in your SAS Studio:
data france_sales_201110_clean;
 set a.france_sales_201110(rename=(customerID=CustomerID_char));
 
 CustomerID=input(CustomerID_char,best32.);
 format customerid best12.;
 
 invoice_date=mdy(invoice_mon,invoice_day,invoice_year);
 
 invoicedate=dhms(invoice_date,0,0,invoice_time);
 format invoicedate datetime.;
 
 drop CustomerID_char invoice_date invoice_mon invoice_day invoice_year invoice_time;

run;

The code above creates a new data set called france_sales_201110_clean.

It fixes the two issues we had with the original France data set.

​You can find it in the WORK library.
Picture

Let's run a CONTENTS procedure on this data set.
proc contents data=france_sales_201110_clean;
run;

Scroll down and look at the list of variables in the data set.

There are only eight variables in the data set, which is good.

There is only one INVOICEDATE variable containing the invoice information, and CUSTOMERID is a numeric variable.
Picture

The issues are fixed!
​
Let's do a final comparison between the online_retail data set and the restructured France data set using the COMPARE procedure.
proc compare
 base = a.online_retail
 compare = france_sales_201110_clean
 listvar
 novalues;
run; 

Most of our previous issues are now cleaned up.

However, there are two variables that have differing attributes:
Picture

The differing attributes could be a problem. 

The second table from the COMPARE procedure displays the differing attributes in more detail:
Picture

It turns out that the two variables differ in the INFORMAT attribute.​
Picture

This will not affect the data values or any analysis results, so it can be ignored.​


Now that the two data sets contain the same data structure, we can concatenate the ​france_sales_201110_clean data set with the online_retail data set.

​A simple SET statement will do.

data online_retail_new;
 set a.online_retail france_sales_201110_clean;
run; 

A new online_retail (i.e. online_retail_new) data set is created.

It combines the original online_retail data set with the france_sales_201110_clean data set.
Picture

​The new online_retail data set now contains 371,829 observations.
Picture
Next
Fill out my online form.
Already a member? Go to member's area.