SAS Project Training 1
[4-6]
[4-6]
From the previous exercises, we know that we have to:
- Consolidate the invoice, day, month, year and time into one variable.
- Convert the CUSTOMERID variable from character to numeric.
Run the following code in your SAS Studio:
data france_sales_201110_clean;
set a.france_sales_201110(rename=(customerID=CustomerID_char));
CustomerID=input(CustomerID_char,best32.);
format customerid best12.;
invoice_date=mdy(invoice_mon,invoice_day,invoice_year);
invoicedate=dhms(invoice_date,0,0,invoice_time);
format invoicedate datetime.;
drop CustomerID_char invoice_date invoice_mon invoice_day invoice_year invoice_time;
run;
set a.france_sales_201110(rename=(customerID=CustomerID_char));
CustomerID=input(CustomerID_char,best32.);
format customerid best12.;
invoice_date=mdy(invoice_mon,invoice_day,invoice_year);
invoicedate=dhms(invoice_date,0,0,invoice_time);
format invoicedate datetime.;
drop CustomerID_char invoice_date invoice_mon invoice_day invoice_year invoice_time;
run;
The code above creates a new data set called france_sales_201110_clean.
It fixes the two issues we had with the original France data set.
You can find it in the WORK library.
Let's run a CONTENTS procedure on this data set.
proc contents data=france_sales_201110_clean;
run;
run;
Scroll down and look at the list of variables in the data set.
There are only eight variables in the data set, which is good.
There is only one INVOICEDATE variable containing the invoice information, and CUSTOMERID is a numeric variable.
The issues are fixed!
Let's do a final comparison between the online_retail data set and the restructured France data set using the COMPARE procedure.
proc compare
base = a.online_retail
compare = france_sales_201110_clean
listvar
novalues;
run;
base = a.online_retail
compare = france_sales_201110_clean
listvar
novalues;
run;
Most of our previous issues are now cleaned up.
However, there are two variables that have differing attributes:
The differing attributes could be a problem.
The second table from the COMPARE procedure displays the differing attributes in more detail:
It turns out that the two variables differ in the INFORMAT attribute.
This will not affect the data values or any analysis results, so it can be ignored.
Now that the two data sets contain the same data structure, we can concatenate the france_sales_201110_clean data set with the online_retail data set.
A simple SET statement will do.
data online_retail_new;
set a.online_retail france_sales_201110_clean;
run;
set a.online_retail france_sales_201110_clean;
run;
A new online_retail (i.e. online_retail_new) data set is created.
It combines the original online_retail data set with the france_sales_201110_clean data set.
The new online_retail data set now contains 371,829 observations.
Fill out my online form.