Fill out my online form.
Task [4-4]
From the previous exercises, we know that we have to:
- Consolidate the invoice, day, month, year and time into one variable.
- Convert the CUSTOMERID variable from character to numeric.
Run the following code on your SAS Studio:
data france_sales_201110_clean;
set a.france_sales_201110(rename=(customerID=CustomerID_char));
CustomerID=input(CustomerID_char,best32.);
format customerid best12.;
invoice_date=mdy(invoice_mon,invoice_day,invoice_year);
invoicedate=dhms(invoice_date,0,0,invoice_time);
format invoicedate datetime.;
drop CustomerID_char invoice_date invoice_mon invoice_day invoice_year invoice_time;
run;
set a.france_sales_201110(rename=(customerID=CustomerID_char));
CustomerID=input(CustomerID_char,best32.);
format customerid best12.;
invoice_date=mdy(invoice_mon,invoice_day,invoice_year);
invoicedate=dhms(invoice_date,0,0,invoice_time);
format invoicedate datetime.;
drop CustomerID_char invoice_date invoice_mon invoice_day invoice_year invoice_time;
run;
The code above restructures the France data set into the france_sales_201110_clean daa set.
It consolidates the invoice information into one variable.
It also converts the CUSTOMERID variable into a numeric variable using the INPUT function.
Let's run a CONTENTS procedure on the newly created france_sales_201110_clean daa set.
proc contents data=france_sales_201110_clean;
run;
run;
There are only eight variables in the data set, which is good.
There is only one INVOICEDATE variable containing the invoice information.
The CUSTOMERID variable is a character variable.
Let's compare the online_retail data set with the france_sales_201110_clean again using the COMPARE procedure.
proc compare
base = a.online_retail
compare = france_sales_201110_clean
listvar
novalues;
run;
base = a.online_retail
compare = france_sales_201110_clean
listvar
novalues;
run;
Both of the data sets have eight variables.
Two variables contain differing attributes:
The differing attributes could be a problem.
The second table from the COMPARE procedure displays the differing attributes in more details:
It turns out that the two variables are differing in the INFORMAT attribute.
This will not affect the data values or any analysis results and it can be ignored.
Now that the two data sets contain the same data structure, we can aggregates the france_sales_201110_clean data set into the online_retail data set.
data online_retail_new;
set a.online_retail
france_sales_201110_clean;
run;
set a.online_retail
france_sales_201110_clean;
run;
The online_retail_new data set is created.
It combines the original online_retail data set with the france_sales_201110_clean data set.