The first task is to get a basic overview of the data and conduct a preliminary data quality assessment.
- Use PROC CONTENTS to examine the contents of the data set:
(a) How many records are in the data set?
(b) How many variables are in the data?
(c) Which variable has the longest length?
- Use PROC FORMAT to classify character and numeric variables as either "missing" or "present". Using PROC FREQ determine the proportion of missing versus non-missing values in all eight variables in the data set.
- Use PROC FREQ to determine the number of purchases by country.
- Use PROC MEANS to determine the minimum, maximum, mean, median and mode of UnitPrice and Quantity, rounded to two decimal places.
What issue do you notice with the descriptive statistics for Quantity? How would you use this information to change your analysis for question 3?