Churn Modeling [3-10]
Another thing we need to deal with is the missing values.
Let's look at which column has missing values:
proc means data=import nmiss min mean max;
We will again look at the list of the output.
None of the columns have missing values except AGE1 and AGE2 (i.e. age of the first and second household members).
There are 1,235 customers whose age information is missing.
Since no other values is missing from these 1,235 customers, it would be a waste to remove them from our data set.
One way to deal with missing values is to replace them with the average of the variable.
Let's first compute the mean of AGE1 and AGE2 and save them into a SAS data set:
create table mean_age as
select mean(age1) as mean_age1,
mean(age2) as mean_age2
The mean ages are stored in the MEAN_AGE table:
Now, we will replace the missing ages with mean_age1 and mean_age2:
if _n_ = 1 then set mean_age;
if age1 = . then age1 = mean_age1;
if age2 = . then age2 = mean_age2;
drop mean_age1 mean_age2;
Now, we will run the Proc Means again and make sure there is no missing value on the AGE1 and AGE2 column:
proc means data=import_mv_removed nmiss min mean max;
The missing values from the AGE1 and AGE2 columns are gone.
In the next section, we will explore different ways to reduce number of variables for our analysis.