Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Time Series Modeling
​[3-10]


Before we fit any time series model on our data, let's properly create the time variable for the x-axis.

If you haven't created the SALES data set, copy and run the code from the yellow box below:

The data about the time are captured in the MONTH, WEEK and DAY columns:
Picture

We can clearly identify the time point of each observations.

However, in order to perform any type of time series modeling, we need to create a single time column that captures the sequential order of each time point.

One way to create such a column is to convert the DAY, WEEK and MONTH columns into a SAS date column and use it as our x-axis.

Below is the code that we run to create the TIME column:
data sales2;
set sales;
cumweek = week + 4*(month-1);
time = day + 7*(cumweek-1);
run;

data day_temp;
do time = 4 to 90;
output;
end;
run;

data sales3;
merge day_temp sales2;
by time;
drop cumweek;
run;

The breakdown of the code will be explained shortly.

​Let's open the SALES3 data set and look at the TIME column that we created:
Picture

The initial time point in the TIME column is set as 4.

This represents the Day 4 (Wednesday) of the first week of the month.
Picture

Let's look at the time on a calendar:

Note: the initial time point does not matter. The time column works as long as it is in sequential order that reflects the weekly pattern from the data.

Let's look at the first four observations.

These are the sales record in week 1:
Picture

Important:

We also include additional rows for the weekends and holidays as well:
Picture

The sales for the weekends (and holidays) are set as missing.

Including the weekends and holidays in the TIME column ensures that weekly pattern is reflected in the x-axis.

Except for the very first week, each week has exactly seven observations:
Picture


Now, let's look at the breakdown of the code to create this TIME column.

The code contains three data steps.

In the first data step, we created the TIME column:
data sales2;
set sales;
cumweek = week + 4*(month-1);
time = day + 7*(cumweek-1);
run;

The initial value of the TIME column is set as 4. 

This represents Day 4 (Wednesday) in the first week:
Picture

On the next day, the TIME value is set as 5. 

This reflects that fact that day 5 in the first week is one day after the day 4 in the same week.
Picture
Picture

Time 29 is a Monday:
Picture

In the previous section, we learned that there is a seasonal spike (i.e. higher sales) on Monday. 

When modeling the data, we would expect that there is also a spike at time 34 (which is five periods after day 29):
Picture

However, due to the missing values, time 34 is actually a Wednesday!
Picture

Treating each observation as an individual time point changes the seasonal pattern of the data.

This will affect the model that we are going to build.

A better way to create the time variable is to create a sequential date column that matches the weekday from (1=Sunday to 7=Saturday).

Let's look at an example.

The breakdown of the code can be found here.

The TIME column created begins at time 4.
Picture

It corresponds to the time point at Day 4 of the very first week.
Picture

Data are collected starting on a Wednesday.

Time 4-7 represents the sales record from Wednesday to Saturday in the very first week:
Picture

Note: Saturday is a holiday and the totalsales are set as missing.

​This row is also included in the data set:
Picture

Similarly, time 8-14 represents the second week of the data. 

It goes from (1=Sunday) to (7=Saturday):
Picture

Similarly, additional rows are added for the weekends:
Picture

The data set is now structured by weeks of 7 days (except for the very first week).

Let's scroll down to the weeks that contain holiday(s).

Time 43-49 represents the third week of month two:
Picture

The Friday in this week is a holiday. The store is closed.

​However, an additional row is added to the data set as well:
Picture

The TIME column is structured in a group of 7 days (except for the very first week).

Whether the week contains weekends or holidays, there are 7 rows of data for the week.

The benefits of creating the TIME column in such a way is that the seasonal pattern can be modeled every seven periods.

The seasonal pattern is now incorporated into the TIME column.

Exercise

Are there any significant difference in sales between the different week of the month?

Create a frequency table for the total sales for each week of the month.

In addition, fit an ANOVA model and test the difference in sales between five weeks of the month.
Next

Need some help?
Get Hint
Get Solution
Fill out my online form.
Already a member? Go to member's area.