Project 2 section 4 (temp 2)

Search the site...

Time Series Modeling
[4-10]

Time series forecasting is about using the past data to predict the future.

However, does the past value really have any forecasting power on the future?

If the sales on Tuesday is $300, should we expect the sales on Wednesday to be roughly the same?

What about the sales on Thursday?

In order to build a good model for sales forecasting, we must learn more about the relationship between the sales at the different time point.

Let's look at an example.

data ds;
input time stockprice;
datalines;
1 99.7
2 100.22
3 101.56
4 102.12
5 100.2
6 99.05
7 106.14
8 107.83
9 105.24
10 105.41
11 105.81
12 104.89
13 105.66
14 104.31
15 101.73
16 106.65
17 103.11
18 103.97
19 104.21
20 106.58
21 111.98
22 111.62
23 114.54
24 109.98
25 109.02
26 102.44
27 94.85
28 96.63
29 94.99
30 94.43
31 92.92
32 93.09
33 91.32
34 89.51
35 90.61
36 92.94
37 89.21
38 93.56
39 97.23
40 99.94
41 100.55
42 105.18
43 102.01
44 101.59
45 101.93
46 102.01
47 105.81
48 104.36
49 103.93
50 102.61
51 103.58
52 99.75
53 96.53
54 96.16
55 94.08
56 90.36
57 86.7
58 83.91
59 86.05
60 84.34
61 82.62
62 86.92
63 88.3
64 91.18
65 91.34
66 95.39
67 93.75
68 90.07
69 91.67
70 95.22
71 95.94
72 92.57
73 88.8
74 85.82
75 85.73
76 87.23
77 83.77
78 77.65
79 78.63
80 78.51
81 78.45
82 78.55
83 72.12
84 78.7
85 73.25
86 69.72
87 74.79
88 72.33
89 66.67
90 65.62
91 65.79
92 62.57
93 62.74
94 63.78
95 69.14
96 66.38
97 61.73
98 58.27
99 52.92
100 54.48
101 56.72
102 54.75
103 49.26
104 47.06
105 49.44
106 45.85
107 46.1
108 50.08
109 52.13
110 51.11
111 48.23
112 46.24
113 42.66
114 43.3
115 45.19
116 42.17
117 41.3
118 40.68
119 39.77
120 39.02
121 35.63
122 39.27
123 35.19
124 41.82
125 45.59
126 44.32
127 46.03
128 48.45
129 46.84
130 50.99
131 50.18
132 51.9
133 54.85
134 60.54
135 57.92
136 58.72
137 58.76
138 60.64
139 61.77
140 58.13
141 53.72
142 50.66
143 50.54
144 48.8
145 52.06
146 55.04
147 56
148 55.21
149 57.66
150 53.7
151 58.93
152 61.13
153 59.11
154 54.15
155 58.01
156 61.01
157 58.59
158 54.26
159 51.51
160 55.47
161 54.78
162 56.53
163 56.64
164 59.18
165 63.72
166 67.1
167 67.53
168 72.9
169 72.19
170 70.39
171 69.87
172 67.98
173 68.63
174 66.42
175 66.03
176 60.27
177 68.48
178 68.51
179 68.96
180 70.9
181 76.8
182 70.73
183 73.77
184 74.17
185 74.41
186 77.01
187 74.56
188 79.35
189 73.22
190 75.93
191 78.08
192 77.72
193 73.55
194 65.63
195 68.82
196 70.17
197 66.2
198 60.86
199 60.31
200 63.78
201 63.75
202 69.15
203 74.53
204 71.88
205 73.13
206 71.86
207 74.14
208 75.14
209 73.13
210 75.62
211 73.32
212 69.26
213 70.36
214 70.93
215 69.45
216 66.19
217 59.51
218 60.81
219 64.28
220 60.97
221 59.47
222 57.94
223 56.01
224 56.13
225 58.27
226 62.33
227 65.16
228 69.09
229 61.81
230 63.37
231 62.54
232 59.83
233 57.56
234 57.51
235 55.62
236 58.87
237 61.35
238 63.13
239 67.37
240 66.57
241 63.09
242 60.88
243 64.16
244 66.09
245 67.96
246 64.37
247 63.95
248 66.81
249 65.9
250 64
251 66.56
252 68.39
253 70.97
254 76.88
255 76.15
256 76.84
257 74.61
258 74.46
259 76.98
260 78.24
261 73.09
262 75.49
263 74.5
264 68.16
265 67.63
266 67.32
267 69.32
268 69.51
269 67.11
270 69.52
271 72.28
272 77.24
273 83.66
274 83.99
275 85.26
276 84.82
277 86.35
278 84.22
279 79.1
280 80.19
281 77.89
282 79.73
283 75.55
284 80.35
285 83.52
286 83.64
287 80.4
288 79.28
289 75.08
290 75.05
291 72.76
292 72.86
293 72.06
294 71.96
295 64.82
296 61.88
297 65.73
298 66.29
299 67.06
300 62.74
;
run;

The DS data set contains the price of a stock over a period of 300 days.

Now, we'd like to look at the relationship between the stock price at two consecutive time point.

We will create another column called PREV which contains the stock price on the previous day.

The LAG function is used:

data ds2;
set ds;
prev = lag(stockprice);
run;

For example, the stock price at time 2 is $100.22. The previous day's price (i.e. time 1) is $99.7.

We want to look at how the prices on two consecutive days are correlated with each other.

We will treat these two columns as two separate columns and compute the autocorrelation between them.

SAS does not have a built-in option to calculate the autocorrelation.

Below is the macro that does the calculation:

%macro autocorr (in=,var=,lag=);

proc sql noprint;
select mean(&var) into :average from &in;
quit;

data autocorr;
set &in end=eof;
a = &var;
c = lag&lag.(a);
a1 = (a-&average);
a2 = (c-&average);
a3 = a1*a2;
a4 = (a-&average)**2;
s3 + a3;
s4 + a4;
r = s3/s4;
if eof = 1;
keep r;
run;

%mend;

Note: the macro itself does not generate any result. It is simply a memory of the code that can be executed once it is called.

Now, let's call the AUTOCORR macro that we have just created.

%autocorr(in=ds, var=stockprice, lag=1);

Calling the macro is easy!

You just have to specify the three things below:

IN: the input data set. In our example, it is the DS data set.
VAR: the variable of interest. In our example, it is the variable STOCKPRICE.
LAG: the number of lag you want to compute for the autocorrelation.

Note: the macro is quite versatile. It can calculate the autocorrelation for more than one lag. We will look at some examples shortly.

The macro computes the autocorrelation between the current day's stock price and the previous day's stock price.

The autocorrelation is 0.9807.

That's a very high correlation between the stock prices on two consecutive days!

This is important.

When performing the forecasting, this is a hint that the stock price on the previous day is highly correlated with the current stock price that you want to forecast.

Now, what about the correlation between the stock price and the price from two days prior?

Maybe the price from two days ago are also correlated with the current day's price?

This is worth investigating.

We will use the AUTOCORR macro to compute the autocorrelation between the current stock price and the price from two days prior.

Simply set LAG=2 in the third parameter of the AUTOCORR macro:

%autocorr(in=ds, var=stockprice, lag=2);

The autocorrelation between the current stock price and the price from two days prior is 0.9612.

This is still a very strong correlation, although it is less strong than the autocorrelation at lag 1.

The correlation isn't as strong as with the one-lagged value (i.e. r=0.5636).

Maybe we should put more weight on the closer time period than the future time period when performing the forecasting.

This is the idea how you build the forecasting model.

We have just computed the autocorrelation at lag 1 (r=0.5636) and lag 2 (r=0.2389).

However, are these values statistically significant?

Maybe these positive autocorrelation estimations are generated by chance?

Fortunately, we can perform statistical testings to find out whether these values are significant.

ACF Plot

The ACF plot shows the autocorrelation of each lag on the plot.

You can create such a plot using the TIMESERIES procedure.

proc timeseries data=ds plots=acf;
var a;
run;

The procedure above creates the ACF plot below:

If you look at the bar at lag 1 and lag 2, the results match our calculations earlier.

The autocorrelation is 0.5636 at lag 1 and 0.2389 at lag 2:

The ACF plot also shows the confidence interval.

Although the autocorrelation at lag 1 is fairly high (r=0.5636), we cannot conclude that the autocorrelation is significant since it is still within the confidence interval.

There is no significant autocorrelation on the ACF plot.

The current value does not seem to be significantly correlated with any of its lagged values.

Let's go back to the SALES3 data set and plot the ACF plot for the TOTALSALES column.

proc timeseries data=sales3 plots=acf;
id time interval=day format=best.;
var totalsales;
run;

The ACF shows a spike at lag 11:

We can also look at the standardized ACF plot:

The plot shows a spike that is significant at lag 11.

This indicates that the sales today is significantly correlated with the sales from 11 days ago.

This is quite unexpected.

This information will be useful when we build our final model.

In the next section, we will look at a few more examples of the ACF plots.

Exercise

Are there any significant difference in sales between the different week of the month?

Create a frequency table for the total sales for each week of the month.

In addition, fit an ANOVA model and test the difference in sales between five weeks of the month.

Need some help?

Fill out my online form.