Project 2 section 11

Search the site...

Sentry Page Protection

Time Series Modeling
[11-15]

Differencing or transformation do not always remove all of the autocorrelation from the time series.

Let's look at an example.

data ts2;
input time x;
datalines;
1 97.79
2 102.49
3 97.16
4 98.58
5 98.92
6 97.14
7 99.9
8 97.49
9 104.5
10 101.84
11 106.56
12 102.18
13 105.92
14 106.13
15 102.98
16 104.09
17 101.54
18 105.23
19 106.37
20 104.6
21 105.62
22 105.79
23 105.94
24 108.55
25 111.3
26 115.53
27 106.94
28 113.2
29 108.29
30 109.9
31 109.87
32 105
33 103.44
34 101.66
35 102.73
36 106.79
37 106
38 104.92
39 105.46
40 103.77
41 106.44
42 109.46
43 111.31
44 113.89
45 116.72
46 121.46
47 119.77
48 120.27
49 126.41
50 118.15
51 125.72
52 125.63
53 128.17
54 126.09
55 128.81
56 126.37
57 127.9
58 133.16
59 128.05
60 128.46
61 132.67
62 127.48
63 125.25
64 126.65
65 127.35
66 127.4
67 126.68
68 125.12
69 122.32
70 123.12
71 126.5
72 127.19
73 131.12
74 132.8
75 130.58
76 132.45
77 131.66
78 128.71
79 129.23
80 129.9
81 127.42
82 128.98
83 127.55
84 127.13
85 129.91
86 127.65
87 128.9
88 128.44
89 128.87
90 130.7
91 135.42
92 131.34
93 129.28
94 132.42
95 127.24
96 129.64
97 131.15
98 128.63
99 129.05
100 123.89
101 128.32
102 126.65
103 129.73
104 128.5
105 129.9
106 130.17
107 126.65
108 129.36
109 129.09
110 126.64
111 128.89
112 127.02
113 130.26
114 127.71
115 123.65
116 129.12
117 122.59
118 125.1
119 124.02
120 129.94
121 130.11
122 136.67
123 127.2
124 138.2
125 130.43
126 129.84
127 127.28
128 131.51
129 128.82
130 132.89
131 130.25
132 130.24
133 135.7
134 130.17
135 126.27
136 124.96
137 124.97
138 125.8
139 128.81
140 126.7
141 128.11
142 129.04
143 127.34
144 131.82
145 125.81
146 123.46
147 120.6
148 120.06
149 117.9
150 117.11
151 116.01
152 120.32
153 119.65
154 124.74
155 126.25
156 124.38
157 128.1
158 120.84
159 121.05
160 116.19
161 115.67
162 112.68
163 113.51
164 113.64
165 117.96
166 113.39
167 111.34
168 115.5
169 111.08
170 117
171 115.49
172 120.1
173 119.85
174 121.71
175 117.41
176 120.63
177 123.44
178 121.5
179 122.41
180 121.35
181 124.86
182 118.84
183 125.04
184 121.58
185 122.74
186 121.42
187 127.66
188 122.86
189 124.39
190 128.08
191 127.37
192 132.5
193 132.47
194 133.83
195 132.44
196 132.71
197 128
198 130.14
;
run;

The TS2 data set is another non-stationary time series.

Let's plot the X column on a time series plot:

proc arima data=ts2;
identify var=x;
run;
quit;

Let's look at the results with one order of differencing.

proc arima data=ts2;
identify var=x(1);
run;
quit;

The time series plot looks good. The random movement has been removed:

However, the ACF (after differencing) shows spikes at lags 1 and 2:

The PACF also shows a spike at lag 1:

Let's look at the Ljung-box test result.

The p-values at all four lags are less than 0.0001. We reject the null hypothesis that the residuals are white noise.

There is remaining autocorrelation left in the residual that is not explained by our model.

Now, what are we going to do next?

The PACF plot of the residual shows a spike at lag 1.

This indicates that the residual at time t is highly correlated with itself at time (t-1).

We can try adding an AR term to the model and see if it improves the residual.

Brief Introduction to the AR Model

The AR (Autoregressive) model models the time series (X) based on the previous value of X (i.e. Xt-1).

With just one AR term, it has the following forecasting equation:

Xt = µ + ϕ1Xt-1 + Wt

With two AR terms, it has an additional term in the equation:

Xt = µ + ϕ1Xt-1 + ϕ2Xt-2 + Wt

Let's add an AR term to the model and see if it improves the autocorrelation in the residuals.

proc arima data=ts2;
identify var=x(1);
estimate p=1;
run;
quit;

We have added an ESTIMATE statement with the p=(1) option.

This specifies the number of AR terms (p).

There are many results generated.

Let's look at each of them individually.

The first set of tables are the Ljung-box test results before adding the AR term.

We have seen these results earlier:

There are a number of tables that have important statistics.

Let's look at the Conditional Least Squares Estimation table:

It shows the estimates for the µ and ϕ1 from the forecasting equation:

Xt = µ + ϕ1Xt-1 + Wt

The p-value for (AR1, 1) is less than 0.0001. This indicates the parameter estimate is significantly different from zero.

The next table shows the AIC and SBC for the model.

AIC and SBC are used for model comparison purposes.

The lower the AIC and/or SBC, the better.

We will look at some examples shortly.

The below shows the Autocorrelation Check of Residuals:

These are the Ljung-box test results after adding the AR term.

The p-values at each lag are above 0.05.

We're done! We conclude that the residuals after adding the AR term are now just white noise.

The model we have identified is ARIMA (1 1 0).

It has one AR term with one order of differencing.

Now, how do we know the ARIMA (1 1 0) model is better than the ARIMA (0 1 0) model that we had earlier?

There are a number of ways to perform a model comparison.

One common way to select a model is to compare the AIC and SBC values.

The model with the lower AIC and SBC are usually the better model.

Let's look at the AIC and SBC from the ARIMA (0 1 0) model:

proc arima data=ts2 plots=none;
identify var=x(1) noprint;
estimate;
run;
quit;

The AIC and SBC for ARIMA (0 1 0) is 1048.36 and 1051.644, respectively.

Now, let's look at the AIC and SBC for the ARIMA (1 1 0) model:

proc arima data=ts2 plots=none;
identify var=x(1) noprint;
estimate p=1;
run;
quit;

The AIC and SBC are 992.0088 and 998.5752, respectively.

Based on the AIC and SBC, the ARIMA (1 1 0) is the better model among of two.

Another issue with the ARIMA (0 1 0) model, as we have mentioned before, is that its residual shows remaining autocorrelation that is not explained by the model.

This indicates that there is a systematic pattern that could be explained by additional AR / MA terms.

In the next section, we will look at an example where adding an MA term could improve the model from the stationarized time series.

Exercise

Copy and run the ORDER data set from the yellow box below:

data order;
input day numord;
datalines;
1 201
2 201
3 202
4 203
5 204
6 206
7 207
8 210
9 208
10 209
11 215
12 212
13 218
14 219
15 215
16 219
17 209
18 218
19 214
20 217
21 217
22 219
23 217
24 221
25 220
26 219
27 220
28 223
29 228
30 227
31 228
32 226
33 228
34 228
35 229
36 230
37 229
38 229
39 231
40 228
41 226
42 230
43 223
44 226
45 225
46 227
47 221
48 227
49 225
50 225
51 227
52 224
53 222
54 221
55 225
56 227
57 226
58 229
59 225
60 233
61 230
62 232
63 230
64 229
65 228
66 230
67 229
68 229
69 229
70 230
71 228
72 230
73 228
74 233
75 224
76 224
77 231
78 225
79 235
80 238
81 238
82 236
83 237
84 238
85 242
86 238
87 240
88 245
89 244
90 245
91 243
92 242
93 243
94 241
95 243
96 239
97 239
98 240
99 244
100 238
;
run;

The ORDER data set contains the number of orders for a period of 100 days.

Perform the necessary steps to identify one ARIMA model where the residuals are purely white noise.

Need some help?

Fill out my online form.