Project 2 section 6

Search the site...

Sentry Page Protection

Time Series Modeling
[6-15]

The Partial Autocorrelation function is another concept that is essential to the understanding of time series analysis.

It is defined as the partial correlation of a variable with its lagged values that is not explained by the shorter lags.

This definition is confusing. Let's look at the following example.

We will first simulate a series of data for variable X.

At time = 0

Let the initial value of X be 100.

X0 = 100

At time = 1

X at time 1 will be a random step away from X at time 0.

X1 = X0 + N1

where N1 is a random value generated from the normal distribution with µ=0 and σ=3.

For example, since X0 is 100, X1 is 100 + a randomly generated value based on the normal distribution with zero mean and a standard deviation of 3.

At time = 2

X at time 2 will also be based on X at time 1.

X2 = X1 + N2

X2 is again a random step away from X1.

At time = t

In general, you can define Xt as:

Xt = Xt-1 + Nt

Now, let's run the code below to simulate X for 200 observations.

data ex2;
retain X 100;
call streaminit(222);
do time = 1 to 200;
X + round(rand('normal', 0, 3), 0.01);
output;
end;
run;

X is simulated for 200 observations:

Let's plot the time series as well as the ACF.

proc timeseries data=ex2 plots=(series acf);
var x;
run;

The time series plot shows the movement of X:

X starts off at 100 and moves between -90 to 150 over a period of 200 time points.

Let's look at the ACF plot:

Unlike the ACF plot that we saw in the last section, the ACF plot for X slowly decreases to zero.

Let's focus on the autocorrelation at lag 1 for now.

The autocorrelation at lag 1 is the second bar in the plot. It is very close to 1:

This indicates that X at time (t) is highly correlated with X at time (t-1).

This makes perfect sense considering how our data is simulated.

Xt = Xt-1 + Nt

X at time (t) is derived from X at the previous time point plus a small random error.

The values of X at two consecutive time points are definitely highly correlated.

Now, let's look at the autocorrelation at lag 2.

The autocorrelation at lag 2 is also very high.

This indicates that X at time (t) is highly correlated with X at time (t-2).

This is a little strange!

Our Xt is derived from:

Xt = Xt-1 + Nt

Xt derives straight from Xt-1. It does not depend on Xt-2.

The high autocorrelation at lag 2 does not quite make sense.

It turns out that the high correlation between Xt and Xt-2 comes from the correlation between Xt-1 and Xt-2.

We have:

Xt = Xt-1 + Nt

and

Xt-1 = Xt-2 + Nt-1

As a result, Xt is indirectly correlated with Xt-2.

Partial Autocorrelation Function (PACF)

The PACF shows the partial autocorrelation of a variable with itself at lag t that is NOT explained by the shorter lag.

Let's first plot the PACF plot using the TIMESERIES procedure.

proc timeseries data=ex2 plots=(pacf);
var x;
run;

The PACF plot is shown below:

The partial autocorrelation at lag 1 is still very high.

There is a high correlation between Xt and Xt-1, as we have seen from the ACF plot.

However, unlike the ACF plot, the partial autocorrelation drops off immediately at lag 2:

This indicates that the correlation between Xt and Xt-2 that is NOT explained by the correlation between X1 and Xt-1 is very small.

Xt is correlated with Xt-2 but it is mostly because of Xt-1.

Together with the ACF and PACF plots, we can further understand the correlation between the variable at different time points.

Further reading on PACF:

Tutorial 1

Exercise

If you haven't created the SALES3 data set, copy and run the code from the yellow box below:

data sales;
retain month;
input Week Day TotalSales OrdertypeA OrdertypeB OrdertypeC;
if _n_ <= 19 then month = 1;
else if _n_ <= 39 then month = 2;
else month = 3;
datalines;
1 4 539.577 61.543 175.586 302.448
1 5 224.675 38.058 56.037 130.58
1 6 129.412 21.826 25.125 82.461
2 2 317.12 41.542 113.294 162.284
2 3 210.517 37.679 56.618 116.22
2 4 207.364 30.792 50.704 125.868
2 5 263.043 43.304 66.371 153.368
2 6 248.958 38.584 85.961 124.413
3 2 344.291 33.973 148.274 162.044
3 3 248.428 36.399 43.306 168.723
3 4 281.42 45.706 111.036 124.678
3 5 243.568 43.851 66.277 133.44
3 6 308.178 43.339 136.434 128.405
4 2 363.402 46.241 120.865 196.296
4 3 336.872 56.519 136.709 143.644
4 4 246.992 56.167 78.101 112.724
4 5 308.88 51.66 92.272 164.948
4 6 233.126 47.717 71.474 113.935
5 2 404.38 59.135 157.681 187.564
1 3 298.56 90.476 80.509 127.575
1 4 229.249 42.904 43.962 142.383
1 5 236.304 47.331 72.444 116.529
1 6 297.174 32.077 127.358 137.739
2 2 409.401 58.721 139.034 211.646
2 3 231.035 36.017 75.813 119.205
2 4 238.826 35.576 79.997 123.253
2 5 235.598 54.401 75.613 105.584
2 6 242.112 37.656 59.907 144.549
3 2 490.79 57.81 236.248 196.732
3 3 289.657 43.359 89.382 156.916
3 4 298.459 45.555 148.718 104.186
3 5 323.603 45.55 120.548 157.505
4 3 616.453 67.884 267.342 281.227
4 4 346.035 70.376 154.242 121.417
4 5 307.645 71.068 100.544 136.033
4 6 253.847 64.137 109.062 80.648
5 2 530.944 118.178 260.632 152.134
5 3 333.359 51.199 124.66 157.5
5 4 306.356 47.002 99.892 159.462
1 6 416.83 109.888 131.165 175.777
2 2 415.187 77.388 154.863 182.936
2 3 268.002 46.295 96.87 124.837
2 4 234.503 53.366 69.15 111.987
2 5 234.724 47.399 77.61 109.715
2 6 230.064 48.081 72.826 109.157
3 2 357.394 59.042 130.098 168.254
3 3 259.246 44.809 99.072 115.365
3 4 244.235 39.025 110.74 94.47
3 5 402.607 39.6 240.922 122.085
3 6 255.061 57.467 88.462 109.132
4 2 342.606 41.418 135.189 165.999
4 3 268.64 34.193 115.536 118.911
4 4 188.601 32.653 81.576 74.372
4 5 202.022 51.985 51.93 98.107
4 6 213.509 36.748 71.353 105.408
5 2 316.849 59.131 92.639 165.079
5 3 286.412 54.224 115.746 116.442
5 4 303.447 58.378 142.382 102.687
5 5 304.95 76.763 96.478 131.709
5 6 331.9 107.568 121.152 103.18
;
run;

data sales2;
set sales;
cumweek = week + 4*(month-1);
time = day + 5*(cumweek-1)-3;
run;

data day_temp;
do time = 1 to 63;
output;
end;
run;

data sales3;
merge day_temp sales2;
by time;
drop cumweek;
run;

proc delete lib=work data=sales sales2 day_temp;
run;

Plot the PACF for the TOTALSALES column.

Do you see any spikes on the PACF plot?

Need some help?

Fill out my online form.