10

I have a sample of 1000 data points and I used it as the training sample to forecast with Timeseries. My lecture suggested me comparing the ACF with its critical values (upper and lower) numerically rather than looking at the graph.

Here are my ACF values:

enter image description here

Question: How do I come up with the upper and the lower critical values for the ACF? Is there any function in R to yield these values?

Richard Hardy
  • 67,272
Shieryn
  • 125
  • 1
  • 2
  • 10
  • You mean the range()? –  Dec 07 '15 at 08:35
  • @Pascal No. I mean, how to determine cut off or not by ACF value? – Shieryn Dec 07 '15 at 08:53
  • So it is not upper and lower. Please edit. –  Dec 07 '15 at 08:54
  • @Pascal What is it called? – Shieryn Dec 07 '15 at 08:57
  • 1
    Are you looking for critical values for ACF so that you could determine the statistical significance of each lag? Like in a graph, you would have ACF bars and a line representing the 95% critical value; the bars that stick out are statistically significant. – Richard Hardy Dec 07 '15 at 09:32
  • @RichardHardy I mean, The lines give the values beyond which the autocorrelations are (statistically) significantly different from zero. is this the same with yours? – Shieryn Dec 07 '15 at 09:46
  • Yes, that's what I meant. I am trying to understand what you question is. – Richard Hardy Dec 07 '15 at 10:05
  • @RichardHardy Yeah. it's true. Could you help me? What syntax do you think needed for know a line representing the 95% critical value in R? – Shieryn Dec 07 '15 at 11:46
  • I edited your question according to what I learned from your answers in the comments. You should look for the null distribution of the ACF. It is pretty simple, I just do not currently remember what it is. It should be possible to find the answer in econometrics textbooks. – Richard Hardy Dec 07 '15 at 20:47

2 Answers2

14

Based on this source, it looks like under the null the autocorrelation is asymptoticaly standard normal. The 5% critical values of the autocorrelation at any given lag $d$ ($d \neq 0$) are

$$\pm \frac{1.96}{\sqrt{T-d}}$$

where $T$ is the sample size.

In your case, $T=1000$, so the critical values for lag 1 are $\pm \frac{1.96}{\sqrt{1000-1}} \approx 0.06201$, for lag 2 are $\pm \frac{1.96}{\sqrt{1000-2}} \approx 0.06204$, and so on.

Mind also a note from another source:

Additionally, in small sample conditions ... this test may be overly conservative such that the null hypothesis is rejected (residuals indicated as non-white) less often than indicated by the chosen significance level (Lutkepohl, 2006).

However, it is not likely to be relevant for a sample as large as 1000.


Related question: "How is the confidence interval calculated for the ACF function?".

Richard Hardy
  • 67,272
  • 1
    Is that different critical value between lag 1 , lag 2, lag 3 and further? i thought the critical value in one graph is the same. :D Because in my experience, the graph that i used to make, The line is constant. is there any syntax in R to help me get the critical value? – Shieryn Dec 08 '15 at 01:24
  • Yeah, that is a bit puzzling since the line in ACF graphs is typically flat. The critical values are almost the same, at least when the sample size is large and you do not consider very distant lags (just the first few). – Richard Hardy Dec 08 '15 at 07:43
  • How many sample is it that called large? in R, it's flat. which formula to calculate the critical value that flat? how about the PACF? do you have any references? – Shieryn Dec 11 '15 at 09:49
1

Since the standard deviation of the acf is approximately = 1/SQRT(NOB) it is so approximate that it is practically useless for large sample sizes . If your "reason" for obtaining critical values is to automatically identify the form of the ARIMA model , you can stop right now ! . Identification of a reasonable starting model for the ARIMA structure is better conducted via approaches like the Inverse Autocorrelation Function http://www.jstor.org/stable/2982488?seq=1#page_scan_tab_contents which is the basis of how AUTOBOX (a piece of software that I have helped develop) effectively solves the riddle.

IrishStat
  • 29,661
  • Regarding your first sentence, isn't the approximation working better when the sample size is large than when the sample size is small? I thought it works fine for large samples (e.g. 1000 observations). – Richard Hardy Dec 07 '15 at 21:37
  • I don;t think so after having a long history analyzing simulated data and "obsering/cataloging" over-modelling suggested by this "rule of thumb" . . The larger the number of observations normally leads to "more correct" estimates but not in this case as there is a vast over-statement regarding the possible significance of sample acf's or pacf;s – IrishStat Dec 07 '15 at 21:46
  • 1
    There is nonsense and there is nonsense but the most non-sensical thing of all is statistical nonsense ! – IrishStat Dec 07 '15 at 21:50