4

I am using Python's statsmodels.tsa.stattools.acf on a series, specifying alpha: acf = acf(x, alpha=0.05). After this I'm using the plot_acf function with the same alpha specified to get a graphical version of the same thing (or so I thought...): plot_acf(x, alpha=0.05). The beginning of the plot looks like this:

ACF Plot

The output of acf returns a couple of things to me: 1) the autocorrelation function, and 2) "Confidence intervals for the ACF", since I specified alpha.

The first two values of the ACF are: [ 1., 0.29301261. Expected, and conforms to the graph. The first two values for the confidence intervals (returned by the same acf function) are: [ 1., 1.], [ 0.25381333, 0.33221189],. Not expected. My value of 0.29 (approx.) at the first lag falls within these values, whereas in the plot it is well outside of the blue range. And, the confidence interval does not match up with the blue cone in the graphic.

What am I misunderstanding about the confidence interval returned by acf, and how am I misinterpreting these results?

  • It is unclear what is going on, because the values you report don't come close to matching what your plot shows. The plot suggests you are working with a tiny series, so it should be possible to show us the details of your calculations. – whuber Jan 11 '23 at 13:29
  • Although your plot is too small to read clearly, it looks like the lag-1 coefficient is around 0.28 to 0.29. That's right in the middle of the second interval you report. That's perfectly consistent. A confidence interval is intended to include the true value. On the other hand, the blue band in the plot shows where the coefficients would lie if the data were independent. Those are two very different things! – whuber Aug 26 '23 at 14:10

2 Answers2

2

I will attempt this different approach in explaining.

  • Let's call the grey/blue area in the graph, exclusion region.
  • Let's call the confidence intervals returned from the acf function, as confidence intervals.

So, the null hypothesis of the acf test is to be significantly different than 0 as stated before. Alpha controls the significance and in your case alpha=0.05 this means that the exlusion region is the area where with 95% confidence we are saying that the coefficient is not significantly different than 0, if it falls within for the given data, but it is significantly different than 0 if it is outside that area (as there is a chance to be outside of that area for the given data only 5%, so it is not a coincidence and therefore the null hypothesis is rejected).

Now, for the second part, confidence intervals show the range that gives you the defined confidence based on the alpha value, for your given data. In your case, with alpha=0.05, this means that for the second obesrvation(first lag), there is a confidence of 95% that the actual results falls within [0.25381333, 0.33221189]. To verify this, for the first observation (no lag), this range is [1, 1] as we are 100% sure that we know the present value (no surpise with 100% autocorrelation with itself).

Lastly, what might help in this case, is the link between those two values, the +/-exclusion region interval and the confidence intervals givan by acf. For your first given datapoint(first lag):

(0.33221189 - 0.25381333) / 2 is your +/- of your exclusion region in the graph (=0.03919928).

In conclusion, the confidence range around the actual value that we are doing the hypothesis testing with let's say 95% confidence, remains the same as stated by @Sextus Empiricus, based on the statistics of the given sample. What is changing though is the centre of the value and the prespective:

  • Are we looking for significantly different (eg.than zero)? Then it should be falling outside our confidence region to reject the hypothesis.
  • Do we want to know the region to which the actual value should be falling inside with a certain confidence (eg. 95%)? Then we find get the +/- intervals that represent this confidence.

Think of it like z-score for normal distribution. +/- 1.96*std for example to get 95% confidence. Based on your sample's mean and std, the interval gets computed and you either try to test a hypothesis, or display a certain confidence (95% in this case) for a the actual value to be around your sample's mean.

I hope this helps, tried to explain with many comparisons. Sorry if I repeated a few things.

1

And, the confidence interval does not match up with the blue cone in the graphic.

This is because that blue area is not the same as a confidence interval, instead it is a critical region.

  • The blue area/range in the graph relates to the range of observations where a null hypothesis test would be insignificant. It is the critical region for deciding whether the tests is significant or not.

  • The confidence interval is the range of hypothetical values for which a hypothesis test would be insignificant (with the given data)

The two are a bit related. See for example:

Confidence interval / p-value duality: don't they use different distributions?

or

Can we reject a null hypothesis with confidence intervals produced via sampling rather than the null hypothesis?

In your image that area is drawn more specifically for a null hypothesis test, assuming that the hypothetical true value of the acf is zero. But, aside from a null hypothesis, you could have hypotheses for other values (different from zero), and each of those alternative hypotheses will relate to different blue critical region.

The confidence interval that you computed, [ 0.25381333, 0.33221189], is the range of hypotheses (not null, but different values) for which your observed value would fall inside the blue critical region of insignificance.

  • I'm sorry, but I cannot see any "gray area" in the plot at all. – whuber Aug 26 '23 at 14:11
  • @whuber corrected. – Sextus Empiricus Aug 26 '23 at 14:13
  • Thank you. But in your final statement I cannot recognize any correct characterization of a confidence interval. It seems to be asserting (a) the "blue area" indicates "significant" values (relative to some hypothesis) and (b) were the lag-1 ACF within this interval, the observed ACF would lie within the blue area (that is, be almost zero). – whuber Aug 26 '23 at 14:16
  • @whuber corrected again – Sextus Empiricus Aug 26 '23 at 14:20
  • I'm sorry, but it remains confusing: is the "blue area" an area of "significance" or "insignificance"? It can't be both! And, since you allude to various hypotheses, which one(s) are the relevant ones? Finally, what might you mean by "special outliers"? – whuber Aug 26 '23 at 14:31
  • The blue area is a region of insignificance for a specific hypothesis test. Inside the region the observation is insignificant, outside the region the observed value is significant. While a single region is drawn (for the null hypothesis) there is no single blue region, and instead there can be many, depending on the hypothesized true value. The confidence interval is the range of hypothetical true values for which the observation will fall inside that blue region. – Sextus Empiricus Aug 26 '23 at 14:41
  • I'm afraid I have to disagree with that last statement: the confidence interval is a range of hypothetical values, but not values for which the observation will fall within the blue region. – whuber Aug 26 '23 at 14:55
  • @whuber one may have some exotic confidence intervals that may not follow the typical method of construction, but in the case here this doesn't seem to be the case and the confidence interval is the region of hypothetical values for which the hypothesis test isn't negative. – Sextus Empiricus Aug 26 '23 at 15:02
  • I am not thinking of anything exotic. I believe you haven't correctly communicated the concept in this context -- and that is the crux of the present question. The confidence interval for the lag-1 coefficient here gives some bounds on its likely value. Its construction has nothing to do with the "blue area" in the plot (which is the source of the OP's confusion). – whuber Aug 26 '23 at 15:06
  • @whuber in my answer I state that this blue area can have different positions depending on the hypothetical true value. In the plot it relates to a null hypothesis, for which the hypothetical value is zero, but one might use different values instead. The confidence interval here is the range of the hypothetical values for which the blue area contains the observed value. – Sextus Empiricus Aug 26 '23 at 15:09
  • I'm afraid that is both vague and confusing. The plot in the original post displays a "blue area" related to the null hypothesis. The question concerns a confusion between that and confidence intervals. Referring vaguely to other possible "blue areas" threatens to add to the confusion instead of clarifying it. – whuber Aug 26 '23 at 15:12
  • @whuber do you oppose the idea that I am using here, or is it just that you find that idea confusing and not helpful? – Sextus Empiricus Aug 26 '23 at 15:15
  • I have absolutely no doubt that you know what you're writing about -- I am only trying to suggest the writing could be made clearer. – whuber Aug 26 '23 at 18:04