5

I have the following dataset:

results <- data.frame(Date = c("A", "B", "C", "D", "E", "F", "G", "H","I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S"),
                  P1 = c(0.43, 0.45, 0.57, 0.15, 0.5, 0.33, 0.26, 0.81, 0.43, 0.48, 0.14, 0.26,-0.21, 0.27, 0.37, 0.33, 0.68, 0.15, 0.44))

I want to know, if there are statistically significant changes of my observations.

So I thought to use changepoint analysis.

First I used the bcp-Package, with the following code:

c <- bcp(results$P1)
plot(c) 

However, there are no changepoints according to this plot.

Then I used the "changepoint" package and the following code:

var=cpt.var(results$P1, method="PELT")
plot(var)

Here I get three possible changepoints, but not were I supposed them to be (for example not in M, but in N).

Can anybody explain me why? Or is there another way to show, if the values changed significantly from one observation to another?

feder80
  • 181
  • 5
    You have only 19 data values, they exhibit essentially no autocorrelation, and are reasonably close to normally distributed: in short, unless you have specific, quantitative hypotheses that were developed before seeing the data, about the only defensible conclusion you can draw is that these data look random. Your initial analysis with bcp got a good result. The second analysis--which amounts to a bit of "data snooping" and overfitting--effectively confirmed that result by producing untenable results. If any further analysis produces "significant" changepoints, don't trust it. – whuber May 16 '14 at 14:42
  • (+1) to above. If you just plot data, by my eye at least there is no obvious change point. With only 19 points I would not trust a method that suggested there was a change. – charles May 16 '14 at 14:58
  • CUSUM test and Quandt likelihood ratio test (Andrews' $supF$) test, in strucchange package, suggest there is no break in mean. – Khashaa Dec 28 '14 at 12:11

3 Answers3

3

IrishStat is correct in that you are trying to identify a change in mean, not a change in variance. Thus in the changepoint package you should be using mean=cpt.mean(results$P1, method="PELT") instead. As for the bcp package this gives no changes in mean.

The cpt.var function gave 3 changes in variance because the variances of each part, calculated using

segvar=param.est(var)$variance
segvar
>0.02126190 0.09944762 0.00080000 0.07043333

segvar[-1]/segvar[-length(segvar)]
>4.677267637  0.008044436 88.041666667

Typically changes in variance are detected with roughly 80% power or more if the ratio of neighbouring variances is greater than 3 (or less than 1/3). The ratio of these variances clearly fits this paradigm which is why the changes were detected but not necessarily in the places you would have expected to identify a change in mean.

Note that this is all based on a penalty that only penalizes the number of changepoints. This is why segment lengths of 2/3 observations are detected. If the application suggests segments of small lengths such as these are implausible then I would use a penalty that penalizes segment length too (or set a minimum segment length).

See introductory references at www.changepoint.info for more background details on changepoint analysis. There is also a list of various changepoint open source software packages there.

Tim
  • 138,066
adunaic
  • 1,281
  • The problem with fitting many many straight lines (segments) is that you have to pre-specify how many segments you want. This is a fitting process not a modelling process. Furthermore one-time anomalies distort these line segments and more importantly auto-correlation is ignored not to mention seasonal pulses which may be present. Overall it appears to me that this is an inadequate treatment. Do I have this correct ? – IrishStat May 21 '14 at 09:00
  • I think every analysis has its own assumptions and the changepoint detection(Killick et al., 2013) assumes the iid sampling from a specific distribution(usu. normal). So it would be wrong to say it's wrong bcoz it ignores autocorrelation. – KH Kim Oct 01 '14 at 13:58
  • The changepoint package doesn't require specification of the number of segments. Some other packages do require this. – adunaic Oct 02 '14 at 18:35
  • The autocorrelation is a separate issue. KH Kim is correct in that independence is a modelling assumption. If you can't make this assumption then other cost functions (not iid Normal) can be used - but are not currently available in the package. – adunaic Oct 02 '14 at 18:37
1

Change Point analysis can mean detecting parameter changes OR changes in error variance OR changes in the Expected Value. Your example is simply the latter. Detecting change points in the Expected Value is called Intervention Detection http://www.unc.edu/~jbhill/tsay.pdf . This has been significantly enhanced in a product called http://www.autobox.com/cms/ which I helped develop that even incorporates user-specified predict series. I took your 19 values and AUTOBOX detected change points at M,H, and Q (13th,8th and 17th values)enter image description here Note well that in general ARIMA structure has to be taken into account in time series analysis. In this case there is no provable arima structure ( autoregressive memory). It appears to me that your free software has a cost.

Exploratory Data Analysis as championed by Tukey, Box et.al. speaks to extracting hypothesis/information from the actual data such as the form of the ARIMA model, the detection of anomalies , changes in parameters etc. all of which do not require specific quantitative hypothesis promulgated before the data arrives. If one is lucky enough to have these hypothesis handy one can of course test them. Modern analytics actually learn from the data. What you knew (visually or scientifically) to be unusual was indeed detectable. I would say that the Tsay reference is an example of a trustworthy procedure.

IrishStat
  • 29,661
  • 1
    I confess I am unable to see how any of Tukey's EDA methods would identify changepoints for these data. Which one(s) were you referring to? – whuber May 16 '14 at 16:16
  • 1
    I was referring to the general concept of exploring data and developing thoughts and structure by investigation rather than anything in specific. Didn't mean to imply that JT had done anything with time series in this regard – IrishStat May 16 '14 at 20:14
0

For this 19-values series, some inconsistency exists among the methods--as expected when it comes to statistics (e.g., "no" changes detected by bcp, no breakpoint by the CUSUM and Quandt likelihood ratio tests in strucchange, no change in mean by changepoint, and three pulse-like changes by autobox). Let me give my two cents and throw some biased interpretations, which could be very wrong. Overall, I think that the different results are not directly comparable for reasons like differences in assumptions, model spaces assumed, and test statistics involved.

PURELY from an information-theoretic point of view, here is the ranking of several alternative models (no autocorrelation assumed) in terms of BIC (the smaller, the better):

M1. bic =-12.42 for the 3-pluses model by autbox (pluses at 8th,13th,& 17th)
M2. bic = -9.10 for a 2-pluses model with changes at 8th and 13th points
M3. bic = 1.601 for a mean-shift model at the 11-th point
M4. bic = 1.837 for a mean model without any changepoint

So, in terms of delta_BIC, the evidence favoring the AUTOBOX model (M1) over the no-change model (M4) is very strong. If we limit ourselves only to be the space of mean-shift models, the best model (again in terms of BIC) appears to the one-changepoint model M3 rather than the no-change model M4, although the evidence favoring M3 over M4 is rather weak. Again, this is purely interpreted in one information-theoretic criterion; the results for sure will be different if switching to classical tests.

Jumping into the Bayesian world, I interpret the bcp result a little bit differently from others. Below is the bcp result. On the first look, there is no salient peak in the probability curve, but if plotted in the finer scale, there are some small peaks and also the fitted curve differs from the a flat line, apparently due to the model averaging nature of the bayesian method. More specifically, according to the bcp output, on average the number of changepoints is 0.98, close to 1. So, my interpretation is that bcp indeed suggests an overall change in mean, but there is no strong model evidence to pinpoint an exact location so that the probability curve is spread out.

enter image description here

Here are also some extra results from another Bayesian changepoint detection package Rbeast my team developed (available in R, Python, and Matlab). Below is the result with a full Bayesian model, giving an average of 0.56 changepoints.

enter image description here

Next is the result with an empirical Bayesian model based on BIC

library(Rbeast)
y = c(0.43, 0.45, 0.57, 0.15, 0.5, 0.33, 0.26, 0.81, 0.43, 0.48, 0.14, 0.26,-0.21, 0.27, 0.37, 0.33, 0.68, 0.15, 0.44)
o = beast(y, season='none', method='bic')
plot(o)
barplot(names.arg=0:3, o$trend$ncpPr)

On average, there is 0.746 changepoint, with the most likely location being #11th (the dashed line in the figure below), which corresponds to the best one-changepoint model M3 obtained by optimizing BIC. enter image description here

To mimic autobox, this is the result by allowing spike-like outliers without any mean changepoint allowed (i.e., tcp.minmax=c(0,0)).

o = beast(y, season='none', tcp.minmax=c(0,0), hasOutlier=TRUE)
plot(o)
barplot(names.arg=0:1, o$outlier$ncpPr)

In the Pr(tcp) subplot, the probability curve for mean changepoint is zero-valued (forced by the prior tcp.minmax=c(0,0)--no mean changepoints allowed), and the mean number of pulse-like changepoints are 0.81, with the most likely locations being #8 and #13. This corresponds to the Autobox model M1, except that #17 is missing. The missing of #17 is an implicit prior imposed inside the Rbeast program. enter image description here

In some cases, even if the data is generated from a random normal independent sample, the methods may identify some "artificial" changepoints. On one hand, this is the seeming "failure" of the method or the test; more importantly, on the other hand, the data sample size is not large enough to offer good statistical power to distinguish alternative models/hypotheses.

zhaokg
  • 655