How to select P and Q in ARIMA

Question

I am new to using ARIMA and I would like to know how to determine the p, q values of ARIMA by PCF and PACF. Here is the raw data figure. The raw data is a human glucose data collected.The blue data is the train set, and the yellow data is the test set.

And These are the 1st diff and 2nd diff figures of the raw data.

I think it seems that choose d = 1 is a good choice for ARIMA or not?

And these are PCF and PCAF figure under 1st diff.

What should I choose to set P and Q from these PCF and PCAF. I have try to set parameters as (10, 1, 5), but the result seems not good. What should I do?

In addition, I would like to know if my understanding of ARIMA is correct. I think ARIMA is an adaptive regression process, it does not actually select features (e.g. random forest, neural network), instead the first thing it needs to do is to eliminate unwanted features and keep only the value of the original data. in other words, for ARIMA, the only feature is time, it is looking for the relationship between the value of the data in time to make the final prediction. I have this problem because my original data is actually a matrix with 23 features, each 23 features corresponds to a value. when I use ARIMA, I only pass the value into the model for training, and I want to know if my understanding is correct. Thanks for your help!

Raw data does not seem to have a unit root component, thus differencing it does not make sense. By doing that you increase the error variance and run into the problem of overdifferencing. for ARIMA, the only feature is time is not quite right. The features are lagged values of the dependent variable and lagged errors (latent). More generally, I wonder if ARIMA class of models is appropriate for this series. — Richard Hardy, Nov 21 '21 at 19:57
Sorry, did you mean I shouldn't make the difference before fitting the model? I'm sorry I'm not familiar with ARIMA, but why it does not make sense unless the raw data have a unit root component? And for ARIMA, the features are the lagged values and their errors. Did it mean that ARIMA learned the history and made the prediction? What I mean is that if I compare ARIMA with other regression models, ARIMA does not rely on several features to train the model and get prediction; it only to learn the history(lagged) what label(value) looks like. I think it seems similar to your opinion? — yuyang sun, Nov 21 '21 at 21:57
And I would like to use ARIMA to learn the tendency of glucose changes. Do you think it is suitable for this series? — yuyang sun, Nov 21 '21 at 21:57
These are such broad and basic questions that I would recommend reading a time series textbook. We have a list of them here. And briefly, yes, you should not take the difference before fitting the model. — Richard Hardy, Nov 22 '21 at 06:07

score 0 · Answer 1 · answered Nov 22 '21 at 11:25

0

Before fitting anything, I would check for suitable Box-Cox transformations. Next, test for unit roots against time trends to select the d parameter. If d=0, then fit a quadratic, linear of whatever time function, whith ARMA errors.

answered Nov 22 '21 at 11:25

user341381

1

score -1 · Answer 2 · answered Nov 21 '21 at 20:03

-1

In the following link you can find a previous answer to how to determine the correct specification of an ARIMA model (p, d, q values). If your goal is to obtain a stationary time series, differentiating the time series is a good option (then the integration order d could be 1 or 2, depending on the number of times you need to differentiate to get the stationary time series).

In your example, the series obtained after carrying out the first difference still has a serial correlation with lag 1, which indicates that all autocorrelations of higher orders are effectively explained by autocorrelation with lag 1. Regarding your understanding of ARIMA, maybe you have come to the right conclusion from the wrong background: ARIMA does not eliminate unwanted characteristics (and what would be an unwanted feature anyway?), but specifies different predominant processes in the series through autoregressions and moving averages.

Apparently the original series you are showing has a polynomial trend, which you could remove before adjusting an ARIMA model.

Hope it helps!

answered Nov 21 '21 at 20:03

bastian.abaleiv

69

Thanks a lot. But let me ask a question, what is the polynomial trend, and for ARIMA why do I need to solve it before. Maybe it is a simple problem, but I'm really new in the static area. Many thanks. – yuyang sun Nov 21 '21 at 21:45
ARIMA models require data to be stationary. A trending series is not stationary. In my experience, for the trend shown in the graph, a quadratic polynomial would suffice. After removing the trend from the series, you could assess its stationarity again. Trend removal is carried out first as it is a relatively simple pattern to identify. You could then adjust a model to the remainder obtained after eliminating the trend. – bastian.abaleiv Nov 21 '21 at 22:40
-1. Sorry, but uwarranted differencing will not yield stationarity and is asking for trouble. Differencing is warranted in presence of a unit root but generally not otherwise. – Richard Hardy Nov 22 '21 at 06:12
I agree that unjustified differentiation can lead to problems (as contained in the reference link added). In my response I did not specify that he should perform differentiation as many times as he needs, I just mentioned that it is a useful procedure when it comes to obtaining a stationary series. Thanks for your comment! Best regards! – bastian.abaleiv Nov 23 '21 at 12:49

How to select P and Q in ARIMA

2 Answers2