I am analyzing some tree physiology data (transpiration) in relation to a number of environmental variables (many of which are predictors such as temperature, PAR and vapour pressure deficit).
I have fine-scale (30 min intervals) data of these various measurements, and there are two objectives I am trying to achieve:
- Use the various predictors (glm?) to see which among these explain the most amount of variation in transpiration. However, since there is clear autocorrelation at this scale (i.e., trans at time $t$ is highly correlated with trans at $t+1$ etc.), I am looking to use ARIMA models with regressors.
- I would like to construct a final predictive ARIMA model that explains the highest variation in trans, from all the different candidate models.
So far, I have noticed that ccf plots show -ve lags between trans and a number of variables (rightly so, e.g., as you expect temp at time $t$ to influence transpiration at $t+1$).
My questions are:
- How do you perform an ARIMA with transpiration as the response variable and several regressors?
- How do you know which one of the regressors to leave out? Does this have to be done manually in R (as in, add each regressor to the model, and inspect the resulting AIC)?
- Is
auto.arimathe best way to determine the differencing term (etc.)? (E.g.,auto.arima(trans, xreg=temp+vpd+......).) - How do you account for the lag between response variable at time $t$ and predictors at $t-1$?
http://stats.stackexchange.com/questions/77285/estimate-single-arima-for-multiple-timeseries
http://stackoverflow.com/questions/20225181/applying-models-to-multiple-time-series
http://stats.stackexchange.com/questions/23036/estimating-same-model-over-multiple-time-series
– Mox Dec 18 '14 at 19:58