I'm interested in modelling a time series of temperature data across several years. The data are on the level of hourly observations, so I have variables for year, month, day, and time.
I found a great example of doing this by Gavin Simpson (found here). The blog only considers correlation within year, where as I have to deal with correlation within year and within day.
How can I best account for this correlation with gamm? Gavin uses the following code
modar2 <- gamm(apparentTemperature ~ s(month, bs = "cc", k = 12) + s(time, k = 20),data = timetemp, correlation = corARMA(form = ~ 1|year, p = 2),control = ctrl)
Where should I pass variables to account for correlation within day?
For reference, here is a sample of my data:
tibble::tribble(
~created_at, ~time, ~month, ~year,
~apparentTemperature,
"2014-01-03 09:30:28", 9.5, 1, 2014, -17.87,
"2014-01-03 10:13:43", 10.2166666666667, 1, 2014, -17.87,
"2014-01-03 12:19:32", 12.3166666666667, 1, 2014, -16.14,
"2014-01-03 12:44:04", 12.7333333333333, 1, 2014, -20.24,
"2014-01-03 13:09:38", 13.15, 1, 2014, -20.24,
"2014-01-03 13:39:00", 13.65, 1, 2014, -20.44
)
corAR1()) rather than an ARMA as that is much more efficient. Then if that fits, look at the normalized residuals to see if you still have remaining autocorrelation. You could also fit without the AR and check that model's residuals. If you go that route, seebam()in mgcv. – Gavin Simpson Apr 20 '18 at 03:09