1

I am new to working with time series and have tried several methods on my data including SARIMAX, Croston and forecastHybrid. The most accurcate result I've gotten so far is with stlf(), based on Explain the croston method of R. and https://laurentlsantos.github.io/forecasting/seasonal-and-trend-decomposition-with-loess-forecasting-model-stlf.html

For reproducability, the data for this project can be found here: https://github.com/Rtse716/Time-Series-Data/blob/main/test_ts.csv

Without getting into detail, the project contains daily data for the purpose of predicting observations of an event within a geographical grid cell. The data provided is for one of the cells, which all have sporadic rates of observations and long periods of zero observations. The columns in the csv include the Date, number of observations and actic sea ice extent.

Here is the code for my project:

Convert data to daily time series

g1_ts2 <- ts(test_ts, frequency=365)

choose 80% of the data to be the training data:

data_split <- initial_split(g1_ts2, prop = .80)
train <- (training(data_split))
test  <- testing(data_split)

Apply loess.as function, which returned 0.7700653

LoessOptim<-fANCOVA::loess.as(train[,3], train[,2], user.span =
                            NULL,
                          plot = FALSE)

Check residuals:

forecast::checkresiduals(LoessOptim$residuals)

enter image description here

Model:

stlf_model <- stlf(ts(train[,2], frequency=365),s.window=365, robust=TRUE, t.window =
                 0.7700653,method = c("arima"))

stlf_model$mean <- pmax(stlf_model$mean,0) fc_stlf <- forecast(stlf_model, robust=TRUE)

accuracy(fc_stlf[["mean"]], g1_ts2[,2]) summary(fc_stlf)

Accuracy came out to be 0.6712329. Summary results are:

Error measures:
               ME     RMSE      MAE MPE MAPE      MASE      ACF1
Training set 10.88896 11745.37 4076.881 NaN  Inf 0.7001504 0.2055367

Here is what the forecasted plot looks like (true values are in red)

autoplot(fc_stlf)+autolayer(g1_ts2[,2])

enter image description here

What can I do to improve the forecast? Is there a way to include exogenous variables with stlf (such as the extent values in the csv)?

I would greatly appreciate any input. Thank you.

0 Answers0