I am new to working with time series and have tried several methods on my data including SARIMAX, Croston and forecastHybrid. The most accurcate result I've gotten so far is with stlf(), based on Explain the croston method of R. and https://laurentlsantos.github.io/forecasting/seasonal-and-trend-decomposition-with-loess-forecasting-model-stlf.html
For reproducability, the data for this project can be found here: https://github.com/Rtse716/Time-Series-Data/blob/main/test_ts.csv
Without getting into detail, the project contains daily data for the purpose of predicting observations of an event within a geographical grid cell. The data provided is for one of the cells, which all have sporadic rates of observations and long periods of zero observations. The columns in the csv include the Date, number of observations and actic sea ice extent.
Here is the code for my project:
Convert data to daily time series
g1_ts2 <- ts(test_ts, frequency=365)
choose 80% of the data to be the training data:
data_split <- initial_split(g1_ts2, prop = .80)
train <- (training(data_split))
test <- testing(data_split)
Apply loess.as function, which returned 0.7700653
LoessOptim<-fANCOVA::loess.as(train[,3], train[,2], user.span =
NULL,
plot = FALSE)
Check residuals:
forecast::checkresiduals(LoessOptim$residuals)
Model:
stlf_model <- stlf(ts(train[,2], frequency=365),s.window=365, robust=TRUE, t.window =
0.7700653,method = c("arima"))
stlf_model$mean <- pmax(stlf_model$mean,0)
fc_stlf <- forecast(stlf_model, robust=TRUE)
accuracy(fc_stlf[["mean"]], g1_ts2[,2])
summary(fc_stlf)
Accuracy came out to be 0.6712329. Summary results are:
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 10.88896 11745.37 4076.881 NaN Inf 0.7001504 0.2055367
Here is what the forecasted plot looks like (true values are in red)
autoplot(fc_stlf)+autolayer(g1_ts2[,2])
What can I do to improve the forecast? Is there a way to include exogenous variables with stlf (such as the extent values in the csv)?
I would greatly appreciate any input. Thank you.

