1

My dataset is composed of time series (40 points) with multiple variables

                    A      B       C       D     ...   Target
    Release Date                                            
    2022-06-01    0.008   15490   69600   16950  ...  1.044659
    2022-05-01    0.007   14500   78920   19874  ...  1.035948
        ...

My goal is to forecast the target value with the relation of all the features combined.

I tried to predict the last column with a simple linear regression, but I got a fluctuating score with each run.

Is it more correct to treat this problem as a Multivariate Time Series / Temporal Convolutional Neural Network?

  • What do you mean by "I got a fluctuating score with each run"? Do you mean that you refit your model on the same data multiple times and got different results? That should not happen. Or did you refit with different data and got different results? That is normal, what else would you expect? Also, do you know your features for the forecast period, or do you need to forecast them as well? Finally, it looks like your data are monthly, is that correct? Do you have gaps in your time series? – Stephan Kolassa Jul 17 '22 at 16:18
  • I refit with the same data while re-running the test and train split. I don't know my features for the forecast period. I only need a forecast for the target value but the features need to be the input of the forecast not the historical values of the target. Yes the data is monthly, I don't have any gaps in the time series. – mourad kchaw Jul 17 '22 at 17:02
  • The train/test split is typically done randomly, so if you do not use the same RNG seed each time you split, you will get different splits, so your model is trained on different data. This is expected. If your model fluctuates a lot, then you are probably overfitting. I will post an answer. – Stephan Kolassa Jul 17 '22 at 17:04

2 Answers2

0

In Multivariate time series Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values, So if A, B, C ,... have a relation with each others so you have to use Multivariate time series.

To check if they have a relation or not you can plot the heatmap and see the correlation between the features.

Ex: if you live in country that use a currency depend on dolar when you want to predict the price of any thing you have to add dolar values because every thing depend on dolar.

Date     Egg-Price-currency     Dolar-price
01-06         500                   10
01-07         600                   12
01-09         ???                    9

If you want to predict the egg price in 01-08 without using Dolar-price this will be wrong (the pattern is always up without dollar price but dolar price went down so the price will go down)

I hope i explained this clearly.

0

40 data points is not a lot to go with, so you should use some sort of regularization. I would recommend running a lasso regression of your target on the predictors. Once you have that, you can forecast your predictors and feed them into the lasso model to get forecasts for your target.

To forecast the predictors, I recommend using something simple, in R I would recommend forecast::ets() for automatic state space exponential smoothing, but I believe this is not available in Python. An auto_arima fit may work (be sure to tell Python your data are monthly, so auto_arima can pick up on any seasonality).

You might be able to improve on the forecasts by fitting another time series model on the residuals from the lasso fit, forecasting this out and adding it to the lasso forecast. Again, use auto_arima on the residuals, and be sure to define your data as monthly.

If your predictors might also influence each other, a vector autoregression might be called for. Or perhaps a model on lagged values of the predictors. Your knowledge of the underlying process should guide you here.

Whatever you do, benchmark your forecasts of the target against some extremely simple alternatives, e.g., the overall historical mean of the target, or a simple auto_arima model applied to the target alone, without any predictors. Such benchmarks may already be extremely hard to beat; in time series forecasting simple models often outperform more complex ones.

Stephan Kolassa
  • 123,354
  • I only chose 40 points because I wanted to keep the dataset with the same date period. The historical data (40 points) of a few features ends way quicker than the others (150 points). Is it possible to use all the 150 points while filling the missing values?

    Also, In my case the predictors might influence each other. In this case the vector autoregression will be used instead of auto_arima to forecast the predictors?

    – mourad kchaw Jul 17 '22 at 17:50
  • You can always try backcasting predictors. But again, I would strongly recommend you start with a simple model. Fit an ARIMA model to your target (even better, switch to R and use ets), and once you have that, think about more complex models. You will be surprised how hard it will be to improve on the simple models - especially if you have to impute, backcast or forecast your predictors. – Stephan Kolassa Jul 17 '22 at 17:59