I'm using SARIMAX as a statistical model to solve my problem of predicting a cost variable (Y) based on the past history of this dependent variable. In particular, I use SARIMAX because I have additional predictor variables X that help me predict the variable, even though right now I'm working on a synthetic dataset whose data was randomly generated by me, so there is no seasonality.
My goal is to create a framework on which I will do my future analysis with the real data, so I'm not interested in the final results because they will come out wrong for obvious reasons.
My starting dataset has 2000 observations and 9 variables, in which we have a daily subscription to the service (so the dataframe index goes from 1 January 2021 to 31 December 2023, in which we can have duplicate dates.
The test dataset on which we want to make predictions, on the other hand, corresponds to the year 2024, so we will have 365 observations.
How do I parameterize SARIMAX during training?
Code:
FEATURES = ['X1', 'X2', 'X3', 'X4','X5',
'Lag1','Lag2']
TARGET = 'Y'
SARIMAX_model = pm.auto_arima(data[TARGET], exogenous=data[FEATURES],
start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=366,
start_P=0, seasonal=True,
d=None, D=1,
trace=False,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
m=366 because I have data day by day. In this way the training is very slow.
In coclusion, I'm not interested in the result but in creating a framework that I will use for my real data, where I will see if there is seasonality or not.
My goal, in this phase, is to create a framework that I will use with real data to do my real analysis.
In general, my goal is to predict the cost of a service in the next year (e.g. the cost of this service predicted for the year 2024 for this type of person with this age range equals Y).
To make this prediction, I want to use the past values of my Y and my predictors, and in addition I want to analyze whether these predictors actually affect the prediction of my variable (for example age is important for the prediction of this service). I used also XGBoost.
– Alessandro Pio Budetti Aug 08 '23 at 08:40