Multicollinearity for Logistic Regression and Neural Network

Question

I am looking to fit Logistic Regression (LR) and Neural Networks (NN) models in order to predict if there will be avalanches during a day (0 or 1 dependant variable) based on meteorological variables (independent variables). I however create 100+ secondary features (e.g Tmax_24h, Tmax_48h, Tmin_24h, Tmax_48h, Rain_24h-48h-72h, Snow 24h-48h-72h, etc.) from 3 raw meteo variables (Air Temperature, precipitations and wind speed). Doing so introduce a lot of collinearity between my features. According to what I understand, having many collinear features seems to be problematic for NN and LR (cannot converge when high collinearity exists between features).

So I am wondering what is the best solution to fit a LR or NN models on dataset that contains multicollinearity? I tried the L1 (Lasso) and L2 (RIDGE) penalty in Sklearn Logistic Regression hyperparameter, but it is still not able to converge. I tought about a recursive featue selection (RFE in sklearn), but this method is a backward elimination analysis so it starts with all features and the model cannot converge. Is there any "go to" method to fit a LR or NN model on collinear datas?

Here is the convergece warning that I get in python when I try to fit a LR directly on the 100+ variables: "ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn("The max_iter was reached which means"

The lasso would be my first approach to this kind of problem. It's a bit strange that it does not converge. Have you tried increasing the maximum number of iterations, or cutting down on your number of variables (how much more helpful will the temperature from 7 days before be if you already have the temperatures from 1, 2, ..., 6 days before?)? — Stephan Kolassa, Dec 17 '21 at 14:22
Yes I have tried increasing the maximum number of iteration. But until how many iteration is it still acceptable (500, 1000, 10 000...)? For the number of variables, I go back of 5 days only. But I have more funky variables like daily temperature range, significant freeze-thaw, etc. which bring me to around 100 features. — Boocaj, Dec 17 '21 at 14:55
Stop training the model when the model stops improving. https://stats.stackexchange.com/questions/231061/how-to-use-early-stopping-properly-for-training-deep-neural-network — Sycorax, Dec 17 '21 at 15:23
Are you scaling the data before using lasso/ridge? How are you choosing the shrinkage magnitude? — Sycorax, Dec 17 '21 at 16:07
Yes I am scaling the data before. What do you mean by shrinkage magnitude? Are you talking about the "penalty" which is the C hyperparameter in sklearn package (C correspond to the "Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.") — Boocaj, Dec 17 '21 at 20:42
How are you scaling the data? Yes, I am asking how you are choosing C. — Sycorax, Dec 19 '21 at 16:48
I am scaling the data with StandardScaler() in python (it standardize features by removing the mean and scaling to unit variance).
According to the C parameter, I just realized that when I decrease it (e.g 0.01 instead of 1), I don't get the convergence warning anymore. I think that is because a stronger regularization is applied and less features are used in the model (because more features parameters are equal to 0). But I am actually wondering HOW should I chose the C value? — Boocaj, Dec 19 '21 at 17:53
The typical approach is to treat C as a hyperparameter and tune it using a hold-out set or cross-validation. Here's a search to get you started: https://stats.stackexchange.com/search?q=how+to+choose+%5Blasso%5D+answers%3A1+score%3A3 If you're looking for an authoritative reference, you could read Elements of Statistical Learning. https://hastie.su.domains/ElemStatLearn/ — Sycorax, Dec 20 '21 at 18:07

Multicollinearity for Logistic Regression and Neural Network

0 Answers0