I am looking to fit Logistic Regression (LR) and Neural Networks (NN) models in order to predict if there will be avalanches during a day (0 or 1 dependant variable) based on meteorological variables (independent variables). I however create 100+ secondary features (e.g Tmax_24h, Tmax_48h, Tmin_24h, Tmax_48h, Rain_24h-48h-72h, Snow 24h-48h-72h, etc.) from 3 raw meteo variables (Air Temperature, precipitations and wind speed). Doing so introduce a lot of collinearity between my features. According to what I understand, having many collinear features seems to be problematic for NN and LR (cannot converge when high collinearity exists between features).
So I am wondering what is the best solution to fit a LR or NN models on dataset that contains multicollinearity? I tried the L1 (Lasso) and L2 (RIDGE) penalty in Sklearn Logistic Regression hyperparameter, but it is still not able to converge. I tought about a recursive featue selection (RFE in sklearn), but this method is a backward elimination analysis so it starts with all features and the model cannot converge. Is there any "go to" method to fit a LR or NN model on collinear datas?
Here is the convergece warning that I get in python when I try to fit a LR directly on the 100+ variables: "ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn("The max_iter was reached which means"
C. – Sycorax Dec 19 '21 at 16:48According to the C parameter, I just realized that when I decrease it (e.g 0.01 instead of 1), I don't get the convergence warning anymore. I think that is because a stronger regularization is applied and less features are used in the model (because more features parameters are equal to 0). But I am actually wondering HOW should I chose the C value?
– Boocaj Dec 19 '21 at 17:53Cas a hyperparameter and tune it using a hold-out set or cross-validation. Here's a search to get you started: https://stats.stackexchange.com/search?q=how+to+choose+%5Blasso%5D+answers%3A1+score%3A3 If you're looking for an authoritative reference, you could read Elements of Statistical Learning. https://hastie.su.domains/ElemStatLearn/ – Sycorax Dec 20 '21 at 18:07