I'm looking for a robust way to gradually build up a regression model -- namely I have a linear base-model with a robust set of predictors for which I'm fairly certain I have near optimal weights for, and I want to incrementally add features/predictors without degrading performance. In a sense, I'm trying to devise a good way to weed out noisy/uninformative predictors from entering my pipeline.
Using a single model like ridge or lasso, the larger predictor set (with the original set of predictors) does not perform as well as the original set of predictors alone. Ideally I'd like to not change basis (ie. no pca) to retain some explainability.
What I've looked into
I was looking for ways to combine models, and I came upon @whuber's fantastic response to Question on how to normalize regression coefficient, which allows multiple regression models to be expressed as a sequence of single variable regressions. I was able to replicate this, but I noticed the use of one multiple regression degrades performance as it's forcing the model to figure out the cross-term interactions, and I don't have enough data for the model to figure out the correct weights.
I've also looked into step-wise regression, though I've not attempted to implement it as it has some drawbacks.
stacking which will probably be my last resort -- I'd ideally like to find a way to figure out optimal weights for those weak / uninformative predictors -- if that's a possibility.
Where I need help
I'm currently thinking of fitting multiple linear models that each have their own distinct 'flavour', but I need to find a way to combine them. If I combine them by sequential matching, like in @whuber's answer, I'll just end up with multiple regression, which has shown to perform worse than my original predictor set.
What I think is happening is that the errors from one model is negating the correct predictions from other models, and as there are many more weak models than strong models, the errors overwhelm the stronger model predictions.
Because I'm working with linear models, I've been thinking one option is to introduce some kind of non-linearity when combining models, but then what kind of non-linearity?
Another thing I've been thinking about is that if I know my original set of predictors are near optimally weighted, then should I be freezing these coefficients when fitting other predictors?
At any rate, would greatly appreciate any guidance on this.
Edit
Performance is evaluated on hold-out data on 90/10 split, where the hold out has approximately 1000 samples.
I'm evaluating performance using out-of-sample RMSE, percent of prediction errors that exceed a variable threshold, correlation between prediction and realized target. The strength of my predictions are generally in consensus among these metrics.
Regarding sample set, the original dataset is has ~27,000 samples with 75 predictors. In this experiment, I'm looking to scale up to ~800 (many of which should have 0 or near 0 weight) predictors with the same number of samples. There's quite a bit of collinearity and noise in the data, and the predictors are all z-scores-like (odd, unit-variance, etc.). I've attempted to orthogonalize these and have gotten mixed results (not sure if noise is a contributing factor when performing orthogonalization).