Imagine working on a linear multiple regression problem with a design-matrix $\Phi$ based upon some independent variables $x_k, k\in{}[1, r]$. The goal is to find an equation that explains the "true" relationship between a dependent variable $y$ and independent variables $x_k, k\in{}[1, r]$ e.g. of a recorded physical experiment. It is defined as \begin{equation} \bar{y}=\Phi\theta \end{equation} with $\theta$ being the regression coefficients and $\bar{y}$ being predictions of the estimator. Why do I need penalized regression or subset-selection?
From what I have read, additional unnecessary columns in $\Phi$ create more variance in the estimators parameters $\theta$ but no bias (p.94, Kennedy: A guide to econometrics). (I assume this is because any correlation with relevant variables will result in some non-zero coefficient for that term. Coefficients could also act in opposing directions (such as with sin(x) and x at low x values). Is that correct? If you could help me understand this with the covariance matrix, that would be great.)
Why am I not able to remove this variance by using cross-validation and by averaging coefficients of the best sets? Isn't their mean free of bias? I read about this approach in (Brunton: DATA DRIVEN SCIENCE & ENGINEERING. http://databookuw.com/databook.pdf) but could not find it anywhere else. It also suggests thresholding small coefficients.
I read https://stats.stackexchange.com/questions/472202/when-to-use-regularization-vs-cross-validation#:~:text=Cross%20validation%20is%20about%20choosing,to%2C%20result%20in%20similar%20solutions and understand that cross-validation does not do the same thing as regularization. But shouldn't it converge to the true parameters for infinite data?
If you know a good piece of literature covering this, please share it. I am a mechanical engineering student and this topic is way above anything we have ever done in classes. Thanks in advance!