I am comparing the MAE of LASSO regression of multiple features vs. MAE of linear regression of each individual feature, and I am having trouble understanding why the LASSO MAE can be worse than some of the individual feature MAE, even on for the training set (where one single feature resulted in lower MAE than LASSO).
In my understanding, LASSO is a linear regression with regulation to make weight of "un-useful" features zero while minimizing MSE (which should be reflected in minimized MAE as well). Then why did LASSO chose multiple features that gives higher error rather than only keeping a single or fewer features that gives a lower error?