I'm trying to build intuition around how individual coefficients change as a regularization penalty is increased (for both ridge and lasso). This is what I understand the curves of the l1 and l2 complexity penalties look like:
Given that the complexity penalty for lasso the l1 norm, I would expect that increasing $\lambda$ would reduce all coefficients equally, and this is indeed what I see when using a modified version of this sklearn example:
Given that the complexity penalty for ridge is the l2 norm, I would expect that increasing $\lambda$ would prioritize reducing the largest coefficients. I initially thought that they would start decreasing earlier (for smaller $\lambda$ values) than the smaller coeffcients, but it seems like they start around the same time, but are decreased with a more aggressive slope, which I think also makes sense:
However, when I go outside these idealized examples and look at some charts "in the wild", I see all kinds of regularization paths, for example this one for lasso (from here):
Or this one for ridge (from here):
I notice there are both on another scale on the x-axis (and the last one is flipped), but it really seems like they are following different paths than the idealized examples I posted. What makes these paths different from the idealized scenario above? Is the model somehow trying to use the features differently as $\lambda$ increases? Or is my intuition wrong about what should happen an it is only in the idealized cases that we can expect the coefficients to shrink in a well-behaved manner?




