I am new on this web-site and coming from the field of economics (although interested in High Dimensional Statistics), I am reading Statistics for High Dimensional Data of Bühlmann and Van De Geer. I struggle to get an intuition of what is the compatibility constant. As far as I understood it is a link between the L1 norm and L2 norm of a vector. Although when they compute the upper prediction bound for LASSO I do not understand. In the noisless case they go from $$ \parallel X\hat{\beta}-X\beta^0 \parallel^2_n + 2 \lambda \parallel \hat{\beta_{S_0^c}} \parallel_1 \leq 2\lambda \parallel \hat{\beta_{S_0}}-\beta^0 \parallel_1 $$
to $$ \text{left hand side} \leq \frac{2\lambda \parallel X\hat{\beta}-X\beta^0 \parallel_n}{\phi(S;L)} $$
Using the definition: $$ \phi(S;L)= min ( \parallel X\hat{\beta}-X\beta^0 \parallel_n : \parallel\beta_S\parallel=1 ; \parallel\beta_{S_c}\parallel \leq L) $$
With S the active set, Sc its complementary set, S0 the true active set .
It sounds obvious to them that by definition those two lines follow naturally, but it is not obvious to me. Does anyone has an intuition or explaination to provide me ? Sorry if I sound a bit unclear as it is not my domain and I feel a bit lost. Don't hesistate to ask me to be clearer :)
Thanks for reading and good evening (which makes no real meaning if not on Greenwich meridian) !