5

I am new on this web-site and coming from the field of economics (although interested in High Dimensional Statistics), I am reading Statistics for High Dimensional Data of Bühlmann and Van De Geer. I struggle to get an intuition of what is the compatibility constant. As far as I understood it is a link between the L1 norm and L2 norm of a vector. Although when they compute the upper prediction bound for LASSO I do not understand. In the noisless case they go from $$ \parallel X\hat{\beta}-X\beta^0 \parallel^2_n + 2 \lambda \parallel \hat{\beta_{S_0^c}} \parallel_1 \leq 2\lambda \parallel \hat{\beta_{S_0}}-\beta^0 \parallel_1 $$

to $$ \text{left hand side} \leq \frac{2\lambda \parallel X\hat{\beta}-X\beta^0 \parallel_n}{\phi(S;L)} $$

Using the definition: $$ \phi(S;L)= min ( \parallel X\hat{\beta}-X\beta^0 \parallel_n : \parallel\beta_S\parallel=1 ; \parallel\beta_{S_c}\parallel \leq L) $$

With S the active set, Sc its complementary set, S0 the true active set .

It sounds obvious to them that by definition those two lines follow naturally, but it is not obvious to me. Does anyone has an intuition or explaination to provide me ? Sorry if I sound a bit unclear as it is not my domain and I feel a bit lost. Don't hesistate to ask me to be clearer :)

Thanks for reading and good evening (which makes no real meaning if not on Greenwich meridian) !

A.Barra
  • 51
  • 2

1 Answers1

3

In the noiseless case, we have $$\|X\hat{\beta}-X\beta^0\|_n^2+2\lambda\|\hat{\beta}\|_1\leq2\lambda\|\beta^0\|_1 $$ $$\Rightarrow\|X\hat{\beta}-X\beta^0\|_n^2+2\lambda\|{\hat{\beta}}_{\!S_0^c}\|_1\leq2\lambda\|\beta^0\|_1-2\lambda\|{\hat{\beta}}_{\!S_0}\|_1\leq2\lambda\|\beta^0-{\hat{\beta}}_{\!S_0}\|_1.\quad(\ast)$$ It can be directly obtained from $(\ast)$ that

(1) $\ \|X\hat{\beta}-X\beta^0\|_n^2+2\lambda\|{\hat{\beta}}_{\!S_0^c}\|_1\leq2\lambda\|{\hat{\beta}}_{\!S_0}-\beta^0\|_1;\quad$ (the first conslusion)

(2) $\ \|{\hat{\beta}}_{\!S_0^c}\|_1\leq\|{\hat{\beta}}_{\!S_0}-\beta^0\|_1$.

Define $s_0=|S_0|$ as the number of indices in the true active set $S_0$. Note that there's an error in the above definition of $\phi(S,L)$, which should be corrected as $$\phi(S,L)=\min_\beta\{\sqrt{s_0}\|X\beta_S-X\beta_{S^c}\|_n:\|\beta_S\|_1=1,\|\beta_{S^c}\|_1\leq L\}.$$ Now we make an explanation for the second conclusion. It's easy to obtain from (2) that $$\|{\hat{\beta}}_{\!S_0^c}-\beta^0_{\!S_0^c}\|_1\leq\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1$$ $$\qquad\Rightarrow \left\|-\,\frac{{\hat{\beta}}_{\!S_0^c}-\beta^0_{\!S_0^c}}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1}\right\|_1=\frac{\|{\hat{\beta}}_{\!S_0^c}-\beta^0_{\!S_0^c}\|_1}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1}\leq1.$$ Let $$\delta_S=\frac{{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1},\quad \delta_{S^c}=-\,\frac{{\hat{\beta}}_{\!S_0^c}-\beta^0_{\!S_0^c}}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1}.$$ Then $$\|\delta_S\|_1=1,\quad\|\delta_{S^c}\|_1\leq1.$$ According to the definition of $\phi(S_0,1)$, we have $$\sqrt{s_0}\|X\delta_S-X\delta_{S^c}\|_n\geq\phi(S_0,1)$$ $$\Rightarrow \sqrt{s_0}\left\|\frac{X({\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0})-X({\hat{\beta}}_{\!S_0^c}+\beta^0_{\!S_0^c})}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1}\right\|_n=\sqrt{s_0}\frac{\|X{\hat{\beta}}-X\beta^0\|_n}{\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1}\geq\phi(S_0,1)$$ $$\Rightarrow \|{\hat{\beta}}_{\!S_0}-\beta^0\|_1=\|{\hat{\beta}}_{\!S_0}-\beta^0_{\!S_0}\|_1\leq\frac{\sqrt{s_0}\|X{\hat{\beta}}-X\beta^0\|_n}{\phi(S_0,1)}.\ (\text{the second conclusion})$$