1

I was trying to understand geometric interpretation of regularization and came across following statement here:

$$\text{Mean Square Error}\; E(y,\hat{y})=\frac{1}{n}\lVert\hat{y}-y\rVert^2$$ $$=\frac{1}{n}(b^TX^TXb-2b^TX^Ty+y^Ty)$$ Since $X^TX$ is positive semidefinite, we know that $b^TX^TXb\geq0$. Furthermore, we know that (from vector calculus) it will be a paraboloid (bowl-shaped surface) in the $(E,b_1,b_2)$ space. The following diagram depicts this situation. enter image description here

I am not able to get the sentence above. Specifically I didnt get "positive semidefinite" and "we know from vector calculas that it will be a paraboloid". How / why the mean square error surface is bowl shaped? Is their intuition (visual or geometrical if possible) behind it?

  • 1
    The definition is explicitly proportional to a sum of squares: that's what $||\hat y - y||^2$ means. The graphs of sums of squares are paraboloids. Try it with one or two variables where you can plot the graph and see it. "Bowl-shaped surface" is not quite correct, however: the "bowl" can be like a curved sheet of paper (similar to the first plot in my post at https://stats.stackexchange.com/a/7629/919) and it need not be circularly symmetric. – whuber Nov 30 '22 at 20:32
  • Lots of helpful information in https://stats.stackexchange.com/questions/224005/why-are-symmetric-positive-definite-spd-matrices-so-important – Sycorax Nov 30 '22 at 20:32
  • @whuber is it just that $x^2$ in 2D is parabola. So, in 3D, its paraboloids? – Mahesha999 Nov 30 '22 at 20:56
  • Yes. There are a limited number of basic shapes of the graphs of quadratic forms. Linear algebra shows how to find the shape by diagonalizing the form: in $n$ dimensions there are coordinates $(x_1,x_2,\ldots,x_n)$ in which the shape is the graph of $x_1^2+\cdots+x_r^2-(x_{r+1}^2+\cdots+x_{r+s}^2)$ where $0\le r+s\le n.$ The numbers $r,s,n$ determine the shape, which is called a "paraboloid" when either $r=0$ or $s=0.$ That this shape is unique is called Sylvester's Law of Inertia. – whuber Nov 30 '22 at 21:15

1 Answers1

1

Suppose $b$ is 1 dimensional. Then you have a loss function of the shape

$$ a_1 \cdot b^2 + a_2 \cdot b + a_3 $$

Which is a parabola. If $a_1 > 0$ then it will be a "smiling" / "convex" / "upward facing" parabola.

This generalizes to bigger dimensions:

$$b^T A_1b + b^T a_2 + a_3 $$

Only now, for it to be "smiling" we require $A_1 \succ 0$ to be positive-definite.

Maverick Meerkat
  • 3,403
  • 27
  • 38