1

I am wondering, for the Gaussian kernel

$$k(x_n,x_m) = \exp(- \frac{\theta}{2} \lVert x_n-x_m\rVert^2)$$

whether we really need the exponent.

What are the consequences of just using

$$k(x_n,x_m) = \frac{\theta}{2} \lVert x_n-x_m\rVert^2$$

1 Answers1

5

One pretty big consequence is that it's no longer a positive semidefinite kernel, so that the mathematical assumptions of almost all kernel methods don't apply. (Relevant discussion focused on Gaussian Processes here.)

Your proposed kernel has $k(x, x) = 0$: all points have zero "self-similarity." Thus for any $x_1 \ne x_2$, the kernel matrix given by the two points will be of the form $\begin{bmatrix}0 & c \\ c & 0\end{bmatrix}$ with $c > 0$, which is not positive semidefinite.


Note that $$ \frac{\theta}{2} \lVert x - y \rVert^2 = \frac{\theta}{2} \left( \lVert x \rVert^2 + \lVert y \rVert^2 - 2 \langle x, y \rangle \right) .$$ A vaguely similar kernel is given by $\theta \langle x, y \rangle$, but this is just the linear kernel.

Another vaguely similar and actually useful kernel is the "distance kernel": $$ k(x, y) = \lVert x - O \rVert + \lVert y - O \rVert - \lVert x - y \rVert ,$$ where $O$ is any fixed point in space, e.g. the origin. This is implicitly the kernel used by the famous "energy distance" / "distance covariance" when viewed as a kernel MMD (paper).

Danica
  • 24,685
  • Gaussian Kernel SVM is in a sense similar to a kNearestNeighbor approach. I was asking more from a practical perspective as one difference between the two is the exponent. – Cagdas Ozgenc May 07 '18 at 04:47