I'm reading this article and I can't really grasp the idea of this so-called kernel trick.
So far, what is present, is:
$ \Phi(x)^T * \Phi(y) = \sum x_ix_jy_iy_j$ and
$ k(x, y) = (x^T*y)^2 = \sum x_ix_jy_iy_j $
I don't see the difference. The "trick" is probably, that those equations are the same(?), so it is sufficient to calculate the lower one but I wonder where is the "trick", where is the computational saved effort as I would calculate the sum anyway, in both cases?