Periodicity of random kitchen sink feature mappings

Question

In various papers, e.g. Random Features for Large-Scale Kernel Machines, Rahimi and Recht introduce the now popular methodology wherein a "low rank" approximation to a stationary, PSD, kernel $K(x,y) = \langle\phi(X),\phi(y)\rangle$ is constructed by randomly sampling a fourier basis from the spectral density $p$ of the kernel K such that: $K(x,y) \simeq \langle Z(x),Z(y)\rangle$, where $$Z(x) = \frac{2}{\sqrt{d}}[\cos(w_{1}^Tx + b_{1}),\cos(w_{2}^Tx + b_{2}),...,\cos(w_{d}^Tx + b_{d})],$$ and $w_i \sim p, b \sim U(0,2\pi)$.

I am confused about the potential fact that the approximation to the potentially non periodic feature mapping $\phi$ is now approximated by a vector of periodic features. Thinking of this as a fourier series, I would think someone might specify an interval over which this approximation is valid, though no such discussion appears in the literature so I think I'm missing something. Should I be thinking of this method as approximating the kernel (i.e. the inner product between approximate feature mappings) rather than as well approximating the actual feature mapping itself?

score 1 · Answer 1 · answered Jan 29 '20 at 20:36

1

Indeed, this approximation is only valid over some region near the origin. I don't have time to do so right now, but it might be informative to simply plot the value of $K(0, x)$ in $\mathbb R$ for a moderate number of random features and a wide region.

I'd dispute, though, that "no such discussion appears in the literature." Even in the original paper, the approximation theorem is for a compact region, and contains an error term based on the diameter of that region.

This plot from my 2015 paper demonstrates the increase in empirical error; here $\breve z$ gives the embedding you used above and $\tilde z$ the (better) embedding which uses sin and cos of half as many frequencies, rather than cos with an offset.

Sriperumbudur and Szábo (2015) gave a bound with the (tighter) optimal-rate dependence on the radius of the approximation set.

answered Jan 29 '20 at 20:36

Danica

24,685

Thanks for the spot on comment / paper links.
I'll look more into your paper and thanks for pointing out the error bound in the original paper...although I imagine I could possibly work it out for myself, I'm somewhat more interested in the range of validity and approximation errors for the feature mapping Phi itself, rather than its associated kernel. Do you know of any work on this?

Alternatively, I'm wondering if I could simply rescale inputs to accomodate the fact that the kernel approximation decays from the origin.
– blackbird Jan 30 '20 at 18:38
@user272232 I'm not sure what you mean by a "range of validity" for $\phi$ – it's not like the random Fourier features directly approximate the infinite-dimensional feature map in any way beyond matching inner products, as far as I know. Rescaling inputs would be equivalent to rescaling the kernel bandwidth; this may or may not make sense in your setting. – Danica Jan 30 '20 at 21:51

Periodicity of random kitchen sink feature mappings

1 Answers1