In various papers, e.g. Random Features for Large-Scale Kernel Machines, Rahimi and Recht introduce the now popular methodology wherein a "low rank" approximation to a stationary, PSD, kernel $K(x,y) = \langle\phi(X),\phi(y)\rangle$ is constructed by randomly sampling a fourier basis from the spectral density $p$ of the kernel K such that: $K(x,y) \simeq \langle Z(x),Z(y)\rangle$, where $$Z(x) = \frac{2}{\sqrt{d}}[\cos(w_{1}^Tx + b_{1}),\cos(w_{2}^Tx + b_{2}),...,\cos(w_{d}^Tx + b_{d})],$$ and $w_i \sim p, b \sim U(0,2\pi)$.
I am confused about the potential fact that the approximation to the potentially non periodic feature mapping $\phi$ is now approximated by a vector of periodic features. Thinking of this as a fourier series, I would think someone might specify an interval over which this approximation is valid, though no such discussion appears in the literature so I think I'm missing something. Should I be thinking of this method as approximating the kernel (i.e. the inner product between approximate feature mappings) rather than as well approximating the actual feature mapping itself?

I'll look more into your paper and thanks for pointing out the error bound in the original paper...although I imagine I could possibly work it out for myself, I'm somewhat more interested in the range of validity and approximation errors for the feature mapping Phi itself, rather than its associated kernel. Do you know of any work on this?
Alternatively, I'm wondering if I could simply rescale inputs to accomodate the fact that the kernel approximation decays from the origin.
– blackbird Jan 30 '20 at 18:38