A necessary (but not sufficient) conditions for $f_2$ to be a temporally scaled version of $f_1$ is that a spectral representation with a logarithmic frequency scale (such as the constant-Q transform) of $f_1$ is a translation of a log-frequency spectral representation of $f_2$.
Practically, given two signals, you can perform the test and evaluate $a$ by computing the CQT of $f_1$ and $f_2$, cross-correlating them and looking at the location of the peak. The strength of the peak might give you an idea of the spectral similarity of the two signals irrespectively of their temporal scale; and the position of the peak will give you the temporal scaling factor.

This type of representation robust to temporal scaling is useful in music signal modeling, where the different notes produced by a music instrument are - in a very rough approximation - temporally scaled versions of themselves.