2

I am working on a classification program using SVM RBF kernel. To find the best parameters C and gamma, I used grid search, and got the image below. What confuses me is that when gamma varies from 0.3 to 3, the accuracy changes so rapidly. I wonder what happens in this region.

enter image description here

I think the good models should be found on the diagonal, of a higher C with a lower gamma , or a lower C with a higher gamma.

Could anyone help me to explain the performance variation when gamma is between 0.3 and 3?

I didn't do feature scaling with the first image so the features are not scaled. I guess it could be a reason because when I normalized the data, the image turned out to be like

enter image description here

the yellow part between gamma=1 and 10 is gone and the accuracy seems decreases slightly

Shiyu
  • 21
  • 1
  • 4

1 Answers1

3

For simplicity, first scale your data $X$ so that $median \|X_i - X_j\| \approx 1$: half the neighbors are < 1 away, and half > 1, on average.
What $e^{-gamma\ dist^2 }$ does is down-weight, attenuate, more distant neighbors. By how much ? Make a little table:

dist:                  [0    .5  1   2    3]
                       ---------------------
exp( - 0.3 * dist^2 ): [100  93  74  30   7] %
exp( -   1 * dist^2 ): [100  78  37   2   0] %
exp( -   3 * dist^2 ): [100  47   5   0   0] %

So $gamma = 3$ down-weights half the points by 5 % .. 0,
$gamma = 1$ by 37 % .. 0,
$gamma = 0.3$ even less. (The range 0.3 .. 3 is way too big.)

A simple rule of thumb: start with $gamma = 3$, for distances scaled to median 1.

Could you try $gamma = 2, 3, 4$ for your scaled data ?
Also, plotting the sample distributions of $dist = \|X_i - X_j\|$ and $e^{ -gamma\ dist^2 }$ might be useful.

denis
  • 3,297
  • 24
  • 36
  • Thanks for the answer denis. However, the features for this image is not scaled. (I tried both scaled and unscaled features and the latter one gave me a better prediction accuracy). When I tried svm gaussian with scaled data, the part between gamma from 1 to 10 was all black while the left part didn't change much. Also, I don't understand why the median of distance is 1 for scaled data, is there any reference? – Shiyu Jun 28 '16 at 09:41
  • @Shiyu, instead of scaling, just vary C and gamma over a small white range in your plot, e.g. gamma [.005, .01, .02] and C [10, 100]. See also http://stats.stackexchange.com/questions/81537/gridsearch-for-svm-parameter-estimation . – denis Jun 29 '16 at 08:57
  • Thank you @denis, I did like you said and found another region where the accuracy varied from 60% to 80%. The linked question you suggest is also helpful. – Shiyu Jun 29 '16 at 11:35
  • You're welcome. Another link, with plots: https://yunhaocsblog.wordpress.com/2014/07/27/the-effects-of-hyperparameters-in-svm/ – denis Jun 29 '16 at 16:07