3

I'm using libSVM for binary classification and my training data is very unbalanced (-1:90%, +1:10%). According to libSVM's documentation, it's better to set different penalties for positive and negative classes. For example, the SVM problem is:

$\min\limits_{w,b,\xi} \frac{1}{2}{\bf w^Tw} + C^+\sum\limits_{y_i=1} \xi_i + C^-\sum\limits_{y_i=-1} \xi_i$

My question is which penalty should be larger and why. Thanks

user11869
  • 299

1 Answers1

7

The larger the penalty, the more an error on the training set (which is what is measured by $\xi_i$) for a pattern of that class influences the model. So if you have more negative patterns than positive patterns then you probably want to make $C^+$ larger than $C^-$. Personally if there is a class imbalance problem then it usually means that the costs of false-positive and false-negative errors are not the same, and the relative costs of the errors is an important criterion for adjusting the penalties. I would suggest using cross-validation to estimate the expected loss and choose the penalties to minimise that.

Dikran Marsupial
  • 54,432
  • 9
  • 139
  • 204
  • A small addition: In practice, I have found that often in unbalanced problems, different penalties yield the same CV performance. So, OP, keep in mind that changing penalties will not necessarily improve your results in CV. – Bitwise Oct 25 '12 at 19:27
  • @Bitwise, what would you do in this case then? – user11869 Oct 25 '12 at 19:39
  • @user11689 there is not much to do, this is just another parameter to play with to try and improve your results (with proper CV, of course). – Bitwise Oct 25 '12 at 19:52