2

Does anyone know what is the difference (theoretically speaking) between

  • under sampling

  • over sampling

  • weight based classifiers

when dealing with highly imbalanced datasets (1:1000, 1:10000)?

When is it recommended to use each one?

Is there a theoretic justification for any of these methods? or do they just "work pretty good"?

YinnonM
  • 21
  • You might find this paper by He and Garcia (2009) to be helpful: http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf – jld Mar 13 '17 at 14:49

0 Answers0