Questions tagged [oversampling]

Sampling cases with differential probability, so that classes that occur rarely in the population occur more often in the training data. Does not address the problems in unbalanced classes.

do pose problems, but contrary to common misunderstandings, these are merely due to low sample size (high variance of predictors), not the unbalancedness per se. As such, oversampling will not help.

See Are unbalanced datasets problematic, and (how) does oversampling (purport to) help? and links there.

112 questions
3
votes
2 answers

What if I factor the training set?

In pratice, it is usual that we don't have enough observations to build our desired models. An idea come to my mind is that the population can be factored: in other words, we can simply duplicate every observation, making (say) 5 copies of every…
Metariat
  • 2,526
  • 4
  • 24
  • 43