Does anyone know what is the difference (theoretically speaking) between
under sampling
over sampling
weight based classifiers
when dealing with highly imbalanced datasets (1:1000, 1:10000)?
When is it recommended to use each one?
Is there a theoretic justification for any of these methods? or do they just "work pretty good"?