I am using logistic regression for classification purpose. For reduction of features and better precision I am using Weight of evidence technique. Also I need to use python for this. As there is no readily available algorithm for binning, I was searching for the rules of binning and I came across this:
http://www.m-hikari.com/ams/ams-2014/ams-65-68-2014/zengAMS65-68-2014.pdf
This paper says that:
A good binning algorithm should follow the following guidelines:
Missing values are binned separately.
Each bin should contain at least 5% of observations.
No bins have 0 accounts for good or bad
I don't understand what is the necessity of second condition i.e. each bin should contain at least 5% of observations? why is it necessary to have at least 5% observation in each bin? Can't I have at least 2% in each bin or at least 10% in each bin.
Someone told me that there will be more points if we consider 5% in each bin. Why is it necessary to have more points when you want to make already continuous data into categorical data?
It would be neglectful of me to not mention that binning itself is not viewed as a good technique by experienced data scientists and statisticians. For continuous features, it is less efficient at improving goodness of fit than using splines to effect a basis expansion. For categorical features, unless based on prior subject matter expertise, it is less principled than a good regularization strategy.
– Matthew Drury Feb 24 '17 at 04:46