SPSS has an optimal binning function that helps categorizing into meaningful intervals continuous predictors when a binary response variable exists. I was looking for an equivalent function in R but I'm not finding any. I'm not sure that using bins derived by CART or CTREE could be equivalent.
Asked
Active
Viewed 3,420 times
2 Answers
3
There is now a package call "smbinning" that longs for Optimal Binning for Scoring Modeling since early 2015. It gives you the optimal cut point for a numeric variable, more precisely, optimizing the information value. It is able to handle categorical variable and missing value as well.
For example:
smbinning(df, y , x, p = 0.05)
- df <- Data frame
- y <- Binary dependent variable
- x <- numeric independent variable
- p <- Percentage of records per bin
It returns a list that contains the information value, Information value table and others. you may find detail in the documentation at CRAN or http://www.scoringmodeling.com/
Anthony Lei
- 421
-
3To be honest not the biggest fan of the smbinning package. I haven't coded anything better but the coding in the package feels "amateurish", and it fails in many of the test cases I tried. I don't recommend smbinning at v0.2. – xiaodai Nov 24 '15 at 02:51
2
You can test the discretization package and the cutPoints function : http://cran.r-project.org/web/packages/discretization/discretization.pdf.
cutfunction and in documentation of?histyou can find info about algorithms that choose "optimal" number of bins for histogram. See also http://stats.stackexchange.com/questions/163778/how-do-you-find-a-cutting-point-strong-slope-within-one-dimensional-data/163787#163787 – Tim Oct 17 '15 at 07:57