Binning continuous predictors, What is the best way?

Question

I want to run a binary logistic regression to understanding (modeling) factors affecting nest-site selection in a bird species.. I think it is better if I transform continuous variables to categorical variables, because for example Nest Height from the ground can have different effects in different ranges (different intervals); for example 0-200cm will have a negative effect in Nest-site selection while 200-250cm have a positive effect and it will be change in next ranges too + - - + + - + .. After running Binary logistic regression we will have a coefficient for each category instead of having a coefficient for a variable.

I am looking for a method where the cut points are not exactly in the data. I think it will be about distribution and density.. I think Equal probability histogram with Kernel density estimation is a good method for this purpose particularly when I can run a Pairwise comparison of means to know each category built properly or not, but only when I do Visual interpretation.. I am looking for a more scientific method which I will refer to in my article. Thanks.

Is "+ - - + + - +" the pattern of the effect of height on selection? Why do you think this happens? Is it several different effects interacting? — Henry, Feb 11 '23 at 22:41
It's not exactly like that "+ - - + + - +" I just gave an example... but I felt something like this in the environmental data — Mostafa Ahmadi, Feb 11 '23 at 22:47
@MostafaAhmadi Why not? The accepted answer gives strong rationale about why this kind of binning is a poor strategy and how you can get the desired nonlinear behavior (the $++-+-+$ you desire) using alternative strategies. — Dave, Feb 12 '23 at 01:16
Did you try to spline it? See https://stats.stackexchange.com/questions/122212/logistic-regression-with-regression-splines-in-r — kjetil b halvorsen, Feb 12 '23 at 04:04
I think it is pretty much the consensus here and elsewhere among statisticians that binning is bad practice (especially "optimal" binning, which can very quickly devolve into p-hacking), and that splines are much better to capture nonlinearities. I would very much recommend you do look into splines, and if you then still believe you need binning, then please do edit this question and explain why splines are not the answer. Of course, if you have any questions about splines, you can post them here, or search in the [tag:splines] tag. — Stephan Kolassa, Feb 12 '23 at 06:13
Thank you all.. You are right about using spline method.. But I am a new R user so I want to start learning about it now. — Mostafa Ahmadi, Feb 12 '23 at 20:02

Binning continuous predictors, What is the best way?

0 Answers0