0

ML newbie here. I'm preparing my data for a binary classification to predict whether a person has an account or not. In total I have 8 variables: 2 numeric (age and household size) and 6 categorical. Is it advisable to change my numeric variables, age and household, to categorical — i.e. age brackets — or better to keep them as discrete numeric values?

Thanks for your help and advice :)

adam g.
  • 11
  • 1
    Someone will post an answer or link to an existing answer about why this is a poor approach (you might get to learn of someone named Frank Harrell), but what advantage(s) do you see to binning your numerical variables? – Dave Sep 10 '22 at 00:48
  • probably depends on the model you are using.. it might help to prevent overfitting, but you might not know how fine those intervals should be to keep the information – Alberto Sep 10 '22 at 01:05

1 Answers1

0

Yes, you can do it, but be careful with the meaning of your content. All data can change from fine-scale (numeric) to coarse-scale (binary) but it will lose some detail.

utobi
  • 11,726