0

Something my lecturer said but I can't find why this is the case. I have to make a continuous variable into a categorical one, and the data is left skewed. Is it more important to have equal ranges or equal amounts of data in each range? Why?

Kev.D
  • 1

1 Answers1

1

My answer is - it depends.

It depends on why you do the discretization of the variable. If you want subsequently to build a statistical classifier from the discretized variable(s), you need to choose a binning (meaning the set of thresholds defining the interval bins all together), which is optimal for discerning the categories. If your purpose is merely display of the variable distribution in a histogram, you often prefer a uniform binning where the width of each bin is equal all the other bins.