Questions tagged [binning]

Binning means grouping a continuous variable into discrete categories. It is particularly used in reference to histograms, but could also be used more generally in the sense of coarsening.

Various rules have been proposed to choose a number of bins in a histogram; as is often the case, it is a tradeoff: With too many bins, the histogram will be very bumpy and reliant on the particular data set. With too few, necessary detail is lost. This is discussed in this thread

One problem with histograms is that different binning can result in histograms that appear quite different.

263 questions
3
votes
0 answers

Predicting binary outcomes for observations given statistics on binned data

SAT Verbal scores range from 200 to 800 in increments of 10. MIT says that for the class of 2023, the acceptance rates were, for various score ranges 750-800 10% = 677/6504 700-740 06% = 312/5039 650-690 03% = 87/2614 600-640 01% = 11/1091 200-590…
2
votes
1 answer

Difference between equal frequency and quantile binning

Equal-frequency binning divides the data set into bins that all have the same number of samples. Quantile binning assigns the same number of observations to each bin. What is the difference between both methods? It seems to me that both do the same…
joni
  • 21
2
votes
0 answers

References for the effect of binning methods?

There are two main ways to bin a numerical signal into discrete categorical values: Quintile Binning: Each Bin will have the same size Histogram Binning: A normal distribution will be used to bin the signal When should I use one or the other?
user46925
1
vote
1 answer

Discretization Score test vs Training set

I'm in the process of training a NB model based on continuous features that need Equal Frequency Discretization to be used. Now, the question mark I'm facing is if discretization needs to be performed separately for train and score set appending…
1
vote
1 answer

When to use equal frequency binning and when equal width binning?

When transforming numerical variables into categorical variables I'm not aware of when should I use equal frequency binning and when equal width binning. Seems that each of them has their own advantages but I can't distinguish them.
al2
  • 13
  • 1
  • 3
1
vote
0 answers

Comparing two sets of equal width binned data

I have two distinct sets of ~1500 intervals of differing lengths - let’s call them Interval Set 1 and Interval Set2. I have another set of data that contains values that fall within each of the sets of intervals - I’ll call it Data Values. I want to…
1
vote
0 answers

Find bins for observations where each bin has similar mean

I have 100000 observations with two variables on each, age on the range of 18-80 and interactions on the range of 1-1500. I want to find 4 bins based on the age variable, where the number of observations in each bin is roughly similar (any bin can…
pir
  • 5,056
0
votes
0 answers

Discretization of skewed data (time durations)

I have data that describes the duration of how long a person views a webpage. This is quite varied and in the context wherein I gathered the data, it was very skewed. People mostly spent short amounts of time in a webpage but sometimes spent a…
Paul
  • 121
0
votes
0 answers

Binning continuous predictors, What is the best way?

I want to run a binary logistic regression to understanding (modeling) factors affecting nest-site selection in a bird species.. I think it is better if I transform continuous variables to categorical variables, because for example Nest Height from…
0
votes
0 answers

How to correct make binning

I have a question regarding how to make binning correctly. I have 500 rows with different values (wages) which vary a lot. For example: in row number 20 the wage is \$600, whereas in 40 the wage is \$50000.
0
votes
0 answers

Binning by boundaries

What happens in the situation where you have a value that is equi-distant to the upper and lower boundaries when binning by boundaries? Take the example {26,28,30,34} Does 30 get converted to 26 or to 34?
TheGoat
  • 639