1

I have 100000 observations with two variables on each, age on the range of 18-80 and interactions on the range of 1-1500. I want to find 4 bins based on the age variable, where the number of observations in each bin is roughly similar (any bin can deviate up to 20% from the mean number in a bin, which is 100000/4=25000).

There are multiple binning options with this constraint. I would like to find the binning option, where the mean of interactions in each bin is very similar across bins. More mathematically, I'd like to do the following. For each bin $b$ I define $\mu_{b}$ as the mean of interactions for the observations in that bin and $\mu_{b,all}$ as the distribution of $\mu_{b}$. Then I need to do the following:

$$minimize \hspace{0.2cm} var(\mu_{b,all})$$

The bin ranges must not overlap and should span the entire interval. For instance, the 4 bins could be:

  • 18-30
  • 31-45
  • 46-57
  • 58-80

Is there a method that would easily do this? Or any suggestions on how to do it?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
pir
  • 5,056
  • 2
    Understanding a "bin" to be a contiguous range of values, the answer is already determined by the requirement that there be 4 bins with the same numbers of values. In general you cannot hope to satisfy the additional requirements of a "similar mean" (presumably of interactions) within each bin. Since you seem to be asking for something impossible, perhaps it would be more useful to edit your post to explain why you're trying to do this, so people could help you figure out an effective approach. – whuber Jun 30 '15 at 20:49
  • I've updated my post, making it more understandable and less "impossible" :) – pir Jun 30 '15 at 21:10
  • 1
    Now it's clear! (I am assuming by "variation across $\mu_i$" you really mean the variation of interactions in the bin relative to $\mu_i$). You are asking for a one-dimensional constrained clustering algorithm. It is especially helpful that you have quantified your constraints. – whuber Jun 30 '15 at 21:11
  • No, that is not what I mean. I've tried to update my post one more time. – pir Jun 30 '15 at 22:01

0 Answers0