binary classification of real numbers

Question

Is there an efficient (linear or sub quadratic) algorithm to find the division point for real numbers? For example, I have a bunch of real number on the Real line and I am looking for a point which divides the line into left and right intervals? For example [-10,-9, -6, 1, 2, 3, -11, 5, 6, 8] what point is optimal to divide this into two 'separate' group. I am open to any suitable objective function.

A median will not work for the following examples [-100, 1, 2, 3, 4, 5] should have a division point something like -50. This is not an outlier detection problem either. For example [-100, -99, -98, 1, 2, 3, 4, 5, 6, 7, 8, 9] will not admit median as an answer, but something like -50 which clearly divided the point into two clusters.

The question is really about when clustering in 1D, is there an efficiency trick to do 'general' clustering in linear or sub-quadratic time, which is equivalent to finding a point on line that divides the given points into two groups.

To get useful answers, you'll need to [edit] to describe what problem you're trying to solve and what criterion makes a chosen number useful for dividing the real line. For example, is 7 a good choice to divide the real line? Why or why not? — Sycorax, Oct 03 '23 at 18:33
@manav, I encourage you to clarify your question :). Once edited, it should be reopened! Please let us know if you are referring to order statistics, change point detection or something else. Some people here are happy to help, but need to understand better what's being asked. This site is different from most forums because of a few rules designed to keep it as Q&A (as opposed to a regular forum). It might sound unwelcoming at first, but that's not the intention: you may see it as just a way to keep the resulting knowledge base in order — Rafael, Oct 03 '23 at 20:22
appreciate this response! I did not realize that a closed question can be opened. I will update this with some examples and then delete the duplicate. Thanks. — manav, Oct 03 '23 at 21:23
In your example [-100, 1, 2, 3, 4, 5] you propose the solution is "something like -50." Why is "something like -50" correct? The value -50 is "something like" half-way between 5 and -100. Are you asking about some kind of a mean? Or are you asking about density estimation? Or outlier detection? Mixture modeling? — Sycorax, Oct 03 '23 at 21:28
when we cluster point in N-dimension, do we ask for 'median' or mean? are we asking for density estimation, or outlier detection, mixture modeling? The question is really about when done in 1D is there an efficiency trick to do 'general' clustering, which is equivalent to finding a point on line that divides the given point into two groups. As I state in the question " I am open to any suitable objective function." I updated the question to clarify this point. Thanks for the feedback again. — manav, Oct 04 '23 at 16:17
This is valuable context to [edit] into your question. Yes, people use k-means and k-mediods for clustering in $N$ dimensions. Is "how do I cluster data in 1 dimension?" your question? Or "how can I make clustering in 1 dimension more efficient?" — Sycorax, Oct 04 '23 at 16:23
I'm struggling to understand how the answer to this could be something other than the median or the mean. (I suppose there are exotic measure of location.) — Dave, Oct 04 '23 at 16:26
See https://stats.stackexchange.com/questions/67571/how-can-i-group-numerical-data-into-naturally-forming-brackets-e-g-income for what seems to be (a version of) this problem — Nick Cox, Oct 04 '23 at 16:27

binary classification of real numbers

0 Answers0