1

Suppose I have a scatterplot in a box with 0 < x < x1 and y1 < y < y2.

Let 0 < prp < 1.

Is there an efficient way to find a line that passes through the origin, and that has prp proportion of the points below or on the line and (1-prp) proportion of the points above the line?

NOTE: If you are thinking of recommending quantile regression, don't. It addresses a different problem.

Argent
  • 113

1 Answers1

1

A simple 1D search will do the work

set.seed(0)
d = data.frame(x1=runif(100),x2=runif(100))
p = 0.3

obj_fun <- function(k){
  abs(sum(d$x2>k*d$x1) / nrow(d) - p)
}


plot(seq(0,10),sapply(seq(0,10),obj_fun),type='b')
grid()

plot(d)
grid()
abline(0,seq(0,10)[which.min(sapply(seq(0,10),obj_fun))])

Here is the objective function and the solution plot

enter image description here

Haitao Du
  • 36,852
  • 25
  • 145
  • 242
  • Your solution is essentially brute force search. However, even when I increase the number of rows in the data frame to 1e5 and the search sequence to a length of 2001, so that it really gets close to 0, it's very fast. So this is a case where brute force search does the job! Thanks. – Argent Apr 26 '20 at 08:17
  • 1
    Why not just use the quantile function in R or at least suggest an efficient search method that will scale well? – whuber Apr 26 '20 at 12:56