0

I have the following data, which produced the following plot.

Predictor_Variable <- c(rnorm(1000, 0, 25), rnorm(1000, 20, 2), rnorm(100, 20, 10))
Response_Variable <- c(rnorm(1000, 0, 2), rnorm(1000, 0, 25), rnorm(100, 0, 10))
Data_Frame <- data.frame(Predictor_Variable = Predictor_Variable, Response_Variable = Response_Variable)
plot(Response_Variable ~ Predictor_Variable, Data_Frame, main = 'Example Plot', xlab = 'Predictor Variable', ylab = 'Response Variable')

enter image description here

I need to fit a vertical line to the points on the plot that extend vertically up and down from the horizontal axis. This vertical portion is not perfectly centered, so I can't simply take the average of the values.

Is there a nice, programmatic way of finding this vertical line?

  • 1
    https://stats.stackexchange.com/questions/33078 has answers. So does https://stats.stackexchange.com/questions/581275. And even https://stats.stackexchange.com/questions/375787/. – whuber Apr 28 '23 at 01:12
  • If you're "fitting" a response in terms of a predictor, you're presumably trying to fit something like the conditional mean $E{Y|X=x)$. The wide spread of Y when x is close to 0 doesn't make the conditional mean go 'vertical'. It might pay to investigate what caused this behavior, which probably should start with looking at what the response measures and how the response is obtained (e.g. such an appearance is not uncommon when Y is a ratio of two things that may both take values close to 0). It may be that a model for the spread is needed ..., – Glen_b Apr 28 '23 at 02:06
  • ... or it may be that you need to look at the variables involved in a different way. Wanting to fit the "vertical" part would involve departing from "fitting Y given X=x" perspective – Glen_b Apr 28 '23 at 02:08
  • Your data is simulated from a mixture of three multi-normal distributions. Do you know that your actual data is a sample from a mixture of three (approximately multi-normal) distributions? Why are your two variables called "predictor" and "response" when in your simulation they are completely independent of each other? – Roland Apr 28 '23 at 05:35
  • 1
    @whuber - thank you for the excellent tip - the Hough Transform sounds like the perfect solution – David Moore Apr 28 '23 at 12:29
  • @Roland - I simulated data for this reproducible example, but my actual data don't necessarily follow a normal distribution, nor are the predictor and response variables completely independent – David Moore Apr 28 '23 at 12:30

0 Answers0