4

I'm reading the textbook "Introduction to Econometrics" by Stock and Watson. Chapter 11 discusses regression with a binary response and it teaches the logit/probit models mostly. These seem to make sense when the plots show an S shaped relationship between X and Y, but what if I'm working with a dataset where the model calculates a probability in the shape of a "hill".

For example what if my continuous predictor X ranges from 0-100 and my response Y is only equal to 1 for X's at values of 30-70 but gets more concentrated at X's around 50. So lets say around X=50 we could have a 40% probability of Y=1, and then at X=40, X=60 we'd have 25% probability of Y=1 until the probability of Y=1 dissipates around 30-70. I cant find any models that model such a relationship through google - Are there any models that are more appropriate than logit/probit for this type of relationship?

VK2022
  • 43
  • 6
    Use a basis expansion, such as [tag:polynomial] terms or [tag:splines]. Polynomials and splines are not restricted to modeling monotonic relationships. – Sycorax Nov 01 '22 at 23:48
  • Essentialy the same question: https://stats.stackexchange.com/questions/63978/do-statisticians-assume-one-cant-over-water-a-plant-or-am-i-just-using-the-wro/63987#63987 – kjetil b halvorsen Sep 29 '23 at 09:01

3 Answers3

15

Logistic regression is a fine approach even in this case. The problem is that a simple linear relationship (on the log odds scale) is insufficient to model this relationship. You need to introduce non-linearity into the conditional mean, and this is most easily done with a spline.

Rather than let

$$\operatorname{logit}(p) = \beta_0 + \beta_1 x $$

let the linear predictor be

$$\operatorname{logit}(p) = \beta_0 + f(x)$$

Where $f(x)$ is some spline function. Because splines can be represented as linear combinations of non-linear functions, we can estimate this using the same machinery we do for linear models.

Shown below is an example. In red is the true conditional mean, which was used to generate the black dots. In black is the estimated conditional mean using a natural spline.

enter image description here

  • 1
    +1 for the good explanation and for the illustrative simulation. Just wondering if this may be a reasonable alternative: 1. Estimate the probability density of each class separately (using kernel density estimation) 2. Find the probability of the positive class at each point along the x axis as the ratio of the positive class pdf and total pdf of both classes. What are your thoughts on this approach? – KishKash Nov 02 '22 at 10:59
  • @KishKash. This is kind of what approaches like linear discriminant analysis do, though not exactly. The process as you've written it is probably not going to work. – Demetri Pananos Nov 02 '22 at 13:09
  • Thanks @Demetri Pananos – KishKash Nov 03 '22 at 08:06
2

What you are a describing suggests that there is not a complete relationship between X & Y. If only 40% of all X's equal to 50 associate with Y=1, then there is some unexplained variance in Y not explained by X.

That means this simple model is not enough, there is some other variable (or variables) needed to properly predict Y.

0

The logit/probit function is the cdf of the conditional mean. So by definition, in a linear function of some variable, say x, the probability must approach 1 for some value.

Now, you can add some sort of non-linearity into your model of the conditional mean. That will allow you to have a "hill". Note that it will not truly be a hill, rather it will be an S-shaped curve in a curved space, so that when you "flatten" out the curved space and look at it from that "flat" perspective, it doesn't look S-shaped any more. A very simple and fairly standard way to do this is to add polynomial terms to your conditional mean. At that point, you are relying on the fact that you can create an arbitrary Taylor-expansion of any function.

Generally, we don't see people do that in an inferential setting as this complicates the marginal interpretation of the coefficients, which is already very messy with discrete choice models due to the non-linearities from the probability distribution.

Ryan
  • 229