2

Any recommendations for models (or applications in papers) to test theoretical predictions about non-continuous effects, e.g., based on a threshold, rather than marginal effects? Some examples of the sort of threshold theoretical predictions I'm interested in testing:

The simplest example would be where the threshold is known. e.g.,

\begin{equation} Y_i=1 \ \text{if} \ X_i>0.5 \ \text{and}\ Y_i=0 \ \text{otherwise} \ \ \ \ \ \ \ \text{(1)} \end{equation}

A more useful example would be where we theorize a threshold, but we don't know its value, or even assume that it's constant across individuals.

\begin{equation} Y_i=1 \ \text{if} \ X_i>T_i \ \text{and}\ Y_i=0 \ \text{otherwise} \ \ \ \ \ \ \ \text{(2)} \end{equation} Where $T_i$ is a threshold that is allowed to differ across units indexed in $i$.

I don't just want to test whether $X$ has a positive effect on $Y$ (e.g., $Y$~$X$), but rather I want to test for evidence for the threshold prediction.

Dr. Beeblebrox
  • 1,302
  • 1
  • 13
  • 18
  • This kind of setup is not familiar to me, can you describe an application by any chance? It seems like we would need additional assumptions on the Ti in order for this to be feasible. – John Madden Jun 13 '22 at 15:31
  • Theoretically, Stark (1991)'s theory of "target earner" migration is an example: i.e., migrants return to their home country after accumulating Ti savings in the destination country. Empirically, I haven't confronted a situation like this before. I could derive testable predictions myself but first I want to see if there are existing models with testable predictions like this. I've found work by Tsay on testing threshold model in time-series data (https://tinyurl.com/2p9bea5y) but I'm hoping to find something in cross-sectional data. – Dr. Beeblebrox Jun 13 '22 at 16:30

2 Answers2

2

I think you're describing change point detection, and you can find an enormous amount of information on that topic once armed with the right term to search for. By the way, this kind of thresholding does not constitute "non-linear dynamics", since there is no non-linearity, and no dynamics over time.

There are lots of existing tools for doing change point detection, but here's a simple approach using a noisy version of your data:

library(tidyverse)
x = 1:100
changepoint = 50
y_mean = ifelse(x < changepoint, 0, 1)
y = rnorm(length(x), y_mean, .5)
plot(x, y)
abline(v=changepoint)

enter image description here

#' Log-likelihood of x under the best-fitting parameters
gaussian_loglik = function(x){
  m = mean(x)
  s = sd(x)
  dnorm(x, m, s, log = T) %>% sum()
}

evaluate_changepoint = function(proposed_changepoint){ y_pre = y[x < proposed_changepoint] y_post = y[x >= proposed_changepoint] gaussian_loglik(y_pre) + gaussian_loglik(y_post) }

changepoint_liks = map_dbl(x, evaluate_changepoint) estimated_changepoint = x[which.max(changepoint_liks)] plot(x, changepoint_liks, 'l', xlab = 'Value', ylab = 'Log lik. of changepoint') abline(v=estimated_changepoint)

enter image description here

Eoin
  • 8,997
  • Thank you, I wasn't aware of change point detection and the code is very helpful. Am I understanding it correctly that this is testing for a single change point in a dataset, rather than testing if there are unit-specific change points? – Dr. Beeblebrox Jun 21 '22 at 17:01
2

You could model this with probit or logistic regression. Then you model the probability

$$\mathbb{P}(Y_i = y) = \begin{cases} p & \qquad \text{if $y=1$}\\ 1-p & \qquad \text{if $y=0$} \end{cases}$$

This $p = \mathbb{P}(Y_i = 1)$ is equivalent to the probability $\mathbb{P}(T_i < X_i) = F_{T_i}(X_i)$. That is, the probability of observing $Y_i = 1$ is equal to the cumulative distribution function of $T_i$ in the point $X_i$.

When $T_i$ is normal distributed then you have a probit model. When $T_i$ follows a logistic distribution then you have a logistic model. (Personally I feel that a probit model is more natural here, and logistic regression is more appropriate in classification problems)

  • Yeah, I really couldn't really tell from the question which kind of problem was being asked about here, but I think between our answers we must have it. – Eoin Jun 20 '22 at 16:40
  • Thank you, this is helpful to imagine modeling the outcome in this way, but I'm not clear what the test statistic would be. What is the test of whether the effect of $X$ on $Y$ occurs through the threshold function I describe in Equation (2), as opposed to a continuous marginal effect of $X$ on $Y$? – Dr. Beeblebrox Jun 21 '22 at 17:06
  • @Dr.Beeblebrox if you do a logit or logistic regression then you are modeling a sigmoid function which describes the relative frequency of $Y=1$ and $Y=0$ as function of $X$. Whether or not $X$ has a relationship with $Y$ can be tested with a likelihood ratio test. The difference between your situation 1 and 2 is less easy to test. If you have situation 1 then the classes Y = 1 and Y = 0 have perfect separation and the fitting is difficult. – Sextus Empiricus Jun 21 '22 at 17:31