Two-part model: is it necessary to use the same regressors in both parts?

Question

I am implementing a two-part model where the first part is a probit/logit and the second part an OLS. Is it necessary to use the same regressors for both parts or can I use different variables?

In particular, I only have one regressor, say $x$, to predict the probability that my dependent variable is $>0$ (the probit part) while I have a bunch of other regressors (different to $x$) to estimate the OLS part.

I have theoretical reasons to believe that the regressors I use in the second part have no effect on the probability that my dependent variable is $>0$.

score 2 · Answer 1 · answered Apr 08 '20 at 14:38

Sounds like you are trying to build a "hurdle" model where you assume that there is a process that dictates whether or not a zero is observed and then the non-zero values follow a separate process. I feel as though based on that definition you should be able to use different covariates for the two processes. For example, there may be a variable that dictates whether or not the event can even occur that you use in your logit and then covariates associated with the magnitude of the event in the second model.

Here's an example in STATA where different variables were used to model the separate processes: https://www.stata.com/stata14/hurdle-models/

I would also add that I am not sure that OLS is appropriate in this instance if your data is on the range [0, infinity). Often times Poisson or Negative Binomial distributions are used for hurdle models but if your data is more continuous, perhaps a Gamma distribution is appropriate.

Hi Emma, thank you for your answer. If I remember correctly from the different papers I read, the term "hurdle models" is often used in the litterature for count data while the term "two-part models" is the counterpart for continuous data (at least in economics), but the logic behind is the same. I cannot use a Poisson or a NB distribution as my dependent variable is continuous (it has decimals). I could indeed use a gamma distribution but I think that, since my dependent variable is continuous, I can stick with OLS, notably for its simplicity of interpretation. — Nicolas L, Apr 08 '20 at 17:51

Two-part model: is it necessary to use the same regressors in both parts?

1 Answers1