Model selection aimed at making "misfit" statistically insignificant

Question

I am working with a model that can be described roughly as

$$ \left\{ \begin{array}{ll} y^* & = & \beta_0 + x'\beta + \epsilon_{\{x,v\} } \\ w^* & = & \gamma_0 + v'\gamma + \delta_{\{x,v\} } \\ y & = & 1[y^* >0 ] \\ w & = & 1[w^* >0 ] \end{array} \right. $$ although one or the other of $y$ or $w$ can be ordinal. (I am fitting this in generic terms with Stata cmp package by Roodman (2011).)

When the sets of regressors $x$ and $v$ are empty, $y^*$ and $w^*$ are strongly correlated. (and thus are their discrete versions $y$ and $w$). It is this correlation that I am aiming my model selection at: I need to make it small, which I know I can make happen as there are common influences on $y$ and $w$ from the explanatory variables $x$ and $v$ (which I kept distinct in my notation, but they will likely have a lot of common variables in practice).

My questions are:

What are the ways to conduct model selection here? So far, I am using a greedy search algorithm that adds one variable at a time to $x$ and/or $v$, check how the magnitude (or, better, significance) of the $\epsilon$-$\delta$ correction is affected, and chooses the variable that provides the best improvement.
Do I need to worry about the excluded variables needed for identification? (A somewhat longer, and a more accurate, story is that the $y$ equation is a selection equation, and $w$ equation is the response equation that only applies to cases with $y=1$, in a variation of the Heckman model. Whether the selection actually depends on $w$, as is implied in the labor supply model used by Heckman, is anybody's telling, but it does not need to be, in the context of my application).
What other questions should I be asking?

A somewhat similar problem of modifying your model until a certain test statistic becomes insignificant is faced in structural equation modeling, and I don't think they have an answer that I would find satisfactory.

I don't see any other answers here, which is a pity, since the question is interesting. I will like to take another thinking on the matter. Can you please clarify/verify the data series which comprise your sample? — Alecos Papadopoulos, Aug 19 '13 at 23:29
I was hoping Frank Harrell would take it up, but he hasn't yet. The data, as I said, are of selection + outcome format. A similar structure can be found say in Peress 2010, although his selection variable $y$ is ordinal rather than binary. — StasK, Aug 20 '13 at 13:04

Alecos Papadopoulos · Answer 1 · 2013-08-13T20:33:57.290

I will propose a method, based on the assumption that the regressors matrix will be identical for both equations, call it $Z$. Then : what do you seek? To "remove the correlation" between $y$ and $w$ because you believe it comes from the regressors. How will you know? As far as I can tell, you will have to consider "what is left unexplained" from the two dependent variables after regressing them on $Z$, and check whether these two "unexplained"s have become uncorrelated. These two "unexplained"s are the residuals from the two regressions. Ideally, you would want them to be totally uncorrelated, i.e to obtain $$\text{Corr}(\hat \epsilon, \hat \delta) = 0 \Rightarrow \text{Cov}(\hat \epsilon, \hat \delta) = 0 \Rightarrow E (\hat \epsilon \hat \delta) - E (\hat \epsilon )E (\hat \delta) = 0$$ As residuals, they will have by construction zero mean. So you are left with the ideal target of

$$E (\hat \epsilon \hat \delta) = 0$$ i.e. that the two residual series are orthogonal. This is a moment condition, and you can minimize its sample analogue, namely

$$\min \frac 1n |\sum_i\hat \epsilon_i \hat \delta_i| = min \frac 1n |\mathbf {\hat \epsilon'}\mathbf {\hat \delta}|$$

where the prime denots a row vector. Now denote $M_z = I-Z(Z'Z)^{-1}Z'$ the "annihilator" or "residual maker" $n\times n$ matrix. It second name comes from the fact $M_z\mathbf y = \mathbf {\hat \epsilon}$. This matrix is symmetric and idempotent, $M_z' = M_z,\; M_zM_z = M_z$. Then you can write your objective function as

Minimize over the set of regressors of course. So you set up a routine to evaluate this product over various combinations of regressors. The combination that gives you the lower value of your objective function is the one that will make your two dependent variables as uncorrelated as possible, after removing the effect of the said regressors.

For your routine, note that the objective function is equivalent to the inner product of the residual from one regresion times the other dependent variable

$$\mathbf y' M_z\mathbf w = \mathbf y'\mathbf {\hat \delta} = \mathbf {\hat \epsilon'}\mathbf w$$

ADDITION: If you want to minimize directly the (absolute value of) the correlation coefficient between the two residual series, you will have to take into account their estimated standard deviations also. In such a case your objective function becomes

$$\min |\mathbf y' M_z\mathbf w|\left(\mathbf y' M_z\mathbf y\mathbf w' M_z\mathbf w\right)^{-\frac 12}$$

Alecos, that's a reasonable answer, but it does not help my particular problem -- as I said in my point 1, I know how to devise a greedy search (and that's what I am doing now), but that's a mechanical solution, not a statistical one. Besides, the residuals are not really observed, as these are limited dependent variable models, as I explained in the update. I also want to keep the regressors separate, as that's what the existing evidence points for the fit of the separate equations. — StasK, Aug 15 '13 at 12:19
I don't see how an objective function that is the feasible implementation of a moment condition, is not a statistical solution. The point is not the search, it is how one evaluates the search. UPDATE: and I answered before seeing the update. — Alecos Papadopoulos, Aug 15 '13 at 12:21

Model selection aimed at making "misfit" statistically insignificant

1 Answers1