Local polynomial (linear) regression of binary data --- logit transformation?

Question

I got a bit confused about how to fit a local polynomial to binary outcomes if I would rather approximate the underlying index (within a link function) instead. (Basically for the same reason why someone would estimate a logit or a probit instead of a linear probability model.) Of course, I would like to plug the variable in the inverse of the link function --- but of course, 0 and 1 will give me $-\infty$ and $+\infty$ in common link functions.

To clarify, my problem is to flexibly predict $y(x)$, where $x \in [-1,1]$ but $y \in \{0,1\}$. I was not thinking about splines, as I would be more interested in a smooth prediction of the index $f(x)$ if the prediction is $\hat{y}=g(f(x))$ with a link function g. I do not have a prior on the order of the splines and the location of the breakpoints. I would be interested in general advice on the appropriate approach.

As a shorthand, I substituted 0 with $\frac{1}{N}$ and 1 with $\frac{N-1}{N}$, and generated the index with these. (Using a logit link, thus I was smoothing/predicting $f(x)=\Lambda(\frac{1}{N})$ if y(x)=0, e.g.)

But I am uneasy about this. I coded up leave-out cross-validation to pick the right bandwidth for smoothing f with the short-hand f, and even used the delta method to get back to confidence intervals of the original binary variable from my smoothing of the index. But that might be fake sophistication if I approach the prediction wrong in the first place.

Thanks for any thoughts on this!

Could you please clarify the question you would like to obtain from the community, and give more details on the process you are trying to model (where binary outcomes come from, do you have a time series process (polynomial), covariates to regress with)? — Dmitrij Celov, Sep 28 '11 at 06:47
The whole point of logistic/probit regression is to solve the problems you are having. You can use splines or other non-linear constructs within logistic regression just as with linear regression. — Aniko, Sep 28 '11 at 12:50
Nice comments; I'll just add that infinite limits for the log odds or probit of the probability of response are natural - no need to be concerned. Also, if you have a single predictor and model it flexibly (e.g., using restricted cubic splines (natural splines)), the choice of link function does not matter. — Frank Harrell, Sep 28 '11 at 12:54
Thank you all, I hope my edited question is more helpful now. — László, Sep 28 '11 at 14:37

Karl · Accepted Answer · 2011-09-28T17:49:43.010

Take a look at McCullagh and Nelder (1989) Generalized Linear Models, 2nd ed, Section 2.5 (pp 40-43), on iteratively reweighted least squares.

Let $y$ be the 0/1 outcome and let $\eta = g(\mu)$ be the link function. You never calculate $g(y)$ directly, but work with an adjusted dependent variable $$z = \hat{\eta}_0 + (y-\hat{\mu}_0) \left(\frac{d\eta}{d\mu}\right)_0$$ where $\hat{\eta}_0$ is the current estimate of the linear predictor, $X\hat{\beta}_0$, and $\hat{\mu}_0 = g^{-1}(\hat{\eta}_0)$. So that avoids the problem with $g(0)$ and $g(1)$ being $\pm \infty$.

For the logit link, $\eta = \ln[\mu / (1-\mu)]$, you'll find that $d\eta/d\mu = 1/[\mu(1-\mu)]$ and so you would have $$z = \hat{\eta}_0 + \frac{y-\hat{\mu}_0}{\hat{\mu}_0 (1 - \hat{\mu}_0)}$$

You further calculate weights $$w_0^{-1} = \left(\frac{d\eta}{d\mu}\right)^2_0 v_0$$ where $v_0 = V(\mu_0)$ come from the mean/variance relationship, which for the binary case would be $V(\mu) = \mu(1-\mu)$. For the logit link, since $d\eta/d\mu = 1/[\mu(1-\mu)]$, you end up with weights $w_0 = \mu_0(1-\mu_0)$.

A key concern is the starting points. You might look at the R source code to see what they do. I wrote down in a notebook to start with $\tilde{\mu} = 1/4$ if $y = 0$ and $\tilde{\mu} = 3/4$ if $y=1$, but I didn't include a source.

To spell out the iterative algorithm a bit more, focusing on the logit link:

At the start you do the following:

Start with initial "fitted" values, say $\hat{\mu}^{(0)}_i = $ 1/4 or 3/4 according to whether $y_i = $ 0 or 1
Calculate $\hat{\eta}^{(0)}_i = \ln[\hat{\mu}^{(0)}_i/(1-\hat{\mu}^{(0)}_i)]$
Calculate $z^{(0)}_i = \hat{\eta}^{(0)}_i + [y_i-\hat{\mu}^{(0)}_i]/[\hat{\mu}^{(0)}_i (1 - \hat{\mu}^{(0)}_i)]$
Calculate the weights $w^{(0)}_i = \hat{\mu}^{(0)}_i (1-\hat{\mu}^{(0)}_i)$
Regress the $z^{(0)}_i$ on $X$ using weights $w^{(0)}_i$, to get initial estimates $\hat{\beta}^{(0)}$

Then, at each iteration, you do the following:

Calculate $\hat{\eta}^{(s)}_i = X \hat{\beta}^{(s-1)}$
Calculate $\hat{\mu}^{(s)}_i = \exp(\hat{\eta}^{(s)}_i)/[1+\exp(\hat{\eta}^{(s)}_i)]$
Calculate $z^{(s)}_i = \hat{\eta}^{(s)}_i + [y_i-\hat{\mu}^{(s)}_i]/[\hat{\mu}^{(s)}_i (1 - \hat{\mu}^{(s)}_i)]$
Calculate the weights $w^{(s)}_i = \hat{\mu}^{(s)}_i (1-\hat{\mu}^{(s)}_i)$
Regress the $z^{(s)}_i$ on $X$ using weights $w^{(s)}_i$, to get revised estimates $\hat{\beta}^{(s)}$

This is all just for regular logistic regression. For the local logistic regression version, there is some discussion in Chapter 4 of Loader (1999) Local regression and likelihood (but frankly, I didn't really follow it).

A Google search for "local logistic regression IRLS" revealed these notes from Patrick Breheny, which say (pg 8):

The weight given to an observation $i$ in a given iteration of the IRLS algorithm is then a product of the weight coming from the quadratic approximation to the likelihood and the weight coming from the kernel ($w_i = w_{1i} w_{2i}$)

Thanks! I am now unsure whether I could use the same approach if I would fit something like a local polynomial: http://en.wikipedia.org/wiki/Local_regression#Degree_of_local_polynomials. Basically, a kernel-weighted linear regression for each observation (as I would take them as the points for smoothing). As I read your pseudo-code, this is not the same thing. But the proposed transformation is a good idea? — László, Sep 28 '11 at 17:30
@László - I added a bit more regarding the local regression aspect. — Karl, Sep 28 '11 at 17:50
I see, this is great! I accepted this again (I withdrew because my original question was about local regression). I would also upvote your answer, but I need to wait until I get a better reputation. — László, Sep 28 '11 at 18:38

Local polynomial (linear) regression of binary data --- logit transformation?

1 Answers1

Linked