0

To simplify things, I will ask my question in the case of simple logistic regression but I am also interested in the case with multiple explanatory variables.

Let $\vec{x} \in \mathbb{R}^N$ be the observed values of the explanatory variable and $\vec{y} \in \{0,1\}^N$ be the corresponding observed values of the binary response variable.

We want to fit a model of the form $p_{\beta_0,\beta_1}(x) = \sigma(\beta_1x+\beta_0)$ which minimizes the logistic loss function:

$$ \ell(\beta_0,\beta_1) = \vec{y}\cdot \log( p_{\beta_0,\beta_1}(\vec{x})) + (\vec{1} - \vec{y}) \cdot \log(\vec{1} - p_{\beta_0,\beta_1}(\vec{x})) $$

Letting

$$ X = \begin{bmatrix} \vec{1} & \vec{x}\end{bmatrix} \hphantom{dsds} \vec{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} $$

we have

$$ \nabla \ell = X^\top (p_{\vec{\beta}}(\vec{x}) - \vec{y}) $$

I understand that if the data is separable ( if the convex hull of $\{ x_i : y_i = 0\}$ is disjoint from the convex hull of $\{x_i: y_i = 1\}$) then there is a unique global minimum of the loss function. If the data is not separable then there is a unique global minimum.

Are there any easy bounds on the parameters in the non-seperable case?

Let the intersection of those convex hulls be the interval $[x_{min},x_{max}]$. I would guess that if the data is not separable then we can bound $ x_{min}< \frac{-\beta_0}{\beta_1} < x_{max}$: graphically this says the inflection point should be located in an area of 'overlap'. It would be nice to have a proof of this.

Can we give any bounds on $\beta_1$ in terms of $x_{min}$ and $x_{max}$? If we can that would give a nice elementary argument that the global minimum exists and is unique: a minimum would exist by compactness and the uniqueness could be established by computing the Hessian and observing that it is positive semidefinite. It might also be valuable for computational reasons: a good first guess for a solver might be the center of the rectangle in parameter space.

  • But we already know that logistic regression is convex: https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting. And logistic regression can be solved using iteratively reweighted least squares, so you don't need a first guess at all. – jbowman Jul 25 '23 at 17:44
  • @jbowman I know that it is known that logistic regression is convex. It would seem that IRLS would also need an initial guess. While that initial guess is ultimately irrelevant, starting with something at least in the right ballpark might offer some speedup? – Steven Gubkin Jul 25 '23 at 17:49
  • @jbowman My main interest is theoretical rather than practical though. – Steven Gubkin Jul 25 '23 at 17:51
  • Often you don't gain much speedup even from a great initial guess: in such problems, provided the first guess isn't completely ridiculous, usually the first step of the iteration accomplishes the most and everything else is just polishing. With simple logistic regression I believe you're right, there must be simple bounds, but it's difficult to see how that could be generalized to multiple explanatory variables. – whuber Jul 25 '23 at 17:54
  • @whuber For actually practical optimized solvers I am sure you are right. I implemented a very naive gradient descent solver and it took a lot of iterations. Initial guess was important for me. It would be nice to see those bounds in the case of simple logistic regression, even if it isn't so easy for multiple! – Steven Gubkin Jul 25 '23 at 17:57
  • 2
  • IRLS doesn't need an initial guess; you just start with all the weights equal to one. – jbowman Jul 25 '23 at 19:56
  • @jbowman That is a particular initial guess. I am willing to let this point die: I have already conceded that any benefit is probably extremely minimal. I am sad that an offhand comment at the end of my question has come to dominate the comments. I just want to know about these bounds! – Steven Gubkin Jul 25 '23 at 20:00

0 Answers0