To simplify things, I will ask my question in the case of simple logistic regression but I am also interested in the case with multiple explanatory variables.
Let $\vec{x} \in \mathbb{R}^N$ be the observed values of the explanatory variable and $\vec{y} \in \{0,1\}^N$ be the corresponding observed values of the binary response variable.
We want to fit a model of the form $p_{\beta_0,\beta_1}(x) = \sigma(\beta_1x+\beta_0)$ which minimizes the logistic loss function:
$$ \ell(\beta_0,\beta_1) = \vec{y}\cdot \log( p_{\beta_0,\beta_1}(\vec{x})) + (\vec{1} - \vec{y}) \cdot \log(\vec{1} - p_{\beta_0,\beta_1}(\vec{x})) $$
Letting
$$ X = \begin{bmatrix} \vec{1} & \vec{x}\end{bmatrix} \hphantom{dsds} \vec{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} $$
we have
$$ \nabla \ell = X^\top (p_{\vec{\beta}}(\vec{x}) - \vec{y}) $$
I understand that if the data is separable ( if the convex hull of $\{ x_i : y_i = 0\}$ is disjoint from the convex hull of $\{x_i: y_i = 1\}$) then there is a unique global minimum of the loss function. If the data is not separable then there is a unique global minimum.
Are there any easy bounds on the parameters in the non-seperable case?
Let the intersection of those convex hulls be the interval $[x_{min},x_{max}]$. I would guess that if the data is not separable then we can bound $ x_{min}< \frac{-\beta_0}{\beta_1} < x_{max}$: graphically this says the inflection point should be located in an area of 'overlap'. It would be nice to have a proof of this.
Can we give any bounds on $\beta_1$ in terms of $x_{min}$ and $x_{max}$? If we can that would give a nice elementary argument that the global minimum exists and is unique: a minimum would exist by compactness and the uniqueness could be established by computing the Hessian and observing that it is positive semidefinite. It might also be valuable for computational reasons: a good first guess for a solver might be the center of the rectangle in parameter space.