What is the difference between a solver and an optimization algorithm?

Question

In scikit-learn's LogisticRegression docs they write

This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers

Logistic regression doesn't have a closed form solution. So it must use some optimization algorithm like gradient descent or Adam. So all, we need are the partial derivatives and we should be good to go. So what are these "solvers" and where do they fit into the picture?

You will get better answers here: https://or.stackexchange.com/ — mhdadk, Apr 05 '23 at 16:54
I see. I just wanted a friendly introduction to it from an ML practitioner's perspective. — statnoob, Apr 05 '23 at 17:00
If there is one overarching message in an introductory Calculus course -- the one thing you ought to remember after you have forgotten all the details -- it is that you can optimize by solving equations: namely, you can find maxima and minima of (differentiable) functions by finding zeros of their derivatives. — whuber, Apr 05 '23 at 18:07
@whuber yes that is called the closed form solution, right? Logistic regression does not have this. So I don't think the "solver" in my question actually corresponds to "solving equations" in that sense. — statnoob, Apr 05 '23 at 18:09
Let's not confuse the concepts with the computing techniques. In most applications it doesn't matter whether a "closed form solution" exists. A standard example concerns roots of univariate polynomials: even for cubics and quartics, it's rare to employ the closed-form solutions (known since the early 1500s); one uses a numerical solver even in those situations. Gradient descent usually is the very worst possible method to apply in any case; and frequently, it's better not to have the partial derivatives. — whuber, Apr 05 '23 at 18:17

score 1 · Accepted Answer · answered Apr 05 '23 at 17:53

1

There are actually various approaches to finding the minimum of your loss function - the "classical" gradient descent algorithm is only one such approach. Gradient descent uses a linear approximation to the function (gradient is the slope of this linear approximation), but another notable example is Newton's Method, which uses a quadratic approximation (involving the Hessian matrix). So in the context you ask about, "solver" is referring to the particular method for seeking the minimal loss.

There's a very nice description of all of these solvers at this thread:

https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-definitions#:~:text=The%20SAGA%20solver%20is%20a,theoretical%20convergence%20compared%20to%20SAG.

answered Apr 05 '23 at 17:53

Eamonn Tweedy

26

1

A footnote to my answer above: Newton's method provides some trade-offs versus gradient descent. For example, Newton's method can often provide faster convergence than gradient descent because it uses the second derivative. In fact, in the case of ordinary linear regression with OLS loss function, the "vanilla" Newton's method converges to the optimal fit in one iteration because the Newton's method approximation to the loss function is exactly equal to the loss function. One downside of Newton's method is that computing the Hessian matrix is expensive. – Eamonn Tweedy Apr 06 '23 at 17:34
Using Newton's method for OLS linear regression also requires the design matrix to be full rank. The case in which it isn't coincides exactly with the case in which there doesn't exist a unique optimal fit. See also this thread:
https://stats.stackexchange.com/questions/514095/optimizing-ols-with-newtons-method
– Eamonn Tweedy Apr 06 '23 at 17:37

What is the difference between a solver and an optimization algorithm?

1 Answers1