I'm trying to get the same linear SVM classifier model by using Scikit-Learn's SVC, LinearSVC and SGDClassifier classes. I managed to do so (see the code below), but only by manually tweaking the alpha hyperparameter for the SGDClassifier class.
Both SVC and LinearSVC have the regularization hyperparameter C, but the SGDClassifier has the regularization hyperparameter alpha. The documentation says that C = n_samples / alpha, so I set alpha = n_samples / C, but when I use this value, the SGDClassifier ends up being a very different model than the SVC and LinearSVC models. If I manually tweak the value of alpha, I can get all models to be approximately the same, but there should be a simple equation to find alpha given C. What is it?
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
C = 5
alpha = len(X)/C # alpha == 20
sgd_clf1 = SGDClassifier(loss="hinge", alpha=alpha, n_iter=10000, random_state=42)
sgd_clf2 = SGDClassifier(loss="hinge", alpha=0.0007, n_iter=10000, random_state=42)
svm_clf = SVC(kernel="linear", C=C)
lin_clf = LinearSVC(loss="hinge", C=C)
X_scaled = StandardScaler().fit_transform(X)
sgd_clf1.fit(X_scaled, y)
sgd_clf2.fit(X_scaled, y)
svm_clf.fit(X_scaled, y)
lin_clf.fit(X_scaled, y)
print("SGDClassifier(alpha=20): ", sgd_clf1.intercept_, sgd_clf1.coef_)
print("SGDClassifier(alpha=0.0007): ", sgd_clf2.intercept_, sgd_clf2.coef_)
print("SVC: ", svm_clf.intercept_, svm_clf.coef_)
print("LinearSVC: ", lin_clf.intercept_, lin_clf.coef_)
This code outputs:
SGDClassifier(alpha=20): [-0.46597258] [[ 0.0283698 -0.03634389]]
SGDClassifier(alpha=0.0007): [ 0.0422716] [[ 0.79608868 -1.48847539]]
SVC: [ 0.04569242] [[ 0.79788013 -1.48716383]]
LinearSVC: [ 0.04556911] [[ 0.79762806 -1.4866854 ]]
Note: to make the LinearSVC class output the same result as the SVC class, you have to center the inputs (eg. using the StandardScaler) since it regularizes the bias term (weird). You also need to set loss="hinge" since the default is "squared_hinge" (weird again).
So my question is: how does alpha really relate to C in Scikit-Learn? Looking at the equations, the documentation should be right, but in practice it is not. What's going on?