I am new in sklearn and I try to learn how to use cross-validation to choose the best model of an SVM. I found this example How to split the dataset for cross validation, learning curve, and final evaluation?and I tried to understand how it does work. Here are some lines that I am not sure that I have understood.
from sklearn.learning_curve import learning_curve
title = 'Learning Curves (SVM, linear kernel, $\gamma=%.6f$)' %classifier.best_estimator_.gamma
estimator = SVC(kernel='linear', gamma=classifier.best_estimator_.gamma)
plot_learning_curve(estimator, title, X_train, y_train, cv=cv)
plt.show()
1) What is the estimator object here, is it a clone of a the best model returned by the cross_validation? I did not think!
2) Is this function plot_learning_curve will apply the cross-validation selection again? I think yes, because it take a cross-valiation iterator.
classifier.score(X_test, y_test)
3) What is the model that returns this score. Is it the best model selected in the section 5) of the previous link?
classifier.fit(X,y)
4)What is the utility of this operation?
fitthat takes a training setX_trainand labelsy_train. Then it has a predict method that takes a test setX_test. The functionscoretakes a validation set with labelsX_validationandy_validation, it computes the sum of errors of prediction onX_validationcompared to the ground labelsy_validation. The functionplot_learning_curveis explained above. What else? – Vince.Bdn Feb 19 '16 at 13:20