In scikit-learn, should classifiers be reinstated after every fold?

Question

In scikit-learn, I don't see any classifier "unfit" or "unlearn" method similar to the untrain method of the classifiers in pyMVPA

http://www.pymvpa.org/generated/mvpa2.clfs.svm.LinearCSVMC.html#mvpa2.clfs.svm.LinearCSVMC

When I was using pyMVPA, it made sense to me to call the untrain method after I had done a cross validation fold, the code would be something like like:

clf = someClassifier #can initialize outside of the loop
for fold in range(len(numRuns)):
   clf.train(trainingDataset[fold])
   clf.predict(testingDataset[fold])
   clf.untrain() # to reset and prepare for next fold

I don't see any sort of untrain method in the scikit-learn classifiers. Is it safe to simply call the clf.predict() method on the same classifier object repeatedly without explicitly doing some sort of resetting of the classifier in between? Or, should the classifier just be reinstated fresh inside the for-loop rather than initiated once outside the for-loop?

Lastly, I do understand that scikit-learn has some other functions meant to do all the cross validating stuff automatically and has a pipeline function that does each of the steps for you. However, I would also like to have the ability to program a valid analysis without requiring the use of those functions.

Thank you!

score 3 · Accepted Answer · answered Feb 09 '18 at 05:35

Yes, in sklearn you should create a new object for each fold of your cross validation. My general pattern looks like this (approximately):

models = []
train_errors = np.empty(shape=n_folds)
test_errors = np.empty(shape=n_folds)
for idx, (train, test) in enumerate(cv_folds):
    model = Model()
    model.fit(train)
    models.append(model)
    train_errors[:, idx] = model._loss(train)
    test_errors[:, idx[ = model._loss(test)

I think this is a better pattern than "untrain"ing: it avoids mutable state, keeps the API simpler, and allows you to hold onto the trained models to fuss with later.

You won't hear any noise from me about rolling your own cross validation in sklearn, it's much more flexible this way.

In scikit-learn, should classifiers be reinstated after every fold?

1 Answers1