How to correctly evaluate a model?

Question

I am sorry, I have a simple question that I am confused about (I AM STILL A BEGINNER):

When I create a model let's say a decision tree model and I specify random_state=integer to get reproducible outputs, then I run cross validation (let's say kfold with k=5) and I also specify random_state=integer in my CV to get reproducible outputs, then take the average R^2 for my kfolds, is this enough to give me a clue about how good is my model?

new_model = DecisionTreeRegressor(max_depth=9,
                                  min_samples_split=2,random_state=0)
crossvalidation_Decision_Trees = KFold(n_splits=5, random_state=0,shuffle=True)

model2=new_model.fit(X_normalized, y_for_normalized)
scores_D_Trees = cross_val_score(model2, X_normalized,y_for_normalized, scoring='r2', cv=crossvalidation_Decision_Trees,
 n_jobs=1)
print("\n\nDecision Trees"+": R^2 for every fold: " + str(scores_D_Trees))
print('\033[1m'+"Decision Trees"+'\033[1m'+": Average R^2 for all the folds: " + str(np.mean(scores_D_Trees)) + '\033[0m'+ ", STD: " + str(np.std(scores_D_Trees)))

OR: Shall I remove the random_state from my decision tree model AND from my CV and let the code take different training and testing datasets every time I run the code, repeat that many times (let's say iterations=5) and at the end take the average R^2 for the average R^2 of my kfolds for these 5 iterations as an indicator for my model's performance? Will this be a better evaluation of my model?

new_model = DecisionTreeRegressor(max_depth=9,
                                      min_samples_split=2)
crossvalidation_Decision_Trees = KFold(n_splits=5,shuffle=True)

model2=new_model.fit(X_normalized, y_for_normalized)
scores_D_Trees = cross_val_score(model2, X_normalized,y_for_normalized, scoring='r2', cv=crossvalidation_Decision_Trees,
 n_jobs=1)
print("\n\nDecision Trees"+": R^2 for every fold: " + str(scores_D_Trees))
print('\033[1m'+"Decision Trees"+'\033[1m'+": Average R^2 for all the folds: " + str(np.mean(scores_D_Trees)) + '\033[0m'+ ", STD: " + str(np.std(scores_D_Trees)))

OR: Any of these approaches is acceptable?

Note: Let's ignore hyperparameter tuning for now.

dipetkov · Answer 1 · 2022-08-12T19:27:43.093

1

The scikit-learn team has written extensive tutorials on how to do cross validation well. You might want to give GridSearchCV a try. You can use it to cross-validate one model or many. This will make your code straightforward to extend when you want to use cross validation for model selection/hyperparameters tuning, as in your previous question.

Selecting dimensionality reduction with Pipeline and GridSearchCV
Cross-validation on diabetes Dataset Exercise
Comparing randomized search and grid search for hyperparameter estimation

from sklearn.datasets import load_diabetes
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeRegressor
X, y = load_diabetes(return_X_y=True)
model = DecisionTreeRegressor()
Default parameters
params = {}
or
Hard-coded parameters
params = {
    "max_depth": [9],
    "min_samples_split": [2],
}
n_splits = 10
The KFold default is no shuffling,
so we explicitly turn shuffling on.
cv = GridSearchCV(
    model, params, scoring="r2",
    cv=KFold(n_splits, shuffle=True),
)
cv.fit(X, y)
Average R-squared across the k folds
cv.cv_results_["mean_test_score"]
Standard deviation of the R-squared
cv.cv_results_["std_test_score"]

edited Aug 12 '22 at 19:27

answered Aug 11 '22 at 22:24

dipetkov

9,805

Thank you. I've used RandomizedSearchCV and GridSearchCV with CV in each but I haven't included this part of my code in the question. I was hoping for a more of a straightforward answer to my question. Like which approach 1 or 2 or both are acceptable? – Z47 Aug 11 '22 at 22:39
I think GridSearchCV is a better option than either approach 1 or 2, so that's why I wrote the answer. – dipetkov Aug 11 '22 at 22:45
let's assume GridSearchCV is not an option, may I know which approach you would go with and why? – Z47 Aug 11 '22 at 22:47
As the comment says, this is the average R-squared across the 10 folds. You can also get the scores for each fold. You will need to actually read the documentation to learn how to use these tools. – dipetkov Aug 12 '22 at 00:53
Thank you. I've figured that out that's why I've deleted my comment. The thing is now my R^2 is -216 and using kfold it was 0.42! – Z47 Aug 12 '22 at 01:00
Oversight on my part: I didn't show you how to shuffle the data, as scikit-learn KFold doesn't shuffle by default. You have to know how to turn it on. The most efficient way to learn this is reading the documentation. – dipetkov Aug 12 '22 at 19:25
I think I have shuffle=True in the code I posted. – Z47 Aug 12 '22 at 19:33
It matters where the shuffle=True is added. Anyway, I think the best use of your time will be to learn enough scikit-learn. And the best way to achieve this is to read the docs and do the tutorials. – dipetkov Aug 12 '22 at 19:38

How to correctly evaluate a model?

1 Answers1

Default parameters

params = {}

or

Hard-coded parameters

The KFold default is no shuffling,

so we explicitly turn shuffling on.

Average R-squared across the k folds

Standard deviation of the R-squared