2

I'm running a ML algorithm on some data, and I noticed that if I change the random state inside the train_test_split function, accuracy score change in a quite wide range.

For example, with random state = 4, I reach an accuarcy score range that may vary from 0.78 to 0.8 (it depends by the seed in the algorithm). By using another value, like 42, it goes down to 0.65 - 0.69.

I don't have duplicates in the dataset, and the task is a multi-class text classification.

I really don't understand this beahviour, is there an explanation?

Thanks.

1 Answers1

1

You have $560$ total observations. Harrell has noted that splitting the data the way you have often leads to instability in the performance metrics until the sample size reaches $20000$ (this is probably in his Regression Modeling Strategies textbook with references to the primary literature, and he has written this sort of comment on here [1, 2] and on his blog [3], too). Consequently, your results, even if disappointing, are not surprising.

You might be curious to run your code through something like this:

# Loop over 500 seeds
#
for i in range(500):
# Set a new seed
# 
np.random.seed(i)

# Run the rest of your code to split the data, train the model, 
# and report the performance for seed i

This will try many seeds and likely show there to be a major dependence of accuracy on the seed. The interpretation would be that your performance in production is subject to major variability and that the one number you get for the one seed is not trustworthy.

Dave
  • 62,186