Say we have a model and some hyper-parameter with L values, and our goal is model selection. A k-fold CV outputs L accuracies (each accuracy is an average over K values). The best model corresponds to the highest accuracy.
When comparing the L accuracies in order to select the best model, why don't we do any significance testing?
My guess is that there is no cost in wrongly rejecting the null hypothesis (== saying one model is better than the other while they are equivalent), therefore there is no harm in selecting the model with the highest accuracy. The worse that can happen is that all L models are equivalent, but even if that's the case we still want to select one of them, so there is no harm in selecting the highest accuracy (it's as arbitrary as anything else)