In semantic segmentation task evaluation with the following properties :
- batch size of the test set : 4
- shape of a target mask/predicted mask : (1, 512, 512)
- number of batches in the test set : 30
- used dice score calculation formula : Sum of [(2 x (target_batch * pred_batch).sum + 1e-8) / (target_batch.sum + pred_batch.sum + 1e-8)]
In order to evaluate a model's performance with the Dice Coefficient, which one is a proper way to do it?
evaluate summed DL(according to the formula given above), and divide it with 30 (# of batches in the test set) to give an averaged final Dice Score evaluation of the model.
Set batch size of the test set as 1 (single sample), and evaluate summed DL(according to the formula given above). Then, divide it with the sample size of the test set to give a final Dice Score evaluation of the model.
Also, I would like to know why is that a proper way to do it.