It seems like something related to McFadden's pseudo $R^2$ might do what you want.
In the usual $R^2$ in linear regression, there is a comparison of the square loss incurred by the model of interest $(L_1)$ and the square loss incurred by a model that predicts the overall mean $\bar y$ every time $(L_0)$.
$$
R^2 = \dfrac{L_0 - L_1}{L_0}
$$
I think of it this way: I start out with a loss of $L_0$ and wind up with a loss of $L_1$, so how much of the original loss $L_0$ has been accounted for? Writing the calculation as in $R^2$ puts this in terms of a proportion of the original loss $L_0$.
McFadden's pseudo $R^2$ takes the same stance but uses crossentropy loss instead of square loss. In you case, you know the crossentropy loss of your model of interest, $L_1 = 0.04$ after those $20$ training epochs. You can calculate the $L_0$ loss by training on an intercept-only model. Then you do that same calculation.
$$
R^2_{\text{McFadden}} = \dfrac{L_0 - L_1}{L_0}
$$
(If you want to do out-of-sample asessments, my argument here to base $L_0$ on the training data holds for this situation, too.)
Overall, $
R^2_{\text{McFadden}} = \dfrac{L_0 - L_1}{L_0}
$ could be seen as measuring the percent decrease in crossentropy loss compared to this kind of baseline "must-beat" model.
None of this is related to the usual classification accuracy, but accuracy turns out to be a surprisingly problematic measure of performance, anyway.