This is a clearer version of this question.
I have two different cases: one where I have just the final accuracy, and another with the four error types split out:
Case 1 (no error types):
Dataset size: 78,000
Classifier 1: 51% accuracy
Classifier 2: 64% accuracy
Case 2 (with error types):
Dataset size: 78,000
Classifier 1 True Positive: 21%
Classifier 1 True Negative: 30%
Classifier 1 False Positive: 25%
Classifier 1 False Negative: 24%
Classifier 2 True Positive: 15%
Classifier 2 True Negative: 49%
Classifier 2 False Positive: 31%
Classifier 2 False Negative: 5%
In each of these cases, what statistical tests are available to determine whether the differences between the models are significant?
For each of these tests:
- What are the assumptions of the test?
- What extra information do I need to perform the test?
- What is the formula for performing the test?
- How should I interpret the results of the test?