Suppose that we have the following contingency matrix :
| T1 | T2 | T3
---------------------
C1 | 0 | 53 | 10
C2 | 0 | 1 | 60
C3 | 0 | 16 | 0
source : How to calculate purity?
As mentioned in the source , the purity is computed as :
Purity = (53 + 60 + 16) / 140 = 0.92142
This seems contradictory to me because from what i understood :
- If the purity is equal to 1 then the confusion matrix must be diagonal ( which means the accuracy of clustering is 100% ). However, we have here a purity value that is near the "1" , and if we read the confusion matrix above it's clear that the accuracy of the used clustering method is low. ( because of the total number of observations and the first column equal to 0 ).
My questions :
- Am I understanding the purity concept correctly? if-else could someone clarify the situation i presented from the mentioned source?
Thank you in advance for help !