I am trying to find the accuracy of these two strings for 'H', 'E' and 'C', when comparing the predicted string (myPrediction) to the original (mySS). To find accuracy for alpha helices for example, I need the number of true positives (when the symbol at index[I] is 'H' in both strings), true negatives (when the symbol is either an 'E' or a 'C' in both strings), false positives (when the symbol in mySS is 'H' but the symbol in myPrediction is an 'E' or 'C') and false negatives (when the symbol in mySS is an 'E' or 'C' but the symbol in myPrecidtion is a 'H'). My code runs but does not give the desired answer. When there is a 'H' it represents an alpha helix, an 'E' represents a beta sheet and 'C' represents a coil.
myPrediction = 'HEEEEEEEEEEEEEEEEEHHHHHHHHCCCCEEEEEEEEEECHEEEEEEEEEEEEEEEEEEEEEEHEEEEEEEEEEEEEEEEEEHHHHHHHHHHHHHHHHHHHHHHHHHHHH'
mySS = 'CEEEEEEEEEEEEEEECCCCEEEEHHHCCCCEEEEEEEECCCCEEEEEEEECCCCCEEEEEEECCCCCCECCCCCEEEEECCCCEEEEEECCHHHHHHHHHHHHHHHHHHC'
atp = 0 # number of true positives (correctly identified calls)
atn = 0 # number of true negatives (correctly missed no-calls)
afp = 0 # number of false positives (incorrectly identified no-calls)
afn = 0 # number of false negatives (incorrectly missed calls)
etp = 0 # number of true positives (correctly identified calls)
etn = 0 # number of true negatives (correctly missed no-calls)
efp = 0 # number of false positives (incorrectly identified no-calls)
efn = 0 # number of false negatives (incorrectly missed calls)
ctp = 0 # number of true positives (correctly identified calls)
ctn = 0 # number of true negatives (correctly missed no-calls)
cfp = 0 # number of false positives (incorrectly identified no-calls)
cfn = 0 # number of false negatives (incorrectly missed calls)
for index in range(len(mySS)):
i = 0
for sym in mySS[i]:
if sym == 'H':
if sym in myPrediction == 'H':
atp += 1
else:
afp += 1
elif sym == 'E':
if sym in myPrediction == 'H':
afn += 1
else:
atn += 1
elif sym == 'C':
if sym in myPrediction == 'H':
afn += 1
else:
atn += 1
i += 1
if sym == 'E':
if sym in myPrediction == 'E':
etp += 1
else:
efp += 1
elif sym == 'H':
if sym in myPrediction == 'E':
efn += 1
else:
etn += 1
elif sym == 'C':
if sym in myPrediction == 'E':
efn += 1
else:
etn += 1
i += 1
if sym == 'C':
if sym in myPrediction == 'C':
ctp += 1
else:
cfp += 1
elif sym == 'E':
if sym in myPrediction == 'C':
cfn += 1
else:
ctn += 1
elif sym == 'H':
if sym in myPrediction == 'C':
cfn += 1
else:
ctn += 1
i += 1
print ("True Positive for alpha = ", atp)
print ("True Negative for alpha = ", atn)
print ("False Positive for alpha = ", afp)
print ("False Negative for alpha = ", afn)
print ("Accuracy of alpha helices= ", (float(atp + atn) * 100 / (atp + atn + afp + afn)))
print ("True Positive for beta = ", etp)
print ("True Negative for beta = ", etn)
print ("False Positive for beta = ", efp)
print ("False Negative for beta = ", efn)
print ("Accuracy of beta sheets= ", (float(etp + etn) * 100 / (etp + etn + efp + efn)))
print ("True Positive for coil = ", ctp)
print ("True Negative for coil = ", ctn)
print ("False Positive for coil = ", cfp)
print ("False Negative for coil = ", cfn)
print ("Accuracy of coils= ", (float(ctp + ctn) * 100 / (ctp + ctn + cfp + cfn)))
The current output is:
True Positive for alpha = 0
True Negative for alpha = 111
False Positive for alpha = 0
False Negative for alpha = 0
Accuracy of alpha helices= 100.0
True Positive for beta = 0
True Negative for beta = 111
False Positive for beta = 0
False Negative for beta = 0
Accuracy of beta sheets= 100.0
True Positive for coil = 0
True Negative for coil = 0
False Positive for coil = 111
False Negative for coil = 0
Accuracy of coils= 0.0
I am expecting other numbers, specifically I know that true positive for alpha should be 20 as there are 20 times when the letter at index[I] is the same in both strings