-2

I am trying to find the accuracy of these two strings for 'H', 'E' and 'C', when comparing the predicted string (myPrediction) to the original (mySS). To find accuracy for alpha helices for example, I need the number of true positives (when the symbol at index[I] is 'H' in both strings), true negatives (when the symbol is either an 'E' or a 'C' in both strings), false positives (when the symbol in mySS is 'H' but the symbol in myPrediction is an 'E' or 'C') and false negatives (when the symbol in mySS is an 'E' or 'C' but the symbol in myPrecidtion is a 'H'). My code runs but does not give the desired answer. When there is a 'H' it represents an alpha helix, an 'E' represents a beta sheet and 'C' represents a coil.

myPrediction = 'HEEEEEEEEEEEEEEEEEHHHHHHHHCCCCEEEEEEEEEECHEEEEEEEEEEEEEEEEEEEEEEHEEEEEEEEEEEEEEEEEEHHHHHHHHHHHHHHHHHHHHHHHHHHHH'
mySS = 'CEEEEEEEEEEEEEEECCCCEEEEHHHCCCCEEEEEEEECCCCEEEEEEEECCCCCEEEEEEECCCCCCECCCCCEEEEECCCCEEEEEECCHHHHHHHHHHHHHHHHHHC'



atp = 0  # number of true positives (correctly identified calls)
atn = 0  # number of true negatives (correctly missed no-calls)
afp = 0  # number of false positives (incorrectly identified no-calls)
afn = 0  # number of false negatives (incorrectly missed calls)
etp = 0  # number of true positives (correctly identified calls)
etn = 0  # number of true negatives (correctly missed no-calls)
efp = 0  # number of false positives (incorrectly identified no-calls)
efn = 0  # number of false negatives (incorrectly missed calls)
ctp = 0  # number of true positives (correctly identified calls)
ctn = 0  # number of true negatives (correctly missed no-calls)
cfp = 0  # number of false positives (incorrectly identified no-calls)
cfn = 0  # number of false negatives (incorrectly missed calls)

for index in range(len(mySS)):
    i = 0
    for sym in mySS[i]:
        if sym == 'H':
            if sym in myPrediction == 'H':
                atp += 1
            else: 
                afp += 1       
        elif sym == 'E':
            if sym in myPrediction == 'H':
                afn += 1
            else: 
                atn += 1
        elif sym == 'C':
            if sym in myPrediction == 'H':
                afn += 1
            else: 
                atn += 1
        i += 1
        if sym == 'E':
            if sym in myPrediction == 'E':
                etp += 1
            else: 
                efp += 1       
        elif sym == 'H':
            if sym in myPrediction == 'E':
                efn += 1
            else: 
                etn += 1
        elif sym == 'C':
            if sym in myPrediction == 'E':
                efn += 1
            else: 
                etn += 1
        i += 1       
        if sym == 'C':
            if sym in myPrediction == 'C':
                ctp += 1
            else: 
                cfp += 1       
        elif sym == 'E':
            if sym in myPrediction == 'C':
                cfn += 1
            else: 
                ctn += 1
        elif sym == 'H':
            if sym in myPrediction == 'C':
                cfn += 1
            else: 
                ctn += 1
        i += 1       
        

print ("True  Positive for alpha = ", atp) 
print ("True  Negative for alpha = ", atn)
print ("False Positive for alpha = ", afp)
print ("False Negative for alpha = ", afn)
print ("Accuracy of alpha helices= ", (float(atp + atn) * 100 / (atp + atn + afp + afn)))

print ("True  Positive for beta = ", etp)
print ("True  Negative for beta = ", etn)
print ("False Positive for beta = ", efp)
print ("False Negative for beta = ", efn)
print ("Accuracy of beta sheets= ", (float(etp + etn) * 100 / (etp + etn + efp + efn)))
    
print ("True  Positive for coil = ", ctp)
print ("True  Negative for coil = ", ctn)
print ("False Positive for coil = ", cfp)
print ("False Negative for coil = ", cfn)
print ("Accuracy of coils= ", (float(ctp + ctn) * 100 / (ctp + ctn + cfp + cfn)))

The current output is:

True  Positive for alpha =  0
True  Negative for alpha =  111
False Positive for alpha =  0
False Negative for alpha =  0
Accuracy of alpha helices=  100.0
True  Positive for beta =  0
True  Negative for beta =  111
False Positive for beta =  0
False Negative for beta =  0
Accuracy of beta sheets=  100.0
True  Positive for coil =  0
True  Negative for coil =  0
False Positive for coil =  111
False Negative for coil =  0
Accuracy of coils=  0.0

I am expecting other numbers, specifically I know that true positive for alpha should be 20 as there are 20 times when the letter at index[I] is the same in both strings

  • 1
    What is the current output, what is the desired one ? – Marius ROBERT May 30 '22 at 07:42
  • 2
    Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community May 30 '22 at 07:44
  • 1
    Explain the problem in the question. It is unclear what you are tying to achieve – Raytheon_11 May 30 '22 at 07:48
  • what is alpha beta and coil. please explain what each of them are ... like what is alpha true positive or what is coil false negative ? – Raytheon_11 May 30 '22 at 07:52
  • Does this answer your question? [Find the similarity metric between two strings](https://stackoverflow.com/questions/17388213/find-the-similarity-metric-between-two-strings) – Areza May 30 '22 at 07:54
  • 1
    Please _do not_ explain alpha, beta and coil; protein biochemistry is not necessary to understand your string comparison. Please do explain what counts as a "false positive", etc. Normally in string comparison you have matches, mismatches (optional), insertions and deletions. – alexis May 30 '22 at 07:57
  • Hint: what on earth is `if sym in myPrediction == 'H'` supposed to do? Try debugging this statement and you will probably find your solution. – wovano May 30 '22 at 07:59
  • I'm new to python, i figured that the code would do what is says. if sym in myPrediction == 'H' kind of to me implies that is the symbol in the sequence myPrecidtion is H then the code will run. Am I wrong? – porgyporridge May 30 '22 at 08:05
  • 1
    @porgy, if you don't know what a programming language construct will do, don't guess; test it. I'm not even sure what that three-part expression will do. – alexis May 30 '22 at 08:21

0 Answers0