I have started learning bioinformatics. There are some matter of finding expected value. But I think I am very weak in calculating such types of things.
As expected value is related to statistics, its explanation is skipped in bioinformatics. So, I am posting it here.
Question:
Suppose, I have 500 strings, each having length 1000.
Now, I have to calculate the expected number of occurrences of a sub-string having length exactly 9.
Notice that, the string contains only four letters A, T, G, C with same probability (each 0.25).
Another thing to be noted: Overlapping strings should be counted.
My Approach:
The probability of existing a 9-length sub-string among all 9-length sub-strings = $ (0.25)^9 $
The number of occurrences of a 9-length sub-string in a string having length 1000 = $ (1000-9+1) * (0.25)^9 $
If the number of such string becomes 500, then the number of occurrences would be = $ 500 * (1000-9+1) * (0.25)^9 $
But I did wrong somewhere, may be in assumption or in calculation.
Could you please guide me to get the actual solution?
Source:
This problem is a part of Bioinformatics course track in Coursera.
Accuracy:
Allowable error = 0.0001
As allowable error is 0.0001, the given calculation serves the purpose and gives a good approximation. It was my bad that I entered less digits there and got that wrong.
The answer is: $1.8920898$
However, This answer gives an approximation about the probability. But when it is converted to expected value by multiplying, it becomes a little bit bad and does not serve the purpose. It gives answer: $1.8885179$. According to whuber ♦'s calculation in comment, it came $1.895678$ which also does not serve the purpose.
Anever appears. The expected count ofAAAAAAAAAtherefore is zero. In another model, all strings start and end with 100 A's. The expected count ofAAAAAAAAAtherefore is at least 184. – whuber Aug 19 '16 at 18:221.8885179which does not pass. My solution is1.8920898which passes. – Enamul Hassan Aug 19 '16 at 18:421.895678from Henry's answer? However, your calculated answer also did not pass. – Enamul Hassan Aug 19 '16 at 19:09