4

To calculate the retention index for a phylogenetic tree, we use the following formula:

$$\frac{\text{maximum number of steps on tree - number of steps on the tree}} {\text{maximum number of steps on tree - minimum number of steps in the data}}$$

To calculate the maximum number of steps, we look at the number of observed character states for a character and take the lowest number. For each character, we take the lowest number of observed traits and then sum across all characters.

Take the following dataset (columns are characters, taxa are rows):

          123456
Outgroup  000000
Taxa A    100001
Taxa B    110001
Taxa C    111001
Taxa D    111101
Taxa E    111111
Taxa F    111110

The maximum number of steps for the first character is 1, since the character state 0 is attested once and character state 1 occurs six times.

What I don't understand is why we do not use 6 for the maximum number of changes for the first character.

Namenlos
  • 317
  • 1
  • 8
  • Could you explain what is the matrix? Each column is a feature, but what does 0 or 1 represent, a step? Perhaps I would understand better if you could show the retention index of this dataset – llrs Oct 25 '18 at 07:57
  • The retention index is used to gain insight into how well the data conform to the most parsimonious tree. 0 and 1 represent the character state. – Kohl Kinning Oct 25 '18 at 14:37
  • Agreed RI is used for a parsimony tree. The minimum of steps requires searching the total tree space of the data and is not trivial to obtain without an algorithm. The maximum number of steps would computationally derrived. – M__ Dec 28 '18 at 21:16
  • found this very useful. I was wondering you could help @Kohl, how do I go about working out max steps if I have a matrix with not only 0s and 1s, but other integers such as: eg taxonA 11000203044.
    taxonB 11001102034 Thanks for your help, hope you see this after it being closed for so long
    – user8918 May 26 '20 at 10:28

1 Answers1

1

The maximum number of steps (or changes) is the number of taxa with state 1 or 0, whichever is smaller. Conceptually, this is the number of character changes if each taxon evolved its state independently of the other taxa.

If for a given character the character state 0 occurred twice and character state 1 occurred five times, 2 would be the maximum number of changes.

Kohl Kinning
  • 1,149
  • 6
  • 26
  • Thank you for this. Could you just edit your answer to explain why in the scenario you sketch above the five-time occurrence of character state 1 is not used in calculating the maximum number of steps? What I find confusing is that if character state 1 occurs five times, could that not represent five independent (i.e., homoplasious) character state changes? – Namenlos Oct 25 '18 at 15:54