8

Ratios (e.g. $Z$=$Y$/$X$) are frequently used (e.g. fold-changes in mRNA or protein expression, body mass index [BMI], etc.). Many people advise that variables coded as ratios (e.g. fold-change) should be log transformed because they are heavily skewed to the right. However, ratios ($Y$/$X$) are relative changes and ratio distributions are not normal (en.wikipedia.org/wiki/Ratio_distribution). If both $X$ and $Y$ are lognormal, then log($Y$/$X$) is normal (is $Y$/$X$ lognormal after taking retransformation bias into account?)

The comparisons between the log transformed ratios are relative changes of the relative changes (i.e. the ratios). Moreover, the necessity of log transformation for right-skewed variables ($Y$) has been questioned. For example, a recent paper (http://www.ncbi.nlm.nih.gov/pubmed/22806695) cautions about the misuses of log transformation for a variable. Some of the advices were that log($Y$) guarantees normal distribution only if $Y$ is lognormal. Namely, it does not guarantee normality even for right-skewed variables. Moreover, the anti-log of E(log($Y$)) is the geometric mean (GM) of $Y$, which is always less than E($Y$) and the tests of the differences of E($Y$) and the GM are different. Finally, the GM is neither more robust nor less likely to be affected by the outliers.

Another paper (http://econtent.hogrefe.com/doi/10.1027/1614-2241/a000110) showed that t-tests on the raw variables performs well even for lognormally distributed variables. A 3rd paper (http://link.springer.com/article/10.1023%2FB%3AEEST.0000011364.71236.f8) showed that the performance of t-test on the ratios and t-test on the log-transformed ratios are similar.

Thus, the question becomes which is the outcome of interest. Because log($Z$) has to be back-transformed to the original units to be meaningful and because of the retransformation bias, I think that the tests of E($Z$) are more meaningful.

Fortunately, parametric tests (e.g. t-tests) are robust to the violation of normality assumption once heteroscedasticity is accounted for (e.g. Welch's t-test). For example, this paper (http://www.ncbi.nlm.nih.gov/pubmed/24738055) advises to use ANOVA to test the differences among raw fold-changes in immunoblotting.

So my question is: If my goal is to test the absolute change of the ratios, can I compare the ratios directly without log transformation?

Reference: In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

KuJ
  • 1,586
  • 4
  • 16
  • 27
  • Am I missing something? If $X$ and $Y$ are log-normal, then surely $X/Y$ is also log-normal...? – M Turgeon Jun 29 '16 at 16:15
  • @Turgeon: Yes, log(Y/X) is normal. But I am not sure if Y/X is lognormal when retransformation bias is taken into account. I think that the Wikipedia page has to elaborate on this idea. – KuJ Jun 29 '16 at 17:04
  • can I compare the ratios directly without log transformation? In at least the following case and implicitly, you are doing the "comparison" of ratios. It is when you compute $\chi^2$ statistic of a contingency table. One way to put its formula is $\sum_{rc}[O_{ij}G_{ij}] - N$, where $O_{ij}$ is the obs. freq. in the cell and $G_{ij}$ is the ratio of it to the expected frequency there. And, therefore, when you compute the (squared) chi-square distance between rows i and i' in the table you are computing differences between the ratios: $d_{ii'}=1/N \sum_c[O_{.j} (G_{ij}-G_{i'j})^2]$. – ttnphns Jun 30 '16 at 12:15

3 Answers3

9

Not only do distributions of untransformed ratios have odd shapes not matching the assumptions of traditional statistical analysis, but there is no good interpretation of a difference in two ratios. As an aside if you can find an example where the difference in two ratios is meaningful, when the ratios do not represent proportions of a whole, please describe such a situation.

As a variable used in statistical analysis, ratios have the significant problem of being asymmetric measures, i.e., it matters greatly which value is in the denominator. This asymmetry makes it almost meaningless to add or subtract ratios. Log ratios are symmetric, and can be added and subtracted.

One can spend a good deal of time worrying about what distribution a test statistic has or correcting for the distribution's "strangeness" but it is important to first choose an effect measure that has the right mathematical and practical properties. Ratios are almost always meant to be compared by taking the ratio of ratios, or its log (i.e., double difference in logs of original measurements).

Brett
  • 6,194
  • 3
  • 33
  • 41
Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
  • Dear Professor Frank Harrell: Thank you for your kind answer. I have revised the question.

    Two examples:

    1. BMI is not normal (http://www.ncbi.nlm.nih.gov/pubmed/26973438) and may or may not be lognormal. If it is, then log(BMI) is normal. If it is not, then log(BMI) is not normal. However, log(BMI) is rarely used.
    2. Fold-changes (Y1/X, Y2/X) of proteins or genes of two experimental groups (Y1, Y2) are compared to a control group (X). Thus, the difference in two ratios are meaningful but the relative difference is not because both experimental groups are normalized by a common control.
    – KuJ Jun 28 '16 at 11:18
  • 2
    That logic is not correct. Assuming BMI is the dependent variable, it will behave better when logged than when not logged, with respect to linear model assumptions, although better would be to model weight adjusted for height and initial weight. The fact that fold changes in protein expression are compared doesn't mean you subtract two fold changes. The more proper measure would be to take the ratio of fold changes. Normalization is another issue altogether. The practice of separate normalization steps is not good statistically because it assumes controls are measured without error. – Frank Harrell Jun 28 '16 at 13:38
  • Do you mean that the GM is better than the arithmetic mean of the ratios or the fold-changes (even though the ratios or fold-changes may not be lognormal and the cautions issued by the 1st paper)?

  • This paper (http://link.springer.com/article/10.1023%2FB%3AEEST.0000011364.71236.f8) showed that the performance of t-test on the ratios and t-test on the log-transformed ratios are similar.

  • Thank you.

    – KuJ Jun 28 '16 at 15:34
  • No don't mean that . Basic idea is that it is improper to subtract measures that are asymmetric. – Frank Harrell Jun 28 '16 at 20:06
  • I agree that it is improper to subtract measures that are not symmetric. However, ratio distributions are neither normal nor lognormal (https://en.wikipedia.org/wiki/Ratio_distribution). Thus, log(Y/X) is not normal and the 1st paper said that it is not more symmetric than Y/X. It follows that the difference "log(Y1/X) - log(Y2/X)" is also improper. – KuJ Jun 29 '16 at 01:57
  • 1
    That paper is drammatically incorrect then. log(Y/X) is mathematically a symmetric function and the distribution of log ratios is much more symmetric than the distribution of ratios. – Frank Harrell Jun 29 '16 at 04:11
  • 4
    I would underline what is not part of this excellent advice. Whether ratios are exactly or even approximately lognormal before and exactly or even approximately normal after transformation can't be predicted in advance or in general, and fortunately is quite secondary. The key point is that ratios of positive numbers are often so extraordinarily skewed given that $X < Y$ maps to $0 < X / Y < 1$ and $X > Y$ maps to $X / Y > 1$ that plotting untransformed data, using them in models and thinking about them is almost always much more awkward than working with their logarithms. – Nick Cox Jun 30 '16 at 11:46
  • (contd) Worry about precisely which $t$ test is best is misplaced; if that's really problematic, there are alternatives. – Nick Cox Jun 30 '16 at 11:47