1

My response variable is a metric calculated from the normalised ratio of two variables.

Calculated as (a-b)/(a+b), resulting in a normalised ratio of continuous data bounded between -1 and +1 - my metric contains no values that are exactly -1 or 1, although heavily skewed to < 0.

Is it possible to transform the response variable to fit a GLMM? Or are there other mixed effects models I could consider?

CM3
  • 11
  • Transform your $y$ to $(y + 1)/2$ and it's a fraction to be treated by fractional logit. – Nick Cox Apr 07 '16 at 17:10
  • If the ratio that you describe is bounded by -1 and 1, it seems likely that the individual $a$ and $b$ values are either all positive or all negative. Is that the case? – EdM Apr 09 '16 at 17:38
  • Sorry, I should have given more detail in my original question. That's correct, all a and b values are positive. They are measures of the power spectral density of two frequency bands in sound recordings. – CM3 Apr 09 '16 at 22:28

1 Answers1

2

Following up on the suggestion from Nick Cox, this problem takes a very simple form, given that $a$ and $b$ are strictly positive.

If$$y=\frac{a-b}{a+b}$$ then $$\frac{y+1}{2}=\frac{a}{a+b}$$ and $$\text{logit}\left(\frac{a}{a+b}\right)=\log\left(\frac{a}{b}\right)=\log a -\log b.$$

This form simplifies thinking about the nature of error terms of this transformed response, in terms of the variances of the log-transformed $a$ and $b$ and their covariance.

For your application to acoustic power spectral densities, this is similar to examining differences on a decibel-type scale. You might consider using this form instead for your response variable, as it includes all the information in your original proposal, has a simple interpretation, and will have much better properties for statistical testing, including use in mixed models as your mention of GLMM suggests.

Added in response to comment:

The above shows that using $(\log a -\log b)$ as a response variable in an ordinary linear regression does much the same thing as analyzing your $\frac{a-b}{a+b}$ ratio, transformed as suggested by @NickCox, as a response variable via a logit in a generalized linear model (GLM). Whatever you are trying to measure with that ratio can also be measured by the difference in log power spectral densities, in a way that provides a good deal more flexibility and possibly power.

Using the difference in log power spectral densities as the response variable, you might be able to perform ordinary linear regression and avoid use of a generalized model completely, unless you need to examine random effects or repeated measures, which would make this a mixed model that might need analysis as a GLMM (generalized linear mixed model) depending on experimental design.

Going a step further, working in the log scale of all the individual power spectral density measurements as the response variable might make it easier to examine and explain influences on the joint distribution of $a$ and $b$ and their absolute levels, something that is lost in the ratio that you propose.

For example, say you had a predictor variable $X$ that might be related both to the absolute levels of $a$ and $b$ and to the ratio that you originally wished to examine. If $PSD$ stands for measurements of power spectral density, indexed by a binary factor variable $band$ that represents whether a measurement is in the $a$ or the $b$ band, you could examine the following type of model:

$$\log PSD = \beta_0 + \beta_b band + \beta_X X + \beta_{bX}(band:X)$$

In default R coding, $\beta_0$ would be the $\log PSD$ at the baseline values of your predictor variables, $\beta_b$ the difference in $\log PSD$ between the 2 bands at the baseline value of $X$, and $\beta_X$ the influence of $X$ on $\log PSD$ at the baseline value of $band$. $\beta_{bX}$ is the interaction between $band$ and $X$, which can be thought of as the influence of variable $X$ on the $\log PSD$ difference between the two bands.

Testing significance of that interaction term would be functionally the same as looking for a relation of your ratio to $X$ in a GLM (perhaps with some differences in the model of the error variances), but in a way that also takes into account the influence of $X$ on $a$ and $b$ jointly, via $\beta_x$. Even if you need to use a GLMM, you are probably better off developing a model of the individual $a$ and $b$ measurements in this way, rather than throwing away so much information by first taking the ratio that you propose.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for the detailed response. Forgive me, but I'm very new to this. When specifying a binomial distribution with a logit link for a GLMM in R, a prior weights argument is required. I am not sure what this would be in my case? Also in terms of justifying my model choice, is a fractional logit model more appropriate than something like a beta regression? – CM3 Apr 13 '16 at 10:25
  • You may be throwing away a lot of information by using this ratio. I think you might be better off modeling the individual PSD measurements in a log scale, as my addition to the answer indicates, including the band as a predictor variable, and looking for interactions of band with other predictors. You might also be able to use ordinary linear regression. If you use a GL(M)M, 'weights' arguments are typically optional, with a default of no different weights among cases. An appropriate model is one that fits well without overfitting and gives well behaved error terms. That depends on your data. – EdM Apr 13 '16 at 16:22