0

I have reaction times (as a dependent variable), which I log-translate using the natural logarithm, to get it normally distributed. Then I run a regression on the log-translated data. I translated the values back using an exponential transformation.

My question - how do I account for the standard errors? Should I state their values after the transformation or not?

Your suggestions and help are much appreciated. Thank you!

Richard Hardy
  • 67,272
Sharon
  • 81
  • 1
    Why did you log-translate your dependent variable? There is no requirement (at least in multiple regression) to have your dependent variable normally distributed. Only your errors must be. By transforming your dependent variable, you may be making your task more difficult than it needs to be. What other variables, if any, are included in your model? – StatsStudent Aug 30 '20 at 14:07
  • 4
    @Stats Work going back to Weber and Fechner in the mid 19th century established the basis for expressing reaction times in terms of logarithms: this is the default, just as chemists by default use log concentrations for expressing their measurements. – whuber Aug 30 '20 at 14:09
  • 2
    Interesting -- you can tell I'm out of my domain of expertise here. ;-)

    Thanks @whuber.

    At any rate, this post may be helpful to the OP: https://stats.stackexchange.com/questions/123514/calculating-standard-error-after-a-log-transform

    – StatsStudent Aug 30 '20 at 14:10
  • Thank you @whuber :) – Sharon Aug 30 '20 at 16:40
  • Thank you @StatsStudent - looks very helpful :) – Sharon Aug 30 '20 at 16:40
  • @StatsStudent Technically one of the assumptions of OLS regression is that $y_i \sim N(\mu_i,\sigma^2)$. Meaning that the observations are assumed to be normally distributed with unknown parameters $\mu$ and $\sigma$. The only way to test this is to fit a model first so that we have parameter estimates. Then using the residuals $(y_i-\hat{y})$ the assumption $y_i \sim N(\mu_i,\sigma^2)$ can be evaluated. So it is true that a histogram of the raw values does not need to follow a normal distribution, but we do assume that the observation are normally distributed around unknown parameters ... – Stefan Aug 30 '20 at 16:58
  • ... I just wanted to mention this because to me it was always confusing why the residuals are tested and why they should be normally distributed. This may have certainly been clear to everyone else except me - please ignore this comment if that's the case! – Stefan Aug 30 '20 at 16:58
  • Thanks all :) So, do you think that I should be using:
    exp(mean(log(x))) * (sd(log(x))/sqrt(n-1)) in order to transform the standard error into the original (i.e., unlogged) scale?
    – Sharon Aug 30 '20 at 17:16
  • @Stefan, hence my comment about multiple regression. But the classic regression multiple makes mention only of the distribution of the error terms. Testing only the $y_i$ for normality only makes sense when your model has one independent variable. – StatsStudent Aug 30 '20 at 20:03
  • @StatsStudent I never said testing the raw $y_i$'s for normality. What is assumed to be normally distributed is the dependent variable around some unknown $\mu_i$ and $\sigma^2$. And that is the same whether there is one or more predictors in the model... – Stefan Aug 30 '20 at 21:02
  • ... So saying $y_i=\beta_0+\beta_1x_{i1}+...\beta_kx_{ik}+\epsilon_i$ where $\epsilon_i\sim N(0,\sigma^2)$ is the same as saying $y_i\sim N(\mu_i, \sigma^2)$ where $\mu_i=\beta_0+\beta_1x_{i1}+...\beta_kx_{ik}$, or plugged in $y_i\sim N(\beta_0+\beta_1x_{i1}+...\beta_kx_{ik}, \sigma^2)$. The probabilistic view highlights that $y_i$ is a normal independent and identically distributed (i.i.d.) random variable. – Stefan Aug 30 '20 at 21:03
  • Ahhh. Sorry. Yes. Sorry I misread your previous comments -- before I had my cup of coffee ;-) – StatsStudent Aug 30 '20 at 21:15
  • @StatsStudent no worries! I know that feeling ;) – Stefan Aug 30 '20 at 22:38
  • @whuber - as you have correctly stated, "work going back to Weber and Fechner in the mid 19th century established the basis for expressing reaction times in terms of logarithms: this is the default, just as chemists by default use log concentrations for expressing their measurements". Would you say that eye measurements (i.e., fixations duration) should be treated the same? – Sharon Sep 15 '20 at 11:06
  • @Sharon I'm not so sure about eye measurements. One might expect Fitts's Law to hold for saccades, but that's at best a starting point for investigation. As far as the duration of fixations goes, that's a different matter. In one dataset I have analyzed, the logarithm of the raw duration data had a skewed distribution, but it is likely the durations of the fixations (times within each cluster) would have a distribution that is much less skewed. – whuber Sep 15 '20 at 11:24
  • Thank you very much @whuber :-) – Sharon Sep 19 '20 at 06:59

0 Answers0