0

I am planning on doing a regression analysis on STATA on the financial performance of private equity funds. On my descriptive statistics, I saw higher levels of kurtosis and skewness. I decreased these by using the winsorization method for outliers. However, I still have high levels, especially on the kurtosis.

  1. Would it be correct to do log transformations on all of the variables (dependent and control) in order to decrease kurtosis, or is this method usually only used for skewness?

  2. If yes, is it okay that I do the log transformation for the variables that have excess kurtosis, even if they have normal ranges of skewness?

  3. I have researched online and have seen that -2 to +2 are the "normal" ranges for kurtosis and skewness...is this correct for both?

Please see below an image of my data! If you cant see the picture, to summarize, I have kurtosis levels ranging for 2.50-22 and skewness levels ranging from 1-4 on the dependent and control variables.

Lucy
  • 1
  • 1
    Welcome to Cross Validated! What do you want to “fix” by altering the kurtosis? – Dave Sep 07 '23 at 11:42
  • Thank you for your answer, happy to be here! I am a beginner in statistics and I assumed its better to lower the kurtosis to address the non-normality and thus do log transformations on the dependent variables and control variables. – Lucy Sep 07 '23 at 12:16
  • 1
    It is a mathematical fact that taking the logarithm will decrease (make more negative) the skewness of the distribution. It might as a byproduct decrease the kurtosis -- but that's a secondary effect and should not be a consideration in your analysis. // There is no such thing as "normal" ranges of skewness and kurtosis. I have seen reasonable, realistic datasets with kurtosis into the hundreds and thousands. // For a principled approach to identifying transformations in regression, see https://stats.stackexchange.com/a/35717/919. Also see https://stats.stackexchange.com/a/60455/919. – whuber Sep 07 '23 at 12:20
  • Thank you for your answer whuber! In that case, would you say that I could just leave the kurtosis and skewness as is and perform an OLS or robust regression? If it is possible, I would rather not do other transformations to the data if not needed, as I am only a beginner in statistics currently! – Lucy Sep 07 '23 at 12:54
  • The entire discussion becomes moot when you Winsorize first, because now you are trying to model some sort of truncated distribution that has no connection to to processes you are trying to model. – BigBendRegion Sep 08 '23 at 11:45
  • Thank you for your comment! Can you elaborate on what you mean and what you would suggest then? – Lucy Sep 08 '23 at 12:03
  • Winsorizing changes the nature of the process you study. It is no longer the same process; instead it is a truncated process that is not reality as it exists. Winsorizing is just a very bad practice, one that made some sense in the ancient history when computing and available methods were limited. My suggestion is to scrap the Winsorizing altogether, then you can consider other, more sensible models. Transformation may be relevant, but so might be quantile regression or ML with heavy-tailed distributions. But it makes no sense at all to consider transforms after Winsorizing. – BigBendRegion Sep 09 '23 at 00:26

1 Answers1

1

First, if you are doing OLS regression, it does not assume anything about the distribution of the variables. It makes assumptions about the errors, which are approximated by the residuals.

Second, in my opinion, you should not transform variables in order to make the data fit a model; rather, you should use a model that fits the data. Two such models are robust regression (actually a collection of methods) and quantile regression. You should transform for substantive reasons. Taking logs of money variables often makes substantive sense.

Third, I'm not even sure logs will reduce kurtosis. Kurtosis behaves in counterintuitive ways.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Many thanks for your message, Peter! All the dependent variables and one control variable ("final close size USD") are monetary variables. Would it makes sense then to soley log transform these ones then? And then do for instance, a robust regression? Or, as a second option, is the degree of kurtosis and skewness okay to leave as it is, and to just continue on with a robust regression instead of an OLS regression? – Lucy Sep 07 '23 at 12:29
  • The reason that I think logs make sense for monetary variables is that we often think of money in multiplicative terms rather than additive ones.

    If you make $30,000 a year, a $5,000 raise is huge. If you make $300,000 a year, it's not even cost of living.

    – Peter Flom Sep 07 '23 at 13:16
  • Right, I see what you are saying. Then would you recommend that I do the logs for the monetary variables, as they are monetary, not because of kurtosis or skewness, and then carry out a OLS or robust regression? – Lucy Sep 07 '23 at 13:20
  • 1
    Well, that's up to you, as it is a substantive question. Probably. But, in your application, would you think about the variables multiplicatively or additively? – Peter Flom Sep 07 '23 at 13:42
  • Yes, they would be in multiplicative terms! – Lucy Sep 07 '23 at 13:54