1

I am trying to analyze if my data is skewed or not, since I am planning to compute the median and the interquartile range over the mean and the standard deviation if my data shows a substantial skew.

However, I am left confused regarding the methods to determine skewness.

First, it seems to me that there are three possible coefficients of skewness, Fisher, Pearson, and Fisher-Pearson. But, I can not find information on how to interpret the resulting values. Is there a publication that indicates clear thresholds for these coefficients to classify skewness?

Second, there seems to be the option to test whether the skew is different from the normal distribution. I heard, however, that testing for normal distribution is not very powerful for small samples like mine (n=14).

Or third, can skewness best be determined through graphical displays?

EDIT:

For example, the Fisher-Pearson coefficient of skewness of a sample as noted on this page:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html

will give me the value of 0.85 for one characteristic, which I interpreted as a substantial skew. (I found no clear threshold here, though)

However, a skewness test computed with the following function:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewtest.html

will lead to a z-score of 1.6 and a p-score of 0.11.

Thus, it seems to be that statistically speaking I would accept the Ho: normality over the alternative; HI: nonnormality due to skewness if I choose alpha < 0.10. However this test is meant to rule out normality for the population distribution and normality tests are not very powerful for my sample size.

And is this test on the population even relevant if I just want to decide whether to focus on the mean and std or the median and IQR based on skewness? I thought this would be only dependent on the sample skewness.

Would it be enough to already argue with the coefficient of 0.85?

Eddyvm
  • 11
  • 1
    These considerations are all valid, but all are beside the point: what matters is whether and how much skewness might affect the ultimate results of your work. Please consider explaining what your planned analysis is so that any answers you get might be appropriate for it. – whuber Feb 27 '22 at 18:05
  • Welcome to CV, Eddyvm! – Alexis Feb 27 '22 at 19:20
  • Thank you for your answers! My planned analysis is to determine how "dense" one would expect a specific kind of network graph to be, based on empirical data of 14 graph samples. I plan to analyze this by focusing on descriptive statistics. However, I am unsure which of the following is best to indicate the central tendency and variation of the computed graph metrics: Mean + Standard or Deviation Median + Interquartile Range. – Eddyvm Feb 27 '22 at 19:36
  • 2
    There are dozens of measures of skewness and Pearson had a hand in several of them (at least four, I think). Can you clarify which ones you mean by "Pearson" and "Fisher" and "Fisher-Pearson" please? – Glen_b Feb 27 '22 at 23:03
  • I added more information to my post above and hope that it clarifies my question. – Eddyvm Feb 28 '22 at 11:05
  • I doubt that 14 observations is enough to say much useful about skewness! – kjetil b halvorsen Feb 28 '22 at 12:36
  • In this case, would it still make sense to focus on the mean and standard deviation if skewness can not be determined? Since, I learned that for example standard deviation is not a good measure of dispersion if the data is skewed. – Eddyvm Feb 28 '22 at 12:53
  • SD is a perfectly fine measure of dispersion. When data are skewed, though, it gives only an average indication of the dispersion: large skewness means the amount of dispersion on one side differs appreciably from the amount on the other side. In such cases, a useful description of the data distribution requires another summary statistic (such as some kind of skewness coefficient!). – whuber Feb 28 '22 at 13:14

0 Answers0