1

From Figure 2 of Ferreira et al. (2016) "Graphical representation of chemical periodicity of main elements through boxplot", we can see the taxonomy of some common cases of symmetrical and asymmetrical distributions, with their corresponding boxplots: enter image description here

Now, if the box of the boxplot is symmetric (i.e. same distance from the median to both hinges), but one whisker is longer than the other one, as in the following picture, are the following (A) and (B) boxplots asymmetric/skewed?

enter image description here

Ommo
  • 270
  • 3
    Nothing stops a distribution being approximately symmetric in its middle half but asymmetric overall. In fact, the tongue-in-cheek generalization that distributions tend be approximately Gaussian in the middle is attributed to C.P. Winsor. – Nick Cox Dec 07 '23 at 11:51
  • 1
    The bottom plot is consistent with the idea that the data are integers. If they are, box plots can fail to be informative because tied values can lead to small artefacts. – Nick Cox Dec 07 '23 at 11:53
  • Thanks a lot @NickCox! From your statement "Nothing stops a distribution being approximately symmetric in its middle half but asymmetric overall." I understand that a distribution represented by the boxplot B in my question can be considered as skewed (at least by eye), even though the central half is symmetric, right or wrong? (Maybe the skewness of boxplot A might be unclear) – Ommo Dec 07 '23 at 13:35
  • About the line of numbers at the very bottom, I did it with integers just to have a rough reference, but my question would apply to any real number (i.e. the current line, where integers are showed, would need to be replaced by the line of real numbers) – Ommo Dec 07 '23 at 13:39
  • 2
    Statements about skewness should generally refer to the entire distributions. The point is that a central box that appears symmetric itself says nothing about the entire distribution. If is often true that a distribution that is strongly skew will have a skewed box as well, but that is not guaranteed. – Nick Cox Dec 07 '23 at 13:51
  • 1
    We are often get confused questions here about box plots that arise from not thinking carefully about the implications of data that have only a few distinct values, such as small counts or ordinal grades. Pleased to note that isn't an issue in this thread. – Nick Cox Dec 07 '23 at 13:53
  • I am very grateful for your additional comments, which bring further clarity to my question, thanks a lot! – Ommo Dec 07 '23 at 13:58
  • This is a simplified version of the question addressed at https://stats.stackexchange.com/questions/96553, which I believe (therefore) answers your question. – whuber Dec 07 '23 at 15:58
  • The question addressed at stats.stackexchange.com/questions/96553, does not answer my question at all (I read it before). My question is not a simplified version of that one. Btw, both Nick Cox and Peter Flom answered my question. Thanks to both of them. – Ommo Dec 07 '23 at 16:11
  • 1
    You might get some value from the cautionary example I gave here, which shows four data-sets with very different histograms$^\dagger$ and yet identical, completely symmetric boxplots (constructed to emulate the examples in Choonpradub & McNeil q.v.). One of the example histograms is quite distinctly asymmetric. $:$ $\dagger$: however a little caution is required in interpreting the histograms as well; they are themselves a little misleading for some of the data sets, such as two distinct peaks hiding within one "box: in one of the histograms. – Glen_b Dec 07 '23 at 22:59
  • Thanks a lot for your comment @Glen_b! – Ommo Dec 08 '23 at 11:26

1 Answers1

1

This will depend on your definition and measure of skewness. There are several different ones. They can be grouped into ones that use all the data and ones that use part of the data. For the former, B will be skewed. For the latter, it may not, depending on the particular choice of measure. (A shows relatively little skewness, so it may be harder to detect).

The most common measure that uses all the data is Fisher-Pearson skewness.

$g_1=\frac{∑^N_{i=1}(Y_i−\bar Y)^3}{N_s}$

which is sometimes given with an adjustment for sample size.

The most common of the latter is probably Bowley's measure:

$\frac{Q_1 + Q_3 - 2Q_2}{Q_3 - Q_1}$

where the Q are quartiles. But this can be varied to use any symmetric quantiles.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • 3
    $g_1$ is usually scaled by the cube of the SD to give a unit-free measure. Otherwise its sign is informative, but the magnitude just depends on the unit of measurement. Pleased to see the attribution to Bowley. Literature often credits Galton with this measure, but he didn't use it, or Yule and Kendall, who did use it, but much later. – Nick Cox Dec 07 '23 at 11:48
  • 1
    Thanks a lot @Peter Flom! With "This will depend on your definition and measure of skewness" I understand the following: If we use the Fisher-Pearson measure, the boxplot B is positively skewed (asymmetric), while if we use the Bowley's measure (for the central half of the distribution), the boxplot B is symmetric. Is this what you wanted to say? Did I understand correctly? – Ommo Dec 07 '23 at 13:42
  • 2
    More or less, yeah. The thing is, "symmetry" isn't a yes/no proposition, unless you mean exactly symmetric, but even if the population is distributed perfectly normally, a sample from it may not be. So, how non-symmetric is it? And in what ways? – Peter Flom Dec 07 '23 at 14:23
  • 1
    Note that (mean $-$ median) / SD is also sometimes used. It has a feature which has often surprised people: like Bowley's measure it varies in $[-1, 1]$. Note that the mean CAN equal the median in a skewed distribution, in which case the measure fails absolutely to quantify skewness, just as Bowley's measure can give a misleading sign. Always look at a graph too! – Nick Cox Dec 07 '23 at 15:10