5

The distribution of height of students in a class has a bell shape. Moreover:

  • the average height is 68 inches
  • and 16% of students were taller than 71 inches

What percentage of students in this class are taller than 65 inches?

Since it's bell-shaped, and the distance of 65 from 68 and distance of 71 from 68 are both 3 inches, can we assume that 16% students are shorter than 65 inches since 16% are taller than 71 inches, leaving us with 100%-16%=84% of the class taller than 65 inches?

Can we assume this graphically?

Or do we have to use Chebyshev's rule for finding $k$ and then input it into $mean-k*\text{standard deviation}=\text{lower limit}$?

  • 3
    Depends on if "bell-shape" means Gaussian distribution or not. – qwr Nov 13 '23 at 17:58
  • 1
    With the data you provided, I can see that this bell looks more like a sewing needle. – Tran Khanh Nov 14 '23 at 03:29
  • 1
    @qwr No, it only depends on the PDF being symmetrical, and that's to my understanding implied for any bell-shaped distribution. – Igor F. Nov 14 '23 at 06:27
  • 1
    @IgorF. symmetry is common but not neccesary. Also, it depends on whether we assume a continuous distribution or not. Height measurements are typically discrete. – Sextus Empiricus Nov 14 '23 at 08:35
  • 1
    Seems an implausible scenario, unless it's a single-sex class. One would expect the distribution to be bimodal. – Michael Kay Nov 14 '23 at 09:12

4 Answers4

11

You've got the answer right by noticing the question is about heights an equal distance from the mean. You are given information about students being taller than 3 inches above the mean and asked about students who are 3 inches below the mean. You know that 84% of students are shorter than 3 inches above the mean. By symmetry of the gaussian, this means 84% are taller than 3 inches below the mean.

The full computation is shown below.


Let $F(x) = P(X \leq x)$ be the CDF for a gaussian with mean $\mu$. Note that

$$ F(\mu+x) + F(\mu-x) = 1$$

by symmetry of the gaussian. You're given info about individuals 3 inches above the mean, so you know

$$ 1 - F(\mu+x) = 0.16 $$

Rearrange the first equation and substitute into the second to find

$$ F(\mu-x) = 0.16 $$

Here $\mu=68$ and $x=3$. That means 16% of students are shorted than 65 inches, or that 84% of students are taller than 65 inches.

6

If the distribution is symmetric about the mean, then $P(h< \mu-3)=P(h>\mu+3)$. We're given that $P(h>\mu+3) =0.16$, so $P(h \geq \mu -3)=0.84$. However, the question asks for the percentage of children with height greater than $65$ inches, not the percentage greater than or equal to $65$. To convert $\geq$ to $>$, we have to assume that no student has a height of exactly $65$.

The term "Bell shaped" usually means Normal/Gaussian, which is symmetric, and even when it's used more generally, it usually refers to symmetric distributions.

Chebyshev's rule does nothing for us, since we don't know the standard deviation. We could get a lower bound for the standard deviation from the fact that $P(h>\mu+3) =0.16$, but we can't use that to get any more information than the starting fact of $P(h>\mu+3) =0.16$ (remember that Chebyshev's rule just gives us a lower bound for probability mass a certain distance from the mean, and we already have a lower bound: at least 0.16 of the probability mass is more than $3$ from the mean).

  • @Accumulation „assuming a continuous distribution” the distribution of heights of students in a class is an empirical distribution. That can't be continuous. – Sextus Empiricus Nov 14 '23 at 09:46
  • @SextusEmpiricus If we accept your characterization, then there is no role for continuous distributions to describe any data arising from a sample. – Sycorax Nov 16 '23 at 03:51
  • @Sycorax you can use continuous distributions as approximations but a continuity correction would be needed. $P(X>x) = 1 - P(X < x) + k/n$ where $n$ is the sample size and $k$ the number of people that have the height of 65 inches. – Sextus Empiricus Nov 16 '23 at 06:11
  • @Sycorax To be fair, the question asks about how many actual students were of particular height range, not what distribution that statistic arose from. But there's still the question of whether we view the heights as being rounded to the nearest inch, or treat them as being real numbers. – Acccumulation Nov 16 '23 at 13:59
3

Your intuition and reasoning are correct.

Demetri has explained the the more general math behind some of this. You don't need me to repeat that.

In your case the question is specifically talking about a Normal Distribution. As Demetri pointed out, the Normal Distribution curve is symmetric, so the percentage area under the curve (which is the same as the total probability of the random value being within that range of X) is equal on both sides of the mean and/or on equivalent intervals on either side. So whether you are considering .16 (16%) or .84 (84%) or summing to 1 or anything like that then you are going to be able to get to the correct answer.

You should note, though, that the term "Bell Shape" is not the same thing as "Normal Distribution". The Normal Distribution (which is a reasonably big deal in statistics) is a specific kind and shape of bell curve with specific parameters and properties. The Standard Normal Distribution is slightly more restrictive than a general Normal Distribution; The Standard Normal is best used when you have made the necessary transformations on your data so that the mean is 0 and the standard deviation is 1.

You can use this understanding in a lot of similar problems in your statistics class.

THill3
  • 61
  • 1
    „You should note, though, that the term "Bell Shape" is not the same thing as "Normal Distribution" ... The Standard Normal Distribution is slightly more restrictive than a general Normal Distribution;” You are starting the paragraph discussing variations from the normal distribution family, but halfway you speak about variations within the normal distribution family (the standard versus the general). This makes it confusing what the point is. It addresses something about the standard normal distribution, but is the question about that? – Sextus Empiricus Nov 13 '23 at 17:57
2

Or do we have to use Chebyshev's rule for finding k and then input in into mean-k*standard deviation=lower limit?

If you don't want the use of the symmetry of the distribution* then we need something like a Chebyshev's inequality. But it is not gonna give you an answer that you hoped for (without knowledge of the variance these inequalities are not very powerful).

Say we consider:

  • a single mode distribution
  • where the density away from the mode is a strictly decreasing function then

The quantile function could look as something like this:

example

  • The red area below zero and the green area above zero need to be equal in order for the mean of the distribution to be zero.

  • The 85th percentile goes through $\mu+3$ as given/defined in the problem.

  • The question is about the possible values of $p$ where the quantile function can cross $\mu-3$.

This value can be anything between 0 and 0.85. we can imagine a part of a sigmoid curve that goes through the points $p, x$ at the given coordinate $0.85, \mu+3$ and at $p_l, \mu-3$ at any other value with $0 \leq <p_l<0.85$. The condition that the green and red areas must be equal can be 'taken care of' by the tails of the distribution.

The limits on quantiles don't tell so much about the rest of the distribution. Something similar is described here: Why is the Median Less Sensitive to Extreme Values Compared to the Mean? quantiles are not so much sensitive to outliers and the outliers make that the inequalities have a very large range. (Restrictions on the outliers could make it that you can make more narrow upper and lower limits)


If you restrict your self to a normal distribution then given any two quantiles all the others are fixed. For example, the mean (which for the normal distribution is also the median, the 50% quantile) at 68 and the 85% quantile at 71, fixes the entire distribution and you can deduce the other points of the distribution (like you did in the first part of your question). See for example this question: How do I determine parameters of normal & lognormal distribution given two points?


*E.g. because the term 'bell shape' is not very clear. It could be referring to a normal distribution but it might also be something else. Often people use the term 'bell shape' when something is not exactly a normal distribution so we shouldn't rely that a normal distribution is meant unless specifically stated. (If my histogram shows a bell-shaped curve, can I say my data is normally distributed?)

  • This is a good answer, but I think the inference that OP meant "bell shape" to mean gaussian is a fine. The question is similar enough to introductory level statistics courses where the gaussian is of primary interest due to is tractability. – Demetri Pananos Nov 13 '23 at 18:46
  • @DemetriPananos my answer is indeed a bit pedantic. At first I had only a comment that criticised the use of the term 'bell shape' but when I noticed that this question got a 'Hot Network Question' I thought that it might deserve a more spicy answer than just a very basic introductionary course question. Also, the question does mention the use of more advanced techniques (Chebyshev's rule) than plainly fitting the CDF of a normal distribution to the two given points to get the value for the third unknown point. – Sextus Empiricus Nov 13 '23 at 20:12
  • I understand, and its nice to have this perspective. That being said, when you hear hooves you should think horses and not Zebras. – Demetri Pananos Nov 13 '23 at 20:20
  • In this case I am strongly against the idea of answering the question as if it is about horses. When we hear hooves and the question is not about hooves and depends on whether it is a zebra or a horse, then we should make first sure that that the question is about horses. The simple level might indicate that the question is about horses, but horses are only a tiny portion of all animals with hooves. And aside from the possibility that the OP is about something else than horses, there may also be many other people that land on this question and are asking about cow, pig, sheep, goat or others. – Sextus Empiricus Nov 13 '23 at 20:37
  • If this was a homework question then the teacher or study book writer did a very bad job by referring to hooved animal while it is a very bad (ambiguous) term to refer to a horse. – Sextus Empiricus Nov 13 '23 at 20:42
  • An interesting deviation could be to regard the distribution as a discrete distribution. Measurements are only reported in whole inches (or other discrete values). Then there are differences between 'taller than' and 'at least as tall'. For example if $$\begin{array}{rcl}X &\sim& Binom(n=58,p=0.5) \ height &:=& X+39\end{array}$$ then the mean of the height is 68 and $P(height>71) = 0.18$ (or more precisely 0.1790717). But $P(height>65) = 0.74$ and not $0.82$. – Sextus Empiricus Nov 13 '23 at 21:01
  • So the question has several snags. Heights are typically not normal distributed (e.g. negative values are impossible) and often statements like 'greater than 65 inches' are based on discrete measurements. So if this question tried to be about an exercise with normal distributions, then it chose a bad example. – Sextus Empiricus Nov 13 '23 at 21:05
  • Given that this is very likely a homework problem -- contrived to be about something relatable and ambiguous enough so that a sophmore may be able to solve it using tools available to them -- who do you think benefits from such a pedantic approach? Agreed that OP should clarify the context, but based on the information available here, its very likely OP is a student. To insist otherwise for the sake of being thorough or "spicy", I would argue, is to ignore what little context is already provided in favor of unlikely alternatives. – Demetri Pananos Nov 13 '23 at 22:12
  • I know statisticians are champions of the phrase "it depends", but this answer is bordering on confusing rather than clarifying due to its superfluous detail. – Demetri Pananos Nov 13 '23 at 22:14