3

When making a histogram to compare two things say the homerun distances of two baseball players, do you have to use the same numerical scale (range) on the x axisfor each even if their homerun distances do not start at the same distance? and should the bars touch?

bob
  • 31
  • 1
    I think a comparison of distributions would be easier with two frequency polygons drawn on the same graph. – David Lane Feb 18 '17 at 21:42
  • @DavidLane Upvoted. That would be the only thing I did not say in my answer below and only because smoothed curves are less misleading than polygons, and are better because they are generalized histograms. – Carl Feb 19 '17 at 01:42

2 Answers2

3

If the purpose of the exercise is comparison then it makes sense to show the histograms on the same range. For example, let's consider an example where we want to compare the home run distances of two players, Babe and Ruth (I randomly generated this data).

enter image description here

You might conclude visually that the two players have fairly similar distributions of home run distances. However, if the ranges of the two histograms are the same, you will conclude something very different. Clearly, Babe has nearly all of his home run distances sitting above Ruth's.

enter image description here

Thus, for comparing the two histograms, showing the data on the same range is important.

As to whether the bars should touch, a histogram is a discrete representation of the frequency of certain observed values of an underlying random variable. Thus, the observed values are assigned to interval bins of a given size. If it eventuates that there are no observations in a particular interval, then there will be no bar to show and there will be no touching bars. For example, in the above example, Ruth's histogram has a gap. You should also remember, the shape of histograms are heavily dependent on how you assign values to the interval bins.

enter image description here

Keep in mind that histograms are a good way to get an initial visual feel for the distribution of data, but there are many better methods for concretely comparing two distributions.

statsplease
  • 2,808
1

Q1: Do you have to use the same numerical scale (range) on the x axis for each even if their home run distances do not start at the same distance? You do not have to do anything, but I would use the same x-axis scale so that the distributions of distances can be better compared.

Q2: ...and should the bars touch? First you can use smoothed curves instead of bars, and if you use bars they should be independently represented as the distances do not add between players. That is, where they intersect, they should merely change color of pattern, or alternatively, they can be offset slightly so that they are not seen to overlap.

Carl
  • 13,084