2

Chapter 1 of "Machine Learning - A Probabilistic Perspective" by Kevin Patrick Murphy gives this figure (fig_1),

enter image description here

and says

this is a pairwise scatter plot on iris dataset. The diagonal plots the marginal histograms of the 4 features.

This post, gives this figure (fig_2) to illustrate "marginal histograms".

enter image description here

I am trying to plot a histogram by using the approach that plotted out fig_2, with first feature of iris dataset, that is, sepal length.

Here is the code,

import matplotlib.pyplot as plt
from sklearn import datasets
import seaborn as sns; 
sns.set(style="white", color_codes=True)
iris = datasets.load_iris()
_ = plt.hist(iris.data[:,0], bins = 11)

which plotted out this figure (fig_3.)

enter image description here

Which looks different to the top left in fig_1.

I've searched whole site and got this and this, none of them gives explanation what a marginal histograms is.

How to plot a marginal histogram correctly? Why is my plot different to the one in a textbook?

JJJohn
  • 1,875
  • 2
  • 12
  • 28

1 Answers1

0

There are subtle differences between the two histograms, but we cannot infer much based on this and it doesn't necessarily mean that either is incorrect. Histograms based on the same data can differ if the bin widths or intervals are different - it's quite possible that has happened here.

mkt
  • 18,245
  • 11
  • 73
  • 172