0

I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1's.

When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1's

sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()

enter image description here

I want to change the scale for the 1's in the same plot, with a second y-scale in the right with twinx.

I have tried the following codes that may solve the problem with 2 plots, but I need only one plot. I couldn't use twinx.

g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()

enter image description here

g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)

enter image description here

A reproducible example could be with the titanic dataset:

df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)

enter image description here

g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()

enter image description here

Trenton McKinney
  • 43,885
  • 25
  • 111
  • 113
  • Did you look into the `common_norm=False` option of displot and related functions? – JohanC Oct 10 '21 at 09:22
  • 1
    I think [seaborn histplot and displot output doesn't match](https://stackoverflow.com/q/68865538/7758804) is relevant, but not a duplicate. – Trenton McKinney Oct 10 '21 at 15:49

2 Answers2

5

To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density". By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False:

sns.displot(
    titanic, x='fare', hue='survived',
    bins=30, linewidth=0, kde=True,
    stat="density", common_norm=False,
    height=5, aspect=1.6
)

enter image description here

The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution's peak would be misleading.

mwaskom
  • 41,082
  • 11
  • 113
  • 120
  • 1
    Thanks! I'm just starting to use sns, and your solution makes a lot of sense, it's not what I was originally looking for, but it's a better approach. – Napoleón Cortés Oct 10 '21 at 22:50
2

I am not sure but are you looking for this.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')

Plot

max12525k
  • 161
  • 3