77

I'm looking to see how to do two things in Seaborn with using a bar chart to display values that are in the dataframe, but not in the graph

1) I'm looking to display the values of one field in a dataframe while graphing another. For example, below, I'm graphing 'tip', but I would like to place the value of 'total_bill' centered above each of the bars (i.e.325.88 above Friday, 1778.40 above Saturday, etc.)

2) Is there a way to scale the colors of the bars, with the lowest value of 'total_bill' having the lightest color (in this case Friday) and the highest value of 'total_bill' having the darkest. Obviously, I'd stick with one color (i.e. blue) when I do the scaling.

Thanks! I'm sure this is easy, but i'm missing it..

While I see that others think that this is a duplicate of another problem (or two), I am missing the part of how I use a value that is not in the graph as the basis for the label or the shading. How do I say, use total_bill as the basis. I'm sorry, but I just can't figure it out based on those answers.

Starting with the following code,

import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-    book/master/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()
g=sns.barplot(x='day',y='tip',data=groupedvalues)

I get the following result:

enter image description here

Interim Solution:

for index, row in groupedvalues.iterrows():
    g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")

enter image description here

On the shading, using the example below, I tried the following:

import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()

pal = sns.color_palette("Greens_d", len(data))
rank = groupedvalues.argsort().argsort() 
g=sns.barplot(x='day',y='tip',data=groupedvalues)

for index, row in groupedvalues.iterrows():
    g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")

But that gave me the following error:

AttributeError: 'DataFrame' object has no attribute 'argsort'

So I tried a modification:

import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()

pal = sns.color_palette("Greens_d", len(data))
rank=groupedvalues['total_bill'].rank(ascending=True)
g=sns.barplot(x='day',y='tip',data=groupedvalues,palette=np.array(pal[::-1])[rank])

and that leaves me with

IndexError: index 4 is out of bounds for axis 0 with size 4

tdy
  • 26,545
  • 9
  • 43
  • 50
  • See [How to plot and annotate grouped bars in seaborn](https://stackoverflow.com/q/63220741/7758804). [Adding value labels on a matplotlib bar chart](https://stackoverflow.com/q/28931224/7758804) applies to seaborn axes level plots. – Trenton McKinney Aug 25 '21 at 00:05

8 Answers8

94

New in matplotlib 3.4.0

There is now a built-in Axes.bar_label to automatically label bar containers:

  • For single-group bar plots, pass the single bar container:

    ax = sns.barplot(x='day', y='tip', data=groupedvalues)
    ax.bar_label(ax.containers[0])
    

    seaborn bar plot labeled

  • For multi-group bar plots (with hue), iterate the multiple bar containers:

    ax = sns.barplot(x='day', y='tip', hue='sex', data=df)
    for container in ax.containers:
        ax.bar_label(container)
    

    seaborn grouped bar plot labeled

More details:


Color-ranked version

Is there a way to scale the colors of the bars, with the lowest value of total_bill having the lightest color (in this case Friday) and the highest value of total_bill having the darkest?

  1. Find the rank of each total_bill value:

    • Either use Series.sort_values:

      ranks = groupedvalues.total_bill.sort_values().index
      # Int64Index([1, 0, 3, 2], dtype='int64')
      
    • Or condense Ernest's Series.rank version by chaining Series.sub:

      ranks = groupedvalues.total_bill.rank().sub(1).astype(int).array
      # [1, 0, 3, 2]
      
  2. Then reindex the color palette using ranks:

    palette = sns.color_palette('Blues_d', len(ranks))
    ax = sns.barplot(x='day', y='tip', palette=np.array(palette)[ranks], data=groupedvalues)
    

    seaborn bar plot color-ranked

tdy
  • 26,545
  • 9
  • 43
  • 50
  • 2
    Use the `labels` parameter in `ax.bar_label()` if the values to be displayed are different from those used to plot the axis. – tfad334 Oct 21 '21 at 02:58
  • 1
    Awesome new functionality – igorkf Nov 30 '21 at 13:29
  • 1
    Matplotlib >= 3.4 is available for Python >= 3.7 See [Matplotlib API changes](https://matplotlib.org/stable/api/prev_api_changes/api_changes_3.4.0.html#development-changes) for more info. – CDuvert Dec 23 '21 at 12:10
70

Works with single ax or with matrix of ax (subplots)

from matplotlib import pyplot as plt
import numpy as np

def show_values_on_bars(axs):
    def _show_on_single_plot(ax):        
        for p in ax.patches:
            _x = p.get_x() + p.get_width() / 2
            _y = p.get_y() + p.get_height()
            value = '{:.2f}'.format(p.get_height())
            ax.text(_x, _y, value, ha="center") 

    if isinstance(axs, np.ndarray):
        for idx, ax in np.ndenumerate(axs):
            _show_on_single_plot(ax)
    else:
        _show_on_single_plot(axs)

fig, ax = plt.subplots(1, 2)
show_values_on_bars(ax)
Sharon Soussan
  • 701
  • 5
  • 3
  • 1
    awesome! any idea how to automatically update the y-axis limits to provide enough space for the text? – zeawoas Mar 26 '20 at 10:59
  • Great solution... Change the 8th line with this: _y = p.get_y() + p.get_height() + (p.get_height() *0.01) and it will look a bit better – Oeyvind Jun 23 '21 at 08:47
60

Let's stick to the solution from the linked question (Changing color scale in seaborn bar plot). You want to use argsort to determine the order of the colors to use for colorizing the bars. In the linked question argsort is applied to a Series object, which works fine, while here you have a DataFrame. So you need to select one column of that DataFrame to apply argsort on.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = sns.load_dataset("tips")
groupedvalues=df.groupby('day').sum().reset_index()

pal = sns.color_palette("Greens_d", len(groupedvalues))
rank = groupedvalues["total_bill"].argsort().argsort() 
g=sns.barplot(x='day',y='tip',data=groupedvalues, palette=np.array(pal[::-1])[rank])

for index, row in groupedvalues.iterrows():
    g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")

plt.show()

enter image description here


The second attempt works fine as well, the only issue is that the rank as returned by rank() starts at 1 instead of zero. So one has to subtract 1 from the array. Also for indexing we need integer values, so we need to cast it to int.
rank = groupedvalues['total_bill'].rank(ascending=True).values
rank = (rank-1).astype(np.int)
Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 289,005
  • 45
  • 571
  • 615
51

Just in case if anyone is interested in labeling horizontal barplot graph, I modified Sharon's answer as below:

def show_values_on_bars(axs, h_v="v", space=0.4):
    def _show_on_single_plot(ax):
        if h_v == "v":
            for p in ax.patches:
                _x = p.get_x() + p.get_width() / 2
                _y = p.get_y() + p.get_height()
                value = int(p.get_height())
                ax.text(_x, _y, value, ha="center") 
        elif h_v == "h":
            for p in ax.patches:
                _x = p.get_x() + p.get_width() + float(space)
                _y = p.get_y() + p.get_height()
                value = int(p.get_width())
                ax.text(_x, _y, value, ha="left")

    if isinstance(axs, np.ndarray):
        for idx, ax in np.ndenumerate(axs):
            _show_on_single_plot(ax)
    else:
        _show_on_single_plot(axs)

Two parameters explained:

h_v - Whether the barplot is horizontal or vertical. "h" represents the horizontal barplot, "v" represents the vertical barplot.

space - The space between value text and the top edge of the bar. Only works for horizontal mode.

Example:

show_values_on_bars(sns_t, "h", 0.3)

enter image description here

Secant Zhang
  • 701
  • 6
  • 5
  • 1
    Works like a charm! Only thing I'd add is a va="center" for horizontal barplots. – S. R. Feb 14 '20 at 15:13
  • 1
    Might be useful to add: `value = 0 if (np.isnan(p.get_height())) else int(p.get_height())` and `value = 0 if (np.isnan(p.get_width())) else int(p.get_width())` for how the label values are computed, in case the bars are null sized – Andrew Mo Jun 04 '20 at 21:25
  • How do I get my `ax`? – Climbs_lika_Spyder Jun 17 '20 at 16:38
  • This also works great on Seaborn countplot(), but does not work on horizontal barplots with hues. I also a `+ float(space)` to when `h_v = "v"` so that it adjusts the spacing on the vertical ones. I'll see if I can fix the hue thing. Edit: It does in fact work for hues if you use @AndrewMo's suggestion, although vertical alignment needs a bit of tweaking by default. Thanks! – TheProletariat Jul 14 '20 at 03:37
  • for h_v="h", we need ```_y = p.get_y() + p.get_height() * 3/4``` to label on bar center, and need ```_x = p.get_x() + p.get_width() ``` without the space to avoid too wide apart labels – Frank Oct 10 '20 at 14:13
  • amazing! I added va="bottom" for horizontal barplots, which made the numbers vertically centered at the edge of each bar (not exactly sure why this worked better than va="center," which seems more intuitive) – R-Peys Dec 08 '21 at 02:12
14
plt.figure(figsize=(15,10))
graph = sns.barplot(x='name_column_x_axis', y="name_column_x_axis", data = dataframe_name ,  color="salmon")
for p in graph.patches:
        graph.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.3, p.get_height()),
                    ha='center', va='bottom',
                    color= 'black')
user3663280
  • 149
  • 1
  • 2
4

Hope this helps for item #2: a) You can sort by total bill then reset the index to this column b) Use palette="Blue" to use this color to scale your chart from light blue to dark blue (if dark blue to light blue then use palette="Blues_d")

import pandas as pd
import seaborn as sns
%matplotlib inline

df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()
groupedvalues=groupedvalues.sort_values('total_bill').reset_index()
g=sns.barplot(x='day',y='tip',data=groupedvalues, palette="Blues")
jose_bacoy
  • 8,661
  • 1
  • 20
  • 35
  • Here you still apply the palette in the order of the bars appearing in the plot ([the leftmost bar has the lightest color](https://i.stack.imgur.com/Gy5vC.png)). The idea (which is also put forward in the linked question) would be to sort the colors in the same order as the sorted "total_bill" column, such that the column with the largest total bill has the darkest color. – ImportanceOfBeingErnest Apr 04 '17 at 22:58
  • Yes, you are right. I did not realize that the question is different to what how I understand it until I see your post. Thanks – jose_bacoy Apr 04 '17 at 23:24
4

A simple way to do so is to add the below code (for Seaborn):

for p in splot.patches:
    splot.annotate(format(p.get_height(), '.1f'), 
                   (p.get_x() + p.get_width() / 2., p.get_height()), 
                   ha = 'center', va = 'center', 
                   xytext = (0, 9), 
                   textcoords = 'offset points') 

Example :

splot = sns.barplot(df['X'], df['Y'])
# Annotate the bars in plot
for p in splot.patches:
    splot.annotate(format(p.get_height(), '.1f'), 
                   (p.get_x() + p.get_width() / 2., p.get_height()), 
                   ha = 'center', va = 'center', 
                   xytext = (0, 9), 
                   textcoords = 'offset points')    
plt.show()
Dharman
  • 26,923
  • 21
  • 73
  • 125
Sarthak Rana
  • 195
  • 1
  • 1
  • 10
3
import seaborn as sns

fig = plt.figure(figsize = (12, 8))
ax = plt.subplot(111)

ax = sns.barplot(x="Knowledge_type", y="Percentage", hue="Distance", data=knowledge)

for p in ax.patches:
    ax.annotate(format(p.get_height(), '.2f'), (p.get_x() + p.get_width() / 2., p.get_height()), 
       ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')