42

I've spent hours on trying to do what I thought was a simple task, which is to add labels onto an XY plot while using seaborn.

Here's my code

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df_iris=sns.load_dataset("iris") 

sns.lmplot('sepal_length', # Horizontal axis
           'sepal_width', # Vertical axis
           data=df_iris, # Data source
           fit_reg=False, # Don't fix a regression line
           size = 8,
           aspect =2 ) # size and dimension

plt.title('Example Plot')
# Set x-axis label
plt.xlabel('Sepal Length')
# Set y-axis label
plt.ylabel('Sepal Width')

I would like to add to each dot on the plot the text in "species" column.

I've seen many examples using matplotlib but not using seaborn.

Any ideas? Thank you.

Trexion Kameha
  • 3,062
  • 7
  • 30
  • 55
  • Can you provide an example data frame? Does `z` contain label information for both X and Y axes? Do you want to label the entire axis, or axis tick marks? Seaborn uses Matplotlib under the hood - are you saying that you do not want to use `plt` methods but `sns` methods only to label your plots? – andrew_reece Sep 03 '17 at 20:53
  • added sample data set. Sorry – Trexion Kameha Sep 03 '17 at 23:05

4 Answers4

51

One way you can do this is as follows:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

df_iris=sns.load_dataset("iris") 

ax = sns.lmplot('sepal_length', # Horizontal axis
           'sepal_width', # Vertical axis
           data=df_iris, # Data source
           fit_reg=False, # Don't fix a regression line
           size = 10,
           aspect =2 ) # size and dimension

plt.title('Example Plot')
# Set x-axis label
plt.xlabel('Sepal Length')
# Set y-axis label
plt.ylabel('Sepal Width')


def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x']+.02, point['y'], str(point['val']))

label_point(df_iris.sepal_length, df_iris.sepal_width, df_iris.species, plt.gca())  

enter image description here

Scott Boston
  • 133,446
  • 13
  • 126
  • 161
  • Thank you Scott. It does plot but for me the string that's plotted looks weird. Each point says something along the following: "species: setosa, Name: 3, dtype: object" Any idea how to fix that? – Trexion Kameha Sep 04 '17 at 00:08
22

Here's a more up-to-date answer that doesn't suffer from the string issue described in the comments.

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df_iris=sns.load_dataset("iris") 

plt.figure(figsize=(20,10))
p1 = sns.scatterplot('sepal_length', # Horizontal axis
       'sepal_width', # Vertical axis
       data=df_iris, # Data source
       size = 8,
       legend=False)  

for line in range(0,df_iris.shape[0]):
     p1.text(df_iris.sepal_length[line]+0.01, df_iris.sepal_width[line], 
     df_iris.species[line], horizontalalignment='left', 
     size='medium', color='black', weight='semibold')

plt.title('Example Plot')
# Set x-axis label
plt.xlabel('Sepal Length')
# Set y-axis label
plt.ylabel('Sepal Width')

enter image description here

Eric Aya
  • 69,000
  • 34
  • 174
  • 243
compBio
  • 1,108
  • 11
  • 17
  • This logic assumes (by looping an iterator line through data[x][line]) that the dataframe has an incrementing index without any gaps. This will not be true, for example, with filtered dataframes. The function will raise a KeyError. – defraggled Apr 23 '21 at 12:25
  • User can workaround this problem by passing `df.reset_index(drop=True)` instead of the raw df. – defraggled Apr 23 '21 at 12:31
9

Thanks to the 2 other answers, here is a function scatter_text that makes it possible to reuse these plots several times.

import seaborn as sns
import matplotlib.pyplot as plt

def scatter_text(x, y, text_column, data, title, xlabel, ylabel):
    """Scatter plot with country codes on the x y coordinates
       Based on this answer: https://stackoverflow.com/a/54789170/2641825"""
    # Create the scatter plot
    p1 = sns.scatterplot(x, y, data=data, size = 8, legend=False)
    # Add text besides each point
    for line in range(0,data.shape[0]):
         p1.text(data[x][line]+0.01, data[y][line], 
                 data[text_column][line], horizontalalignment='left', 
                 size='medium', color='black', weight='semibold')
    # Set title and axis labels
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    return p1

Use the function as follows:

df_iris=sns.load_dataset("iris") 
plt.figure(figsize=(20,10))
scatter_text('sepal_length', 'sepal_width', 'species',
             data = df_iris, 
             title = 'Iris sepals', 
             xlabel = 'Sepal Length (cm)',
             ylabel = 'Sepal Width (cm)')

See also this answer on how to have a function that returns a plot: https://stackoverflow.com/a/43926055/2641825

Paul Rougieux
  • 8,881
  • 3
  • 56
  • 95
  • 1
    This logic assumes (by looping an iterator `line` through `data[x][line]`) that the dataframe has an incrementing index without any gaps. This will not be true, for example, with filtered dataframes. The function will raise a KeyError. – defraggled Apr 23 '21 at 12:24
  • 1
    User can workaround this problem by passing `df.reset_index(drop=True)` instead of the raw df. – defraggled Apr 23 '21 at 12:31
1

Below is a solution that does not iterate over rows in the data frame using the dreaded for loop.

There are many issues regarding iterating over a data frame.

The answer is don't iterate! See this link.

The solution below relies on a function (plotlabel) within the petalplot function, which is called by df.apply.

Now, I know readers will comment on the fact that I use scatter and not lmplot, but that is a bit besides the point.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df_iris=sns.load_dataset("iris") 

def petalplot(df): 
    
    def plotlabel(xvar, yvar, label):
        ax.text(xvar+0.002, yvar, label)
        
    fig = plt.figure(figsize=(30,10))
    ax = sns.scatterplot(x = 'sepal_length', y = 'sepal_width', data=df)

    # The magic starts here:
    df.apply(lambda x: plotlabel(x['sepal_length'],  x['sepal_width'], x['species']), axis=1)

    plt.title('Example Plot')
    plt.xlabel('Sepal Length')
    plt.ylabel('Sepal Width')
    
petalplot(df_iris)
Martien Lubberink
  • 2,309
  • 1
  • 15
  • 23