Annotating every data point in scatter plot in matplotlib python

Question

The scatter plot is not showing the name of the person next to every data point in the plot.

I am trying to draw a scatter plot of salary and bonus. The only thing that is missing is the name of each employee at every data point in the plot.

Reeving a TypeError: cannot concatenate 'str' and 'tuple' objects

fig, ax = plt.subplots()
my_scatter_plot = ax.scatter(
df["salary"], 
df["bonus"] 

)
ax.set_xlabel("Salary")
ax.set_ylabel("Bonus")
ax.set_title("Enron Employees Salary and Bonus Scatter Plot")

for _, row in df[["Names","salary","bonus"]].iterrows():
    xy = row[["salary", "bonus"]]
    xytext= xy + (0.02, 5)
    ax.annotate(row["Names"], xy, xytext)


plt.show() 


TypeError: cannot concatenate 'str' and 'tuple' objects

Expecting to see name of every data that correspond to the employee.

Possible duplicate of [matplotlib scatter plot with different text at each data point](https://stackoverflow.com/questions/14432557/matplotlib-scatter-plot-with-different-text-at-each-data-point) — Sheldore, May 04 '19 at 17:58
Closing the question. Several duplicates exist on this topic — Sheldore, May 04 '19 at 17:58
Tried the https://stackoverflow.com/questions/14432557/matplotlib-scatter-plot-with-different-text-at-each-data-point but no luck -- First converted bonus, salary, and names into a list , as the data type in the suggested solution were lists, but got this error ValueError: cannot convert float NaN to integer — Rizwan Ahmed, May 04 '19 at 23:41

Zaccharie Ramzi · Accepted Answer · 2019-05-05T08:33:58.820

1

I think the problem comes from the fact that your "salary" and "bonus" columns are interpreted as strings. Hence when you xy + (0.02, 5), it thinks that you are trying to concatenate the string (xy) with a tuple. I think you should try and cast those columns to floats or integers depending on your case.

edited May 05 '19 at 08:33

answered May 04 '19 at 16:07

Zaccharie Ramzi

1,759
1
14
31

Thanks, @Zaccharie Ramzi for your reply, I can see the plot and Data points but all the names are cluttered in the bottom left corner of the plot. The code I am using is this:: ' fig, ax = plt.subplots() my_scatter_plot = ax.scatter( df["salary"], df["bonus"] ) ax.set_xlabel("Salary") ax.set_ylabel("Bonus") ax.set_title("Enron Employees Salary and Bonus Scatter Plot") for _, row in df[["Names","salary","bonus"]].iterrows(): **xy = row[["salary", "bonus"]].astype(float)** **xytext= (0.02, 5)** ax.annotate(row["Names"], xy, xytext) plt.show() ' – Rizwan Ahmed May 05 '19 at 01:17
So after trying it does seem that you need to do the addition. To simplify things I would convert all to numpy (`import numpy as np`) array: `xytext= np.array(xy) + np.array([0.04, 0.02])` and also make the offset a tiny bit smaller (like I did). One more thing, you may need to handle the x and y limits yourself. Finally, it would have been much easier to help you with a minimal working example, try to come up with one next time (I had to do it myself): https://stackoverflow.com/help/mcve . – Zaccharie Ramzi May 05 '19 at 08:33
@RizwanAhmed can you accept the answer if it solves your problem (https://stackoverflow.com/help/someone-answers) ? – Zaccharie Ramzi May 05 '19 at 11:37
Thanks, @Zaccharie Ramzi, It worked. I also found that NaNs in my data were also part of the problems. Appreciated your help. – Rizwan Ahmed May 06 '19 at 00:21

Annotating every data point in scatter plot in matplotlib python

1 Answers1