Pyplot Label Scatter Plot with Coincident Points / Overlapping Annotations

Question

This question is a follow on to this. How can I layout the annotations so they are still readable when the labeled points are exactly or nearly coincident? I need a programmatic solution, hand tuning the offsets is not an option. Sample with ugly labels:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
N = 10
data = np.random.random((N, 4))
data[1, :2] = data[0, :2]
data[-1, :2] = data[-2, :2] + .01
labels = ['point{0}'.format(i) for i in range(N)]
plt.subplots_adjust(bottom = 0.1)
plt.scatter(
    data[:, 0], data[:, 1], marker = 'o', c = data[:, 2], s = data[:, 3]*1500,
    cmap = plt.get_cmap('Spectral'))
for label, x, y in zip(labels, data[:, 0], data[:, 1]):
    plt.annotate(
        label,
        xy = (x, y), xytext = (-20, 20),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

plt.show()

I'm pretty sure matplotlib won't do this for you automatically. You have to iteratively determine angles for the labels such that they don't intersect. That's a difficult task, especially since there's no guarantee for a valid solution. If a point is surrounded by other points, you can't really put its label anywhere within a specific radius. I believe you have to create a heuristic first, which you can then implement. — Andras Deak -- Слава Україні, Oct 13 '16 at 12:59

score 2 · Answer 1 · answered Oct 13 '16 at 13:15

Take the distance between the last point, set a threshold hold, and then flip the x,y text accordingly. See below.

import numpy as np 
import matplotlib.pyplot as plt

np.random.seed(0)
N = 10
data = np.random.random((N, 4))
data[1, :2] = data[0, :2]
data[-1, :2] = data[-2, :2] + .01
labels = ['point{0}'.format(i) for i in range(N)]
plt.subplots_adjust(bottom = 0.1)
plt.scatter(
    data[:, 0], data[:, 1], marker = 'o', c = data[:, 2], s = data[:, 3]*1500,
    cmap = plt.get_cmap('Spectral'))

old_x = old_y = 1e9 # make an impossibly large initial offset
thresh = .1 #make a distance threshold

for label, x, y in zip(labels, data[:, 0], data[:, 1]):
    #calculate distance
    d = ((x-old_x)**2+(y-old_y)**2)**(.5)

    #if distance less than thresh then flip the arrow
    flip = 1
    if d < .1: flip=-2

    plt.annotate(
        label,
        xy = (x, y), xytext = (-20*flip, 20*flip),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
    old_x = x
    old_y = y

plt.show()

which results in:

I think that's definitely in the right direction. Your test for closeness only works if the nearby points are consecutive in the data array. For example change data[1, :2] = data[0, :2] to data[5, :2] = data[0, :2]. This solution doesn't have to be bulletproof, if there are 10 points on top of each other it will look like a hot mess no matter what. — user2133814, Oct 13 '16 at 13:24

score 0 · Answer 2 · answered Oct 13 '16 at 15:33

Here's what I ended up with. Not perfect for all situations, and it doesn't even work smoothly for this example problem, but I think it is good enough for my needs. Thanks Dan for your answer pointing me in the right direction.

import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import cKDTree


def get_label_xy(tree, thresh, data, i):
    neighbors = tree.query_ball_point([data[i, 0], data[i, 1]], thresh)
    if len(neighbors) == 1:
        xy = (-30, 30)
    else:
        mean = np.mean(data[:, :2][neighbors], axis=0)

        if mean[0] == data[i, 0] and mean[1] == data[i, 1]:
            if i < np.max(neighbors):
                xy = (-30, 30)
            else:
                xy = (30, -30)
        else:
            angle = np.arctan2(data[i, 1] - mean[1], data[i, 0] - mean[0])

            if angle > np.pi / 2:
                xy = (-30, 30)
            elif angle > 0:
                xy = (30, 30)
            elif angle > -np.pi / 2:
                xy = (30, -30)
            else:
                xy = (-30, -30)
    return xy


def labeled_scatter_plot(data, labels):
    plt.subplots_adjust(bottom = 0.1)
    plt.scatter(
        data[:, 0], data[:, 1], marker = 'o', c = data[:, 2], s = data[:, 3]*1500,
        cmap = plt.get_cmap('Spectral'))

    tree = cKDTree(data[:, :2])
    thresh = .1

    for i in range(data.shape[0]):
        xy = get_label_xy(tree, thresh, data, i)

        plt.annotate(
            labels[i],
            xy = data[i, :2], xytext = xy,
            textcoords = 'offset points', ha = 'center', va = 'center',
            bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
            arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))


np.random.seed(0)
N = 10
data = np.random.random((N, 4))
data[1, :2] = data[0, :2]
data[-1, :2] = data[-2, :2] + .01
data[5, :2] = data[4, :2] + [.05, 0]
data[6, :2] = data[4, :2] + [.05, .05]
data[7, :2] = data[4, :2] + [0, .05]
labels = ['point{0}'.format(i) for i in range(N)]

labeled_scatter_plot(data, labels)

plt.show()

Pyplot Label Scatter Plot with Coincident Points / Overlapping Annotations

2 Answers2