0

I read a paper from John Hershberger with this title: "Minimizing the sum of diameters efficiently". That paper proposed a simple algorithm that finds a bipartition of points $S$ in the plane such that minimize sum of diameters of clusters $C_1$ and $C_2$.

According to John Hershberger paper, at the first, the paper gives some definitions:

Let $S$ be a set of $n$ points in the plane, let the distance between two points $p$ and $q$ be $d(p, q)$, and let the diameter of a set $X$ be denoted by $Diam(X) = \max\{d(p, q)\mid p, q \in X\}$. The closed disk with radius $r$ centered on a point $p$ is denoted by $D(p, r)$. For brevity, we refer to the sum of the diameters of the components of a bipartition as the diameter sum of the bipartition. We say that any bipartition with diameter sum less than $Diam(S)$ is good. Our algorithm finds the best of the good partitions, if any good partitions exist.

After that, the paper describes a straight-forward lemma as follow:

Lemma 3.1. Let $a$ and $b$ be a diametral pair of $S$. In any good bipartition of $S$, the subsets are contained in two disjoint disks centered on $a$ and $b$.

Proof. Let $A = d(a, b)$. Any good bipartition of $S$ must have $a$ and $b$ in different sets, which we denote by $S_a$ and $S_b$. Let $r$ be the distance from $a$ to the point in $S$, farthest from it. The set $S$, lies in the disk $D(a, r)$. Because $Diam(S_a)\geq r$, we must have $Diam(S_b) < \delta - r$, and hence $S_b\subset D(b, A - r - \epsilon)$ for some positive $\epsilon$. This proof remains valid even when $r = 0$, i.e., $\mid S_a\mid= 1$.

Next, it describes the algorithm as follow:

The algorithm begins by computing a diametral pair $a$ and $b$ of $S$, which takes $O(n \log n)$ time [18]. Next, it sorts the points of $S\setminus\{a, b\}$ into two lists $L_a$, and $L_b$, one sorted by increasing distance from $a$ and the other by increasing distance from $b$. Lemma 3.1 implies that for any good bipartition, the points in $S_a$, must be a prefix of $L_a$, and a suffix of $L_b$ (their order may differ in the two lists). The algorithm identifies all prefixes of $L_a$, whose elements form a suffix of $L_b$. To do this, it first marks each element of $L_a$, with its rank in $L_b$, then prepares an empty array corresponding to the list $L_b$. The algorithm marches through the elements of $L_a$, at each step marking the array entry given by the element’s rank in $L_b$. Whenever a suffix of the array is marked, the algorithm detects it using union-find. This takes $O(n)$ total time, or $O(n \log n)$ time using a simpler the algorithm based on a static binary tree.

Now my problem starts from there and I can't understand the continuation of the algorithm:

To compute the diameter sum for all potentially good partitions, we use Bentley’s logarithmic method maintains the diameter under a sequence of insertions. We insert the elements of $L_a$, into $S_a$, in order, to record the diameter of the set as it changes. We do the same for $L_b$. For each prefix of $L_a$ whose elements form a suffix of $L_b$, we add the two corresponding diameters. The minimum sum gives the best partition. Using the farthest-point Voronoi diagram as the basis for the logarithmic method, we get a data structure with query time and amortized insertion time both equal to $O(\log^2 n)$.

Anyone can help me to understand the last part of the algorithm to find out how the algorithm works?

ErroR
  • 210
  • 1
  • 10
  • What part(s) specifically within the last part don't you understand? E.g. is it Bentley's logarithmic method (see e.g. here or here)? The last part computes for each prefix of $L_a$ the diameter of that prefix, and likewise for $L_b$, in total time $O(n\log^2 n)$. By the observation from the previous part, this is enough to answer the question. – Neal Young Feb 21 '22 at 22:58
  • @NealYoung What is means "their order may differ in the two lists"? I can't understand very well this sentence. I think it say, elements that form a prefix of $L_a$ should be suffix of $L_b$ but I can't understand "their order may differ in the two lists". – ErroR Feb 22 '22 at 03:59
  • 2
    In $L_a$ they are ordered by distance from $a$. In $L_b$ they are ordered by distance from $b$. These orders are of course different, but also the reverse of these orders can be different. Try some examples. This needs to be taken into account when applying the previous observation, that the partition will consist of some prefix of $L_a$ and some prefix of $L_b$ such that prefix of $L_a$ and the prefix of $L_b$ exactly partition the set of points. (For example, it is not the case that, for each prefix of $L_a$, the points in the remaining suffix of $L_a$ will form a prefix of $L_b$.) – Neal Young Feb 22 '22 at 13:25
  • @NealYoung Thank you. Also, I can't understand why this algorithm finds an optimal solution? Can you please give me some explanation about the proof of correctness and optimality of the algorithm? I read many time the paper, but I can't find out. – ErroR Feb 23 '22 at 04:47
  • 2
    Jut, the proof in the paper seems correct, and relatively accessible. My general advice would be to find some courses in algorithms that you can take, where you can get some help learning to understand and work with proofs of correctness. – Neal Young Feb 23 '22 at 14:00
  • @NealYoung Can you address that where is the proof in the paper? Thank you. I can't find it. – ErroR Feb 28 '22 at 08:21
  • Lemma 3.1 and the following two paragraphs. It is informal, but complete. – Neal Young Feb 28 '22 at 13:07
  • @NealYoung Thank you I really appreciate your help. If you are agree, can you again read my post? I edit the post and I encounterd another problem. – ErroR Mar 26 '22 at 15:09
  • But there might not be any minimum-cost solution where the two parts have the same size. Or do you mean, among clusterings into two equal-size clusters, find one of minimum cost? This seems like a nice question, but its a new question so you'd have to make a new post. – Neal Young Mar 27 '22 at 17:03
  • @NealYoung Thank you for your suggestion. I post the new question at this link. My goal is to partition points into equal size clusters. You can see the question at https://cstheory.stackexchange.com/questions/51279/partition-points-into-two-equal-clusters – ErroR Apr 01 '22 at 17:48
  • Any help would be greatly appreciated. – ErroR Apr 01 '22 at 18:48

0 Answers0