27

It is known that given an $n$-point subset of $\ell_2^d$ (that is, given $n$ points in ${\mathbb R}^d$ with Euclidean distance) it is possible to embed them isometrically in $\ell^{n\choose 2}_1$.

Is the isometry computable in (possibly, randomized) polynomial time?

Since there are finite-precision issues, the precise question is

Given a set $X$ of $n$ points in ${\mathbb R}^d$ and $\epsilon >0$, is there a mapping $f: X \to {\mathbb R}^{n\choose 2}$ computable (possibly, using randomness) in time polynomial in $n$ and logarithmic in $1/\epsilon$ such that for every $x,y\in X$ we have

$|| f(x)-f(y)||_1 \leq ||x-y||_2 \leq (1+ \epsilon) \cdot || f(x)-f(y) ||_1$

(Note: I am aware that a mapping with distortion $(1+\epsilon)$ can be found with high probability in time polynomial in $n$ and $1/\epsilon$ by projecting on $O(\epsilon^{-2} \cdot \log n)$ random lines, but I am not sure if the number of dimensions can be constructively reduced to $n\choose 2$ or even $O(n^2)$ when $1/\epsilon$ is much larger than $n$, and I don't know if there is a polynomial time method to handle the case in which $1/\epsilon$ is exponential in $n$.)

Luca Trevisan
  • 4,912
  • 28
  • 34
  • 1
    this is a very nice question. @Luca, do you suspect it might be hard ? (of course my first thought was to look over 'Hamming meets Euclid', and then I saw the identity of the questioner :) – Suresh Venkat Apr 02 '11 at 04:54
  • I would guess that an algorithm exists, but a lot of seemingly easy problems are hard in L1. For example, if you are given the Euclidean distances between a finite set of points, then you can find an embedding that realizes those distances using SDP, but apparently it is hard to do so in L1. (In fact, I am told that it is NP-hard to check if a finite metric can be realized as L1.) – Luca Trevisan Apr 02 '11 at 05:38
  • 1
    This reference appears to be related: Pjotr Indyk, "Uncertainty principles, extractors, and explicit embeddings of l2 into l1", Proc. STOC'07. – Martin Schwarz Apr 02 '11 at 07:16
  • There's something weird about your variables: you're using n to mean number of points some times, and Euclidean dimension some times. It's easy to embed n d-dimensional Euclidean points in $L_1^{\binom{n}{d}}$: just use one $L_1$ dimension for each of the $\binom{n}{d}$ ways of partitioning the points by a hyperplane, where the points on one side of the partition get coordinate value zero and the points on the other get coordinate = probability that a random hyperplane forms that partition. But you seem to be saying it can be reduced to $\binom{n}{2}$ or $\binom{d}{2}$. Which one? – David Eppstein Apr 02 '11 at 19:19
  • 2
    @David: $n$ is the number of points, I corrected the place where I used $n$ for the dimension. That $n$ points in Euclidean space (of any dimension) can be embedded isometrically in $\ell_1^{n \choose 2}$ is proved here: http://www-math.mit.edu/~goemans/18409-2006/lec1.pdf but rather non-constructively.(Carathéodory's theorem to go from finite but large dimension to dimension ${n \choose 2}$ with arbitrarily small error, and a compactness argument to go from arbitrarily small error to zero error.) – Luca Trevisan Apr 02 '11 at 20:26
  • 1
    @Martin: thanks for the reference. Piotr's paper deals with the harder problem of mapping all of $\ell_2^d$ (not just a fixed set of $n$ points) to $\ell_1^m$. For this problem, I believe that it is an open problem even to achieve, constructively, $m = poly(d,1/\epsilon)$ and distortion $(1+\epsilon)$. (Piotr gets $m = d^{O(\log d)}$ and $\epsilon = 1/d$.) – Luca Trevisan Apr 02 '11 at 20:41
  • 1
    @LucaTrevisan: re: the hardness of embedding in l1, this is true (it's mentioned in Chapter 1 or 2 of the Deza and Laurent book - I think via MAX CUT) – Suresh Venkat Apr 02 '11 at 21:12
  • Thanks for the pointer to Goemans' notes. I think the argument I gave for $\ell_2^d$ to $\ell_1^{\binom{n}{d}}$ together with Caratheodory gives an isometric embedding in $\ell_1^{\binom{n}{2}}$ directly, with no approximation and no need to use compactness. – David Eppstein Apr 02 '11 at 22:07
  • @David, @Luca: If I understand correctly, in the David's construction, we need to specify how to introduce a probability distribution over all hyperplanes. This is, I think, what an argument in Goemans' lecture notes essentially does. I'm not sure if David's construction yields no epsilon. By the way, the Deza-Laurent book gives another method by taking a d-dimensional sphere tangent to the d-space in which the points lie (in the (d+1)-dimensional ambient space), projecting each point on the sphere in the inverse stereographic way, taking a probability distribution in a suitable way, ... – Yoshio Okamoto Apr 03 '11 at 03:22
  • There is a unique (up to scale) measure on the space of hyperplanes that is invariant under Euclidean congruences. I was sloppy when I referred to it earlier as a probability since it is unbounded. Under this measure, the length of any curve is proportional to the measure of the hyperplanes that cross it (counting multiplicity for the hyperplanes that cross more than once); in particular the distance between two points is the measure of the set of hyperplanes that split them. The coordinate value for each partition of the points is just the measure of the hyperplanes forming that partition. – David Eppstein Apr 03 '11 at 03:40
  • @David: "Under this measure, the length of any curve is proportional to the measure of the hyperplanes that cross it." That sounds a bit off. Surely you should only count perpendicular crossings? – Per Vognsen Apr 03 '11 at 11:13
  • 1
    No, all crossings. The perpendicular crossings have measure zero. This is standard integral geometry. – David Eppstein Apr 03 '11 at 15:40
  • since the measure is invariant under Euclidean congruences, wouldn't every partition have the same coordinate value assigned to it ? – Suresh Venkat Apr 03 '11 at 15:54
  • 1
    No. For instance, consider the points a=(0,0), b=(1,0), c=(0,1), and d=(1/2+ε,1/2+ε). The partition {a}/{b,c,d} is a lot bigger than the partition {a,b,c}/{d}. Even simpler, for Euclidean points in one dimension, we get an L1 coordinate for each gap between consecutive points: the points earlier than the gap get coordinate value zero and the points after the gap get coordinate value equal to the gap length. – David Eppstein Apr 03 '11 at 16:04
  • The measure of the hyperplanes cutting a set of points to form a particular partition is the same as the measure of the points in a particular cell of the hyperplane arrangement dual to the points (for the right choice of duality). – David Eppstein Apr 03 '11 at 16:10
  • Ah I see. the dual argument makes it a little clearer – Suresh Venkat Apr 04 '11 at 08:11
  • 1
    I was browsing this question as an "unanswered" one, and it seems like @DavidEppstein answered it ! could you post your assembled comments as an answer ? – Suresh Venkat Sep 04 '11 at 16:21
  • Ok, I'll do that. – David Eppstein Sep 04 '11 at 23:37

1 Answers1

8

Suresh asked me to assemble my comments above into an answer, so here it is. I'm not really sure it's an answer to the original question, though, since it's not obvious how to make it polynomial time when the dimension of the input Euclidean space is non-constant. It does at least have the advantage of avoiding any problem with large $1/\epsilon$ as the original question asks, because it doesn't involve any approximation, and it looks polynomial for constant $d$.

Anyway: from integral geometry, there is a standard measure on sets of hyperplanes in $d$-dimensional Euclidean space that is invariant under Euclidean congruences. It has the property that the length of any bounded-length curve $C$ is proportional to the measure of the hyperplanes that cross $C$ (with multiplicity, meaning that if a hyperplane crosses $C$ twice then it contributes twice to the total measure of the hyperplanes crossing $C$). In particular if $C$ is a line segment then the multiplicity complication doesn't arise and we can normalize the measure on the hyperplanes crossing $C$ to be exactly the length of $C$. (The hyperplanes containing $C$ have measure zero, so don't worry about infinite multiplicity.)

Now, given a set of n points in d-dimensional space, make an $\ell_1$ coordinate for each of the partitions of the points into two subsets induced by a hyperplane that doesn't pass through any of the points. Give the points on one side of the partition coordinate value zero and the points on the other side of the partition coordinate value equal to the measure of the set of hyperplanes inducing that partition.

If $p$ and $q$ are any two of the $n$ points, let $K$ be the set of hyperplanes crossing line segment $pq$, and let $K_i$ be the subsets of $K$ formed by each possible hyperplane partition that has $p$ on one side and $q$ on the other. Then $K$ is the disjoint union of the $K_i$, and the coordinate differences between $p$ and $q$ are just the measures of the subsets $K_i$. Therefore, the $\ell_1$ distance between the coordinatizations of $p$ and $q$ (the sum of the measures of the $K_i$) is the measure of $K$, which is just the original $\ell_2$ distance between $p$ and $q$.

For computational geometers, an alternative description of the same construction may be helpful: use projective duality to transform the $n$ input points into $n$ hyperplanes, and separating hyperplanes into points. The integral geometry measure on sets of hyperplanes is then transformed into a more standard measure on sets of points, the distance between $p$ and $q$ dualizes to the measure of the double wedge between two hyperplanes, and the hyperplane arrangement partitions this double wedge into smaller cells. The coordinate value for a point is either the measure of one of the cells in the arrangement (if the dual hyperplane is below that coordinate's cell) or zero (if the dual hyperplane is above the cell). Therefore, the $\ell_1$ distance between $p$ and $q$ is just the sum of the measures of the cells in the double wedge, which is the same as the measure of the whole double wedge. This dual point of view also makes it easy to calculate the dimension of the embedding found in this way: it's just the number of cells in the hyperplane arrangement, which is $O(n^d)$, or more precisely at most $\sum_{i=0}^d{n\choose i}$.

So far, this gives a completely deterministic and exact embedding in $\ell_1^{O(n^d)}$. But we wanted a smaller dimension, $\ell_1^{n\choose 2}$. Here's where Luca's comment about Carathéodory's theorem comes in. The set of $\ell_1$-representable metrics forms a polyhedral cone in the $n\choose 2$-dimensional space of all functions from unordered pairs of points to real numbers, and the geometric argument above says that the Euclidean metric belongs to this cone. The points on the extreme rays of the cone are one-dimensional $\ell_1$ pseudometrics (in which the points are split into two sets, all distances within a single set are zero, and all distances across the split are equal), and Carathéodory says that any point within the cone (including the one we care about) can be represented as a convex combination of points on extreme rays whose number is at most the dimension of the ambient space, $n\choose 2$. But a convex combination of at most $n\choose 2$ one-dimensional $\ell_1$ metrics is a $\ell_1^{n\choose 2}$ metric.

Finally, how can we go about actually computing the $n\choose 2$-dimensional embedding? At this point we have not just a point in the $n\choose 2$-dimensional convex cone of $\ell_1$ metrics (the distance metric we started with), but we also have a set of $O(n^d)$ extreme points of the cone (corresponding to the partitions of the input into two subsets induced by hyperplanes) such that our metric is a convex combination of these extreme points — for small $d$, this is a big improvement over the $2^n-2$ extreme rays that the cone has overall. Now all we need to do is apply a greedy algorithm that gets rid of extreme points from our set, one by one, until only $n\choose 2$ of them are left. At each step, we need to maintain as an invariant that our metric is still inside the convex hull of the remaining extreme points, which is just a linear programming feasibility problem, and if we do this Carathéodory will ensure that there always remains a set of $n\choose 2$ extreme points whose convex hull contains the input metric.

David Eppstein
  • 50,990
  • 3
  • 170
  • 278