Find Euclidean sub-distances for a given distance matrix

Question

Assume I have a matrix $(d_{ji})_{ij}$ of distances between points $i$ and $j$. These distances could be anything fulfilling the triangle inequality.

Now I would like to find coordinates $(x_i,y_i)$ for each $i$, so that the Euclidean distances are always less or equal to the real ones:

$$ \varepsilon_{ij} = d_{ij} - \sqrt{(x_i-x_j)^2 + (y_i-y_j)^2} \geq 0$$

Furthermore, the sum $\sum_{ij} \varepsilon_{ij}$ should be minimal.

How could one approach such a non-linear problem?

EDIT: I would also be interested in other, similar objective functions that somehow minimize the $\varepsilon_{ij}$. The important for me is that the "approximation" fulfils the inequalities stated above.

How large a problem (dimension) do you need to solve? How quickly do you need to solve it? Would you settle for a local optimum which may or may not be globally optimal? — Mark L. Stone, May 22 '20 at 17:34
The matrix has the maximal size $200\times 200$. To be useful, it should solve in under one minute. A heuristic that finds good solutions would also be of interest. — J Fabian Meier, May 22 '20 at 21:04
This looks a lot like a Euclidean Distance Matrix Completion problem (see https://www.lix.polytechnique.fr/~liberti/dgp-siam.pdf for a review). Solving it exactly is equivalent to a low-rank optimization problem, which is currently intractable, but at size 200x200 you can usually get pretty good performance from taking a semidefinite relaxation and minimizing the trace of the Gram matrix. — Ryan Cory-Wright, May 23 '20 at 18:18
If you care about finding the underlying points in addition to the Gram matrix then this paper by Biswas and Ye might also be a good starting point http://www.optimization-online.org/DB_FILE/2006/07/1436.pdf — Ryan Cory-Wright, May 23 '20 at 18:19

Kevin Dalmeijer · Answer 1 · 2020-05-21T16:13:23.043

Your problem seems to be similar to Multidimensional Scaling (MDS). In MDS, the goal is to represent multidimensional data as good as possible in fewer dimensions.

If high-dimensional distances are represented in $\mathbb{R}^2$, classical MDS corresponds to the problem $$\min_{x,y} \sum_{ij} \varepsilon_{ij}^2,$$ using your definition of $\varepsilon_{ij}$. This problem allows for a closed-form solution based on eigendecomposition.

In your case, there are two differences. First, you have the additional constraints $\varepsilon_{ij} \ge 0$, which are convex in terms of $x$ and $y$. Second, you have a different objective, which is not convex in $x$ and $y$. I don't know if there are MDS generalizations that specifically consider these things, but it could be a good starting point.

Actually, I would also be interested in similar other objectives (like the one you gave). The conditions need to be as they were stated. — J Fabian Meier, May 21 '20 at 16:41

Ggouvine · Answer 2 · 2020-05-23T17:59:10.920

1

Your constraint defines a second order cone, which means it is convex. You can solve it with a specialized solver (for example one listed on the Wikipedia page), although any convex solver might work if the problem is not too hard.

The equivalent SOCP formulation would be: \begin{align}\min& \quad\sum_{ij} \varepsilon_{ij}\\\text{s.t.}& \quad\lVert X_i - X_j \rVert_2 \leq d_{ij} - \varepsilon_{ij}\end{align} where $X = (x, y)$.

EDIT: There is nothing in the above formulation that forces $\varepsilon_{ij}$ to be equal to $d_i - \lVert X_i - X_j \rVert_2 \leq d_{ij}$: it doesn't match the original problem. Such a constraint would make it non-convex. Thanks to Mark L. Stone and Paul Rubin for pointing it out in the comments.

edited May 23 '20 at 17:59

answered May 22 '20 at 17:19

Ggouvine

1,877
7
13

1

No. Although the constraints are (convex) SOCP constraints, the objective (as currently proposed) is concave. That makes this a Concave Programming problem. There are some specialized solvers, such as KNITRO, which have special processing for SOCP constraints, even for problems which overall are not convex. i don't know whether such specialized treatment of SOCP constraints would help for this problem. – Mark L. Stone May 22 '20 at 17:32
Thanks for your comment. I don't understand your assertion that the objective is non-convex: it seems to me that the goal to minimize $\sum_{ij} \varepsilon_{ij}$ makes it linear (and therefore convex) – Ggouvine May 22 '20 at 17:38
$\epsilon_{ij}$ is linear minus a norm. Norm is convex. That makes $\epsilon_{ij}$ concave. – Mark L. Stone May 22 '20 at 17:50
Isn't a linear expression (of any sign) + a norm the definition of SOCP ? The trick is that any norm combined in a linear constraint defines a convex set – Ggouvine May 22 '20 at 17:57
The objective is CONCAVE. Combined with convex constraints, that makes it s Concave Programming problem. – Mark L. Stone May 22 '20 at 18:13
1

I edited to add the formulation. Could you please elaborate how the LINEAR objective is an issue, or how the constraint does not define a second order cone? – Ggouvine May 23 '20 at 07:49
The objective function Is NOT LINEAR in the decision (optimization) variables.. $\epsilon_{iJ}$ are nonlinear, and in fact, concave, functions of the decision variables $x_i,y_j$. – Mark L. Stone May 23 '20 at 11:44
Ok I understand where you're coming from. Indeed, $\varepsilon = d - |x|$ would be concave. You need to consider $\varepsilon_{ij}$ as decision variables as well. Then the constraint $\varepsilon \geq d - |x|$ defines a convex cone, on which we minimize a linear objective. – Ggouvine May 23 '20 at 12:22
1

NO!!!! You can not remove the non-convexity like that. Making $\epsilon_{ij}$ decision variables would make the objective linear, but you would need to add non-convex constraint(s) to relate $\epsilon_{ij}$ to the (other) decision variables $x_i,y_j$. Your proposed constraint (i.e., epigraph formulation): $\epsilon \ge d - |x|$ is a non-convex constraint, so it does NOT define a convex cone. The inequality is going in the wrong direction to be convex. – Mark L. Stone May 23 '20 at 13:59
2

I don't think this formulation is equivalent to the original problem. The $\epsilon$ variables must be nonnegative (else this version is unbounded). That being the case, $X=0$, $\epsilon=0$ is a trivial optimal solution here, but does not approximate the given distances. – prubin May 23 '20 at 14:25
Thanks for pointing this out! Indeed this does not match the original problem and the equality constraint seems to be needed, which is not convex. – Ggouvine May 23 '20 at 18:01

Find Euclidean sub-distances for a given distance matrix

2 Answers2