1

I need to find out to what extent two shapes are similar. I mean I've got two vectors of points - and just that, no shadows, color or whatever - simplest case. Two triangles are the perfect example. Of course this has to work regardless of what the angle shift is and regardless of their scale (only aspect ratio matters).

I know there is SIFT method but this seems too complicated for this simple use case. Is there more suitable method?

Updated question

The problem is I don't know the optimal measure too. I mean I have to check if the two shapes are similar not in terms of size or angle but shape. So in terms of those triangles what matters is angle. Of course these shapes not need to be so trivial - they can consist of curves where no angles are present. So far I have found that I can use Procrustes algorithm to eliminate the offset, size and angle but what then? There is Fréchet distance but it seems it won't work for any composite shapes - a square with both diagonals is a good example. I mean this isn't going to work when there are several ways of drawing a figure, is it?

I was also wondering if those shapes can be represented by graphs (so finite number of points) and then I could see what is the length of minimal path going through both points (hamiltonian cycle?) - but I'm not sure if it would work in all scenarios.

While Procrutes algorithm will work with any shape I can think of, Frechet distance is more problematic.

It works well for any shape which can be drawn in one stroke so there are no two ways of doing this. A good example is a triangle

enter image description here

the problem is visible here

enter image description here

as there are more than 1 way of ordering the pixels to get the desired shape. Is there a way to handle this as well with Frechet algorithm? If not, what algorithm should be used?

kboom
  • 133
  • 3
  • We're not likely to be able to answer this question without a notion of what you mean by similarity. There are many possible similarity measures; without specifying one, it's not clear what you are looking for. Have you looked at RANSAC? – D.W. Apr 08 '15 at 05:40
  • Please see my updated question. I'll look into it. – kboom Apr 08 '15 at 06:18
  • One way to think about similarity is to think about what kinds of transformations you think you want to allow without any penalty to the similarity metric. (Translation? Rotation? Scaling?) You can also think about what the "error model" is, i.e., what errors you expect might routinely happen and that shouldn't be penalized too badly (e.g., a point is deleted? a point is moved?). This might help you narrow down the metric. Another way to think about it is to think about how you will use the results of the metric, or look at examples of shapes you do/don't want to consider similar. – D.W. Apr 08 '15 at 18:38
  • Can you give an example to illustrate where Procrustes analysis and Procrustes distance is not enough? – D.W. Apr 08 '15 at 18:41
  • I would like to disregard translation, rotation and scaling and focus on the proportions of the shapes. The question is what data structure should I use to store shapes in the first place. A list of points (x and y coordinates) won't do any good since the order of the drawing does not matter. How many points should be there to ensure no corners are deformed? If it is a set of points I guess one or two points different are unlikely to happen and if so they should be left out - the user draws curves, not points so if a shape is malformed it is not a matter of single points. – kboom Apr 08 '15 at 18:47
  • Please see me updated question. Procrustes distance and analysis should work well, but the problem is Frechet won't – kboom Apr 08 '15 at 18:57
  • OK. I'm a bit confused, then. If Procrustes distance should work well, does that mean you've solved your own problem? If so, you might write an answer (it's OK to self-answer your question if you managed to solve it yourself). If there's a problem with Procrustes distance, what is the problem? I confess I'm also a bit confused about what the problem is that the triangle-square example is supposed to illustrate. Do you want the triangle to be considered similar to the square or not? What do you consider the vector of points to be, for the square? Seems to me the diagonal is irrelevant. – D.W. Apr 08 '15 at 19:03

1 Answers1

0

One approach is to apply Procrustes analysis to superimpose the shapes, then use the sum of the squares distances between each pair of corresponding points (i.e., a L2 norm). This is apparently sometimes called Procrustes distance. Here you will use the set of "corners" of the shape as your set of points.

One disadvantage of this approach is that it is not robust in case one or a few points are deleted: maybe shape $S_2$ is the same as shape $S_1$ except that it is missing a single point. If you want to be robust to this kind of situation, you could combine RANSAC with Procrustes distance. Pick a random sample of 3 consecutive points from $S_1$, and 3 consecutive points from $S_2$; use Procrustes analysis to find the best map $M$ to superimpose those points; apply $M$ to $S_1$ to get $M(S_1)$, and use some kind of edit distance to measure the similarity between the vector of points of $M(S_1)$ and vector of points of $S_2$ (you might assign a large cost $c_\text{del}$ to deleting a point, a large cost $c_\text{ins}$ to inserting a point, and make the cost of matching a point $p$ to a point $q$ be the squared distance $||p-q||^2$ between the two points, then apply standard dynamic programming algorithms to compute the edit distance). Now repeatedly apply this sampling procedure. Keep the best map $M$ you find, where you can determine what you consider best (e.g., the lowest edit distance; or some other metric).

There are probably many other possible distances. For instance, instead of the sum of squared distances, another possibility is to compute $\theta_i$ (the angle between $p_i,p_{i+1},p_{i+2}$ from the first shape) and $\theta'_i$ (the angle between $q_i,q_{i+1},q_{i+2}$ from the second shape), then some norm on these angles: e.g., $\sum_i (\theta_i - \theta'_i)^2$. This too could be made robust to a few mismatched points using RANSAC. There isn't enough information in the question to select between these, so without more knowledge, the best I can suggest is to simply try these on a set of examples and see which one seems to give results that most closely match what you are looking for.

None of this will work if your shapes $S_1$ and $S_2$ look "similar" but they have a very different number of points. For instance, you mention the case where $S_1$ is a perfect square (with crisp edges where the angle between all 4 pairs of 3 points is exactly 90 degrees) made out of 20 points total and $S_2$ is a user-drawn square consisting of 300 points without crisp corners and no 3 consecutive points which align at 90 degree angle. The methods above will not work at all for that kind of situation.

Incidentally, you mention the issue of shapes that can be drawn in one vs those that cannot. I think your representation of a shape as a vector of points implicitly assumes that the shape can be drawn in one stroke (since the points appear in the vector in the order that they're visited by the stroke, and each point is assumed to be connected to the one after it), so if you want to deal with shapes that cannot be drawn in a single stroke, you'll probably need to find a different representation of the shape.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • The second approach seems okay for my needs but what happens when there is a perfect square (with crisp edges where the angle between all 4 pairs of 3 points is exactly 90 degrees) made out of 20 points total as a reference and the user-drawn square consisting of 300 points which won't have crisp corners and we cannot pick 3 consecutive points which align at 90 degree angle. Will it work also for this scenario? I mean in the second case there is no specific curves to base Procrustes distance on – kboom Apr 08 '15 at 20:45
  • @kboom, that's a good example. The methods outlined in my answer won't work for that kind of situation: they'll totally break. If you expect that kind of situation to be common, we could look at metrics that might work well for that case -- this kind of example is helpful in guiding the design of metrics. – D.W. Apr 08 '15 at 21:35