3

I need to find a similarity measurement between two arrays of data. You can call similarity measurement whatever you want, difference, correlation or whatever.

For example:

 1, 2, 3, 4, 5 < Series 1
 2, 3, 4, 5, 6 < Series 2

Should be far more similar to each other than these 2 series:

 1, 2, 3, 4, 5 < Series 1
 1, 1, 5, 8, 7 < Series 2

Any suggestions?

Is there a source code available for it?

Rob Kennedy
  • 159,194
  • 20
  • 270
  • 458
EBAG
  • 20,309
  • 13
  • 52
  • 92
  • This has nothing to do with C++ and everything to do with math. – Nikolai Fetissov Dec 03 '11 at 21:17
  • Maybe better on [Stats.SE](http://stats.stackexchange.com/). – dmckee --- ex-moderator kitten Dec 03 '11 at 21:17
  • 2
    EBAG: this is better than your [last question](http://stackoverflow.com/questions/8370857/efficient-algorithm-to-calculate-correlation-between-two-arrays), but still hard to answer precisely. Maybe try looking [here](http://en.wikipedia.org/wiki/Category:String_similarity_measures). The problem is "similarity" is a human concept, not a technical one. To choose an algorithm you need to be more specific about the data, the use of the similarity algorithm, and your expectations. – tenfour Dec 03 '11 at 21:20
  • @NikolaiNFetissov: I think he wants answer in c++ – Daniel Dec 03 '11 at 21:25

3 Answers3

2

You can calculate the sample Pearson product-moment correlation coefficient: "The above formula suggests a convenient single-pass algorithm for calculating sample correlations". Write a loop to calculate sum(xi), sum(yi), sum(xi^2), sum(yi^2), and sum(xi*yi). Then insert these sums into the formula.

kol
  • 26,464
  • 11
  • 74
  • 113
  • Would you mind going into more detail? Perahps with an example? Explain to me like i'm five ;) – Anders Nov 06 '13 at 12:14
  • Found this excellent answer with code, although C#: http://stackoverflow.com/questions/17447817/correlation-of-two-arrays-in-c-sharp – Anders Nov 06 '13 at 12:29
0

Another way to do this is to calculate mutual information, there is a toolbox for this in matlab and C http://www.cs.man.ac.uk/~pococka4/MIToolbox.html

cody
  • 1
0

If your definition of similarity is how much same elements there are you can use set intersection:

std::multiset<int> Series1 = std::multiset({ 1, 2, 3, 4, 5 });
std::multiset<int> Series2 = std::multiset({ 2, 3, 4, 5, 6 });
std::multiset<int> Intersection;

std::set_intersection(Series1.begin(), Series1.end(),
                      Series2.begin(), Series2.end(),
                      std::back_inserter(Intersection));

int similarity = Intersection.size(); // = 4
Daniel
  • 29,121
  • 15
  • 79
  • 134