2

I have a set of features F and a set of items X. Each item x_i is a vector:

(w_i_1, ..., w_i_n)

where w_i_j is a weight of feature f_j in the item x_i.

For each item sum of weights is arbitrary, but each weight is in [0, 1] range (e.g. there are vectors not having any of the features or all of the features at max).

Graphically it could be shown as:

enter image description here

(purple are features, red/green - vectors, red/green spots - weights)

For now I compute the weight of the vector s_i as sum of it's components. That gives a bias towards heavy, but unbalanced vectors. What metric should I use to prefer more balanced vectors (covering more features) over heavy vectors (covering few features) - green over red?

In other words I want heavy features not to contribute that much into final result.

P.S. Inititally I am comparing vectors using

sum(by all features: min(feature i weight for Sm, feature i weight for Sn))

$\rho(s_n, s_m) = \sum_{f_i \in F} min(\omega_i^n, \omega_i^m)$

Now I am having a situation, when two items having some common heavy feature match and I don't want that. E.g. if a car is heavily red and an apple is heavily red, it doesn't mean that they are very similar. More matching features stand for higher similarity.

  • 1
    Please use math typesetting. It will make your question easier to read and more likely to attract answers. http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference – Sycorax Sep 06 '16 at 19:05
  • 1
    @GeneralAbrial Tnx for sharing. Will migrate all formulas to it. – Denis Kulagin Sep 06 '16 at 19:13
  • Could you explain what a "heavy" vector or attribute is and how you measure the "balance" of a vector? – whuber Sep 07 '16 at 13:11

0 Answers0