How to choose more "balanced" metric?

Question

I have a set of features F and a set of items X. Each item x_i is a vector:

(w_i_1, ..., w_i_n)

where w_i_j is a weight of feature f_j in the item x_i.

For each item sum of weights is arbitrary, but each weight is in [0, 1] range (e.g. there are vectors not having any of the features or all of the features at max).

Graphically it could be shown as:

(purple are features, red/green - vectors, red/green spots - weights)

For now I compute the weight of the vector s_i as sum of it's components. That gives a bias towards heavy, but unbalanced vectors. What metric should I use to prefer more balanced vectors (covering more features) over heavy vectors (covering few features) - green over red?

In other words I want heavy features not to contribute that much into final result.

P.S. Inititally I am comparing vectors using

sum(by all features: min(feature i weight for Sm, feature i weight for Sn))

$\rho(s_n, s_m) = \sum_{f_i \in F} min(\omega_i^n, \omega_i^m)$

Now I am having a situation, when two items having some common heavy feature match and I don't want that. E.g. if a car is heavily red and an apple is heavily red, it doesn't mean that they are very similar. More matching features stand for higher similarity.

Please use math typesetting. It will make your question easier to read and more likely to attract answers. http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference — Sycorax, Sep 06 '16 at 19:05
@GeneralAbrial Tnx for sharing. Will migrate all formulas to it. — Denis Kulagin, Sep 06 '16 at 19:13
Could you explain what a "heavy" vector or attribute is and how you measure the "balance" of a vector? — whuber, Sep 07 '16 at 13:11

How to choose more "balanced" metric?

0 Answers0