I have a set of features F and a set of items X. Each item x_i is a vector:
(w_i_1, ..., w_i_n)
where w_i_j is a weight of feature f_j in the item x_i.
For each item sum of weights is arbitrary, but each weight is in [0, 1] range (e.g. there are vectors not having any of the features or all of the features at max).
Graphically it could be shown as:
(purple are features, red/green - vectors, red/green spots - weights)
For now I compute the weight of the vector s_i as sum of it's components. That gives a bias towards heavy, but unbalanced vectors. What metric should I use to prefer more balanced vectors (covering more features) over heavy vectors (covering few features) - green over red?
In other words I want heavy features not to contribute that much into final result.
P.S. Inititally I am comparing vectors using
sum(by all features: min(feature i weight for Sm, feature i weight for Sn))
$\rho(s_n, s_m) = \sum_{f_i \in F} min(\omega_i^n, \omega_i^m)$
Now I am having a situation, when two items having some common heavy feature match and I don't want that. E.g. if a car is heavily red and an apple is heavily red, it doesn't mean that they are very similar. More matching features stand for higher similarity.
