1

I am trying to find a similarity measure for a very specific problem.

I have a list of cities that have some characteritisques. For example bilingual school, firemen stations, etc. These cities are represented as binary vectors where each element corresponds to a characteristic.

On the other side I have possible citizens. The citizens give a punctuation between 0 and 1 to the importance they give to each of the characteristics in the cities.

So I could have : Citizen 1 = {0.83, 0.3, 0, 1}

City 1 = {1, 0, 0, 1} City 2 = {1, 0, 1, 1}

I would like to find the best city for each citizen. The important thing would be to find cities with a 1 in the characteristics that the citizen cares about and we don't mind if the city has a zero when the citizen did not care or not care much ( less weight or not weight at all if the citizen does not care)

I have tried some distance (eg hamming, cosine) but they give weight when the citizen and the city have both zero in the same characteristic.

what would be a good similarity to use in this case?

Thanks

DroppingOff
  • 537
  • 1
  • 4
  • 12

1 Answers1

2

I assume by "similarity" you mean some score that measures how well a given city matches a given citizen. Think of each weight as the extent to which a citizen is happy when a city has the corresponding feature, or unhappy when it doesn't. Assuming happiness and unhappiness are both proportional to the weight, here's one possible score you could use:

Let binary vector $x = [x_1, \dots, x_n]$ represent a city, where $x_i \in \{0, 1\}$ denotes whether or not the city has feature $i$. Let weight vector $w = [w_1, \dots, w_n]$ represent the preferences of a citizen, where $w_i \in [0, 1]$ denotes how much the citizen cares about the presence of feature $i$. For each feature, we add happiness $w_i$ when feature $i$ is present, and subtract happiness $w_i$ when feature $i$ is absent. The happiness of the citizen with the city is then given by the dot product:

$$h(x, w) = w \cdot (2x - 1)$$

Of course, $2x-1$ is simply a copy of $x$ with the zeros replaced by negative ones. Since $h$ is a simple dot product, you could easily relate it to Euclidean or cosine distance if you were so inclined.

user20160
  • 32,439
  • 3
  • 76
  • 112
  • Hi,I have tried cosine and euclidean but they take into account when the citizen gives zero or low points to a characteristic.If the citizen did not give importantce to one characteristic, I would like to give it weight or little weight in the final outcome. I want to be ignored or more or less to be ignored whether the city has it or not. I – DroppingOff Jun 15 '18 at 10:02
  • $h$ as I described it should have the properties you want. What I mean about Euclidean/cosine distance is not that you should use Euclidean/cosine distance between the original $x$ and $w$ (clearly they won't have the desired properties). Rather, because all of these measures contain dot products between $w$ and $x$, it's easy to derive expressions that relate them. – user20160 Jun 15 '18 at 10:18