I am trying to cluster Facebook users based on their likes.
I have two problems: First, since there is no dislike in Facebook all I have is having likes (1) for some items but for the rest of the items, the value is unknown and not necessarily zero (corresponding to a dislike). If use 0 for unknowns, then I think my clusters will be biased. Any suggestion?
Second, supposed I assign 0 to unknown items and cluster them, using a hierarchichal clustering method using a binary measure distance such as Jaccard, Tanimoto,...
How can I evaluate the clustering results? The within and outside SSE is not appropriate for binary data. If I use median centers, I m afraid most of them are going to be zero as I have a sparse feature matrix. So what would be a good way to evaluate the clusters?
Let's pretend our data is high dimensional and sparse and binary -- but the data and our objectives are completely unrelated to SNA. We want to cluster objects based on a large number of binary features. What's the approach, and is it possible to address the OP's concern about confusing '0' with 'NaN's?
– Aman Dec 03 '12 at 18:29