5

As a part of my studies, I’m trying to cluster co-occurrences of URLs and tags in data from Delicious. I found a promising method for this in a paper called “Emergent Semantics from Folksonomies: A Quantitative Study” (pages 6-13). It used a Separable Mixture Model (SMM, described in the paper “Statistical Models for Co-occurrence Data” pages 2-4) to model the data and an adapted EM-algorithm to fit the known data to the model.

I coded the algorithm with Java and ran it with a little piece of real data from Delicious. Unfortunately, the results did not seem correct. The results showed that each tag had equal (although varying from tag to tag) possibility to belong to each concept.

Now, while this problem could have came from me simply coding the adapted EM-algorithm wrong, I would also like to rule out the possibility of incorrectly initialized variables. This time, since I didn’t know any better way to do it, I simply initialized all the $R_{r\alpha}$ (variables that denote the possibility of co-occurrence $r$ to have raised from concept $\alpha$) to be equal, $1/K$ ($K$ being the number of concepts).

My question is two-fold.
Could the flat results come from the flat variable initialization?
How should I initialize the variables from the EM-algorithm in this case?

DJ Pirtu
  • 153
  • 1
  • 6

2 Answers2

5

Whenever I implement a new algorithm, I get myself an easy, interpretable dataset on which I can try it out. This has several advantages, for example runtime (use a small dataset) or visualization (make things you want to plot have dimension 2 or 3).

Of course the behavior you see could result from the dataset. It feels however unlikely to me. However, a standard way to initialize variables is to just randomize them uniformly in an interval like [-1, 1], [-0.1, 0.1] or something like that. I have seen standard normal as well.

I am not sure if it works for SMMs, but for Gaussian Mixture Models and PCA mixtures, it makes sense to run a few iterations of K-Means before you go into EM (and use the centers as responsibilities). Maybe you want to try that.

bayerj
  • 13,773
  • Well, I do not know how to fit the K-Means, which assumes the observations to be a d-dimensional real vector, to a co-occurance data. Also, since we're talking probabilities here, random initialization can be a bit of a pain, even tough I did end up using it. But doing some table testing with a dataset of just 5 observations did clear things up for me. Other than that, good general advices. – DJ Pirtu Apr 19 '11 at 07:57
0

To asnwer the first part of my question, does a flat initial guess lead to flat data, the answer would be "yes". Not only does having a flat guess flatten the result, it also makes it unchanging (a fact I missed thanks to a small error in my algorithm).

Here's a proof:
Assuming that $\langle R_{r\alpha} \rangle^{(t)} = x$ for all $r$ and $\alpha$.
$\hat{\pi}_{\alpha}^{(t)} = \frac1{L}\sum_{r=1}^{L}\langle R_{r\alpha} \rangle^{(t)} = \frac{Lx}{L} = x$ for each $\alpha$.

$\hat{p}_{i|\alpha}^{(t)} = \frac1{L\hat{\pi}_\alpha^{(t)}} \sum_{r:i(r)=i} \langle R_{r\alpha} \rangle^{(t)} = \frac{n_ix}{Lx} = \frac{n_i}{L}$ for each $\alpha$, $n_i$ is the number of co-observations where $r:i(r)=i$ is true.

$\hat{q}_{j|\alpha}^{(t)} = \frac1{L\hat{\pi}_\alpha^{(t)}} \sum_{r:j(r)=j} \langle R_{r\alpha} \rangle^{(t)} = \frac{n_jx}{Lx} = \frac{n_j}{L}$ for each $\alpha$, $n_j$ is the number of co-observations where $r:j(r)=j$ is true.

From these, we can calculate
$\langle R_{r\alpha} \rangle^{(t+1)} = \frac{\pi_{\alpha}^{(t)}\hat{p}_{i(r)|\alpha}^{(t)}\hat{q}_{j(r)|\alpha}^{(t)}}{\sum_{v=1}^K\pi_{v}^{(t)}\hat{p}_{i(r)|v}^{(t)}\hat{q}_{j(r)|v}^{(t)}} = \frac{x(n_{i(r)}n_{j(r)})/L^2}{Kx(n_{i(r)}n_{j(r)})/L^2} = \frac1{K}$ for all $r$ and $\alpha$.

And thus, $R_{r\alpha}$ is again a constat for the next round of iteration.

DJ Pirtu
  • 153
  • 1
  • 6