Proving that K-means corresponds to an EM algorithm?

Question

Just wanted to make sure that my proof is correct and that I am not missing anything in the process. Any thoughts?

" To demonstrate mathematically that the K-means algorithm corresponds to an algorithm of Expectation Maximization, we need to show that the K-means algorithm can be derived from the EM algorithm for a specific choice of the likelihood function and the prior distribution.

The K-means algorithm can be derived from the EM algorithm for a mixture of K Gaussians with equal covariance matrix. In this case, the likelihood function is given by:

$$p(x_n|z_n=k,\mu_k,\Sigma)=\frac{1}{(2\pi\sigma^2)^{D/2}}\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)$$

where $x_n$ is the $n$-th data point, $z_n$ is the latent variable indicating the cluster assignment of $x_n$, $\mu_k$ is the mean of the $k$-th cluster, $\Sigma$ is the covariance matrix, and $\sigma^2$ is the common variance.

The prior distribution is given by:

$$p(z_n=k)=\frac{1}{K}$$

which assumes that each cluster is equally likely.

The EM algorithm for this mixture of Gaussians is given by:

Initialize the means $\mu_k$ randomly.
E-step: Compute the posterior probability of the cluster assignment $z_n$ given the data $x_n$ and the current means $\mu_k$:

$$r_{nk}=\frac{\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)}{\sum_{j=1}^K\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_j\|^2\right)}$$

where $r_{nk}$ is the responsibility of the $k$-th cluster for the $n$-th data point.

M-step: Update the means $\mu_k$ using the responsibilities:

$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}$$

Repeat steps 2 and 3 until convergence.

Now, let's show that the K-means algorithm corresponds to this EM algorithm. The K-means algorithm is given by:

Initialize the means $\mu_k$ randomly.
Assignment step: Assign each data point to the closest cluster mean:

$$z_n=\arg\min_k\|x_n-\mu_k\|^2$$

Refitting step: Update the means $\mu_k$ to the average of the data points assigned to them:

$$\mu_k=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$

where $N_k$ is the number of data points assigned to the $k$-th cluster.

We can see that the K-means algorithm corresponds to the EM algorithm for the mixture of Gaussians with equal covariance matrix by setting $\sigma^2=\frac{1}{2}$ and $r_{nk}=1$ if $z_n=k$ and $r_{nk}=0$ otherwise. In this case, the likelihood function reduces to:

$$p(x_n|z_n=k,\mu_k)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}\|x_n-\mu_k\|^2\right)$$

which is the Gaussian distribution with unit variance. The prior distribution is given by:

$$p(z_n=k)=\frac{1}{K}$$

which assumes that each cluster is equally likely.

The E-step of the EM algorithm reduces to:

$$r_{nk}=\begin{cases}1 & \text{if }z_n=k\\0 & \text{otherwise}\end{cases}$$

which is equivalent to the assignment step of the K-means algorithm.

The M-step of the EM algorithm reduces to:

$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$

which is equivalent to the refitting step of the K-means algorithm.

Therefore, we have shown that the K-means algorithm corresponds to an algorithm of Expectation Maximization for a specific choice of the likelihood function and the prior distribution."

Please state exactly what is the statement you want to prove along with the necessary assumptions. Only then it's possible to prove or disprove the result. — utobi, Dec 18 '23 at 17:27
the literal prompt i was given is "prove that k-means corresponds to an EM algorithm". — Naomi Pomella, Dec 18 '23 at 22:18

score 3 · Answer 1 · answered Jan 11 '24 at 15:25

3

The proof is wrong because $\sigma^2 = 1/2$ allows $r_{nk}$ to take on values between 0 and 1. For the correct proof see Deriving K-means algorithm as a limit of Expectation Maximization for Gaussian Mixtures

answered Jan 11 '24 at 15:25

Tom Minka

7,060

Proving that K-means corresponds to an EM algorithm?

1 Answers1