0

Just wanted to make sure that my proof is correct and that I am not missing anything in the process. Any thoughts?

" To demonstrate mathematically that the K-means algorithm corresponds to an algorithm of Expectation Maximization, we need to show that the K-means algorithm can be derived from the EM algorithm for a specific choice of the likelihood function and the prior distribution.

The K-means algorithm can be derived from the EM algorithm for a mixture of K Gaussians with equal covariance matrix. In this case, the likelihood function is given by:

$$p(x_n|z_n=k,\mu_k,\Sigma)=\frac{1}{(2\pi\sigma^2)^{D/2}}\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)$$

where $x_n$ is the $n$-th data point, $z_n$ is the latent variable indicating the cluster assignment of $x_n$, $\mu_k$ is the mean of the $k$-th cluster, $\Sigma$ is the covariance matrix, and $\sigma^2$ is the common variance.

The prior distribution is given by:

$$p(z_n=k)=\frac{1}{K}$$

which assumes that each cluster is equally likely.

The EM algorithm for this mixture of Gaussians is given by:

  1. Initialize the means $\mu_k$ randomly.
  2. E-step: Compute the posterior probability of the cluster assignment $z_n$ given the data $x_n$ and the current means $\mu_k$:

$$r_{nk}=\frac{\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)}{\sum_{j=1}^K\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_j\|^2\right)}$$

where $r_{nk}$ is the responsibility of the $k$-th cluster for the $n$-th data point.

  1. M-step: Update the means $\mu_k$ using the responsibilities:

$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}$$

  1. Repeat steps 2 and 3 until convergence.

Now, let's show that the K-means algorithm corresponds to this EM algorithm. The K-means algorithm is given by:

  1. Initialize the means $\mu_k$ randomly.
  2. Assignment step: Assign each data point to the closest cluster mean:

$$z_n=\arg\min_k\|x_n-\mu_k\|^2$$

  1. Refitting step: Update the means $\mu_k$ to the average of the data points assigned to them:

$$\mu_k=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$

where $N_k$ is the number of data points assigned to the $k$-th cluster.

We can see that the K-means algorithm corresponds to the EM algorithm for the mixture of Gaussians with equal covariance matrix by setting $\sigma^2=\frac{1}{2}$ and $r_{nk}=1$ if $z_n=k$ and $r_{nk}=0$ otherwise. In this case, the likelihood function reduces to:

$$p(x_n|z_n=k,\mu_k)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}\|x_n-\mu_k\|^2\right)$$

which is the Gaussian distribution with unit variance. The prior distribution is given by:

$$p(z_n=k)=\frac{1}{K}$$

which assumes that each cluster is equally likely.

The E-step of the EM algorithm reduces to:

$$r_{nk}=\begin{cases}1 & \text{if }z_n=k\\0 & \text{otherwise}\end{cases}$$

which is equivalent to the assignment step of the K-means algorithm.

The M-step of the EM algorithm reduces to:

$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$

which is equivalent to the refitting step of the K-means algorithm.

Therefore, we have shown that the K-means algorithm corresponds to an algorithm of Expectation Maximization for a specific choice of the likelihood function and the prior distribution."

  • Please state exactly what is the statement you want to prove along with the necessary assumptions. Only then it's possible to prove or disprove the result. – utobi Dec 18 '23 at 17:27
  • the literal prompt i was given is "prove that k-means corresponds to an EM algorithm". – Naomi Pomella Dec 18 '23 at 22:18

1 Answers1

3

The proof is wrong because $\sigma^2 = 1/2$ allows $r_{nk}$ to take on values between 0 and 1. For the correct proof see Deriving K-means algorithm as a limit of Expectation Maximization for Gaussian Mixtures

Tom Minka
  • 7,060