Just wanted to make sure that my proof is correct and that I am not missing anything in the process. Any thoughts?
" To demonstrate mathematically that the K-means algorithm corresponds to an algorithm of Expectation Maximization, we need to show that the K-means algorithm can be derived from the EM algorithm for a specific choice of the likelihood function and the prior distribution.
The K-means algorithm can be derived from the EM algorithm for a mixture of K Gaussians with equal covariance matrix. In this case, the likelihood function is given by:
$$p(x_n|z_n=k,\mu_k,\Sigma)=\frac{1}{(2\pi\sigma^2)^{D/2}}\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)$$
where $x_n$ is the $n$-th data point, $z_n$ is the latent variable indicating the cluster assignment of $x_n$, $\mu_k$ is the mean of the $k$-th cluster, $\Sigma$ is the covariance matrix, and $\sigma^2$ is the common variance.
The prior distribution is given by:
$$p(z_n=k)=\frac{1}{K}$$
which assumes that each cluster is equally likely.
The EM algorithm for this mixture of Gaussians is given by:
- Initialize the means $\mu_k$ randomly.
- E-step: Compute the posterior probability of the cluster assignment $z_n$ given the data $x_n$ and the current means $\mu_k$:
$$r_{nk}=\frac{\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_k\|^2\right)}{\sum_{j=1}^K\exp\left(-\frac{1}{2\sigma^2}\|x_n-\mu_j\|^2\right)}$$
where $r_{nk}$ is the responsibility of the $k$-th cluster for the $n$-th data point.
- M-step: Update the means $\mu_k$ using the responsibilities:
$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}$$
- Repeat steps 2 and 3 until convergence.
Now, let's show that the K-means algorithm corresponds to this EM algorithm. The K-means algorithm is given by:
- Initialize the means $\mu_k$ randomly.
- Assignment step: Assign each data point to the closest cluster mean:
$$z_n=\arg\min_k\|x_n-\mu_k\|^2$$
- Refitting step: Update the means $\mu_k$ to the average of the data points assigned to them:
$$\mu_k=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$
where $N_k$ is the number of data points assigned to the $k$-th cluster.
We can see that the K-means algorithm corresponds to the EM algorithm for the mixture of Gaussians with equal covariance matrix by setting $\sigma^2=\frac{1}{2}$ and $r_{nk}=1$ if $z_n=k$ and $r_{nk}=0$ otherwise. In this case, the likelihood function reduces to:
$$p(x_n|z_n=k,\mu_k)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}\|x_n-\mu_k\|^2\right)$$
which is the Gaussian distribution with unit variance. The prior distribution is given by:
$$p(z_n=k)=\frac{1}{K}$$
which assumes that each cluster is equally likely.
The E-step of the EM algorithm reduces to:
$$r_{nk}=\begin{cases}1 & \text{if }z_n=k\\0 & \text{otherwise}\end{cases}$$
which is equivalent to the assignment step of the K-means algorithm.
The M-step of the EM algorithm reduces to:
$$\mu_k=\frac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}=\frac{1}{N_k}\sum_{n:z_n=k}x_n$$
which is equivalent to the refitting step of the K-means algorithm.
Therefore, we have shown that the K-means algorithm corresponds to an algorithm of Expectation Maximization for a specific choice of the likelihood function and the prior distribution."