7

I think this is the most easily understood topic in Kernel K Means Clustering. But assuming that I am not an expert in Machine Learning, can someone tell me how does someone calculate Kernel K means clustering centres?

From what I know, we take mean of all points in a cluster for normal k means. But in case of Kernel K means, we need to take mean of all points in feature space (which can be of infinite dimension). Certainly, for each kernel, its feature map is not known. How can someone then calculate kernel k means centres of clusters?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
pg2455
  • 333

2 Answers2

3

I think I found an answer. All you need to do in Kernel K means is to compute $$ C^{(t+1)}(i) = argmin_k \{K(x_i,x_i) -\frac{2}{N_k}{\Sigma_{l\epsilon C^{t}_k}}K(x_i,x_l) +\frac{1}{N_k^2} {\Sigma_{{l,{l`}}\epsilon C^{t}_k} }K(x_l,x_{l`})\} ...(1) $$

So this is the only operation that needs to be done. One need not to know each cluster center in high dimensional space. Just compute $(1)$ again and again till the algorithm converges.

Algorithm:

Step 1: Assign Random Cluster to points (Known as clsuter map $ C(i):= \{k: i\rightarrow k\}$ i.e point $i$ is assigned to cluster $k$

Step 2: For each point perform $(1)$ above and assign new $C(i)$.

Just to be more clear at this step:

$\rightarrow$After running this step for $(t-1)^{th} iteration $, you get a new $C^{(t)}(i)$ which will be used in (1) again to calculate $C^{(t+1)}(i)$

$\rightarrow$ So each iteration assigns new $C(i)$.Hence, $C^{(t)}(i)$ keeps on changing ( which is representative of cluster means).

Step 3: Repeat 2 above till the point assignments do not change or any of your error metric is stable. (I am not sure about the error metric that should be used)

New Point:

Each new point will be classified according to $(1)$ above.

pg2455
  • 333
  • I have heard also that this is correct but did you find the logic behind it ? I mean why dont we have to again compute the cluster means ? – roni Jul 08 '16 at 14:28
0

You can't compute the cluster centers. If we let $\Phi\::\: \mathbb{R}^p \to F$ be the map into the unknown feature space, then the center for a cluster $C_m$ is $$ \mu_m^\Phi = \frac{1}{n_m}\sum_{x_i \in C_m} \Phi(x_i),$$

where $n_m$ is the number of observations in cluster $C_m$. But, we don't know what $\Phi$ is (which is why we are using a kernel), and therefore we don't know $\mu_m^\Phi$.

The reason that we don't need to know $\mu_m^\Phi$ when performing kernel k-means is that during each iteration, a point $x$ is assigned to the cluster $C_m$ which minimizes

$$ \begin{align} ||\Phi(x) - \mu_m^\Phi||^2 &= \Phi(x)^\top\Phi(x) - \frac{1}{n_m}\sum_{x_i \in C_m}\Phi(x_i)^\top\Phi(x) - \frac{1}{n_m}\sum_{x_i \in C_m}\Phi(x)^\top\Phi(x_i) \\ &\phantom{=} + \frac{1}{n_m^2}\sum_{x_i \in C_m}\sum_{x_j \in C_m}\Phi(x_i)^\top\Phi(x_j) \\ &= K(x, x) - \frac{2}{n_m}\sum_{x_i \in C_m}K(x, x_i) + \frac{1}{n_m^2}\sum_{x_i \in C_m}\sum_{x_j \in C_m}K(x_i, x_j), \end{align} $$ assuming $K$ is a Mercer kernel.

jdb
  • 13
  • 4
  • "But, we don't know what $\phi$ is (which is why we are using a kernel)" this is incorrect. We know what $\phi$ is, at least analytically. We use a kernel since it doesn't require us to use $\phi$ (which may map to an infinite dimensional space). – helperFunction Oct 27 '21 at 13:17
  • @helperFunction could you provide a reference for this? Admittedly there is a bit of oversimplification in my answer since I'm not quite sure on the deeper parts, so it would be nice to get more information on $\phi$ – jdb Apr 02 '22 at 15:52
  • we usually know $\phi$ analytically. There's rich theory related to when we can find an analytical $\phi$ for a given kernel (read about Reproducing Kernel Hilbert Spaces etc). We use a kernel because it allows us to operate implicitly in the high-dimensional space given by $\phi$. For example, the RBF kernel has a $\phi$ which maps to an "infinite-dimensional" space, but the associated kernel is simple enough. – helperFunction Apr 02 '22 at 17:02
  • Yes, my question is moreso: for the RBF kernel, what is $\phi$, then? – jdb Apr 03 '22 at 05:51
  • have a look at this answer: https://stats.stackexchange.com/a/122633 – helperFunction Apr 03 '22 at 08:28