Red crosses represents the center of the cluster and the black points represent the data points. I have this hypothetical scenario where the K-means seems like is producing a bad clustering. Why would this happen and is there any way to fix it like, like if standardizing the data would fix this? I asked this question another time and it was closed due to lack of information, but please bear with me because I am a beginner and just trying to learn. Thanks in advance.
Asked
Active
Viewed 25 times
0
-
2If you are able to squeeze the vertical dimension so that the two clusters are about spherical - then the problem is likely solved: k-means will uncover the clusters. But this implies you already know how much and in what direction to squeeze the data cloud. In your example you can know this, because the clusters are clearly visible and there are few dimensions. Standardizing your data will hardly help because the total variances along both axes are already roughly equal. – ttnphns Oct 02 '20 at 20:39
-
1Please do not repost a closed question: that requires the community to start all over evaluating it and understanding it without benefit of any previous comments. – whuber Oct 02 '20 at 20:42
-
@whuber I am trying to learn? What is this? – Sachihiro Oct 02 '20 at 20:42
-
1See our [help] for how SE sites work. I closed your question again because, as far as I can tell, it is fully answered already in another thread. If that doesn't do the trick for you, then edit this post rather than posting yet another version of your question. – whuber Oct 02 '20 at 20:45
-
2Everyone knows that k-means "likes" approximately spherical clusters of similar size. If it were always simple to discover before the clustering what shape and size are clusters there! Since most of the time it is impossible to do, we have to apply different methods of clustering and then to compare their results for quality ("validate" the results). – ttnphns Oct 02 '20 at 20:48
-
1@whuber, I'm not quite sure the Q is a duplicate of that specific older Q. That Q was about gap statistic and about number of clusters - very different sound from the present one. (Though I'm not saying the present Q is very good and clear.) – ttnphns Oct 02 '20 at 20:55
-
@ttnphns I am very willing to be persuaded about that and have invited the OP many times now, in this thread and their previous one, to clarify what they are looking for. – whuber Oct 02 '20 at 20:56
-
The Q is more or less clear in its outline: What to do with these data shown in order K-means to cope and uncover the clusters. That's how I understood it. So was my two comments. Maybe (?) the Q is worth reopening. – ttnphns Oct 02 '20 at 21:01
-
2Sachihiro, I will depart at this point with a bit of advice: please read the help, because it explains that you can edit a closed question and, once edited, that question is brought to the attention of the whole community for review and reopening. @ttnphns One of the things the OP deleted in their original post was my request to indicate whether the apparent duplicate answered their question and, if not, to help us understand what help they need. I am still waiting for that information and you probably are too: let's not guess what somebody is trying to ask. – whuber Oct 02 '20 at 21:03
-
1Sachihiro, perhaps you choose to make your question more articulated? And are you asking about these particular data or what? – ttnphns Oct 02 '20 at 21:05
-
@ttnphns first of all let me thank you very much for your help, I am reading your comments and trying to make sense out of them because I want to learn for God's sake and I am trying to do my part, the problem is completely hypothetical and I just draw some hypothetical clusters trying to understand different scenarios. – Sachihiro Oct 02 '20 at 21:09
