0

I have a project in school about clustering analysis. I have applied standardization and principal component analysis (PCA) to my dataset (I used K-means), which is about heart disease patients. I also have 3 clusters since it has a high silhouette score based on the Hyper-Tuning Method.

A scatter plot has been provided below, but I am unsure how to interpret it. The PCA identified two principal components (PCs) in my data:

PC1:

age 0.319710

cp 0.307450

thalach 0.446012

exang 0.373741

oldpeak 0.439388

slope 0.422569

PC2:

age 0.385531

sex 0.391836

trestbps 0.455142

chol 0.452708

Here is the scatter plot:

enter image description here

Definition of each variable: age - Age of patient

sex - Gender of patient

cp - chest pain type

trestbps - Resting blood pressure (in mm Hg on admission to the hospital)

chol - Serum cholesterol in mg/dl

thalach - Maximum heart rate achieved

exang - Exercise induced angina (1 = yes; 0 = no)

oldpeak - ST depression induced by exercise relative to rest

slope - The slope of the peak exercise ST segment

Here is my link for the google collab: K-Means-PCA-Standardization-ClusterAnalysis

I would like help understanding the scatter plot and interpreting the results of the PCA.

  • note that PCA with categorical variables (e.g. Sex) is meaningless. – utobi Jan 05 '23 at 08:45
  • Okay, this is noted. However, I still want to know the explanation behind this scatter plot based on the given PCs. – AK6000W Jan 05 '23 at 08:49
  • I am still having difficulty interpreting the results from the scatter plot generated by my clustering analysis project. I would appreciate any guidance or suggestions for how to begin understanding the plot. – AK6000W Jan 05 '23 at 08:57
  • Ok, so what do you understand so far? Tell us your interpretation of the plot, try. – user2974951 Jan 05 '23 at 09:00
  • Principal component analysis (PCA) was applied to my dataset, which was previously standardized, in order to identify the most important features. The scatter plot resulting from this analysis shows that certain attributes are present within clusters, indicating that they are similar to one another. In other words, the scatter plot illustrates the attributes that are most closely related or associated with each other in the dataset. However, I am still unsure how to interpret the results of the scatter plot beyond this basic understanding. – AK6000W Jan 05 '23 at 09:08

0 Answers0