I have a project in school about clustering analysis. I have applied standardization and principal component analysis (PCA) to my dataset (I used K-means), which is about heart disease patients. I also have 3 clusters since it has a high silhouette score based on the Hyper-Tuning Method.
A scatter plot has been provided below, but I am unsure how to interpret it. The PCA identified two principal components (PCs) in my data:
PC1:
age 0.319710
cp 0.307450
thalach 0.446012
exang 0.373741
oldpeak 0.439388
slope 0.422569
PC2:
age 0.385531
sex 0.391836
trestbps 0.455142
chol 0.452708
Here is the scatter plot:
Definition of each variable: age - Age of patient
sex - Gender of patient
cp - chest pain type
trestbps - Resting blood pressure (in mm Hg on admission to the hospital)
chol - Serum cholesterol in mg/dl
thalach - Maximum heart rate achieved
exang - Exercise induced angina (1 = yes; 0 = no)
oldpeak - ST depression induced by exercise relative to rest
slope - The slope of the peak exercise ST segment
Here is my link for the google collab: K-Means-PCA-Standardization-ClusterAnalysis
I would like help understanding the scatter plot and interpreting the results of the PCA.
