My assignment question is quoted: "2. Which set of variables best predicts handgrip strength in women? a. Reduce the number of continuous variables before doing the analysis."
I do not really know how to reduce the number of continuous variables. The variables I Have are: • Vnr = subject number • Sex = sex (0= female, 1= male) • Lft = age (years) • Leo = waist circumference (cm) • Sbd1 = systolic blood pressure (mmHg) • Dbd1 = diastolic blood pressure (mmHg) • Gluc = glucose (mg/dl) • Trig = triglycerides (mg/dl) • Hdl = hdl cholesterol (mg/dl) • Eberoepcal = energy expenditure during occupation (cal/wk) • Esportcal = energy expenditure during port (cal/wk) • Eavtcal = energy expenditure during leisure time (cal/wk) • PAL = physical activity level • Tkijk = tv time (hrs/wks.) • BIAP = vetpercentage via bio elektrische impedantie (%) • RUSTP = hartslag pols in rust (bpm) • VO2A = max oxygen consumption - absolute (L/min) • HGR= hand grip strength (kg)
So, first I probably need to reduce the ones that are not predicting/correlated enough with 'handgrip strength' to make the analysis I have to do after the reduction of continuous variables.
Secondly, which analysis would be the best to obtain 'the best set of predictors'?
I would solve this question in the following way;
- Dimensionality reduction: Apply a dimensionality reduction technique to the continuous variables to reduce their number. -> Principal Component Analysis (PCA).
- Predictive modeling: Use a predictive modeling technique to determine which of the reduced set of variables best predicts handgrip strength (HGR). -> Multiple linear regression.
- Model evaluation: Evaluate model using appropriate metrics (like Mean Squared Error for regression tasks) and cross-validation techniques to ensure its predictive performance.
Can anyone confirm this, or optimize it if I am wrong?