1

I have a dataset of 18 variables (and 170 observations), contains 16 continuous variables. I have calculated the proportion of observations that have metabolic syndrome (19%).

Now I have to determine "if the variables in the dataset distinguish between people with and without metabolic syndrome"?

I would do this by creating a 'logistic regression', where I have metabolic syndrome as dependent variable and al the other (relevant) as independent variables. Then I would create a ROC-curve to assess the discriminatory power of the variables and the logistic regression model.

But I am not sure this is the correct way to solve this question. Could anyone confirm or optimize this or even present a better strategy to solve this?

1 Answers1

0

That's certainly one good way. You might also consider classification trees, random forests, or other methods, but logistic reg is fine. Discriminant analysis is yet another possibility, but IIRC, it imposes a lot of assumptions.

However, if 19% of your data are positive, that's about 30 cases. One rule of thumb limits you to about 3 independent variables. If those variables are related, you might try factor analysis, and keep, at most, 3 factors.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Are there any important assumptions related to logistic reg that I first need to take into account before performing it? Secondly, about your rule of thumb, since I had to perform a literature review, I have came to conclusion that I needed 6 independent variables (that are in my dataset) to determine the observations with metabolic syndrome. So, now you suggest I would not use all these 6, but only 3? If this give more clarification, the 6 variables are: Waist Circumference, Triglycerides, High-Density Lipoprotein, Systolic Blood Pressure, Diastolic Blood Pressure, and Glucose. – Nathan Vermaerke Dec 25 '23 at 20:57
  • The rule of thumb is to avoid overfitting.

    A list as assumptions is here https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/assumptions-of-logistic-regression/

    – Peter Flom Dec 26 '23 at 01:00