0

I apologize in advance if my questions seem incredibly dull. Trying to teach myself Stata before I get to grad school as someone who's brain is very much not made for handling statistics.

So trying to understand how to perform a logistic regression using Stata. Using this data set from PEW:https://www.pewresearch.org/social-trends/dataset/american-trends-panel-wave-68/.

So thought it might be interesting to see if mask wearing can be explained by partisanship, knowledge of health risk (using variables living in a metropolitan area, age, educational level, and how closely they follow the news), and income (because I'm thinking this would affect access to resources/masks?).

I recoded my variables of interest. Run models, and the Pseudo R2 value seems to imply that with each model, the explanatory power increases, but when I try to check Homer and Lemeshow's fit, my model is super not a good fit. Any tips? Am I setting up this model correctly?

Here is my code:

import spss using "C:\Users\NAME\Desktop\ATP W68.sav"

recode F_AGECAT (1/3 = 0) (4 = 1), gen(age) label variable age "Age" label define abracket 0 "18-64" 1 "65+" label value age abracket replace age = . if age == 99

recode F_INCOME_RECODE (3 = 0) (1/2 = 1), gen(income) label variable income "Income Level" label define irange 0 "<30,000" 1 "30,000+" label value income irange replace income = . if income == 99

recode COVIDFOL_W68 (3/4 = 0) (1/2 = 1), gen(news) label variable news "News Following" label define closely 0 "Not too closely/Not at all" 1 "Very/Fairly closely" label value news closely replace news = . if news == 99

recode F_EDUCCAT (3 = 0) (1/2 = 1), gen(edu) label variable edu "College Degree?/Eucational Attainment" label define elevel 0 "HS or less" 1 "College+" label value edu elevel replace edu = . if edu == 99

recode COVIDMASK1_W68 (3/4 = 0) (1/2 = 1), gen(mask) label variable mask "Mask Wearing Behaviour" label define often 0 "Hardly/Never" 1 "All/Some of the Time" label value mask often drop if mask == 5 drop if mask == 99

recode F_PARTYSUM_FINAL (1 = 0) (2 = 1), gen(party) label variable party "Party" label define demrep 0 "Republican" 1 "Democrat" label value party demrep replace party = . if party == 9

recode F_METRO (1 = 1) (2 = 0), gen(metro) label variable metro "Environment" label define area 0 "Non-Metropolitan" 1 "Metropolitan" label value metro area

tab1 metro party mask edu news income age

tabulate mask party, column V tabulate mask metro, column V tabulate mask edu, column V tabulate mask news, column V tabulate mask income, column V tabulate mask age, column V

*logit model with risk factors and knowledge logit mask metro age edu news logit mask metro age edu news, or

*logit model with risk factors and knowledge, and income logit mask metro age edu news income logit mask metro age edu news income, or

*logit model with risk factors and knowledge, and income and party logit mask metro age news income party logit mask metro age news income party, or

estat classification

lroc

lfit, group(10) table

Joel
  • 1
  • Hosmer-lemeshow is considered obsolete: https://stats.stackexchange.com/questions/273966/logistic-regression-with-poor-goodness-of-fit-hosmer-lemeshow – kjetil b halvorsen May 13 '21 at 02:24
  • Thank so much for the link! Ah so the way that I set up the logistic regression doesn't itself seem to have any logical errors? – Joel May 13 '21 at 05:34

0 Answers0