0

I'm trying to understand the relationship based on survey data between one binary dependent variable (Did Give a Bribe for a Vaccine) and where the respondent lives (Urban, Peri-urban and Rural) - the respondent can only select one of these. What statistical tests would be most appropriate?

Below is an sample of the data (N=5200). I've split the independent variables into separate columns but can put it back to one if needed. Ideally I'd like to know significance and strength of each predictor. Apologies if I've missed something here - I'm new to this. Thanks!!

Urban Peri-Urban Rural (dependent) Did Give a Bribe for a Vaccine
0 0 1 0
1 0 0 0
0 0 1 0
1 0 0 0
0 0 1 0
0 0 1 0
0 1 0 1
0 0 1 1
0 0 1 0
0 0 1 0
0 0 1 0

Follow up question - I assume I cant use linear regression, but what about logistic?

  • Welcome to Cross Validated! Logistic regression is making sense to me. How would you perform and interpret it? – Dave Oct 07 '22 at 14:14
  • Logistic regression is an option, but note that you would only two of the dummy variables (Urban, Peri-urban, Rural). You could also summarize the data into a table of counts (contingency table) with two dimensions: Bribe vs. Location, which would be a 2 x 3 table. You could then use a chi-square test of association or related test. – Sal Mangiafico Oct 07 '22 at 14:47
  • @SalMangiafico thankyou! "you would only two of the" did you mean to add "use" in there? if so why is that the case? – twright Oct 07 '22 at 15:08
  • @Dave it may be blasphemous here but im just using Excel at the moment. im not sure how i'd interpret it yet! – twright Oct 07 '22 at 15:09
  • Also note that your sample size is quite large. You will likely find a small p-value for a test of the coefficient from logistic regression or a test of the contingency table. You will likely want to rely on some measure of effect size for interpretation. Maybe odds ratio. Or the proportion of Bribe Yes for each Location. – Sal Mangiafico Oct 07 '22 at 15:10
  • Yes, "use". The values for Urban and Peri-urban perfectly predict the value for Rural. So you can't put all three of them in the model. This is sometimes called the "dummy variable trap". (Maybe see, SE link). – Sal Mangiafico Oct 07 '22 at 15:13

1 Answers1

0

Answer from comments:

Logistic regression is an option, but note that you would use only two of the dummy variables (Urban, Peri-urban, Rural). The values for Urban and Peri-urban perfectly predict the value for Rural. So you can't put all three of them in the model. This is sometimes called the "dummy variable trap". (Maybe see, SE link: stats.stackexchange.com/questions/590741/about-dummy-variable-trap).

You could also summarize the data into a table of counts (contingency table) with two dimensions: Bribe vs. Location, which would be a 2 x 3 table. You could then use a chi-square test of association or related test.

Also note that your sample size is quite large. You will likely find a small p-value for a test of the coefficient from logistic regression or for a test of the contingency table.

You will likely want to rely on some measure of effect size for interpretation. Maybe odds ratio. Or the proportion of Bribe Yes for each Location.

Sal Mangiafico
  • 11,330
  • 2
  • 15
  • 35