1

I have a dataset that looks like this:

enter image description here

I would like to use R and find out if being assigned to group "A" rather than "B" (or vice-versa) may depend on "age", "gender" or "school" (or a combination of these):

which kind of analysis should I conduct? Thank you very much!

Fabio
  • 11
  • Do you want to know whether the assignment to $A$ or $B$ depends on those features or do you want to find a way to actually predict the assignment? – frank Apr 07 '22 at 11:18
  • Hi Frank, I would like to explore whether the assignment to A or B depends on those other features. Thank you! – Fabio Apr 07 '22 at 12:15
  • This is similar: https://stats.stackexchange.com/questions/73646/how-do-i-test-that-two-continuous-variables-are-independent – frank Apr 07 '22 at 12:54
  • Sorry, I may not have the necessary theoretical background to see the similarity... I thought the variables I have in my dataset are not exactly of the continuous type, am I wrong? Thank you! – Fabio Apr 07 '22 at 13:19

1 Answers1

1

Nearly all algorithms need data in numerical form. If you have binary or categorical data, the usual approach would be to use dummy or one-hot encoding that code different categories as zeroes and ones. In practice, modern software would often do this for you, for example in R if you use the factor datatype (it is likely that R already used it for your data, you check it with the is.factor function) that under the hood produces the dummy encoding representation of the data when needed.

If you want to "find out if being assigned to group "A" rather than "B" (or vice-versa) may depend on "age", "gender" or "school" (or a combination of these)", this sounds like a logistic regression problem. Logistic regression would work out of the box with factor data.

Tim
  • 138,066
  • Hi Tim, thank you for you reply. Would you mind to suggest an R function which you think would work to run a logistic regression for my dataset, keeping in mind my goal? Thank you very much! – Fabio Apr 07 '22 at 13:15
  • @Fabio as said above, most likely you don't need to do anything specific, it'd probably just work. If the variables are strings, use as.factor to transform them. If you're asking about general help with R, the question would be better suited for R users Q&A sites. – Tim Apr 07 '22 at 13:17