2

Assuming the results (admitted and rejected) of all applicants are available, I'd like to design a regression or statistical test to examine if the admission is discriminative regarding the gender or race of the applicants. The available explanatory variables include both continuous variables like sat score, and categorical variables that depict the qualification of the students.

Any comments are very well appreciated.

User1865345
  • 8,202
Watchung
  • 307

1 Answers1

1

A logistic regression could be a good place to start, as such a model predicts the probability of an event and leads to tests of factors to might influence that probability.

A reasonable model might include your usual predictors (GPA, exam scores, etc), plus your gender and race variables. Also of interest might be interacting the gender and race variables with the usual predictors to see how the impact upon admission probability of gender and race change as the usual predictors change. For instance, maybe men with low test scores have a lower chance of admission than women with low test scores, yet men with high test scores have a higher chance of admission than women with high test scores. You might also consider interacting the usual predictors with each other and/or high-order interactions that interact gender, race, and the usual features (perhaps even with interactions between the usual features). This can result in a great many features, and it will be on you to balance the potential benefits of including this high-order interactions that can improve predictive ability with the downsides like difficulty of interpretation and risk of having more variables than the data reasonably support (e.g., overfitting).

You then can test individual parameters (e.g., the indicator for being male) or entire variables, including all interactions, in a “chunk test”.

In addition to the variables described above, you might consider allowing the usual features like exam scores to have nonlinear behavior (e.g., splines) and including such features in the model and in interactions.

Dave
  • 62,186
  • Thank you very much for the explanation. One quick questions on how to test if "men with low test scores have a lower chance of admission than women with low test scores, yet men with high test scores have a higher chance of admission". Do I need a dummy for high test score, and add a three way interaction score x gender_indicator x high_score_indicator to achieve the effect? Thank you very much in advance! – Watchung Jan 05 '23 at 19:26
  • No, that would come from the interaction between the gender indicator and the test score variable. If this is not obvious, I recommend posting a new question about it so others can benefit from good answers specific to that question. – Dave Jan 05 '23 at 19:34
  • I confused myself. The gender*score will do the work. – Watchung Jan 05 '23 at 19:59