0

I have a set A. For example, the set of high school graduates.

I have a subset $B \subset A$. For example, the enrollees in STEM college majors.

I have a dummy variable $G$. For example, gender.

I want to test whether females enroll in STEM majors at a lower rate than what they are represented in high school graduate class.

So, I want to test whether the mean or the median of $G$ is statistically different across the two groups.

Are the samples from the two groups independent?

What test should I use?

  • 1
    From the information you have, can you not calculate the %female and %male in both high school and STEM majors, thereby avoiding the superset/subset problem? – mkt Jul 06 '22 at 11:22
  • @mkt yes, it was to explain the dataset ... I'm not following you ... which test should I use to use whether the percentage of females is STEM is significantly different from the percentage of females in high school graduates? – robertspierre Jul 06 '22 at 12:17
  • 1
    This changes the question considerably, and it's worth editing your question to reflect that. You don't need a way to test the difference between a subset and a superset - you need a way to compare proportions in two groups (%female in high school and university STEM). There are many questions on this site about this. – mkt Jul 06 '22 at 13:01
  • @mkt I need a way to compare proportions in two groups ... but the two groups are not independent, because one is a subset of the other. Right? – robertspierre Jul 06 '22 at 13:54
  • Just think of it as comparing the proportion of female students in 2 groups (high school and university STEM). – mkt Jul 06 '22 at 14:03
  • @mkt but, again, the two samples are not independent, so I cannot use the tests that assume independent samples, right? – robertspierre Jul 06 '22 at 14:58

0 Answers0