2

I would like first to mention that I am relatively new to the Machine Learning (ML) world, but I have a decent background in statistics and econometrics. I am working on a research paper focusing on the gender labor force participation gap. Using the language of econometrics,

  • the dependent (outcome) variable is a binary variable equal to 1 if the individual is in the labor force and 0 if they are not.
  • The primary variable is a binary variable equal to 1 if the individual is a female and 0 otherwise.
  • In addition to that, I have many control variables that, for convenience, I will stack in a matrix X.

Using classical econometrics approach, I would use a probit model and then find the difference between the probability of being in the labor force between males and females(i.e. Pr(LF=1|female=0, X=x)-Pr(LLF=1|female=1, X=x)).

My question is: is there a machine-learning counterpart for such a method? In other words, is there a machine learning approach that allows me to compare the probability of success (success defined as being in the labor force) of two groups conditional on a set of controls?

I can provide further details if needed. Thanks!

Abdahrt
  • 21
  • 2
  • Have you considered using Naive Bayes for starters? It will allow you to have a "perfect explainability" early on. Then you can look at more fancy beasts. – usεr11852 Mar 15 '23 at 11:56
  • @usεr11852 Wouldn’t there be issues with that if features are correlated? – Dave Mar 15 '23 at 11:59
  • 1
    Obviously, some will be there, but that's a start, so the OP can have tractability. Then they move forward. (But it is not necessary those correlations will destroy any predictive performance.) – usεr11852 Mar 15 '23 at 12:03

1 Answers1

2

You are looking for a probabilistic classifier. You can feed in your predictor data for the two groups (so the predictors will presumably only differ in the group membership) and get the success probability for each group, conditional on the other predictors.

Many classifiers can be taught to give probabilistic results, but unfortunately, many are implemented to yield "hard" 0-1 classifications, even though probabilistic classifications are much more useful.

This earlier thread specifically discusses Random Forests as probabilistic classifiers: How to make the randomforest trees vote decimals but not binary

Stephan Kolassa
  • 123,354
  • Thank you Stephan for your answer and the useful link. I have a follow-up question, you mentioned that I can feed in my predictor each group and find the success probability conditional on X. I am worried this might not be suitable since the decision for females and males to join the labour force might be correlated especially for married couples. Do you think I can still run separate routines for females and males given this potential correlation? – Abdahrt Mar 16 '23 at 06:28
  • Yes, that is a thorny problem. But then it really transcends statistics as such, because you can't meaningfully isolate traits like group membership from other predictors. It makes little sense to compare "a man that is just like the average human except for being a man" to "a woman that is just like the average human except for being a woman", because that "average human" does not exist - everyone is either a man or a woman. This is related to Miller & Chapman (2001). – Stephan Kolassa Mar 16 '23 at 07:28