2

I'm trying to create an IPW model with a base model of MLP, yet from one run to another, all the propensity scores for each row in the dataFrame are distributed differently.

Could you please help me understand this behaviour? and why it doesn't happen when using a base model of logsitic regression for such task?

Thanks in advance!

EDIT:

from causallib.datasets import load_nhefs
from causallib.estimation import IPW
from causallib.evaluation import PropensityEvaluator
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier

data = load_nhefs() data.X.join(data.a).join(data.y).head()

learner = MLPClassifier() ipw = IPW(learner) ipw.fit(data.X, data.a)

effect = ipw.estimate_effect(outcomes[1], outcomes[0])

This is an example of my usage from the example found here: https://github.com/IBM/causallib/blob/master/examples/ipw.ipynb

  • 1
    What does MLP mean? What software are you using and what command did you give it? – Noah Jun 14 '22 at 14:42
  • @Noah MLP = multi layer perceptron, and i will edit the question – StrugglingResearcher Jun 14 '22 at 16:44
  • 2
    I see. Does this question actually have anything to do with propensity scores or IPW, or is this just a question about getting predicted probabilities from a regression model? – Noah Jun 14 '22 at 16:55
  • Yes it does, I used the library causallib that i found online, in order to use IPW you need to supply the library with a sklearn learner, and that you can calculate the propensity scores, my main concern is that when I use MLP the effect and the propensity scores keep changing while with a base learner of logistic regression that remain the same. – StrugglingResearcher Jun 14 '22 at 17:42
  • 1
    At its core, this question seems to be about why MLP as implemented in sklearn produces different sets of predicted probabilities when run multiple times. Is that correct? If so, everything about IPW is a distraction and will make it harder to answer your question. That's why I keep pressing you on this. My guess is that MLP simply has a stochastic component and logistic regression doesn't, but hopefully an MLP expert will weigh in, which is why I added the [tag:neural-networks] tag. No knowledge of propensity scores or IPW (which is what I do know about) will help answer this. – Noah Jun 14 '22 at 17:54
  • 1
    @Noah is right. MLP has a stochastic component because the objective function is non-convex, so there may be multiple minima — which one you converge to depends on your parameter initialization, hence the different results. Logistic regression has a strongly-convex objective function if your input is full-rank if I remember correctly; hence, the solution is unique. To get around this, look into how you can set the random seed for the MLP in your library — it's standard practice for replicability in MLP/neural-network based work. – chang_trenton Jun 14 '22 at 17:58
  • So assuming I do set the random seed, how can i compare the two models as base models for the IPW learner as to who works better by their propensity scores (for each sample) distributions? – StrugglingResearcher Jun 14 '22 at 19:50
  • That is a separate question, but I have written a bit about it on this site and elsewhere. See here for example. The answer is to examine the covariate balance each model yields. – Noah Jun 15 '22 at 15:11

0 Answers0