5

I have a dataset that I want to run propensity score analysis on. Using package TWANG in R, I plan to compute the propensity score and use it as IPTW. The variables that I put into the model are those I believe are confounded with treatment selection

The question that I have is what to do about a confounder that is already balanced at baseline. Should I include it in my propensity score model? What good/harm does it do?

For example, in my data (3 treatments). Surg 0/1 is something I believe is confounded with treatment selection. I want to control for it, but it seems to already be balanced between all three groups.

     tmt1 tmt2      var mean1 mean2 pop.sd std.eff.sz   p      ks ks.pval 
15     1    2   surg:0 0.147 0.136  0.342      0.034 0.668 0.012   0.668         
16     1    2   surg:1 0.853 0.864  0.342      0.034 0.668 0.012   0.668         


33     1    3   surg:0 0.147 0.135  0.342      0.035 0.711 0.012   0.711         
34     1    3   surg:1 0.853 0.865  0.342      0.035 0.711 0.012   0.711         

51     2    3   surg:0 0.136 0.135  0.342      0.000 0.998 0.000   0.998         
52     2    3   surg:1 0.864 0.865  0.342      0.000 0.998 0.000   0.998 
StasK
  • 31,547
  • 2
  • 92
  • 179
RayVelcoro
  • 1,179

1 Answers1

4

Propensity scores are usually developed using logistic regression and we usually use a "kitchen sink" approach. I don't believe in doing univariable analysis to decide which variables to include, and you may easily have a power problem that prevents you from seeing a real imbalance. It is typical to adjust for observed variables, imbalanced or not. I am liberal about using regression splines in the propensity model so as to not assume linearity (which translates as a shift in the means only; a quadratic effect would allow means and variances to differ by treatment group).

Make sure that a weighted analysis is efficient, as compared with covariate adjustment using regression splines of the logit of propensity.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
  • Will we run into an issue with the kitchen sink approach is we have many covariates? I know that you are a proponent of having 10-20 observations per covariate, but does this rule apply for creating propensity scores? – RayVelcoro Nov 12 '15 at 20:59
  • Best available evidence (we need more) is that you might be OK with f/4 observations per variable in a propensity model, where f = minimum of the two frequencies of the exposure (treatment). I wish I could remember where I read that. – Frank Harrell Nov 12 '15 at 21:12
  • 2
    @FrankHarrell, a better question for you may be, where you may have written that ;) – StasK Aug 13 '16 at 16:59
  • 1
    There is a current epub in advance of print paper in Statistics in Medicine by Qingxia Chen that indirectly addresses that. – Frank Harrell Aug 13 '16 at 22:42