6

I have a response variable (Yes/No) by visit with some missing values. I am considering imputing the underlying continuous variable in SAS using proc MI. After this process, I will have, let's say, M imputed datasets. How should I combine them into one dataset?

In other words, how can I apply Rubin's rule if I need a simple summary at the end, such as the count of responders/non-responders at each visit for each treatment group? Or should Rubin's rule be used only when we need to compare treatments with, say, a relative risk/risk difference/odds ratio?

Peter Mortensen
  • 343
  • 3
  • 10
Kate
  • 93
  • (Note: imputation, imputed, and imputing are *not* typos... (in this context) - "5. (transitive, statistics) To replace missing data with substituted values.". See also Imputation (statistics).) – Peter Mortensen Nov 11 '23 at 02:39
  • 1
    You ask: "I will have, let's say, M imputed datasets. How should I combine them into one dataset?" You do not combine the multiple datasets into one. You use Rubin's rules to combine the results of the same modeling approach on the multiple datasets. Stef van Buuren's Flexible Imputation of Missing Data is an excellent and freely available resource, starting from basic principles and working up from there. – EdM Nov 11 '23 at 16:05
  • @Kate Please register &/or merge your accounts (you can find information on how to do this in the My Account section of our [help]), then you will be able to edit & comment on your own question. – Sycorax Nov 13 '23 at 14:10
  • @EdM thanks for the book! is my understanding that multiple imputation technique and Rubin's rule are for comparison of treatments only? What should I do when I have single-arm data with some missing data? Also what is still not clear for me which steps shall I follow, i.e. first algorithm: 1. impute underlying continuous endpoint; 2. dichotomize to have binary; 3. use Rubin's rule to combine the results; or second algorithm: 1. impute binary endpoint; 2. use Rubin's rule to combine the results – Kate Nov 13 '23 at 12:37
  • You can use Rubin's rules for many types of comparisons. In principle, you can compare (imputed) binary outcomes as a function of treatment-group membership, modeled for example with logistic regression. In the log-odds scale of that model, Rubin's rules hold and you could compare log-odds/probabilities of the outcome among groups. I'm worried that your binary outcome, however, seems to come from dichotomizing a continuous variable. That's not usually wise. The discussion on this page applies to outcomes as well as to predictor variables. – EdM Nov 13 '23 at 14:51
  • @EdM I was asking about single-arm trial when you have only one treatment and no arm to compare. Could we use Rubin’s rule for proportion itself? And as for using binary instead of continuous-I know that I am loosing information when using binary but my clinical team prefers binary variable – Kate Nov 14 '23 at 07:49
  • You can apply Rubin's rules to any estimate with an underlying asymptotic normal distribution. As the answer from @PBulls notes, in some circumstances the proportion itself can be close to normal, or you can work in the log-odds scale of the outcome as in logistic regression. – EdM Nov 14 '23 at 09:44

2 Answers2

9

Rubin's rules work on means and their standard errors, so they are only really valid if a normal approximation is appropriate for your statistic.

PROC FREQ provides an asymptotic standard error for proportions ($\sqrt{\hat p(1-\hat p)/n}$) which you can combine along with the proportion estimate using PROC MIANALYZE. This might be acceptable for larger samples with proportions that are not close to the boundaries (0 or 1).

More commonly you would run a logistic regression and combine the estimated parameters in the log-odds scale along with its estimated standard error. The advantage here is that working in the logit scale will be more appropriate than doing the normal approximation in the proportion scale, but you'll have to back-transform your combined statistic to the proportion scale. There's obviously a lot of extenstions or other models you can apply as well. The gist however is that anything you combine using Rubin's rules assumes an asymptotic normal distribution.

PBulls
  • 4,378
  • is my understanding that multiple imputation technique and Rubin's rule are for comparison of treatments? What should I do when I have single-arm data with some missing data? Also what is still not clear for me which steps shall I follow, i.e. first algorithm: 1. impute underlying continuous endpoint; 2. dichotomize to have binary; 3. use Rubin's rule to combine the results; or second algorithm: 1. impute binary endpoint; 2. use Rubin's rule to combine the results – Kate Nov 13 '23 at 12:29
  • There is no reason why you couldn't use Rubin's rules on a single proportion as I explained in the second paragraph, or an intercept-only logit model as in the third. How to best impute your data depends a lot on what exactly was collected, what you're trying to learn from it, and the assumptions you're willing to accept while imputing; Frank Harrell discusses some options in his answer. – PBulls Nov 13 '23 at 13:50
3

I assume that the default in SAS is not predictive mean matching (PMM). I recommend using PMM here as it will result in real values of imputed variables, i.e., zeros and ones if a variable is binary. By doing multiple imputation the proportion of ones in the long run will be the probability of being in that category. But you stick with 0/1 in combining analyses. Note that for PMM it doesn’t matter very much whether you use logistic regression or OLS for predicting the binary variable, as PMM just uses ranks of predicted values.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397