1

I recently used the terminology imputation by zero, because the cause of the loss to follow-up were well known in ourstudy, since they were failures. Somebody pointed out to me that the terminology is not correct, that we speak about imputation only when we replace with another value than zero. When I read about the different methods of imputation, it is true that imputation by zero is not included in the methods mentioned. I understood imputation, just a replacement. Am I wrong?

  • 1
    Use of the term may be inconsistent in the literature. I'd personally not think that "imputation by zero" is wrong use of terminology, but I wouldn't be surprised if some reserve the term for imputation rules that have at least some (if maybe weak) statistical basis and don't impute the same value everywhere. – Christian Hennig Jan 14 '23 at 11:32

1 Answers1

0

Definitions of Imputation

I think their definition doesn't make any sense. Citing this paper on imputation methods:

enter image description here

Nowhere does it say this is restricted by what value you use. Same with how it is defined in this article:

enter image description here

So I'm not sure where this person got their assumption from, but it does not seem correct.

An Example of MI Using R

A simple test of this is whether or not you can impute binary data in a model using something like multiple imputation. By definition you either transform missing values to a 0 or a 1 if it is missing. As an example using R, we can create some binary data.

#### Create Binary Data ####
set.seed(123)
df <- data.frame(
  x = c(0,1,1,NA,0,NA,0,1,0,NA),
  y = rbinom(n=10,size=1,prob=.7)
)
df

If we inspect the data, you can see where the missing values are (NA):

    x y
1   0 1
2   1 0
3   1 1
4  NA 0
5   0 0
6  NA 1
7   0 1
8   1 0
9   0 1
10 NA 1

If we then load the mice package, impute the data, then pool it to create a data frame we can inspect:

#### Impute and Inspect Data ####
library(mice)
imp <- mice(df)
comp <- complete(imp)
comp

You will see that comp shows some missing values that have been imputed as zeroes, such as the x value at Row 6:

   x y
1  0 1
2  1 0
3  1 1
4  1 0
5  0 0
6  0 1
7  0 1
8  1 0
9  0 1
10 0 1