Questions tagged [data-imputation]

Refers to a general class of methods used to "fill in" missing data. Methods used for doing this typically are related to interpolation (http://en.wikipedia.org/wiki/Interpolation) and require assumptions about why the data is missing (e.g. "missing at random")

686 questions
9
votes
2 answers

Imputation to account for systematic error in survey responses

I have a large survey in which students were asked, among other things, their mother's level of education. Some skipped it, and some answered wrongly. I know this, because there a sub-sample of the initial respondents mother's were later…
8
votes
2 answers

Advice on missing value imputation

I am working on insurance data in which a customer has a field named customer_no_dependent (customer's number of dependent). Its coming out to be a significant variable( just that it has $p<0.0001$). This variable has almost 20% missing values. For…
ayush biyani
  • 1,617
5
votes
2 answers

Single-Imputation on Age Needed?

I am hoping someone can help me with this answer. I did a Single Imputation on my data set for age (<5% missing). My adviser asked the following "It’s strange to me to impute a demographic variable – can you really validly impute age based on their…
Shanna
  • 51
4
votes
1 answer

When performing imputation on categorical variables, does the data lose meaning?

Suppose I have a data set with several variables where one of my variables is categorical. For instance, a rating from 1 to 10. Suppose it has missing values. I want to impute this data via regression or via the mean from another column. However,…
3
votes
1 answer

Does it make sense to do imputations for each treatment group separately?

I would like to do just a simple imputation on my outcome, not having many other covariates and not wanting to make estimates on tens and datasets I would like to avoid multiple imputation. Assuming that one is supposed to be more like the group to…
3
votes
0 answers

Bad data imputation: How to impute specific bad data and replace it with realistic ones?

I conducted an experiment with multiple human participants to analyze some air traffic scenarios. Some data has turned out to be very unrealistic. Take a look at the following pics which shows the time trace of aircraft positions. Each participant…
2
votes
0 answers

Doing multi-value imputation to maintain the same distribution

I currently learning value imputation. The popular methods that I've seen such as mean, median, arbitrary value etc, impute all missing values with a single calculated value. Each of these methods can potentially alter the distribution of the…
2
votes
1 answer

Imputing missing outcome data

I saw the other link (Multiple imputation for outcome variables) discussing missing outcome data imputation for complete case analysis. However, I have missing outcome data as well as missing covariate data. Under what assumptions can I impute…
sma
  • 233
2
votes
1 answer

How to handle missing information simultaneously in training and testing set?

I would like to know how to handle missing data in predictive analysis: In my case, missing information has been decided not to be omitted, however, certain predictive models such as logistic regression, random forest, couldn't handle missing data.…
user95902
  • 103
2
votes
0 answers

Multiple imputation for missing data

I'm using multiple imputation in my mediation analysis and was wondering if the variables that I use for the imputation have to precede the study variables? For example if my mediator is self-esteem at age 10, do the variables use to develop missing…
PGB
  • 21
1
vote
0 answers

Is there any issue in imputing missing observations when the missing observations are related to each other?

Say that we have a cross-sectional dataset with two variables, A and B. Also suppose that A and B are related to each other in some way. Now, there are some rows for which only A is missing, and some for which only B is missing. There are no rows…
1
vote
1 answer

Definition of an imputation in statistics

I recently used the terminology imputation by zero, because the cause of the loss to follow-up were well known in ourstudy, since they were failures. Somebody pointed out to me that the terminology is not correct, that we speak about imputation only…
1
vote
0 answers

Where and how to show imputed data in manuscript?

What are your thoughts on where or how to show (non-)imputed data? Please regard this question as a more a general question. I am in the field of medical clinical research, where missing data is very common. It is most likely caused by mixed reasons…
Kim
  • 13
1
vote
0 answers

methods for filling out the upper half of a similarity matrix

I have some data and a method of finding the similarity between any two points, but this method is expensive computationally and I do not need a precise answer as I want to use this matrix for clustering. Are there any known statistical methods for…
1
vote
0 answers

Data imputation for a dataset where all values are 1 or N/A

I have a dataset which contains relations between jobs and skills required for these jobs. It is a matrix with value 1 if a skill is required for a job, and N/A otherwise. N/A instead of zero because I believe that the dataset is not complete, e.g.…
1
2