Questions tagged [missing-data]

When the data present lack of information (gaps), i.e., are not complete. Hence, it is important to consider this feature when performing an analysis or test.

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

Tag wiki reference: Wikipedia

1616 questions

votes

6 answers

Why do some people use -999 or -9999 to replace missing values?

I have a dataset. There are lots of missing values. For some columns, the missing value was replaced with -999, but other columns, the missing value was marked as 'NA'. Why would we use -999 to replace the missing value?

missing-data

asked Jul 22 '16 at 19:47

qqqwww

votes

2 answers

80% of missing data in a single variable

There is one variable in my data have 80% of missing data. The data is missing because of non-existence (i.e. how much bank loan the company owes). I came across an article saying that dummy variable adjustment method is the solution for this…

missing-data

asked Jan 26 '11 at 02:53

lcl23

votes

3 answers

Distinguishing missing at random (MAR) from missing completely at random (MCAR)

I've had these two explained multiple times. They continue to cook my brain. Missing Not at Random makes sense to be, and Missing Completely at Random makes sense...it's the Missing at Random that doesn't as much. What gives rise to data that would…

missing-data

asked Feb 18 '12 at 21:02

Fomite

23,134

votes

3 answers

Techniques for Handling Incomplete/Missing Data

My question is directed to techniques to deal with incomplete data during the classifier/model training/fitting. For instance, in a dataset w/ a few hundred rows, each row having let's say five dimensions and a class label as the last item, most…

missing-data

asked Aug 07 '10 at 05:07

doug

10,549
1
26
26

votes

3 answers

When is it a good idea to just use the average for imputation?

Suppose we have a data set test: 1 8 12 14 . . 19 The . denotes missing values. When would it be better to use the average of the non-missing values to impute the missing values rather than assuming that the data comes from a normal distribution?

missing-data

asked Jun 25 '14 at 16:19

thoms

votes

2 answers

How to handle non existent (not missing) data?

I've never really found any good text or examples on how to handle 'non-existent' data for inputs to any sort of classifier. I've read a lot on missing data but what can be done about data that cannot or doesn't exist in relation to multivariate…

missing-data

asked Mar 01 '11 at 23:04

user3484

votes

2 answers

Is listwise deletion / complete case analysis biased if data are not missing completely at random?

In the comments to the answer to my question I stated "Many rows have only 1 missing variable, so to exclude the row think leads to bias (they are not MCAR)" and in reply I was told "You're wrong, see Rubin's Statistical Analysis with Missing Data…

missing-data

asked Nov 08 '12 at 19:11

Joe King

3,805

votes

2 answers

Is it ever okay to drop missing observations?

I have a dataset that looks at immigration applications and visa acceptances (granting of visas). The rates are calculated for "accepted" and "rejected" of visa applications. However, the dataset also has values for cases that were closed.…

missing-data

asked Mar 25 '17 at 19:06

EJ16

votes

4 answers

Is the method of mean substitution for replacing missing data out of date?

Is the method of mean substitution for replacing missing data out of date? Are there more sophisticated models that should be used? If so, what are they?

missing-data

asked May 23 '11 at 11:33

Melissa Duncombe

votes

2 answers

MAR vs. MNAR: how can I decide?

I'm working with a big dataset (400,000 participants) and It has missings in 4 variables: 2 of which are continuous variables and have 3%, 10% missings, and the other two variables are categorical, where both of them have less than 5% missingness. I…

missing-data

asked Mar 15 '17 at 14:14

user153182

votes

1 answer

Pooling the results of random hot-deck imputation

I am using random hot-deck imputation on a repeated measures dataset. I am tempted to use Rubin's rules for pooling the results of multiple imputation, in particular for regression coefficients. Intuitively it seems the average of the coefficient…

missing-data

asked Jan 14 '13 at 12:27

Robert Long

60,630

votes

1 answer

ISC exam - cheating or not

Background: I read this article on "hackaday" about alleged "large-scale cheating" on the ISC exam. It gives this as source. Here is one of the images from the site: The hack-a-day asks for speculation about the nature of the "cheating" that the…

missing-data

asked Jun 06 '13 at 16:36

EngrStudent

9,375

votes

0 answers

What is the theoretical ideal when dealing with multiple causes of missing not at random (MNAR) data - TL;DR included

Background to problem I am currently in the process of computing some quantitative data (Questionnaire likert scales) and there is clear differences in missing data on a specific item ~400 missing responses, compared to ~100 (on the other 9 items).…

missing-data

asked Mar 15 '17 at 12:50

user153169

votes

1 answer

How to cope with missing values in sequential data before applying moving averages (and in general)?

I have a set datasets with sequential measurements. Since the size of these sets is quite big (>80000 measurements) I decided to simplify them by applying a Simple Moving Average (SMA) and selecting the data every n measurements. Each set belongs to…

missing-data

asked Jan 12 '14 at 17:05

Bakaburg

2,917

votes

1 answer

Listwise deletion appropriate?

I have a data set with 440 responses. I have 11 people who did not answer any question on the survey. Then there are a couple of missing values here and there outside of the full 11 non-responses. Is list wise deletion my best option? In all, I…

missing-data

asked Aug 16 '22 at 17:11

Cindy

2 3 4 Next