Missing at random vs missing not at random: What if it is both? (Does one imply the other?)

Question

My understanding is that:

Missing at random: Whether or not a variable's value is missing is dependent on the values of the other variables.
Missing not at random: When the propensity for a variable's value to be missing depends on the value.

But what about when the variables are correlated, as they often are?

To make things more concrete, let us consider an experiment where we are collecting data on temperature, humidity, and CO2, and let us suppose that there relationship between these is T = H = C.

Say that we are missing all CO2 variables below 50, because the sensor freezes.

In this case, it is

Missing at random: Because the propensity of CO2 to be missing is dependant on the value of temperature and humidity. Missing not at random: Because all CO2 values below 50 are missing.

Since the variables are interlinked, missing at random => missing not at random.

Or have I made a mistake in my reasoning somewhere?

score 2 · Answer 1 · answered Jul 09 '20 at 15:46

Missing at random (MAR) means the NA frequency of the variable is never depended on the value of the variable itself.

Therefore in your example the data would be Missing not at random (MNAR)!

Why is this distinction important?

Because when the data is MNAR we have to identify the relationship between missingnes and the value however if the data was truly MAR or MCAR we could ignore the NAs or impute them with simple methods like mean-imputation.

Correctly identifying wether data is MCAR,MAR or MNAR is the only way to correctly identify how to deal with it!

But how can something be MAR and not MNAR?

Imagine you are asking senior citizens their birth month but also measure whether they have Alzheimers or similar memory impairments.

Birth month would likely be MAR as the fact whether it is missing correlates with the Alzheimer variable yet I have no information about the actual birth month from this fact.

I can predict from other variables whether the data is not missing but not what the actual value of it is!

Thanks for the response. To clarify, in your example, (MAR but not MNAR), you are assuming, and your example requires that age (with birthdate as a proxy) is not correlated with Alzeimers? — Edward Garemo, Jul 17 '20 at 10:45
I gave birth month as an example which is independent of age. I.e. people of all ages are born in February so from the information of Alzheimer being present I will be unable to identify birth month. — Fnguyen, Jul 17 '20 at 10:47
Home address could be a clearer example. Again people with Alzheimer are less likely to remember it so more NA but we cannot glean address from the Alzheimer variable. Age however would be more likely to be MNAR. — Fnguyen, Jul 17 '20 at 10:52

Missing at random vs missing not at random: What if it is both? (Does one imply the other?)

1 Answers1

Linked