3

I have a dataset of survey responses with 10 variables. 4 of them were optional questions during the survey. Now for ~900 participants, I have missing values for the 4 optional questions, but these have informative data for the other questions. What is the best way to represent them in my descriptive analysis report?

Any leads are appreciated.

Jishan
  • 155
  • 1
  • 9

2 Answers2

4

You did not give any information on why you have missing data. Is this missing at random, or informative missingness? What is the missing fraction? ... In addition to the answer by @Gales, such information should be given. Then, you could try to use (multiple?) imputation, see for instance Imputation to account for systematic error in survey responses

Then report the descriptive statistics twice, once computed only on the available, observed data, and then again on the completed/imputed data.

  • 1
    +1. Unfortunately, some people in a given audience do not always know what imputation is. I wonder what are some effective ways of tackling this issue, in particular when you have limited time or space to report the data. – J-J-J Sep 28 '23 at 19:15
3

There is likely a multitude of ways to report missing values, however I will offer a really simple approach. This is to directly report the response rate of each of those 4 questions, which could be expressed in counts or percent of sample size. This will lead your intended audiences' attention to potential sampling issues. I would also recommend that you give some discussion for your readers on what some of the potential reasons why people would prefer to respond or not respond. Your readers will likely appreciate a nuanced appraisal, and will understand that your reported statistics for certain variables pertain only to a subset of the respondents.

Galen
  • 8,442