1

I'm currently working on an event study to examine abnormal returns.

In the first step, I've calculated abnormal returns in regards to a certain type of company event, consisting of roughly 13,000 events and >4,000 firms.

In the second step, I intend to run a regression analysis with several (control?) variables to see whether some of the effect stems from certain aspects of the event.

So far so good, now I'm having the issue that I want to control for 5-6 factors like market capitalization and total enterprise value. Unfortunately, I don't have every single datapoint for every single of the 13,000 events. As an example, for Event 1 I'm missing market capitalization, for Event 2 the total enterprise value, for Event 3 the M/B-ratio and so on.

Question: Can I still run a meaningful regression even though I have a significant number of NA's or am I required to delete every single event with incomplete data? Given the poor data availability for some variables (which I generally still would love to include), that would result in a very large number of deleted events (>7,000).

mkt
  • 18,245
  • 11
  • 73
  • 172
LeCV
  • 11

1 Answers1

0

As dipetkov says, is the best solution here. And imputing binary and categorical variables is both reasonable and quite feasible. This is a vignette that shows how to do this in the R package Amelia. And here is a Google scholar search that shows a number of papers about multiple imputation of binary variables, in case you would like to look at the primary literature on the subject.

mkt
  • 18,245
  • 11
  • 73
  • 172