I have data about the occurrence of a data breach at certain companies for the period 2005-2018. Now I have a question about the model I should use. I have two options:
Probit/Logit: I set the dependent variable to 1 if there have been a data breach in a certain year. For example:
- Company A does not suffer a data breach in period 2005-2007 --> variable Data Breach takes value of 0 for all the years 2009-2017
- Company A suffers a data breach in 2008 --> variable Data Breach takes value of 1 in 2008
- Company A does not suffer a data breach in period 2009-2015 --> variable Data Breach takes value of 0 for 2009-2017
- Company A suffers again a data breach in 2018 --> variable Data Breach takes value of 1 in 2018
And with this data I run a probit/logit regression.
OLS:
I count the amount of times that in the sample period every company has been victim of a data breach. So in the example above, the variable Data Breaches will take the value of 2 for Company A. And with this data I run a OLS.
Notes:
Important to know is that my independent variables are all numerical variables that take values from 0 to 100, but the value might differ each year. So, for example:
- Company A does not have a Cyber Committee in place for the period 2005-2008 --> Variable Cyber Committee takes 0 for all those years.
- Company A does have a Cyber Committee in place for the period 2009-2018 --> Variable Cyber Committee takes 1 for all those years.
This example is just with a independent variable that takes 0 and 1, but there are also independent variables like the amount of cyber related jobs a someone had in the past.
I also want to include industry and year effects in the regression, because it might be possible that in certain years or industries there were/is a higher chance of data breaches.
Intuitively, I would choose the probit/logit model, but for this model it is difficult to implement the year and industry fixed effects.