I study coups d'etat. I would like to understand the relationship between leader/country characteristics and the likelihood of coup attempts.
I have a leader-country-year dataset with one entry for every year that a given leader in a given country was in power. I also have data on leader and country characteristics. Some of these variables do not vary over time (e.g., was the leader elected) and some do (e.g., whether the country is at war with another country). I've made a mock dataset below to illustrate what my data look like.
| leader_id | country_id | years_since_entering | elected | war | coup_attempt | |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 0 | 0 | |
| 1 | 1 | 1 | 1 | 1 | 0 | |
| 1 | 1 | 2 | 1 | 1 | 1 | |
| 1 | 1 | 3 | 1 | 1 | 0 | |
| 1 | 1 | 4 | 1 | 1 | 0 | |
| 1 | 1 | 5 | 1 | 1 | 1 | |
| --------- | -------- | ------------------- | -------- | --- | --- | --- |
| 2 | 1 | 0 | 0 | 1 | 0 | |
| 2 | 1 | 1 | 0 | 1 | 0 | |
| 2 | 1 | 2 | 0 | 0 | 0 | |
| 2 | 1 | 3 | 0 | 0 | 0 |
I would like to use a Cox PH model to understand the effect that these variables have on a leader's survival time until a coup attempt is made. So, the Cox PH "event" is coup_attempt. The covariates are elected and war.
Some details I want to note:
- There can be multiple coup attempts in one leader's tenure. (leader_id == 1 in the mock dataset experiences two coup attempts)
- The same country can have multiple leaders, though they are never in the leadership position at the same time. (country_id == 1 in the mock dataset has two leaders)
I'm planning to use the survival package in R and run a model like this:
library(survival)
fit <- coxph(Surv(start, stop, coup_attempt) ~ elected + war +
strata(country_id) + cluster(country_id), data = df)
Some questions I have: (I will explain the strata and cluster choices in the second question)
- I know that I need to reshape my data to have start and stop intervals to use with the coxph function. Can I make all of the start and stop intervals in my coxph dataset the same length? Specifically, can I make them one year intervals so that they capture all of the changes in the war variable? The dataset would look like this:
| leader_id | country_id | years_since_entering | elected | war | coup_attempt | start | stop |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 2 |
| 1 | 1 | 2 | 1 | 1 | 1 | 2 | 3 |
| 1 | 1 | 3 | 1 | 1 | 0 | 3 | 4 |
| 1 | 1 | 4 | 1 | 1 | 0 | 4 | 5 |
| 1 | 1 | 5 | 1 | 1 | 1 | 5 | 6 |
| --------- | -------- | ------------------- | -------- | --- | --- | --- | --- |
| 2 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| 2 | 1 | 1 | 0 | 1 | 0 | 1 | 2 |
| 2 | 1 | 2 | 0 | 0 | 0 | 2 | 3 |
| 2 | 1 | 3 | 0 | 0 | 0 | 3 | 4 |
- I included strata(country_id) + cluster(country_id) in the coxph function to account for the correlation between leaders of the same country. There might be a country specific effect (e.g., Iraq has a higher baseline likelihood of coups than Canada), which I aim to capture with strata(country_id). There may also be correlation in the observations of leaders of the same country (e.g., leaders from Iraq are correlated), which I aim to capture with the cluster(country_id). Does this use of strata and cluster make sense? Or would you recommend another way to address these issues?
- A leader can experience multiple coup attempts. How do I account for the fact that a single unit can have multiple "events" in the Cox PH?
- What, if anything, should I do about data censoring? My data end in 2019, but that doesn't mean that all leaders leave office or stop experiencing coup attempts in 2019.
Any help with any of these questions would be very much appreciated. Thank you in advance!