Hopefully someone can guide me on the following. I have a large dataset (> 151k firms with multiple observations per firm). My dataset looks at firm failure using counting process style and my independent variables are gender, age & nationality diversity and team size. These variables can vary over time (but I wouldn't call it time-dependent yet as time itself does not cause a team to change in composition but over time a team can change).
When trying to asses the proportional hazard assumption I'm a bit puzzled by the results.
Using Schoenfeld residuals I see the assumption is violated (significance p< 0.001), but graphically my line is horizontal for my independent variables (edit: when looking at the slope it's slightly different from 0, eg. 0,007).
However, when I perform a Cox analysis based on different subsamples in time (eg. a subsample looking at quarter 0 to quarter 10 of the firms, the next subsample quarter 11 to 20, etc. up until quarter 31 to 40 of the firms), I see quite some differences between the Hazard Ratios. Which makes me think the assumption would not hold.
As a note, the subsamples do differ in size as some firms do not survive up until the 10-year/40-quarter mark.
Sample 1: cox analysis if quarter = 0-10 --> N = 157K subjects, 23K failures
Sample 2: cox analysis if quarter = 11-20 --> N = 133K subjects, 49K failures
Sample 3: cox analysis if quarter = 21-30 --> N = 83K subjects, 28K failures
Sample 4: cox analysis if quarter = 31-40 --> N = 55K subjects, 18K failures
For example for gender diversity in quarter 0-10 the hazard ratio (HR) = .2, for quarter 31-40 the hazard ratio (HR) = .98
How would I interpret the above results/handle this issue. Thanks in advance for your help and time!
Best regards, Laura
Edit: I forgot to mention I am currently using STATA as my statistical software program