1

I have a dataset of a few thousand employees and want to compare time-to-terminate by their source of hire. The data is for a four year period. Out of the dataset about 15% have terminated, while the rest are still with the company.

I considered survival analysis but the employees started at many different times. Is there anyway to work around this? Or is there a more appropriate analysis?

Thanks!

EDIT: I found this paper which explains how to handle my problem of variable sample size: http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf

  • Why do you think you have a problem at all? – shadowtalker Feb 07 '15 at 16:50
  • Doesn't survival analysis require everyone to start at the same time, t0? Let's say I have data for the past two years and two groups of candidates: applied and recruited. In total there are 100 in the applied group and 150 in the recruited group, but starting at Feb 7 2013 (t0), only 10 applied and 15 recruited were working, with the others joining in the last two years. How do I include those who joined late than t0? – universltravlr Feb 07 '15 at 17:31
  • http://stats.stackexchange.com/a/4012/36229 and http://support.sas.com/documentation/onlinedoc/stat/ex_code/121/phrcpa1.html and http://www.ism.ac.jp/~eguchi/pdf/entry-bias-model.pdf – shadowtalker Feb 07 '15 at 19:52

1 Answers1

1

You shouldn't need to "work around" different start times with survival analysis software, that's par for the course. You'll have a lot of right censoring (employees who are still there), which the software should also be able to handle.

You'll still have to make sure to use it properly, and to worry about things like left truncation (employees who quit before your records start), but if you set things up properly, it sounds like you've chosen the appropriate approach.

Wayne
  • 21,174
  • Hi Wayne,thank you for your help.

    I thought that everyone in the model had to be present at time 0 in order to accurately compare the amount of time they stayed. If 10 were present at t0,and 5 left at t1, but 10 also joined at t1, wouldn't that effect the curve?

    – universltravlr Feb 07 '15 at 17:35
  • The technique isn't basing everything on the number of employees at $t_n$, if that's what you're afraid of. In your example, the fact that 10 employees are present at $t_0$ -- presumably hired before $t_0$ -- is more of a complicating factor than employees coming and going. – Wayne Feb 07 '15 at 20:38
  • Thank you Wayne. I limited the data so that no one started before my t0. The employee hired the earliest will have his or her date be the t0. And so it sounds like it doesn't matter if only 5% or 100% of the population was present at t0 because the software "corrects" for this.

    If you have any recommended articles or books on this, please let me know. Thanks!

    – universltravlr Feb 08 '15 at 17:07