I have a somewhat strange assumption that I am trying to validate. I am probably not going to explain this very clearly because I don't have a really strong statistical background so please bear with me. Here is my best attempt to explaining what I am trying to do.
Let's assume we have a censored dataset, I won't use my real use case here, but for sake of describing it let's assume it is a drug test dataset. The censoring event is a patient still alive or dead. The patients are grouped by year of birth. This means that the patients belong to different groups and each group is the year he/she was born in. I can do a quick group by like this
YEAR N. of PATIENTS
1994 45
1995 42
1996 46
1997 49
1998 51
What is interesting for me is the average lifespan of groups of patients by year. Because some patients may have died, and some other may have not, I am using a survival analysis approach and calculate for each group (each year of birth) the survival function. At this point I can also understand if there is a statistical significant difference in the survival rates using a K-sample log-rank hypothesis test. So far so good.
My problem is in the next thing I want to calculate: I want to understand if there is a correlation between the number of patients born in a specific year in my sample (not the entire population) and the expected lifespan. In other words I want to understand if there is a correlation between the number of people born in a year their expected lifespan. If I have more people born in a specific year, do the people live longer? (or less?) How can I do a correlation between a number and the survival function in this case? Thank you.
EDIT: would trying to correlate the number of patients in a group and the median overall survival rate correct statistically and archive what I am trying to do?