The setup I describe below is analogous to my actual problem.
Problem: I have millions of individuals in my dataset and for each individual, I have certain stats over time. I.e. for one individual Joe, for each month, I have certain measurements about Joe that change through time (age, BMI, and other medical stats) and I have data on certain economic conditions (income of his area, proximity to nearest hospital, etc) that also change with time. Additionally, in my training data, I have whether or not the individual survives each month.
A third-party will select groups of 1000 individuals and group them together. The dataset contains past defined groupings. As a goal, given a new group, I would like to predict the proportion of survivals in the future (for many months out).
Approach 1 (Logistic Regression/Trees): Because each individual outcome is independent, we could predict the probability of death for each individual and aggregate up to a group level to see what proportion of the group survived.
Issues: While I know the economic and health conditions at each time period, I do not know the future state of these factors/features. This limits how far into the future I can make predictions using regressions.
Approach 2 (Time Series + Regression): Another option is that I make the prediction at a group level and engineer new features such as averaging age and BMI across individuals to get the averaging BMI for the group and computing the number of individuals remaining in the group at each month. Doing this allows me to potentially consider time series techniques (you can consider the survival probability for a particular group as a time series as well as incorporating regression techniques using engineered features)
Issues: There is no longer probability of survival for each individual which would be helpful for interpretability. The distribution of BMIs is condensed into a mean/standard deviation and comes with information loss.
I was mainly wondering if anyone has encountered a problem similar to this before and what techniques would be helpful. Is it more naturally a time series problem or a regression problem?