6

Similar as in this self-answered question, I want to ask about possible approaches for modelling data with aggregated targets, i.e. things like

$$ \bar y_{j[i]} = \alpha + \beta x_i + \varepsilon_i $$

where $j[i]$ is the $j$-th group, where $i$-th observation belongs, and for each $j$-th group of size $|j|$, we are predicting the target that is an average value of all the $y_i$ observations within the group, $\bar y_{j[i]} = |j|^{-1} \sum_{i \in j[i]} y_i$. Of course, the means are given, and cannot be disaggregated, this is the data we have.

Additional assumption that can be made in here, is that there is clustering within the $j[i]$ groups, so the group assignment is not completely random, the subjects within each group share some characteristics.

For example, imagine that you have data on average test score per class (something to predict), and features on both student level, e.g. individual IQ scores (that should be highly predictive, but not perfect, for exam scores), class level features, and features on higher level of aggregation (school level). I am interested in finding factors that contributed to each individual test score, and predict them. The data is a random sample of classes, the final predictions will be made for students from classes that were not observed in the training data.

Can we use such data to learn anything (approximately) about the unobserved individual-level targets?

What are the approaches used for modelling such data? Can you give some references? Obviously with aggregated data we loose precision, and the variance of the means $\bar y_{j[i]}$ is smaller then of the individual observations $y_i$, so predicting the average target is not the same as predicting individual values. Is there any way how to translate the predictions of the group averages to possible variability between subjects?

Tim
  • 138,066
  • It looks like 'raw' inputs/regressors are known, rather than just the group means (unlike the outputs). Is that right? 2) Are the parameters , shared across groups? 3) Not sure I understand the noise model. Is noise added to the unobserved outputs (before averaging), or is noise added after averaging? Or both? Does the noise distribution have any explicit dependence on group? 4) If you want to make predictions for out-of-sample data, are new points members of an existing group? Is group membership known?
  • – user20160 Oct 07 '19 at 04:24
  • @user20160 1 - yes, 2 - shared, but there may be some per-group features, 3 - see example in my edit, 4 - group id's are not meaningful in here, I'd like to make predictions for groups that were not seen in the training data. – Tim Oct 07 '19 at 05:08