Background: Consider a response variable decomposed as:
$$ y = y_1 + y_2 + y_3 + y_4 + y_5 $$
The predictors in matrix $X$ are divided into subsets $X_1, X_2, \dots, X_5$. Importantly, for each $X_i$, only $y_i$ is non-zero.
Main Question: If I were to fit separate models, $y_i \sim X_i$, for each subset of predictors and then sum the predictions to get an estimate for $y$, how does this approach compare to fitting a single model using all predictors with an interaction term with a categorical variable $G_j$ indicating the subset the predictors belong to:
$$ y \sim G_j + X + G_j \times X $$
Sub-questions/Clarifications:
- Are the two modeling approaches equivalent, particularly in terms of their predictions for $y$?
- What might be the key distinctions or implications of choosing one approach over the other, especially in terms of model assumptions and interpretations?
I appreciate any insights or references that can clarify the relationship between these two modeling strategies.