2

My intution tells me that the following is a straight forward question, but I could not find relevant answers when I searched for it. I assume the reason for that is that I don't know the relevant terms, so I would appreciate some guidance in this respect.

Let's say we have used a sample $\{y_i, x_i\}_{i=1}^n$ to estimate a linear model, and the regression coefficient of the variable of interest is $\hat\beta$.

Let's say our sample consists of two groups of individuals, $A, B$. I would like to decompose $\hat\beta$ into the relative contributions of $A$ and $B$. Note that this is not about groups of regression coefficients (for which there are many decomposition techniques), but about groups of individuals in the data.

To see what I mean, consider:

$\hat\beta = \frac{\sum_i (x_i-\bar x)(y_i-\bar y) }{\sum_i (x_i-\bar x)^2}$

I would like to end up with a formula along the lines of

$\hat\beta = \phi_A \hat\beta_A + \phi_B \hat\beta_B$

where $\phi$ is the relative share in the sample (if neccessary), and $\hat\beta_X$ is the relevant contribution of group $X$ to $\hat\beta$.

If we expand the summation into two sums for each group, we are still left with the overall denominator and I am not sure what to do with the grand mean. At the same time, this should be a standard question and does not seem too hard intuitively. Anyone care to chime in to enlighten me? Any hints greatly appreciated!

coffeinjunky
  • 2,006

3 Answers3

2

This is what you get when you regress on your variable of interest, the group variable (appropriately coded, which software might take care of under the hood), and an interaction between the variable of interest and the group variable.

You get some baseline group, to which the intercept and coefficient on the variable of interest correspond. Think of this as the sun-regression on that group alone. Then the coefficient on the group variable tells you by how much the other group is greater or lower in intercept than the baseline group. Likewise, the coefficient on the interaction between the variable of interest and the group variable tells you by how much the coefficient on the variable of interest is greater or lower than the variable of interest in the baseline model.

Let’s look at the math.

$$ y = \beta_0+\beta_1x_{group}+\beta_2x_{interest}+\beta_3x_{group} x_{interest}+\epsilon $$

You would think of $\beta_0$ as the intercept of the baseline group (either picked by you are selected by some default setting in your software) and $\beta_1$ as the coefficient on the variable of interest in the baseline group. The $\beta_0+\beta_2$ is the intercept for the other group, and the $\beta_1+\beta_3$ is the coefficient on the variable of interest.

Dave
  • 62,186
1

This question continues to receive comments/answers even after 7 years have passed, so I thought I should highlight what answer I found and used in the end.

I found the answer I was looking for in a neat paper by Hertz (2008), which outlined that the regression coefficient $\hat\beta$ can be decomposed into the following:

$$\hat\beta = \sum_g \hat\pi_g \left( \hat\beta_g \frac{\hat\sigma^2_{x(g)}}{\hat\sigma^2_x} + \frac{(\bar x_g - \bar x)(\bar y_g - \bar y)}{\hat\sigma^2_x} \right) $$

where $g=1, ..., G$ are the groups in the data, and $\pi_g$ is group $g$'s share in the data.

coffeinjunky
  • 2,006
0

I am not really sure about the kind of data you are handling and why you need to split a group into two but you can run a multivariate linear regression to address your problem. Since you are splitting one group in two, my intuition says that these two will display high correlation and hence splitting the data does not make any sense. Normally we try to do the exact opposite, calculate correlations and based on it try to eliminate as many independents as possible. It would help if you can provide more context to your problem.

Raunak87
  • 413
  • Well, I know that I could do that practically, but I want to analytically decompose the regression coefficient into the contributions of several groups to the coefficient. If you need a substantive example, let's say you estimate the effect of sunshine on ice cream sales in the US, and $\beta$ tells you that one more hour of sunlight is associated with 10 more pounds of ice cream sold. Now I want to decompose the overall effect (10) into the contributions of the southern US and northern US. There should be an analytical way of showing this... – coffeinjunky May 20 '15 at 15:39
  • Intuitively, what you are trying to accomplish does not make sense. In the example that you have given, you don't really know the relation between populations of Northern US and Southern US. You can't say with absolute certainty that they are independent or dependent (and how much dependent). – Raunak87 May 21 '15 at 07:19
  • 1
    Make a dummy variable for groups A/B, and include in the model interactions with that dummy. – kjetil b halvorsen Jun 07 '20 at 21:13