5

I am reading two recent papers studying between-firm and within-firm wage inequality, Barth et al 2016 (hereafter BBDF) and Song et al 2019 (hereafter SPGBV). I am confused by the different variance decomposition methods used in these two papers.

Both of these two paper first yield a simple log wage variance decomposition.

BBDF: $$V(\ln w)=V(s)+V(\varphi)+2 \operatorname{Cov}(s, \varphi)+V(u)$$

SPGBV: $$\operatorname{var}\left(y_{t}^{i, j}\right)=\operatorname{var}\left(\theta^{i}\right)+\operatorname{var}\left(\psi^{j}\right)+2 \operatorname{cov}\left(\theta^{i}, \psi^{j}\right)+\operatorname{var}\left(\epsilon_{t}^{i, j}\right)$$

Where the $s$ or $\theta$ is person effects, the $\varphi$ or $\psi$ is firm effects, and $u$ or $\epsilon$ is the match error.

However they then both rewrite this simple decomposition to a more complicated decomposition that distinguish the between-firm component and within-firm component, in somehow different ways.

BBDF: $$V(\ln w) = \underbrace{V(s)(1-\rho)+V(u)}_{\text {Within-firm component }} + \underbrace{V(s)\left(\rho+2 \rho_{\varphi}\right)+V(\varphi)}_{\text {Between-firm component }}$$ , where $\rho=\operatorname{Cov}(s, S) / V(s)$, $\rho_{\varphi}=\operatorname{Cov}(s, \varphi) / V(s)$, and $S$ is defined as the establishment's average level of the predicted wage from $(s)$.

SPGBV: $$\begin{aligned} \operatorname{var}\left(y_{t}^{i, j}\right)= \underbrace{\operatorname{var}\left(\theta^{i}-\bar{\theta}^{j}\right)+\operatorname{var}\left(\epsilon_{t}^{i, j}\right)}_{\text {Within-firm component }} +\underbrace{\operatorname{var}\left(\psi^{j}\right)+2 \operatorname{cov}\left(\bar{\theta}^{j}, \psi^{j}\right)+\operatorname{var}\left(\bar{\theta}^{j}\right)}_{\text {Between-firm component }}, \end{aligned}$$

Are these two decompositions the same thing but written in different ways? I try some calculations but fail to show that they are the same. Moreover while it is very clear how BBDF get their second decomposition (add and subtract one $\operatorname{Cov}(s, S)$ from the first decomposition), it is unclear to me where does the second formula in SPGBV come from? However in terms of interpretation, SPGBV seems to be a more intuitive way to explain the within- and between- components than the one in BBDF. I also wonder what is the principle behind a decomposition that separate the between- and within-firm effects?

Alalalalaki
  • 2,419
  • 7
  • 16

1 Answers1

2

I'm not at home in the literature, so I can not tell you if the two decompositions are the same. I can only help you with the derivations.

For the BBDF expression, you can simply obtain the first from the second by substituting out $\rho$ and $\rho_\varphi$.

For the SPGBV expression things are a bit more tricky. Let $i$ represent the unit and $g$ the group variable (which is $j$ in your equation).

I'm guessing that $$ \bar \theta^g = \dfrac{1}{n_g} \sum_{i \in g} \theta^i $$ where $n_g$ is the number of elements in group $g$. Let there be $N ( = \sum_g n_g)$ observations in total. Let $p_g = \dfrac{n_g}{n}$ be the probability of $\bar \theta^g$ (I take that $\dfrac{1}{N}$ is the probability of the unit $\theta^i$).

First look at the expression $var(\theta^i - \bar \theta^g)$​​. We can decompose it in the following way: $$ var(\theta^i- \bar \theta^g) = var(\theta^i) + var(\bar \theta^g) - 2 cov(\theta^i, \bar \theta^g). $$ Now, we can expand the covariance term: $$ \begin{align*} cov(\theta^i, \bar \theta^g) &= \frac{1}{N}\sum_{g}\sum_{i \in g} (\theta^i - \bar \theta)(\bar \theta^g - \bar \theta),\\ &= \frac{1}{N} \sum_g (\bar \theta^g - \bar \theta) \sum_{i \in g} (\theta^i - \bar \theta),\\ &= \frac{1}{N} \sum_g n_g (\bar \theta^g - \bar \theta)(\bar \theta^g - \bar \theta),\\ &= \sum_g p_g(\bar \theta^g - \bar \theta)^2,\\ &= var(\bar \theta^g). \end{align*} $$ This gives: $$ var(\theta^i - \bar \theta^g) = var(\theta^i) - var(\bar \theta^g) $$ Rewriting gives: $$ var(\theta^i) = var(\theta^i - \bar \theta^g) + var(\bar \theta^g). \tag{1} $$ Next, let's have a look at the covariance term $cov(\theta^i, \psi^g)$. $$ \begin{align*} cov(\theta^i, \psi^g) &= \frac{1}{N} \sum_g \sum_{i \in g} (\theta^i - \bar \theta)(\psi^g - \bar \psi),\\ &= \frac{1}{N} \sum_g (\psi^g - \bar \psi) \left( \sum_{i \in g} (\theta^i - \bar \theta)\right),\\ &= \frac{1}{N} \sum_g (\psi^g - \bar \psi) n_g(\bar \theta^g - \bar \theta),\\ &= \sum_g \frac{n_g}{N} (\bar \theta^g - \bar \theta) (\psi^g - \bar \psi),\\ &= \sum_g p_g (\bar \theta^g - \bar \theta)(\psi^g - \bar \psi),\\ &= cov(\bar \theta^g, \psi^g) \end{align*} $$ So: $$ cov(\theta^i, \psi^g) = cov(\bar \theta^g, \psi^g). \tag{2} $$ Subsituting $(1)$ and $(2)$ into the first SPGBV condition should give you the second one.

tdm
  • 11,747
  • 9
  • 36