1

As I understand it, the Frisch-Waugh-Lovell theorem implies that if you regress y on x1 and x2, the coefficient on x2 will be the same as if you regress the residuals from regressing y on x1 on x2.

But if you run the below code, they're different. Why is this?

import random

import numpy as np import pandas as pd import statsmodels.api as sm

random.seed(10) target = np.random.normal(0, 1, 1000) x1 = np.random.normal(0, 1, 1000) + 0.1 * target x2 = np.random.normal(0, 1, 1000) - 0.1 * target + 0.1 * x1

df_ = pd.DataFrame({'const': 1, 'x1': x1, 'x2': x2}) full_model = sm.OLS(target, df_).fit() print('x2 coefficient in full model') print(full_model.params['x2'].mean())

resid = sm.OLS(target, df_[['const', 'x1']]).fit().resid partial_model = sm.OLS(resid, df_[['x2']]).fit() print('x2 coefficient in full model') print(partial_model.params['x2'].mean())

Galen
  • 8,442
tobmo
  • 71
  • This has been thoroughly explored elsewhere here on CV. See, for instance, https://stats.stackexchange.com/a/113207/919 (theoretical/geometrical) and https://stats.stackexchange.com/a/46508/919 (your approach), https://stats.stackexchange.com/questions/572623 (similar), https://stats.stackexchange.com/a/32237/919 (another example), etc. – whuber Feb 01 '23 at 19:11

0 Answers0