7

Currently testing if monthly fund characteristics (size, capital flows, age, risk, persistence,...) explain funds abnormal returns.

My data is set as a panel with 1000 equity mutual funds over the period (2000-2017).

Therefore, I run the following regression

Alpha = intercept + B1(fund size) + B2(capital flows) + ... + u

Most past literature such as Chen et al. (2004) and Carhart (1997) use the Fama MacBeth procedure to test such relationship.

However since my dataset suffers from both time series and cross sectional correlation. I am better off (according to Petersen (2009)) by using a fixed effect regression and cluster residuals by fund and time to adjust standard errors.

Anyway, I run the regression using both models (fixed effect and Fama MacBeth procedure) and I get slightly different results.

I was just wondering what would be better model to tackle such problem.

References: Chen, J., Hong, H., Huang, M. and Kubik, J.D., 2004. Does fund size erode mutual fund performance? The role of liquidity and organization. The American Economic Review, 94(5), pp.1276-1302.

Petersen, M.A., 2009. Estimating standard errors in finance panel data sets: Comparing approaches. The Review of Financial Studies, 22(1), pp.435-480.

user28909
  • 488
  • 5
  • 10
  • If I understand properly, you have some procedure to develop yearly alpha estimates $\alpha_{it}$ and then you're regressing $\alpha_{it}$ on various fund level covariates? Is that correct? How are the $\alpha_{it}$ calculated? Or is it an $\alpha_i$ (i.e. that doesn't vary over time)? – Matthew Gunn Aug 24 '17 at 16:56
  • I calculate alpha following Carhart (1997) where I regress excess returns against the four factors using the past three years and roll the regression forward on a monthly basis. For every month I obtain different estimates (b1, b2, b3, and b4) in which I use to calculate expected returns. Then I calculate monthly alpha as actual returns less expected return. – user28909 Aug 24 '17 at 18:20
  • By "slightly different," is the difference substantive? Or not really? – Matthew Gunn Aug 26 '17 at 18:41
  • @MatthewGunn Yes, the coefficient of one variable (fund family size) is positive according to the Fama MacBeth. It is, however, negative when I use a fixed effect panel. Both of them are significant at the 1% level. My dataset is unbalanced however due to having funds that incorporated after 2000 and funds that died before 2017. Would having an unbalanced panel be the issue why am getting inconsistent results? Thanks. – user28909 Aug 26 '17 at 20:36
  • How do you measure fund family size? eg. Is it in units of billions or do you do something like log size? – Matthew Gunn Aug 26 '17 at 20:39
  • @MatthewGunn I measure it as the natural log of assets under management of all funds in the family that the fund belongs to. – user28909 Aug 26 '17 at 20:49
  • If you add fund fixed effects, you're using within fund variation in abnormal returns to estimate the relationship between fund family size and abnormal returns. An obvious but important concern is that timing is quite precise. It's well know that past performance forecasts future fund flows, and hence prior abnormal returns and current mutual fund size are linked. Eg. if AUM is future AUM, that could create a positive relationship between AUM and abnormal returns. I don't know; it's hard for me to diagnose from afar. – Matthew Gunn Aug 26 '17 at 23:50
  • In theory, you could also have a situation like this picture where there's an overall positive relationship between $y$ and $x$ but once fixed effects are accounted for, it's a negative relationship. (Picture is from this question). – Matthew Gunn Aug 26 '17 at 23:56

1 Answers1

12

A more apples to apples comparison would be between (i) Fama-Macbeth procedure and (2) clustering standard-errors by date. Adding fixed-effects is somewhat different.

Problem: cross-sectional correlation causes naively computed standard errors to be understated

Let $r_{it}$ denote the return of firm $i$ in month $t$. An important statistical issue is that firm returns are cross-sectionall correlated: $\operatorname{Cov}(r_{it}, r_{jt})$ is far from zero. This flows through to significant, positive cross-sectional correlation in the error terms for most any regression specification. The end result is that you'll underestimate standard errors (overstate significance) unless you use standard errors that are consistent in the presence of cross-sectional correlation.

What to do?

Two possible approaches are:

  1. The Fama-Macbeth procedure: run a cross-sectional regression each period and take a time-series average of those estimates.
  2. Running a single panel regression and clustering standard-errors by date.

Both methods rely on zero correlation between the error terms of non-contemporaneous periods. A difference is weighting. The Fama-Macbeth procedure weights each time period equally. A panel regression will effectively give greater weight to periods with more observations or greater variation in right hand side variables.

If your results are basically the same with either method, that's great. If they differ, you need to be careful about what your precise research question is.

Somewhat related is an old debate in the event study literature as whether to equally weight events or calendar time periods. For example, initial public offerings (IPOs) tend to group together in time. Weighting each IPO equally or forming portfolios and weighting each month equally are quite different. Eg. Fama (1998) advocates for calendar time portfolios of abnormal returns. Critics (eg. Ritter) say this is inefficient.

Setup

Imagine you have the panel data model:

$$ y_{it} = \mathbf{x}_{it} \cdot \mathbf{b} + \epsilon_{it}$$

Let's assume:

  1. Error terms are cross-sectionally correlated: $\operatorname{E}[\epsilon_{it} \epsilon_{jt}] \neq 0$ for $i \neq j$.
  2. Error terms are uncorrelated over time: $\operatorname{E}[\epsilon_{it} \epsilon_{j\tau}] = 0$ for any $i$ and $j$ and $t \neq \tau$.

Review of Fama-Macbeth procedure

For each time period $t$, run the a cross-sectional regression:

$$ y_{it} = \mathbf{x}_{it} \cdot \mathbf{b}_t + \epsilon_{it}$$

From this, you obtain a time-series of estimates $\hat{\mathbf{b}}_t$. Under the assumption that error terms are uncorrelated over time, we can then compute the overall estimate and standard-errors using the most basic, Stats 1 method. For any component of the vector $\mathbf{b}$ one would compute the estimate and standard-error as:

$$ \hat{\mathbf{b}} = \frac{1}{T} \sum_t \hat{\mathbf{b}}_t \quad \quad \mathit{SE} = \sqrt{\frac{\frac{1}{T} \sum_t \left( \hat{\mathbf{b}}_t - \hat{\mathbf{b}} \right)^2 }{T}}$$

OLS regression and then clustering standard-errors by time

A more modern approach is to run a standard panel regression and then cluster on the date variable.

An advantage of the general panel setting is that it's reasonably straightforward to apply other kinds of corrections to standard errors if you so desired (eg. Hansen-Hodrick, Newey-West, two-way clustering, etc...)

An instructive special case

If you have a balanced panel and no-time series variation in your right hand side variables (i.e. $\mathbf{x}_{it} = \mathbf{x}_i$ for all $i$ and $t$), then your estimate of $\mathbf{b}$ using a single ordinary least squares reression and using the Fama-Macbeth procedure are EXACTLY the same. The two approaches to estimating standard errors though may be quite different depending on the cross-sectional correlation in the errors.

If there is time-series variation in the right hand side variables, the two estimates will differ. What's happening? Fama-Macbeth equally weights each time period while a single OLS regression will effectively give greater weight to periods where $\mathbf{x}_{it}$ have greater variation.

Simple example showing how Fama-Macbeth procedure differens on weighting:

Imagine we have the data: $$ \begin{array}{cccc} y & x & i & t \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 2 & 1 \\ 0 & 0 & 1 & 2 \\ 2 & 2 & 2 & 2 \end{array} $$

Running the Fama-Macbeth procedure we get $b_1 = 0$ for time period 1 and $b_2 = 1$ hence our estimate of $b$ is $\frac{0 + 1}{2} = .5$.

A single panel regression would estimate $b$ as approximately .9.

References:

Fama, Eugene F., 1998, "Market Efficiency, Long-Term Returns, and Behavioral Finance"

Matthew Gunn
  • 6,934
  • 1
  • 21
  • 32
  • In your set up: Error terms are cross-sectionally correlated and Error terms are uncorrelated over time, won't this meaning that Fama Macbeth first cross-sectional OLS is biased? – JOHN Feb 09 '18 at 21:07
  • @JOHN Thanks. If there's cross-sectional correlation in error terms, your usual standard errors are going to be inconsistent unless. I may have had a wrong sentence in there, and I've changed it. – Matthew Gunn Feb 09 '18 at 21:24
  • I was trying to wrap my head around this. I am reading Petersen 2009, where he states that

    for example. Since the Fama-MacBeth procedure is designed to address a time effect, the Fama-MacBeth standard errors are unbiased

    but in your example, you uses Fama-MacBeth to solve a issue where Error terms are cross-sectionally correlated and time uncorrelated. So, I got confused.

    – JOHN Feb 09 '18 at 21:57
  • @JOHN You can swap $i$ and $t$ and use Fama-Macbeth when error terms are cross-sectionally independent but correlated over time. You see people doing both. Clustering standard errors is arguably more modern and more general. You still see Fama-Macbeth all over finance though. – Matthew Gunn Oct 04 '18 at 13:44
  • do you need to center X, i.e. make it mean zero, for the panel regression to be equal to fame macbeth? – freddy888 Jul 06 '20 at 17:27
  • @freddy888 Fama Macbeth solves period by period $\hat{b}_t = (X_t'X_t)^{-1} X_t' y_t$ then $\hat{b} = \frac{1}{T} \sum_t \hat{b}_t$. If $X_t = X$ for all $t$ then $\hat{b} = \frac{1}{T} \sum_t (X'X)^{-1} X' y_t = (X'X)^{-1}X' \left( \frac{1}{T} \sum_t y_t \right)$ – Matthew Gunn Jul 06 '20 at 22:01
  • @freddy888 In the special case where your right hand side variables are the same for every time period ($X_t = X $ for all $t$) then the cross-sectional regression of time-series average $\bar{y} = \frac{1}{T} \sum_t y_t$ on $X$ produces the same estimate as Fama-Macbeth procedure. This argument doesn't depend on centering etc.... The original reason Fama and Macbeth used the procedure was to get the standard errors and t-statistics correct. – Matthew Gunn Jul 06 '20 at 22:04
  • @freddy888 In above $y_t$ is a vector and $X_t$ is a matrix. – Matthew Gunn Jul 06 '20 at 22:08
  • Yes, I see. Maybe I misphrased the question. Fama Macbeth is used to test for the significance of alpha when returns are regressed on some factors. In case of non-traded factors, such as fund size and fund flows, I am worried that the equation $y_{it} = a + \beta X_{it}$ wont give a meaningful alpha because the average of X is uncentered, e.g. if X is very high on average alpha will become negative. Hence, my intuiition to center – freddy888 Jul 07 '20 at 10:59
  • @freddy888 The simplest answer I can say is go ahead and center and standardize (or similar) your $X$. That's common, but the more precise answer is the FM intercept term isn't alpha under the academic finance definition of alpha! The FM intercept isn't alpha. Alpha is the intercept in a time-series regression on returns; you can show that alpha from a time-series regression on tradeable factors becomes the residual in the cross-sectional relationship examined by Fama-Macbeth. (See this answer: https://quant.stackexchange.com/questions/34091/calculating-alpha-and-its-meaning/34092#34092) – Matthew Gunn Jul 07 '20 at 14:06
  • @freddy888 The cross-sectional regression of Fama-Macbeth procedure is trying to estimate the linear relationship between some $x_{it}$ and expected returns (with consistent standard errors in the presence of cross-sectional correlation). In $R_{it} = \gamma_0 + \gamma_1 x_{it} + \epsilon_{it}$, it's just trying to estimate and do hypothesis testing on $\gamma_1$. – Matthew Gunn Jul 07 '20 at 14:13
  • In academic finance $\alpha_i$ is an error term: the difference between observed average returns and the expected returns implied by a classic, economic asset pricing model where expected returns are a function of covariance with some variable of hedging concern to investors. – Matthew Gunn Jul 07 '20 at 14:20
  • @freddy888 For each test asset $i$, run time-series regression $R_i - R^f_t = \alpha_i + \beta_i F_t + \epsilon_{it}$ where $F_t$ is some tradeable factor (eg. HML or RMRF) to get $\beta$s. Let's say we're interested in cross-sectional relationship between expected returns and $\beta$s: $E[R_i-R^f_t] = \gamma0 + \gamma_1 \beta_i + u_i$. We could do Fama Macbeth OR if $F_t$ is tradeable, we can take expectations of both sides of time-series reg to obtain: $E[R_i-R^f_t] = E[F_t] \beta_i + \alpha_i$. Cochrane calls this the cross-sectional implications of the time-series regression. – Matthew Gunn Jul 07 '20 at 14:28
  • If a factor is tradeable, the time-series regression implies a cross-sectional relationship where $\gamma_0 = 0$ and $\gamma_1 = E[F_t]$ and $u_i = \alpha_i$. If $F_t$ is not a return, you can't do that. – Matthew Gunn Jul 07 '20 at 14:31