There is a lot of talk about regression diagnostics in tutorials on the web, but then in economics research papers nobody actually reports residual plots, collinearity checks etc. Is there any reason for this?
-
2Thats a good question. I assume many do, but fail to show the parameters of their diagnostics in their papers. This also boils down to the fact that many economics papers are not reproducible. There have been several discussions on this topic in economics and other fields of science as well. – Mike J Jan 26 '21 at 14:19
2 Answers
To start with the question in the title:
Why don't economists do regression diagnostics?
People do regression diagnostics. I don't know of any respectable researcher that would not perform regression diagnostics and virtually any paper will have hints that the regression diagnostics was performed. For example, in tables with regression results you will find remarks on White or HAC errors being used to either correct for heteroskedasticity or autocorrelation or both, or references to corrections for cross-sectional dependence etc.
Only very unscrupulous scholars would make claims about these issues being/not being present without some testing. So scientists (or at least the good ones) always perform regression diagnostics.
To address the question in the body:
in economics research papers nobody actually reports residual plots, collinearity checks etc. Is there any reason for this?
Yes, almost all scientific journals have very strict page limit of between 30-60 pages with most journals having page limit around 40 pages. In addition shorter articles are often more preferred and attract wider readership because people usually prefer to read shorter papers. Also, note that page limits are usually inclusive of list of references which often can eat up another 1-5 pages and also all other stuff. Only online appendices are excluded from page limit.
Now documentation of regression diagnostic can easily eat up 10 pages if you want to do it properly with all the plots (or even more). Moreover, regression diagnostic is not of much interest in itself. You preform it in order to know how to properly specify your model or what identification strategy to use. Once you figure that you just use the appropriate model, so by itself regression diagnostic has little value for reader as it carries very little information about research result. As mentioned in the first part of the answers people will still mention in their paper that there was autocorrelation or heteroskedasticity and how they corrected for it (and so on for other problems), so there is not that much point in additionally wasting space in the paper on showcasing all the auxiliary diagnostics. Any mistrustful researchers can just request their data and rerun that diagnostic themselves.
Consequently, the reason for that is simply that there is not enough space for it in the paper, and because you always have to economize on the space given. If you would really want to do it, it would usually end up eating 1/3 of a precious space, You would be surprised how common problem it is for researchers to actually fit their research in the page limit. Often you will be forced to relegate even main derivations to online appendices just to fit in the limit. In the end anything that is not of great importance to support or interpret the main result will simply not make the cut.
As mentioned in the comments this can sometimes cause issues with reproducibility, but nowadays the solution to that is that journals require scholars to post their code that was used to derive results (where you would find also regression diagnostics) rather than actually report it in the paper, for the reasons mentioned above.
- 56,292
- 4
- 53
- 108
-
Can you point to a paper where code actually does a couple of diagnostics of their main results (and I don’t mean running different specifications as robustness checks)? – Papayapap Jan 26 '21 at 15:50
-
@Papayapap yes, Kaplan et al (2020) Early Voting Laws, Voter Turnout, and Partisan Vote Composition: Evidence from Ohio American Economic Journal: Applied Economics, the code that is online with the paper also includes code for placebo tests that are equivalent of regression diagnostic for the particular model they were using. I also include it always with my code that I send to journal but I don't want to dox myself. Not everyone does that many people still send code just to replicate basic results but just have a look over couple of random recent codes and you will see its being done – 1muflon1 Jan 26 '21 at 16:25
-
1also, in Burde and Linden (2013) Bringing Education to Afghan Girls: A Randomized Controlled Trial of Village-Based Schools, they actually even use regression analysis and in their dofiles you can see residual diagnostic and even unadjusted results from auxiliary regressions before the main published results. – 1muflon1 Jan 26 '21 at 16:31
-
1Thanks! I would not count placebo tests as diagnostic but as robustness checks. The residual and outlier checks in Burde and Linde definitely count, but even for them there would still be checks for linearity, misspecification, and multicollinearity to be done if one would follow a checklist like approach as in the tutorials: https://stats.idre.ucla.edu/stata/webbooks/reg/chapter2/stata-webbooksregressionwith-statachapter-2-regression-diagnostics/ It doesn't seem to me that in practice researchers follow such a strict diagnostic procedure and I wanted to understand why, not criticize it – Papayapap Jan 26 '21 at 16:58
-
1@Papayapap I don’t think those placebo tests count as robustness checks. They test validity (they are falsification tests) of the result not how robust the estimates are. Also, it is often not necessary to do specific tests for multicollinearity as that shows in the reg results. If you don’t see unusually large variances you can skip that. Next, in this case they were using randomized control trials with large number of controls so OVB is not big concern there, finally non-linearity would show in their residual plots as with multicollinearity that can be tested further if – 1muflon1 Jan 26 '21 at 17:13
-
There are any signs of it. In addition, it is still likely that they did more than they show. You usually edit and clean these files before sending them to journal as well and not send just raw code. Some people like to only leave what is necessary because they have mentality that they don’t want others to benefit from their code to which they put hard work, and sometimes during cleaning of the code you get rid of some extra robustness checks or tests you did to make it shorter and more readable - that’s also why you see it split into different files instead of 1 big one – 1muflon1 Jan 26 '21 at 17:17
-
I don't believe it is a matter of saving spaces. It takes only a few lines, at most, to report those statistics, or just a few notes below tables would be enough (e.g., Jarque-Berra test, max influence function, etc.) Also, if those diagnostics were important, papers would use more spaces for them. Econometric papers don't report them simply because those statistics are not considered important, and OP's question makes sense. – chan1142 Feb 25 '21 at 01:22
-
@chan1142 it is simply not true that these are not considered important in econometric lit. and there are literally thousands of papers in econometric literature just devoted to these topics you can easily just verify it by typing regression diagnostics into google scholar and see rich econometric literature on the subject. Also, outside econometric research in field work it is possible to see some tests mentioned under the line or in footnote or just mentioning that authors found this or that issue and corrected it this or that way. Additionally, not all papers just blindly use HAC errors – 1muflon1 Feb 25 '21 at 01:44
-
Which implies some testing had to be done whether it’s mentioned explicitly or not – 1muflon1 Feb 25 '21 at 01:45
-
@1muflon1Saving pages argument doesn't make sense. Papers contain important things. If they are omitted, that's simply because they are not regarded important. Yes, there are many papers about those diagnostics. I know that. But how many empirical papers report them in applications? You should probably have read Wooldridge's textbook, or Stock and Watson's. If those things were important, why wouldn't they have emphasized them and include them under every regression tables? Also, I know the problems of HAC; please see my comments to my answer. – chan1142 Feb 25 '21 at 02:03
-
@1muflon1 OP's question was why people don't report those diagnostics. (I understand it as for empirical papers.) You said that's for saving spaces. I'm saying they are not reported because they are not regarded important, and thus OP's question makes sense. – chan1142 Feb 25 '21 at 02:08
-
@chan1142 funny that you mention wooldridge or stock and watson since both textbooks stress the importance of residual diagnostics and devote significant amount of space to them. Also, the main point of paper is to communicate research finding and the way how those findings were derived, to understand that it is not important to mention results from auxiliary tests in the paper itself since it is enough to mention what sort of adjustment was done. However, that does not mean that the tests were not critical and important in choosing right adjustment/model specification – 1muflon1 Feb 25 '21 at 02:09
-
@chan1142 OP asks different question in the title and body - the main question in the title asks why economists dont do these checks, but there the answers is that they do. They wont make cut into the paper because there is more important stuff to put there - but that does not mean the tests were not done or that they are not important for the research, as mentioned in my answer its not just raw page constraint but also generally wanting to economize on space in the paper – 1muflon1 Feb 25 '21 at 02:12
-
1@1muflon OK. So you took "do" as important, and I took "reports" as important. I get it now. That makes sense. Thanks for clarification. – chan1142 Feb 25 '21 at 02:22
This is a very thoughtful question.
I think it is related with (i) the purpose and (ii) the sample size. Econometrics is very often concerned with causality (rather than prediction or forecast). For causality, correct model specification, consistency, and valid standard errors are important. Things such as multicollinearity (high correlation), non-normality, etc., are irrelevant (especially with large data sets).
For example, multicollinearity typically leads to large standard errors but, importantly, no biases. If you drop some variables due to multicollinearity, it just means you fail to control for the variables you initially intended to control for; that is, your estimator is biased. Nonnormality check (e.g., normal Q-Q plot) is no important as long as the sample size is large due to the central limit theorem. Outliers are data points just like any other; who gave you the authority to omit them at your will? By dropping the 'outliers' you are just restricting the population in a fancy way; you only get the criticism that your estimator is biased. VIF? If you drop variables due to high VIF, it means you have an inconsistent (biased) estimator.
Selecting a model based on data is dangerous. It will be hard for you to defend your model chosen by lasso if you want to say something about causal effects unless you experiment with bleeding edge econometric techniques. Inferences are to be done for a given model (created by your thought), not a model suggested by statistics (i.e., by a computer).
These days we don't even care much about testing heteroskedasticity becase the sample size is large and we can always do HC inferences. Autocorrelation is not an isssue, as it only complicates standard errors, which you can fix by HAC.
If you are interested in prediction/forecast, those things might be useful more. But even for that, the said diagnostics are too old-fashioned. People have already moved on to lasso and other machine learning techniques. I think the said diagnostics might survive in (non-econometric) textbooks and tutorials, but will die out eventually in econometric practices. But if you have small samples, the story is different. It also happens that old things are found useful in completely different contexts. For example, IF's are very useful for computing standard errors.
- 2,114
- 8
- 17
-
2There are some issues with the answer, "Nonnormality check (e.g., normal Q-Q plot) is no important as long as the sample size is large due to the central limit theorem." - No this is common misconception, but this is not what central theorem says. Central theorem is about asymptotic distribution, and sure often in large samples you will get approx normal distribution but not always. In fact there are whole classes of models where normality even in regression with high number of observation will be violated (e.g. EG-cointegration model). Furthermore, errors will likely be non-normal if you – 1muflon1 Feb 21 '21 at 18:42
-
are trying to fit linear model when you should use non-linear one. In fact this is why you see people taking logs of variables often - exactly to prevent non-normality in errors - and this is all despite the central limit theorem. So that part is simply incorrect although it is very common misconception so I guess you can be forgiven for believing that. In addition, it is not right to assert that heteroskedasticity or autocorrelation are not an issue just because you can apply HAC errors. – 1muflon1 Feb 21 '21 at 18:44
-
For example, HAC errors inflate standard errors more than White-heteroskedasticity consistent errors so if you just blindly use HAC everywhere you will make wrong inferences in marginal cases. Moreover, those wrong inferences will be mainly type II errors - which is not good career-wise. – 1muflon1 Feb 21 '21 at 18:49
-
2(i) Error normality has nothing to do with consistency. (ii) If we are not very fussy (cases the Lindeberg conditions violated), CLT can be applied and the estimator is asymptotically normal, and thus testing based on t-stats is asymptotically valid. (iii) Log is a completely different issue. It's about model specification. (iv) Bad performance of HAC inferences can be practically important, but in fact there are not many cases where HAC is relevant in cross sectional analyses. (In panel, cluster.) (v) Time series is totally different. I focused on cross sections (and panel data). – chan1142 Feb 22 '21 at 01:32
-
1(vi) Cointegration is a time-series topic. I'm not talking about it. I don't think OP is talking about it. – chan1142 Feb 22 '21 at 01:46
-
- ? I never asserted normality of error has anything to do with consistency, but it affects inference, the point estimates are useless if you can’t fix that. Of course you can do that with bootstrapped standard errors but again there are pros and cons to that so you should actually test whether the assumption holds. 2. Yes, exactly asymptotically, however econometric studies show that this can be still problem even in large samples in certain circumstances(I am using the narrow definition of $n \geq 30$ per $k$). Sure if you have like access to big data that would not be issue
– 1muflon1 Feb 22 '21 at 02:01 -
1But it is generally agreed by many authors in the literature, you should test for normality even if your $n<200$ or some authors even argue $n<500$ or even more. Once you get to dataset with thousands of observations it will not be an issue (assuming model is well specified), but claiming that it’s not an issue generally is not appropriate. 3. It is connected to this issue since model mispecification shows up in the errors. You can use some normality tests as quick tests for proper specification before doing some more serious testing. 4. Sure, that’s why I said in marginal cases but that – 1muflon1 Feb 22 '21 at 02:06
-
1Still means accepting unnecessary mistakes and for what? To save 15-30 minutes on heteroskedasticity/autocorrelation testing? That is just simply unprofessional in my opinion. 5. You never mention that in your answer and panels with long $T$ have similar issues that pure time series has. There are actually even more things to test such as cross-sectional dependence which is serious issue. 6. OP never states that they are interested just in cross-section or panel data – 1muflon1 Feb 22 '21 at 02:13