8

Suppose we have a standard regression model and want to identify whether we have violated the assumptions of the model. Traditionally, we might utilize a significance test to determine whether (for example) the residuals are normally distributed (where the null model states that the residuals belong to a normal distribution).

I think there's consensus that using hypothesis tests to evaluate model assumptions isn't the best idea. These tests are sensitive to sample size (failing to detect major departures when N is small, and detecting even minor discrepancies when N is large). Personally, I think the biggest problem with these tests is they evaluate whether violated assumptions can be detected, not whether they are problematic.

When I evaluate assumptions, I exclusively use visuals. Unfortunately, visuals are more subjective (and they don't really tell you whether the observed violation is problematic). So, here's my question:

Are there any objective ways to evaluate statistical assumptions that do not rely on significance tests?

And, do these objective means of evaluating assumption evaluate whether the assumption violation matters, rather than whether it can be detected?

I'd love to see references to articles that I can study, if you have it. Thanks!

Edit

Some of the comments/answers mention that it depends on what one's research goals are. Yes, I absolutely agree and appreciate you all pointing this out. I suppose what I'm asking for is some literature that says, "If your goal is A, index B is appropriate. If your goal is C, criteria D is appropriate."

dfife
  • 599
  • Your bolded sentence seems to be missing a word (or perhaps two). One can of course discern the gist of what you intended but specific word choice might change an answer. – Glen_b May 21 '22 at 05:17
  • Good catch. I'll correct that. – dfife May 25 '22 at 13:52
  • 2
    I would offer a counterargument to your "visuals are more subjective." First, that requires some justification. If the answer is "people often misinterpret the graphics" or "people often disagree," then we might think to question the qualifications of such people rather than infer the graphics are to blame. Otherwise, appropriate graphics offer far more information about the nature of the data. When correctly interpreted, those graphics can be more useful and more objective than the results of some putatively objective test (that itself relies on yet more assumptions...). – whuber May 25 '22 at 14:15
  • 2
    For references, start with Tukey's EDA. Although that book says absolutely nothing about testing assumptions--it doesn't even talk about probability models or related assumptions--it shows how graphics can be deployed in quantitative, insightful ways to assess deeply hidden characteristics of data. Those techniques, in conjunction with modern simulation techniques (to avoid being fooled by apparent patterns), offer more than any battery of assumptions tests could. – whuber May 25 '22 at 14:18
  • 1
    @whuber--You're exactly anticipating the direction of the paper I'm writing. We're trying to argue that visuals are far superior to "tests," and offering suggestions on how to be more informed. This question is really to help us frame our solution. – dfife May 25 '22 at 14:21
  • Are you familiar with Diane Cook's "Statistical lineup" technique? It uses a set of graphics as a test. I have employed it in a few of my answers here on CV, most recently at https://stats.stackexchange.com/a/560502/919. – whuber May 26 '22 at 14:12
  • Yes! That too is part of that paper. – dfife May 27 '22 at 14:57

2 Answers2

9

This is a good question, as it acknowledges that the issue is not whether model assumptions are fulfilled or not (they never are), but rather whether violations of the model assumption matter (in terms of misleading conclusions).

Unfortunately I tend to answer this question with "no". A problem is this: The statistical problem of interest is well defined within the framework of the assumed model, but if model assumptions are violated, there is no unique objective way to define what it actually means to say that conclusions are misled.

Here is an illustration. Let's say you are running a test about the mean of a normal distribution, but the underlying distribution is in fact skew. Now your sample size may be fairly large and there may not be any indication that second moments do "explode", so the Central Limit Theorem may justify normal theory to hold approximately (violation of normality could therefore be seen as not problematic). However, in most skew distributions mean, mode, and median are different, whereas in the normal distribution they are the same, meaning that even though the CLT authorises inference about the mean, it isn't clear whether in your specific application you should rather be interested in median or mode in order to summarise your distribution. Under the normal assumption this doesn't matter, but if the underlying distribution is in fact skew, it usually does, and the answer whether the normal assumption theory is fine or not depends on this.

Here is a paper I had my hand in on investigating the quality of formal misspecification tests for assumptions in testing, with connected considerations. Overall we're not as negative about this approach as some others, even though we agree that in a good number of situations it is not very good. But sometimes it's fine. Unfortunately this depends on a number of specifics of the situation.

M. I. Shamsudheen & C. Hennig: Should we test the model assumptions before running a model-based test? https://arxiv.org/abs/1908.02218

Two maybe interesting papers that we cite:

Zimmerman DW (2011) A simple and effective decision rule for choosing a significance test to protect against non-normality. British Journal of Mathematical and Statistical Psychology 64:388-409 (Here an objective rule different from a formal assumptions test is proposed to choose between a t-test and a nonparametric test; namely to use the nonparametric test in case they have very different results.)

Spanos A (2018) Mis-specification testing in retrospect. Journal of Economic Surveys 32:541–577 (good and thoughtful survey paper even though I don't agree with everything)

I believe that developing decision rules between model-based and alternative procedures that have better performance than standard misspecification tests is a very promising research area (even though there will always have to be some "subjective" decision making, see above), but as far as I know there isn't much.

PS: Regarding the Zimmerman paper, I have seen a similar suggestion for linear regression somewhere, namely to run a least squares and a robust regression, and to use the least squares one if both are reasonably in line (of course one can define this "objectively"), and otherwise to use the robust fit. I don't remember exactly where that was anymore, somewhere in the robustness literature, and as far as I remember, it was defined but not much (or even nothing) was done to compare its quality to alternative approaches including running either least squares or robust fit all the time.

  • 1
    Since you mentioned Spanos (2018) and misspecification testing, here is a related thread. – Richard Hardy May 20 '22 at 18:07
  • That looks like the exact sort of thing I'm looking for. Thanks! – dfife May 20 '22 at 18:22
  • @RichardHardy Thanks for this, hadn't seen it before. There is a comment of Spanos prompted by a presentation of mine here, with a long reply of mine. – Christian Hennig May 20 '22 at 19:22
  • The general stance of this I like. But -- as a point of detail -- it's easy to find skewed distributions in which mean, median and mode are identical, and they aren't pathological either. Binomial distributions with integer means provide examples. – Nick Cox May 25 '22 at 14:00
  • 1
    I don't regard subjectivity as a bugbear. There are judgments at all levels in statistical science. Some that I understand I might regard as good judgment and others bad judgment. I hope that's banal; and what matters is how far what someone else did is transparent so that we have scope to discuss it if we care. (These matters came up in your paper with Andrew Gelman.) – Nick Cox May 25 '22 at 14:05
  • Regarding your PS -- that's exactly what I suggest with my students. I call it a "sensitivity analysis." And, I'm sure there's a way to make it objective (e.g., if x% of your predictions from regular model are more than .5 standard deviations of robust model, use the robust model). – dfife May 25 '22 at 14:25
  • @NickCox I edited my answer so that the statement about skew distributions is no longer general. Thanks! – Christian Hennig May 25 '22 at 21:28
8

This is not a complete answer, and it's not exactly "objective", but a useful tool is what is known as posterior predictive checks in a Bayesian context (but probably by other names elsewhere). In this, you simulate data from the distribution implied by your model, and compare it to your actual data, usually by plotting them side-by-side in whatever fashion makes sense for your problem. Mismatches then suggest assumptions that don't hold - e.g. your simulated data is symmetrical where the real data is skewed, or the real data has fatter tails.

Eoin
  • 8,997