1

The description of the tag in this website states that it

Refers to the conditions under which a statistics procedure yields valid estimates and/or inference. E.g., many statistical techniques require the assumption that the data are randomly sampled in some way. Theoretical results about estimators usually require assumptions about the data generating mechanism.

Is there an authoritative source to cite for a similar definition expressed in more formal terms?

Richard Hardy
  • 67,272
Kuku
  • 1,452
  • 3
    Could you elaborate on what you mean by "similar definition" and "more formal terms"? As it stands, your question reads like it would be answered by consulting any good dictionary. – whuber Jul 06 '23 at 14:14
  • 1
    As the title states, the context is of an assumption in a statistical model. I don't think this corresponds to a dictionary definition. Since it is a statistical concept I would imagine the definition on the tag must come from somewhere (hence, a similar definition), and if it's a mathematical statistics book and it's a fundamental component of valid inference, I would expect it to be defined somewhere in more than just words (hence, more formal terms). – Kuku Jul 06 '23 at 14:25
  • 1
    In what way do you believe this sense of "assumption" is anything other than an application of the usual dictionary meaning to a statistical setting? – whuber Jul 06 '23 at 14:28
  • 1
    In that the definition given in the tag states that the assumption is defined relative to a statistical procedure (namely, estimation and/or inference). Hence, the same condition can or cannot be considered an assumption depending on the statistical model in question. Consider two steps in a causal problem: structure learning (where the likelihood of some causal relationships are being estimated) and the estimation of the causal effect (where the causal relationships are assumed true). The researcher says in their article that the causal structure was not assumed, because it was estimated 1/2 – Kuku Jul 06 '23 at 14:43
  • But the statistical model used for the estimation of the causal model assumed that the causal structure was correct. Even if it was 'learned' in a preliminary step. 2/2 – Kuku Jul 06 '23 at 14:44
  • Relatedly, would it be statistically correct to state that a point-mass degenerate Bayesian prior (uncapable of being updated) is an assumption ? Would there be no further sources to sustain such a claim other than a dictionary? – Kuku Jul 06 '23 at 14:46
  • I still see nothing in any of that which implies statisticians use the word "assumption" in any way other than the usual meaning. – whuber Jul 06 '23 at 14:47
  • Another example, is the discussion by John Copas of the paper by Greenland (https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-985X.2004.00349.x), where he states that "We are so used to using conventional statistical methods which convert information about a sample $S$ into a conclusion $C$ about the population that we all too easily forget [...] Such induction is only possible if we make assumptions $A$, i.e. statistical inference is $(S,A) \rightarrow C$ and not $S \rightarrow C$." Would have imagined there would be a treatment of statistical inference that formalizes this mapping. – Kuku Jul 06 '23 at 14:56
  • Fine--but you're not addressing my question. Exactly how do you perceive such uses of "assumption" in statistical literature as departing in any way from the dictionary definition?? – whuber Jul 06 '23 at 14:58
  • In the previous quote, inference is stated to be a mapping from the sample and the assumptions into a conclusion. I interpret this to be equivalent to say that inference is a mapping from data and a model into a conclusion. If 'assumptions' are a synonym for 'statistical model', this is not clear to me from the dictionary definition. And the large number of questions on this site asking about what constitutes a 'statistical model' tells me that this is not answered by a dictionary definition on 'assumptions'. Hence there is a distinction in use, for which I expected a formal treatment to exist – Kuku Jul 06 '23 at 15:04
  • 1
    In what way is any statistical model not a set of assumptions? Regardless, if your question is about the meaning of a statistical model, than ask it specifically. Asking for a definition of "assumption" is likely to be fruitless is all I'm saying. – whuber Jul 06 '23 at 15:36
  • Well if the terms are equivalent with no caveats, the question can be deleted/closed. Thanks. – Kuku Jul 06 '23 at 15:51
  • I'm sorry, but that misrepresents everything. Nobody, least of all myself, supposes that "statistical model" and "assumption" are completely equivalent terms. (Please note that "model" appeared in this thread only with your first comment: it's not even mentioned in the question itself.) That's why I'm requesting that you consider rephrasing your post to clarify what you're really trying to ask about. – whuber Jul 06 '23 at 16:01
  • Assumptions also occur in other fields of science and thought, perhaps check those as well, e.g. assumptions in physics, mathematics, philosophy. – Firebug Jul 07 '23 at 06:56
  • @whuber "Statistical model" appears in the title of the post, it did not appear first in my comment. In your previous comment you ask "in what way is any statistical model not a set of assumptions?", so I recognize the point that they are not explicitly equivalent (as one assumption is not a statistical model). But I read your comment as suggesting that the discussion to what assumptions are and what is a statistical model (as in questions like https://stats.stackexchange.com/questions/63074/what-exactly-is-building-a-statistical-model) is equivalent, and thus make this Q redundant. – Kuku Jul 07 '23 at 08:45
  • 1
    I can't find anything I wrote in this thread as possibly being read as asserting an equivalence. The first four comments don't even mention models. The fifth might be the one you are referring to, which suggests models are sets of assumptions. But that does not imply equivalence! Many assumptions aren't models or parts of models at all. Other assumptions are so general that they are never even explicitly considered as parts of models. So we are still stuck at my initial query: how do you conceive of "assumption" as having any specialized meaning in a statistical context? – whuber Jul 07 '23 at 13:26
  • @whuber The equivalence is in that the context for the question as stated in the title refers to "assumptions in a statistical model". Thus I am not asking about assumptions which aren't "part of models at all", nor about the use of 'assumptions' in fields outside statistics. But rather about the assumptions invoked by a statistical model. If then, models are sets of assumptions, there are no other "assumptions in a statistical model" other than the "statistical model" itself. Hence the reduction of my question into the other questions about what exactly constitutes statistical models. – Kuku Jul 08 '23 at 22:26

2 Answers2

7

Most of the time when people write about 'assumptions' of statistical tests and the like, it is not the test that is making the assumptions. Consider for a moment a wood-chipper. Does it assume that anything fed into it by an operator is wood? Not at all, but it will happily chip it (or at least attempt to chip it). If you assume that anything coming out of the wood-chipper is wood-chips then that is your assumption, not that of the chipper. Statistical models do not make assumptions. It is the 'use' of a statistical test that entails assumptions: the assumptions that the test will yield results that are relevant to the task at hand.

It is fairly common to read that a particular test 'assumes' that the data (or errors) are normally distributed, but the statistical model does not really make assumptions. For example, Student's $t$-test will can tell you how frequently random samples from a normal distribution will give a $t$ statistic at least as large as any value you specify. There is no assumption there. The result is correct in the absolute. However, if you use a Student's $t$-test in the analysis of your data in order to form an inference then you are assuming that the test result is relevant to that inference. You can peel apart the layers of that assumption to see that it has components that relate to the distribution of the notional population from which you notionally sampled, the nature of your sampling, stopping rules, et cetera.

Changing the perspective from statistical model assumptions to the assumptions implied when forming real-world inferences based on the results of a model can be very helpful. It changes the task from a relatively mechanical one of choosing a test recipe into one that can be more thoughtful and inference-related.

Michael Lew
  • 15,102
  • 2
    I have found it helps me, and sometimes others, to think of ideal conditions, not assumptions, for reasons like those stated here. – Nick Cox Jul 07 '23 at 06:44
  • In line with my comment in the original question about Copas suggesting that inference is a process that maps a sample and assumptions into a conclusion $(S, A) \rightarrow C$, If I understand correctly your response indicates that a statistic (such as a z-value) is part of the mapping as it is a function of the sample, call it $g(S)$, but that it is not a function of the assumptions themselves, that's another component of the inference mapping. This is clarifying, was wondering if you knew of any source that would expand on this reasoning? – Kuku Jul 07 '23 at 08:50
  • 1
    @Kuku using a mathematical expression like $(S, A) \rightarrow C$ might give the appearance as if the assumptions $A$ are some mathematical object with a formal definition, but I doubt that it is the case. – Sextus Empiricus Jul 07 '23 at 12:24
  • 1
    @Kuku Your mapping equation thing does not give me any insights and I cannot say if it is valid or useful to you. However, I have made an attempt to map the proesses of inference here: https://link.springer.com/chapter/10.1007/164_2019_286/figures/8 – Michael Lew Jul 07 '23 at 20:40
2

We do not need a formal definition of 'assumption'. The definition of 'assumption' is not relevant for the formal mathematical treatment of a problem. The formal treatment treats the assumptions as given facts and doesn't care whether the facts are assumptions or not.

Take, for example, some computation in a dice problem where the assumption is made that the die is fair. E.g.

'given the fact that we roll a fair d6 die, what is the probability of rolling a six?'

The mathematics that computes the probabilities doesn't care whether the stated problem is assumed to be true or not. We will compute as answer 1/6, and for that computation it doesn't matter whether the given fact was reality, or an assumption that might be possibly false.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • Does the same reasoning apply once we stop dealing with a deduction (as probability) and start dealing with induction (as in statistics and statistical inference), where an empirical element (the 'real-world') comes into play? Isn't the notion of 'validity' in statistics requiring a concept that links those computations to the real-world? – Kuku Jul 07 '23 at 08:53
  • 1
    @Kuku that link is not part of a formal process. Or at least not in such a way that you get something different as the definition from a dictionary. – Sextus Empiricus Jul 07 '23 at 10:07