2

The weak likelihood principle (WLP) has been summarized as: If a sufficient statistic computed on two different samples has the same value on each sample, then the two samples contain the same inferentially useful information. The WLP is usually described as a "widely accepted" or "very reasonable" statement, but not as a theorem. From this I infer that there exists no proof of the WLP.

My question is, why does the absence of a proof for the WLP not prompt skepticism about it? Or, if not skepticism, then at least pressure to find a proof (or a disproof) among statisticians? Or among mathematicians, for that matter? Why is the WLP not the subject of a Millennium Prize (or its statistical equivalent)? Why do we not regard it as the Fermat's Last Theorem of probability and statistics? (Maybe the parallel postulate would be a better analogy....)

As far as answers go, I'd appreciate either/both of two types: theoretical explanations ("We don't need to prove it because..." or "Actually, there is a theorem, see reference...") or historical explanations ("Early statisticians went through a phase when they tried to find a proof but ultimately settled for an axiom..." or "Fisher bullied people into it..."). (My own search turns up no evidence along any of these lines, but I'd welcome examples if they exist.)

Clarification based on input from responders: I'm calling this the WLP but some may prefer to identify it as the sufficiency principle (SP). I'm okay with that, because the SP implies the WLP. Alternatively, you could say the SP is the mathematical statement made by Fisher and proven by the factorization theorem--that the sufficient statistic contains all the parameter information in the sample, and that the sample conditional on a sufficient statistic is independent of the parameter--and that the WLP takes this a step further by insisting there is no non-likelihood information in the sample that's inferentially useful. I'm okay with that, too. Whether it's called the WLP or SP, and whether it involves only the likelihood function or includes the sampling distribution, both are empirical claims about the best possible estimate calculable on a sample in practice, and there seems to be no imperative for proving either.

Edit 2: I think an answer is materializing across both answers and both sets of comments. If someone agrees and wants to write this up, or modify further, I'll give it a checkmark. a) Statistics lacks a formal axiomatic system. b) Instead, statistics relies on these things called "principles," which are like axioms or postulates except they arise by convention and are adopted by consensus (implicit or explicit). c) No one expects or even hopes to turn these principles into theorems, because without a formal system of axioms, it may well be impossible, and in any event it's very hard to know how or where to start. d) Birbaum's proof of the SLP is the exception that proves the rule, in that he was able to deduce a strong principle from two weaker ones (controversially). e) If someone were to prove or contradict the WLP, it would be another such exception.

virtuolie
  • 528
  • 2
    In short (and consistent with the two answers given so far): Statistical inference is not an axiomatized mathematical theory. 1) There is not agreement about axioms 2) Practice is not consistent with any axiomatic system so far tried – kjetil b halvorsen Jan 23 '23 at 16:57
  • @kjetilbhalvorsen: Many proof/disproof methods are effectively axiom adjacent. For example, because this is an empirical claim, a single counterexample would do. Also, and this is admittedly vague speculation, but perhaps one could prove an "enumerative" theorem--i.e., if the WLP holds for all samples, then surely it holds for this one pair of samples, or pairs like this pair, etc. Even that would be progress. Maybe even extending it to something like the computer-assisted proof of the 4 color theorem, but proving it for classes of samples instead of classes of maps. – virtuolie Jan 23 '23 at 17:16
  • @kjetilbhalvorsen: That said, I'm not familiar with any attempts to axiomatize statistics. (Excepting Solomonoff's theory of inductive inference, though that seems more a parallel universe to probabilistic statistics, where sufficiency doesn't emerge naturally. Combinatorics or set theory or graph theory might work for permutation statistics...) Maybe the real question is, why have we not prioritized developing such a system? I'd guess it's because statistics has to be consistent with reality, like physics. – virtuolie Jan 23 '23 at 17:24
  • 1
    Of interest, given your Edit 2 reference to Birnbaum's proof, is that within 2 years of the proof's publication he had rejected both the SLP and his own conditionality principle (according to Giere 1977), although it took until 1969 for Birnbaum's rejection to appear in one of his own published articles. He rejected both principles because they were were inconsistent with the frequentist confidence concept (see also his 1970 letter in Nature). – Graham Bornholt Jan 24 '23 at 22:05
  • 2
  • @kjetilbhalvorsen Thanks! I'm familiar. I considered wording my question, Why is it that the SLP, which has a disputed proof, is controversial, but the WLP, which has no proof, isn't? (Didn't want to make people think the question was about the SLP, though.) From what I've learned from you all, the answer seems to be that SLP isn't controversial, only the alleged proof. The SLP as a convention or opinion or rule of thumb is ignorable. A proof implies ignoring it makes you (and your methods) wrong. Also interesting that the WLP is unnecessary for Birnbaum's proof. – virtuolie Jan 26 '23 at 22:43

2 Answers2

7

Fermat's Last Theorem is a proposition of Number Theory, so you'd want to prove it from Peano's axioms; the parallel postulate, of Euclidean geometry, so from Euclid's other four postulates: but the Weak Likelihood Principle (a.k.a. the Sufficiency Principle) isn't a proposition of Probability Theory, so it's not obvious what you'd want to prove it from.

Birnbaum (1962) kicked off the approach of giving a formal account of the relationships between various "principles". He took the concept of evidential meaning as basic & the W.L.P. as axiomatic, & went on to derive the Strong Likelihood Principle from this & another axiom, the Conditionality Principle. His formal statement of the W.L.P. is that for inference about a parameter $\theta$ in an experiment $E$, where $T$ is a sufficient statistic for $\theta$, if $T(x) = T(y)$ for samples $x$ & $y$, then $\operatorname{Ev(E,x)} = \operatorname{Ev}(E, y)$; in which $\operatorname{Ev(E,x)} = \operatorname{Ev}(E, y)$ denotes "evidential equivalence" or your "containing the same inferentially useful information". This is not an empirical claim, or even a mathematical one, but purports to constrain (sensible) inferential procedures: if it's entailed by other foundational principles you hold dear, then all well & good; if not then you may try & balance it against those or to eschew it altogether.

The W.L.P. is part & parcel of Bayesian frameworks (e.g. Savage, 1954): the likelihood's all from the data that goes into the calculation of posterior probabilities. (Not necessarily so for the S.L.P.—see Do you have to adhere to the likelihood principle to be a Bayesian?.)

Perhaps more interestingly, purely frequentist desiderata tend to mandate the use of sufficient statistics in estimation & testing—consider the Rao–Blackwell Theorem & the Neyman–Pearson Lemma & their ramifications. On those occasions when a randomized estimator or test does enjoy some kind of optimality, that's more prone to be taken as evincing the need for the W.L.P. than as a counter-example. In complex situations different criteria often clash. For example, a solution to the Behrens–Fisher problem was posted here last year: an exact test with better power properties than several alternatives: the only thing wrong with it is that it violates the W.L.P. (But note that in all cases, it's a matter of 'padding out' the sufficient statistic with random noise, & it makes no odds whether the noise is real—from the ancillary part of the data—or synthetic—introduced by the statistician—see the first bullet point of @Sextus Empiricus' answer. There's no "non-likelihood information" being exploited.)

Fiducial inferences may violate the W.L.P. in a different way—in cases where reduction of the data to a sufficient statistic positively discards information held to be pertinent. See Fraser (1963) for discussion & an example. In fact the difficulty isn't unique to fiducial approaches: the nub of the matter is that a premature reduction may conflate events you'd prefer to separate through conditioning (Kalbfleisch, 1975). (This is generally seen as calling for strictures on when to invoke the W.L.P. rather than for its abandonment.)


Birnbaum (1962), "On the Foundations of Statistical Inference", J. Am. Stat. Assoc., 57, 298

Fraser (1963), "On the Sufficiency & Likelihood Principles", J. Am. Stat. Assoc., 58, 303

Kalbfleisch (1975), "Sufficiency & Conditionality", Biometrika, 62, 2

Savage (1954), The Foundations of Statistics

  • In a rigorous system, having no obvious way to prove a fundamental principle is a reason to prioritize proving it. Consider the question, why are we satisfied for certain statements to be definitions or axioms, but insist on proofs for others? I'd argue the answer is that the former--definitions of the a line, natural numbers, sampling distributions--are abstractions. But WLP is an empirical statement, with direct, practical consequences. An empirical counterexample would disprove it. If it's wrong, it means we're leaving information unused. – virtuolie Jan 23 '23 at 16:16
  • But maybe I've misunderstood you. Are you saying that, historically, statisticians have avoided trying to prove the WLP/SP, or failed to recognize the value of doing so, $because$ there's no obvious place to start? – virtuolie Jan 23 '23 at 16:19
  • Before anything else, I'm baffled by the notion that the W.L.P. could be empirically disconfirmed. Could you please elaborate? – Scortchi - Reinstate Monica Jan 23 '23 at 17:09
  • Sure. Keep the "weaker" SP, so a sufficient statistic conveys all probabilistic information in the sample. Then define a certain measure of non-probabilistic information that quantifies manifest or exact properties of the data. If one could show that sample-specific randomness in a probability sample (X, Y) is equivalent to, say, combinatorial information or conditional complexity of Y given X, then conditioning on a non-probabilistic measure would improve estimate accuracy and precision, not by increasing parameter information, but by decreasing randomness. – virtuolie Jan 23 '23 at 17:40
  • More succinctly, one need only show that probability information is not the only inferentially useful type of information in a sample. – virtuolie Jan 23 '23 at 17:49
  • Oh, that's not what I understood by empirical! Not sure I entirely follow, but in any case you seem to have come armed with measures of accuracy & precision that you think can trump the W.L.P. - would everyone agree that they do? Something I was trying to get at with the 3rd bullet point in my answer was that if you cleave to particular frequentist notions of the best estimators or tests you might be able to find a discrepancy with the W.L.P. & perhaps consider that akin to a disproof. Randomized tests would be an example - not a very exciting one, but all that comes to mind just now. – Scortchi - Reinstate Monica Jan 23 '23 at 19:17
  • I'm content to use conventional measures of accuracy and precision. Accuracy $=$ statistic $-$ parameter, precision $=$ $SD$ of the sampling distribution ($SE$). We could simplify to $MSE$, but of course everyone's free to choose a different risk function. – virtuolie Jan 23 '23 at 19:52
  • 1
    In that case you could improve your estimator by Rao-Blackwellizing it. – Scortchi - Reinstate Monica Jan 23 '23 at 21:59
2
  • There are theorems revolving around the sufficient statistic. For instance the Fisher-Neyman factorisation theorem. It's corollory is that the likelihood function is a sufficient statistic.

    If we have a sufficient statistic then we can replace the data generating process by (hypothetically) equivalent split process, first sample the sufficient statistic (the only step where the parameters are relevant), second the rest of the data which is indepently produced based on only the value of the sufficient statistic.

    That view leads to descriptions such as Theorem 6.1 in Lehmann & Casella Theory of Point Estimation (thanks to Scortchi for mentioning it)

    Theorem 6.1 Let $X$ be distributed according to $P_\theta \in \mathcal{P}$ and let $T$ be sufficient for $\mathcal{P}$. Then, for any estimator $\delta(X)$ of $g(\theta)$, there exists a (possibly randomized) estimator based on $T$ which has the same risk function as $\delta(X)$.

  • The strong likelihood principle states that that likelihood function is the only relevant information in inference, also when it stems from experiments with different distributions.

    There are many cases of frequentist and fiducial inference that are not consistent with this strong likelihood principle, because those methods also regard the probability distribution of the sufficient statistic along with the likelihood function. The probability distribution can differ while the likelihood function is the same.

  • If you weaken the likelihood principle enough, and require the sampling distribution of the sufficient statistic in two samples/experiments to be equal in order for the comparison in the LP to make sense, then it becomes equivalent to a definition of the 'sufficient statistic', and that's not much of 'theorem'. It is a definition and a trivial matter of relating different definitions.

    "why there's no proof that only the sufficient statistic is inferentially relevant"

    Because a sufficient statistic is by definition a statistic that contains all inferentially relevant information. The statement requires no proof.

  • I think you're talking about the strong likelihood principle in your first paragraph. If you didn't accept the weak likelihood principle as normative, then purely frequentist considerations would usually (always?) still lead you to base inference on sufficient statistics. – Scortchi - Reinstate Monica Jan 23 '23 at 09:40
  • @Scortchi-ReinstateMonica I am not sure what the weak likelihood principle exactly weakens or how weak of a LP we are speaking about. If it ends up making the models equivalent for the sufficient statistic of both data sets (and especially also if it makes the value of the sufficient statistic the same as in the question) then it is not much about the likelihood principle anymore (which is more about results from different cases that give the same likelihood function). – Sextus Empiricus Jan 23 '23 at 09:57
  • Sorry - I posted the above without refreshing the page & you'd already addressed my main point in an edit. I believe there's no distinction to be made between the Weak Likelihood Principle & the Sufficiency Principle (same model; same observed value of the sufficient statistic: same inference). Whatever you call it, it's as you say "more like a part of a definition of what is and what is not inference". – Scortchi - Reinstate Monica Jan 23 '23 at 10:06
  • I take your point about the difference between parameter information (which involves the distribution) and the likelihood function (which does not). However, as the SP implies the WLP, I'm content to treat them as interchangeable for purposes of the wording of my specific definition in the question. It seems to me that going from "all parameter information" to "all inferentially useful information" is what takes the statement from definition to empirical claim. – virtuolie Jan 23 '23 at 16:35
  • Re: the meaning of "useful," that's the opposite of (purely) philosophical. If accounting for other attributes of the data renders the estimate more precise and at least as accurate as relying only on the parameter information, or vice versa, that is inferentially useful. Granted, you could find yourself in a position of having to make a tradeoff between the two--or, say, having to make certain model assumptions that would sometimes be untenable or undesirable. But it is easy to define unambiguously useful information, no? – virtuolie Jan 23 '23 at 16:42
  • I don't follow your wording. Could you express it mathematically in order to make it a theorem or a conjecture. If that is not possible, then I wonder what this question is all about and what you mean by 'proof' or 'theorem'. My point in this answer is that the principle is not something like a mathematical proposition and does not relate to a theorem or something that requires proof. – Sextus Empiricus Jan 23 '23 at 16:43
  • 1
    „Wovon man nicht sprechen kann, darüber muss man schweigen.“ – Sextus Empiricus Jan 23 '23 at 16:50
  • See my edit to the ?. As is, your answer 1) doesn't explicitly state that relying on reasonable-sounding, consensus principles is an "in-universe" convention. 2) The WLP isn't a definition. It makes a potentially falsifiable conjecture, which reduces to "The only inferentially useful information in a sample is Fisher information," an empirical claim. In other words, one can't produce a line that is length with breadth, by definition; one could, say, produce a measure closer to mu than is x-bar and with less variance. We've implicitly chosen to adopt a falsifiable conjecture as true. – virtuolie Jan 24 '23 at 22:15
  • 2
    "The only inferentially useful information in a sample is Fisher information," @virtuolie what is 'inferentially useful information' in mathematical terms? If you can not express it into mathematics, then it is not gonna be a mathematical theorem. – Sextus Empiricus Jan 26 '23 at 12:23
  • Fair point. "Inferentially useful" might be expressed as improving a statistic's performance on a specified risk function (or functions). The most commonly used is mean squared error, but one could also specify just standard error or just bias by themselves. The WLP doesn't specify, so we won't, either. Now, let $X$ be the sample, $T = T(X)$ a sufficient statistic computed on $X$, $U = U[X|T(X)]$ a function of the data conditional on $T(X)$, and $V$ a risk function increasing with risk. Then the mathematical expression is: $V(T) <= V(U)$ for all $V$ and all $U$. – virtuolie Jan 26 '23 at 22:15
  • Technically, WLP directly claims that U is independent of the parameter estimated by T(X). However, a measure need not depend on the parameter for it to improve inference as just defined. If (say) U measures the effect of random sampling error on T, then U would be independent of the parameter, but V(T|U) < V(T). – virtuolie Jan 26 '23 at 22:24
  • 2
    @virtuolie if you define 'inferentially useful' as improving performance, then your WLP turns into something like the Rao-Blackwell theorem and the Lehmann Scheffé theorem, which state (more or less) that you can't beat the performance of the best estimator based on the sufficient statistic. – Sextus Empiricus Jan 26 '23 at 23:02
  • Your example "U measures the effect of random sampling error on T" is a variable that changes the distribution of T. For different values of U, the distribution of T will be different. That is not the WLP, but instead the SLP. E.g. a binomial distribution and a negative binomial distribution can give the same likelihood function, but different inference. The WLP, assumes that we do not compare likelihood functions of different distributions. – Sextus Empiricus Jan 26 '23 at 23:08
  • Moreover, even when your loss function is non-convex, so that R.-B. & L.-S. don't apply, you need only know the observed value of the sufficient statistic to be able to simulate a new sample with the same distribution as the original, & hence the same risk function for any decision procedure. An admissible randomized estimator or uniformly most powerful randomized test will violate the W.L.P., but you can't say it makes use of information in the sample beyond that provided by the sufficient statistic. – Scortchi - Reinstate Monica Jan 26 '23 at 23:39
  • @SextusEmpiricus 1. The math of both RB and LS both state much less. RB just says MSE[W(X)|T(X)] <= MSE[T(X)]. LS just says if W(X) is also unbiased it's UMVUE. Neither says MSE[T(X)] <= MSE[U(X)] for all possible U. For that interpretation, we'd need a proof of WLP or SP. 2. As I say in my first edit to the ? I'm interested in the statement I begin the ? with, whether that's WLP or Sufficiency Principle (SP). To say that T(X) has all the sample's inferentially useful info is to implicitly compare it to every other statistic with any different distribution or likelihood function. – virtuolie Jan 27 '23 at 04:20
  • @Scortchi-ReinstateMonica You're begging the question (in the technical sense). The claim "you need only know the observed value of the sufficient statistic to be able to simulate a new sample with the same distribution as the original" is equivalent to saying all the inferentially useful attributes of a sample depend only on the sufficient statistic, which is a restatement of the SP. I.e., "Nothing has a better risk function than the sufficient statistic, because sufficient statistics capture all sample attributes useful for improving the risk function." – virtuolie Jan 27 '23 at 04:34
  • @virtuolie I see your U as making it SLP instead of WLP, because it effectively works as a parameter in the distribution of the sufficient statistic. – Sextus Empiricus Jan 27 '23 at 06:29
  • Actually, U would be a sample-specific quantity, the effect of sampling error on the observed value of the statistic in hand, with no long-run or population-level interpretation. Technically, it would be a non-probabilistic measure, and parameters can only be measured by probabilistic measures (except in the special case of a permutation statistic when the parameter = 0 and combinatorial frequencies and probabilities become equivalent). – virtuolie Jan 27 '23 at 06:51
  • @virtuolie "the effect of sampling error on the observed value of the statistic in hand" what do you actually mean by this, it is effectively a parameter that determines the distribution of the sufficient statistic or not? – Sextus Empiricus Jan 27 '23 at 06:55
  • If we have a sufficient statistic then we can replace the data generating process by a process that produces only the sufficient statistic, and the rest of the data is indepently produced based on only the value of the sufficient statistic (in this view it is already obvious why the rest of the data can never improve the estimator). Can we make an example with your U variable in this description of the sufficient statistic? In your example it seems like U is a variable that goes into the distribution of the sufficient statistic (an effect in the sampling distribution of the statistic). – Sextus Empiricus Jan 27 '23 at 06:58
  • For a hypothetical perfect U, every possible combination of values in X is associated with a different value of U. U would uniquely identify the direction and distance of X, and so T(X), from the parameter. If U is imperfect, say a discrete quantity for continuous X, then yes X|U technically partitions the sampling distribution into an ordered set of sub-distributions. But U is still random and therefore independent of the sampling distribution of T(X). – virtuolie Jan 27 '23 at 07:20
  • Seriously, though, even disproving this hypothetical has no bearing on whether the statement holds. It's not incumbent on me to provide a counterexample, but for the principle to exclude all possible counterexamples, whether conceived or not. Also the first 2 sentences of that last comment are just a conjecture--actually the conjecture at issue. My ? literally is why there's no proof that only the sufficient statistic is inferentially relevant. (See comment to @Scortchi-ReinstateMonica re: begging the question.) A claim is not obvious just because its contradiction remains unproven. – virtuolie Jan 27 '23 at 07:34
  • Again, your question is not expressed as a mathematical statement. – Sextus Empiricus Jan 27 '23 at 07:38
  • "why there's no proof that only the sufficient statistic is inferentially relevant" Because a sufficient statistic is by definition a statistic that contains all inferentially relevant information. It is not something to prove. – Sextus Empiricus Jan 27 '23 at 08:05
  • One can prove sufficiency by the factorization theorem. Sufficiency just describes the proportion of sample Fisher information in a statistic = 1. SP is distinct: it says there is no attribute of data, other than Fisher information, which accounting for will reduce the risk function. I expressed it mathematically, which you said was equivalent to RB or LS; I pointed out that those are much weaker claims about Fisher information only. You refuted my point by restating the SP as fact. I pointed out you were begging the question. You said my statement wasn't mathematical. We've come full circle. – virtuolie Jan 27 '23 at 08:44
  • @virtuolie in the end it comes down to writing down exactly what you mean by the WLP and give good and clear definitions of the different elements in the principle to allow a mathematical interpretation. I believe that a lot of confusion and philosophical discussion around the LP revolves around this. The elements are not clearly defined and this makes it difficult to prove it as a theorem, because it is unclear what the theorem actually means in mathematical terms. If you would describe the LP in some way in mathematical terms, then I guess you end up with some already proven theorem. – Sextus Empiricus Jan 27 '23 at 08:52
  • Your introduction of $U$ is confusing. I still don't know what this does and how to place it into the distribution function of the data where I would be supposed to factorize it out if there's a sufficient statistic. I believe that with this $U$ you turn it into the SLP and are not anymore talking about the WLP. – Sextus Empiricus Jan 27 '23 at 08:54
  • There's a proof in Lehmann & Casella (1998), Theory of Point Estimation, Theorem 6.1. You're begging the question in assuming that risk functions are the only valid way to evaluate estimators. A decision theoretic framework can underpin both Bayesian & frequentist procedures - but is not beyond challenge (see the paper by Fraser I cited) – Scortchi - Reinstate Monica Jan 27 '23 at 08:58
  • If it's confusing, ignore it. To bastardize Mayo (2014): It must be remembered that the onus is not on someone who questions SP to provide suitable principles of evidence, however desirable it might be to have them. The onus is on others to show they can derive that x∗ and y∗ would have the identical inference implications concerning shared parameter theta. – virtuolie Jan 27 '23 at 09:00
  • 1
    @virtuolie I referred to RB and LS theorems but the theorem 6.1 in the reference from Scortchi does it much more efficient. The proof is only two sentences long (because it is a bit trivial). The RB and LS relate to slightly more different/complicated cases. – Sextus Empiricus Jan 27 '23 at 09:05
  • If short, can you paste here? I don't own that book. – virtuolie Jan 27 '23 at 09:08
  • Nevermind, found it online. The proof stipulates that the second statistic "depend[s] on the data only through T" (the sufficient statistic). In other words, it assumes the thing you want to use it to prove. – virtuolie Jan 27 '23 at 09:17
  • 1
    "that the second statistic "depend[s] on the data only through T @virtuolie that is the definition of the sufficient statistic. If it depends on something else, then T would not be sufficient. It is also not what you want to prove. The proof is about the fact that with T you can always construct a statistic that has the same risk function. – Sextus Empiricus Jan 27 '23 at 09:21
  • @virtuolie: No you want to prove that the second statistic has the same risk function as the first statistic that depends on the whole data. I found some lecture notes here that cover this (under Section 4 ). – Scortchi - Reinstate Monica Jan 27 '23 at 09:24
  • The question is not whether you can construct a statistic with the same risk function as T. The question is whether you can construct a measure of an attribute of the data not described by T, i.e., it conveys information lost when reducing the data to T, where information is anything that shrinks the risk function. All these theorems show is that the risk reduction due to T is also achieved by a statistic conditional on T. They don't address whether the function can ever be shrunk even further, i.e., "This is the minimum variance we can achieve via measures on an isolated sample." – virtuolie Jan 27 '23 at 22:19
  • Literally all I want a proof for is the statement in my last comment in quotes. Just because a statistic is called "minimum variance..." or whatever doesn't mean the math proves it has universally minimum variance. That's just a convenient label. What the math really says is that if W is a statistic that reduces risk solely through its Fisher information, W can have no less risk than a statistic conditional on the sufficient statistic. Whether there is any other way to reduce risk, using non-Fisher, non-probabilistic information in the sample, is untouched by statistical theorems (so it seems) – virtuolie Jan 27 '23 at 22:34
  • For example, none of these theorems rule out that some measure of Kolmogorov complexity (non-probabilistic info that hadn't been defined when SP was originally formulated) might measure some sample-specific attribute of the data (therefore not a parameter and so indescribable by Fisher information) which, when conditioned upon by sufficient T, reduces risk below the minimum achievable by Fisher information measures alone. But, the fact that we haven't anticipated the role of such information doesn't mean we can't show whether it exists through a non-constructive proof, or at least look for it. – virtuolie Jan 27 '23 at 22:48