1

I am familiar with the notion that $Pr(A) = \mathbb{E}[1_{\omega \in A}(\omega)]$, given some suitable measure theoretic assumptions. I seem to recall a comment on a stats.stackexchange post citing a paper explaining how the P-value can be formally defined as an expectation of an indicator function. Unfortunately I have since forgotten the user, post, and indeed the paper. Neither our site's search engine nor Google Scholar gave useful results in the first few pages.

If someone knows of a paper with this description, please post a link to it as an answer.

Galen
  • 8,442
  • 2
    Since any probability can be so expressed, this is an awfully vague question. https://stats.stackexchange.com/questions/422100 seems to fit the bill insofar as it relates p-values directly to expected indicators. – whuber Mar 22 '22 at 21:38
  • @whuber 100% agree. I will add details if I recall them. Unfortunately there is no better place to ask about a previous stats.SE question than here... Or should this be migrated to meta since it is ostensibly about an existing question somewhere on this site? – Galen Mar 22 '22 at 21:42
  • 2
    Chat would be the best place to initiate such a conversation. Meta is unsuitable--this issue isn't really about how the site works or its policies. We do accept questions of this sort that are ineluctably vague ("where did such-and-such a paper appear and who wrote it?," for instance). I have made this one CW because it explicitly invites multiple answers and there might not be any objectively "best" or "most correct" one. – whuber Mar 22 '22 at 21:45
  • 1
    @whuber Thank you for clarifying the purpose of Meta. That is all agreeable to me. – Galen Mar 22 '22 at 21:47

2 Answers2

2

Suppose we have a family of distributions $\mathcal{P}=\{P_\theta\colon\ \theta\in\Theta\}$, where $\theta$ is the unknown parameter, possibly a vector, and $\Theta$ is the parameter space. Given each $\theta\in\Theta$, $P_\theta$ is a known distribution function.

For example, we may consider the family of normal distributions with variance one: $$\mathcal{P}=\{N(\theta,1)\colon\ \theta\in\Theta=\mathbb{R}\}.$$

Now suppose we believe a family $\mathcal{P}$ is appropriate for the upcoming data. We will observe i.i.d. random variables $X_1,...,X_n$, with the common distribution function in $\mathcal{P}$; that is, there exists an unknown true $\theta^*\in\Theta$ such that $P_{\theta^*}$ generates the data. We are willing to test a hypothesis $$H_0\colon \theta^*\in\Theta_0\qquad vs.\qquad H_1\colon \theta^*\notin\Theta_0,$$ where $\Theta_0$ is a subset of $\Theta$, under significance level $\alpha\in(0,1)$.

Before actually seeing the realized values of $X_1,...,X_n$, we can already construct a decision rule based on a test statistic.

For example, for the normal family above and $H_0\colon \theta^*=0$ with $\alpha=0.05$, the usual decision rule is to "reject $H_0$ if and only if $\sqrt{n}|\bar{X}|>1.96$", where $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$ is the sample average. In this example, the test statistic is $\sqrt{n}|\bar{X}|$. A large value of $\sqrt{n}|\bar{X}|$ supports $H_1$.

Let's denote $\mathbf{X}=(X_1,...,X_n)$, which is a random vector. Now replace $\sqrt{n}|\bar{X}|$ by a general test statistic $W(\mathbf{X})$.

Theorem 8.3.27 (Statistical Inference by Casella & Berger). Let $W(\mathbf{X})$ be a test statistic such that large values of $W$ give evidence that $H_1$ is true. For each sample point $\mathbf{x}$, define $$p(\mathbf{x})=\sup_{\theta\in\Theta_0}P_\theta(W(\mathbf{X})\ge W(\mathbf{x})).$$Then $p(\mathbf{X})$ is a valid $p$-value.

Definition 8.3.26 (Statistical Inference by Casella & Berger). A $p$-value $p(\mathbf{X})$ is a test statistic satisfying $0\le p(\mathbf{x})\le 1$ for every sample point $\mathbf{x}$. Small values of $p(\mathbf{X})$ give evidence that $H_1$ is true. A $p$-value is valid if, for every $\theta\in\Theta_0$ and every $0\le \alpha\le 1$,$$P_\theta(p(\mathbf{X})\le \alpha)\le \alpha$$

When $\Theta_0$ is a single point $\{\theta_0\}$, Theorem 8.3.27 gives $$p(\mathbf{x})=P_{\theta_0}(W(\mathbf{X})\ge W(\mathbf{x})) = E_{\theta_0}(1_{\{W(\mathbf{X})\ge W(\mathbf{x})\}})$$ and $p(\mathbf{X})$ is a valid $p$-value. Or, in some one-sided test problems $\Theta_0=(-\infty,\theta_0]$ or $\Theta_0=[\theta_0,\infty)$, the supremum is attained at the boundary point $\theta_0$, then we can still define the $p$-value as $E_{\theta_0}(1_{\{W(\mathbf{X})\ge W(\mathbf{x})\}})$.

Min
  • 46
0

This is not the paper I had in mind, but the following does take an resampled average of indicator functions to obtain a p-value.

$$\hat{p}^*(\hat{\tau}) \equiv \sum_{j=1}^{B} I(\hat{\tau}_{j}^{*} > \hat{\tau})$$

@article{Davidson2000,
  doi = {10.1080/07474930008800459},
  url = {https://doi.org/10.1080/07474930008800459},
  year = {2000},
  month = jan,
  publisher = {Informa {UK} Limited},
  volume = {19},
  number = {1},
  pages = {55--68},
  author = {Russell Davidson and James G. MacKinnon},
  title = {Bootstrap tests: how many bootstraps?},
  journal = {Econometric Reviews}
}

https://www.tandfonline.com/doi/abs/10.1080/07474930008800459

Galen
  • 8,442
  • This is not the article I am looking for, so please post an answer if the article you have in mind is distinct from this one. – Galen Mar 22 '22 at 19:35
  • While this is a paper that presents a p-value as the expectation of an indicator function, it is not demonstrated in this answer that anyone previously linked to this paper on stats.SE. – Galen Mar 22 '22 at 21:58