Suppose we have a family of distributions $\mathcal{P}=\{P_\theta\colon\ \theta\in\Theta\}$, where $\theta$ is the unknown parameter, possibly a vector, and $\Theta$ is the parameter space. Given each $\theta\in\Theta$, $P_\theta$ is a known distribution function.
For example, we may consider the family of normal distributions with variance one: $$\mathcal{P}=\{N(\theta,1)\colon\ \theta\in\Theta=\mathbb{R}\}.$$
Now suppose we believe a family $\mathcal{P}$ is appropriate for the upcoming data. We will observe i.i.d. random variables $X_1,...,X_n$, with the common distribution function in $\mathcal{P}$; that is, there exists an unknown true $\theta^*\in\Theta$ such that $P_{\theta^*}$ generates the data. We are willing to test a hypothesis $$H_0\colon \theta^*\in\Theta_0\qquad vs.\qquad H_1\colon \theta^*\notin\Theta_0,$$ where $\Theta_0$ is a subset of $\Theta$, under significance level $\alpha\in(0,1)$.
Before actually seeing the realized values of $X_1,...,X_n$, we can already construct a decision rule based on a test statistic.
For example, for the normal family above and $H_0\colon \theta^*=0$ with $\alpha=0.05$, the usual decision rule is to
"reject $H_0$ if and only if $\sqrt{n}|\bar{X}|>1.96$", where $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$ is the sample average. In this example, the test statistic is $\sqrt{n}|\bar{X}|$. A large value of $\sqrt{n}|\bar{X}|$ supports $H_1$.
Let's denote $\mathbf{X}=(X_1,...,X_n)$, which is a random vector. Now replace $\sqrt{n}|\bar{X}|$ by a general test statistic $W(\mathbf{X})$.
Theorem 8.3.27 (Statistical Inference by Casella & Berger). Let $W(\mathbf{X})$ be a test statistic such that large values of $W$ give evidence that $H_1$ is true. For each sample point $\mathbf{x}$, define $$p(\mathbf{x})=\sup_{\theta\in\Theta_0}P_\theta(W(\mathbf{X})\ge W(\mathbf{x})).$$Then $p(\mathbf{X})$ is a valid $p$-value.
Definition 8.3.26 (Statistical Inference by Casella & Berger). A $p$-value $p(\mathbf{X})$ is a test statistic satisfying $0\le p(\mathbf{x})\le 1$ for every sample point $\mathbf{x}$. Small values of $p(\mathbf{X})$ give evidence that $H_1$ is true. A $p$-value is valid if, for every $\theta\in\Theta_0$ and every $0\le \alpha\le 1$,$$P_\theta(p(\mathbf{X})\le \alpha)\le \alpha$$
When $\Theta_0$ is a single point $\{\theta_0\}$, Theorem 8.3.27 gives $$p(\mathbf{x})=P_{\theta_0}(W(\mathbf{X})\ge W(\mathbf{x})) = E_{\theta_0}(1_{\{W(\mathbf{X})\ge W(\mathbf{x})\}})$$ and $p(\mathbf{X})$ is a valid $p$-value. Or, in some one-sided test problems $\Theta_0=(-\infty,\theta_0]$ or $\Theta_0=[\theta_0,\infty)$, the supremum is attained at the boundary point $\theta_0$, then we can still define the $p$-value as $E_{\theta_0}(1_{\{W(\mathbf{X})\ge W(\mathbf{x})\}})$.
stats.SEquestion than here... Or should this be migrated to meta since it is ostensibly about an existing question somewhere on this site? – Galen Mar 22 '22 at 21:42