My notes introduce the concept of minimal sufficient statistics as follows:
Definition
A sufficient statistic $T(\mathbf{Y})$ is called a minimal sufficient statistic if it is a function of any other sufficient statistic.
Remark
Except for several very special examples, a minimal sufficient statistic always exists.
Assume the existence of a minimal sufficient statistic and consider partitioning the sample space $\Omega$, where $\mathbf{y}_1, \mathbf{y}_2 \in \Omega$ are assigned to the same equivalence class iff the likelihood ratio $L(\theta; \mathbf{y})/L(\theta,\mathbf{y})$ does not depend on $\theta$.
Define a statistic $T(\mathbf{Y})$ in such a way that $T(\mathbf{y}_1) = T(\mathbf{y_2})$ if $\mathbf{y}_1$ and $\mathbf{y}_2$ belong to the same equivalence class and $T(\mathbf{y}_1) \not= T(\mathbf{y_2})$ otherwise.
Therorem 2
The statistic $T(\mathbf{Y})$ defined above is the minimal sufficient statistic for $\theta$.
Proof of theorem 2 for the discrete case
First we show that $T(\mathbf{Y})$ is sufficient.
$$\begin{align} P_\theta (\mathbf{Y} = \mathbf{y} \vert T(\mathbf{Y}) = t) &= \dfrac{P(\mathbf{Y} = \mathbf{y}, T(\mathbf{Y}) = t)}{P(T(\mathbf{Y}) = t)} \\ &= \dfrac{P(\mathbf{Y} = \mathbf{y})}{\sum_{\mathbf{y}_i : T(\mathbf{y}_i) = t} P(\mathbf{Y} = \mathbf{y}_i)} \\ &= \dfrac{L(\theta; \mathbf{y})}{\sum_{\mathbf{y}_i : T(\mathbf{y}_i) = t} L(\theta; \mathbf{y})} \\ &= \dfrac{1}{\sum_{\mathbf{y}_i : T(\mathbf{y}_i) = t} \dfrac{L(\theta; \mathbf{y}_i)}{L(\theta; \mathbf{y})}} \end{align}$$
Since $T(\mathbf{y}) = T(\mathbf{y}_i) = t$, all $\mathbf{y}_i$ and $\mathbf{y}$ belong to the same equivalence class induced by $T(\mathbf{y})$ and, therefore, the likelihood ratios $L(\theta; \mathbf{y}_i)/L(\theta; \mathbf{y})$ do not depend on $\theta$.
To prove the minimality of $T(\mathbf{Y})$, consider any other sufficient statistic $S(\mathbf{Y})$ and the corresponding partitioning of $\Omega$. Let $\mathbf{y}_1, \mathbf{y}_2 \in \Omega$ belong to the same equivalence class of that partition. According to the factorisation theorem, Theorem 1, $$\dfrac{L(\theta; \mathbf{y}_1)}{L(\theta; \mathbf{y}_2)} = \dfrac{g(S(\mathbf{y}_1), \theta)h(\mathbf{y}_1)}{g(S(\mathbf{y}_2), \theta)h(\mathbf{y}_2)} = \dfrac{h(\mathbf{y}_1)}{h(\mathbf{y}_2)},$$ which does not depend on $\theta$. By definition, $\mathbf{y}_1, \mathbf{y}_2$ then fall within the same equivalence class induced by $T(\mathbf{Y})$ as well and, therefore, $T(\mathbf{Y})$ is a function of $S(\mathbf{Y})$.
Theorem 1 is as follows:
Theorem 1 [Fisher-Neyman Factorisation Theorem] A statistic $T(\mathbf{Y})$ is sufficient for $\theta$ if, and only if, for all $\theta \in \Theta$, $$L(\theta; \mathbf{y}) = g(T(\mathbf{y}, \theta) \times h(\mathbf{y}),$$ where the function $g(\cdot)$ depends on $\theta$ and the statistic $T(\mathbf{Y})$, while the function $h(\cdot)$ does not contain $\theta$.
How does one conclude that $\dfrac{L(\theta; \mathbf{y}_1)}{L(\theta; \mathbf{y}_2)} = \dfrac{g(S(\mathbf{y}_1), \theta)h(\mathbf{y}_1)}{g(S(\mathbf{y}_2), \theta)h(\mathbf{y}_2)} = \dfrac{h(\mathbf{y}_1)}{h(\mathbf{y}_2)}$? Specifically, how do $g(S(\mathbf{y}_1), \theta)$ and $g(S(\mathbf{y}_2), \theta)$ cancel?