Below I quote the parameter identifiability definition from Section 4.6, Statistical Models, by A. C. Davison:
There must be a 1-1 mapping between models and elements of the parameter space, otherwise there may be no unique value of $\theta$ for $\hat{\theta}$ to converge to. A model in which each $\theta$ generates a different distribution is called identifiable.
This model identifiability definition is just a more heuristic way of paraphrasing the definition you cited in the question. The first sentence in the above quotation requires that the mapping from the parameter space $\Theta$ to the model space $\mathscr{P}_\Theta$: $\theta \mapsto f_\theta$ is a bijection (as the mapping is inherently surjective, it is only necessary to verify the mapping is injective). Therefore, the identifiability definition should not rely on the specific observations of data and can be discussed from a pure probabilistic perspective, as other answers have already pointed out.
A few concrete examples will elucidate this concept. Consider the probit model in your question: for parameter $\theta = (\theta_1, \theta_2)' \in \Theta$, the distribution of the response variable $y$ given the explanatory variable $x = (x_1, x_2)'$ is
\begin{align}
f_\theta(y|x) = \Phi(\theta'x)^y(1 - \Phi(\theta'x))^{1 - y}. \tag{1}
\end{align}
Hence $f_{\theta_1}(y|x) = f_{\theta_2}(y|x)$ requires, in particular $f_{\theta_1}(1|x) = f_{\theta_2}(1|x)$ for all $x$, i.e., $\Phi(\theta_1'x) = \Phi(\theta_2'x)$ holds for all $x$. Since $\Phi$ is strictly increasing, this implies $\theta_1'x = \theta_2'x$ for all $x$, which can only hold when $\theta_1 = \theta_2$. This shows that the mapping is injective hence the probit model $(1)$ is identifiable.
Now consider a model that is non-identifiable. General mixture models are usually non-identifiable according to the very original definition quoted above, which is known as the label switching problem. The two-component mixture model below (Exercise 4.6.1 from the same reference) is a very simple example:
Data arise from a mixture of two exponential populations, one with probability $\pi$ and parameter $\lambda_1$, and the other with probability $1 - \pi$ and parameter $\lambda_2$. The exponential parameters are both positive real numbers and $\pi$ lies in the range $[0, 1]$, so $\Theta = [0, 1] \times \mathbb{R}_+^2$, and
\begin{align}
f(y; \pi, \lambda_1, \lambda_2) = \pi\lambda_1e^{-\lambda_1y} + (1 - \pi)\lambda_2e^{-\lambda_2y}, \quad y > 0, 0 \leq \pi \leq 1, \lambda_1, \lambda_2 > 0. \tag{2}
\end{align}
There are many ways to show $(2)$ is non-identifiable. One way is by noting that as long as $\lambda_1 = \lambda_2$, then no matter what the value of $\pi$ is, the model degenerates to the single exponential distribution: for example, $\theta_1 = (0.5, 1, 1) \neq \theta_2 = (0.2, 1, 1)$ give the same density $e^{-y}$. The other way corresponds to the label switching, which means all the permutations of one parameter give the same density: for example, $\theta_1 = (0.2, 1, 2) \neq \theta_2 = (0.8, 2, 1)$ yield the same density $0.2e^{-y} + 0.8 \times 2e^{-2y}$.
For more examples and discussions on this topic, you can look into the referenced section.