This is what proper and strictly proper scoring rules do, and they tend to be preferred in statistics over measures like accuracy and $F_1$ score.
Briefly, your models are not the same. One is more confident about the observation belonging to the second class, and it should be rewarded for this confidence if the true observation is the second category; likewise, that model should be penalized more severely for being so overconfident.
Log loss and Brier score are two standard statistics for assessing the probability outputs of machine learning models. Below, $y_i\in\{0,1\}$ are the true obervations, $\hat y_i$ are the predicted probabilities, and $N$ is the sample size.
$$
\text{Log Loss}=
-\dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}}\left[
y_i\log(\hat y_i) + (1 - y_i)\log(1 - \hat y_i)
\right]\\
\text{Brier Score} = \dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}}\left(
y_i - \hat y_i
\right)^2
$$
If the true label for that $x_1$ feature vector in the original question is the second category, you will find both of these giving lower (better) values for model $f$. If the true label is the first category, you will find both of these giving lower values for model $g$.
But given that we choose the majority probability for prediction they both choose y=1.
It is common to do this kind of thresholding, but doing so throws away a lot of information. First, it might be that a threshold of $0.5$ is wildly inappropriate for your task, such as if the consequences of mistaking a $0$ for a $1$ are much worse than the consequences of mistaking a $1$ for a $0$. Second, this removes any kind of "grey zone" where the best decision is not to make a decision and collect more data. Yes, a prediction of $0.51$ will be mapped to a particular categorical prediction, but I would like to know that, even if this is the likely outcome, I am on thin ice.
Frank Harrell of Vanderbilt University has two great blog posts that get into this in more detail.
Classification vs. Prediction
Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules