Consider the binary cross-entropy loss $L = -y \log f(x) + (1-y) \log (1-f(x))$ with binary labels $y \in \{0,1\}$ and our model produces predictions $f(x)$ from the features $x$. If we $n$ observations with the same values $x$ for the features, but not all the same $y$ values, then the average loss for these $n$ samples is given by
$$
\frac{1}{n} L = -\frac{k}{n} \log f(x) - \frac{n-k}{n} \log(1 - f(x)) \\
$$
where $k$ is the sum of $y$ for these $n$ samples.
So for the case of having a probabilistic label $P(A) = 0.78$ and $P(\lnot A)=0.22$, the loss is given as
$$L = -0.78 \log f(x) - 0.22 \log(1 - f(x)),
$$ which matches the average loss for data arising from $78$ observations of $A$ and $22$ observations of $\lnot A$ where all 100 observations have the same feature vector $x$.
This shows that we can generalize cross-entropy loss with binary labels to a loss on probabilistic labels.
And we can reach a similar conclusion using maximum likelihood estimators: Machine Learning with Aggregated Frequency Data as Training
There is one caveat here, which is that the probabilistic labels are not identical to modeling data where $n$ trials result in $k$ successes (binomial data). This is because the probabilistic labels suppress the number of trials $n$, whereas the binomial model is weighted by the number of trials $n.$ The resulting models will differ whenever there are unequal $n$s in the data. So if we only have two distinct feature vectors $x_1$ and $x_2$ and wildly differing sample sizes, it's easy to see that the likelihoods differ by inspection.
This caveat illustrates an exposes an important nuance to the question: whether using this weighted cross-entropy model is appropriate depends on how the data were collected and how the probabilistic labels constructed.