Commonly, in statistics, we make a strong distributional assumptions about where each datapoint comes from: we say that $P(y_i|\theta) = f(y_i,\theta)$. If we make the assumption that the data are independent, then we can get the joint probability of all our data by taking the product of each density: $P(\{y_1, \ldots, y_N\}|\theta) = \prod_{i=1}^N f(y_i|\theta)$.
When it comes time to find out what the parameters $\theta$ are, a common approach is maximum likelihood. This involves simply computing the maximum of $P(\{y_1, \ldots, y_N\}|\theta)$ now viewed as a function of $\theta$:
$$
\hat\theta = \underset{\theta}{\textrm{argmax}} \prod_{i=1}^N f(y_i,\theta)
$$
So this is the first part of an answer to your question: if we are set on doing maximum likelihood, the product comes straight from our assumption of independence and basic probability rules.
But you ask: is it possible to consider the sum of the probabilities instead? This would involve abandoning the idea of maximum likelihood (but it could probably fall under the umbrella of M-Estimation):
$$
\hat\theta = \underset{\theta}{\textrm{argmax}} \sum_{i=1}^N f(y_i,\theta)
$$
One downside is that it's not as well motivated as the product formulation, because that's based on the probability of the entire sample rather than being some function of the individual probabilities. But nevertheless, this cost functional is still monotonically increasing in each individual probability, right? So it's not a priori a ridiculous thing to try, and I imagine that there may be circumstances in which it does reasonable things.
However, in the most common circumstances, it's going to have inferior properties to the product formulation. Let's do a simple example. Imagine that we have some data, and we want to fit a normal distribution to it with a fixed variance, say $\sigma=0.1$, but want to find the mean $\mu$ given this fixed variance assumption:
$$
y_i \overset{iid}{\sim} N(\mu, 0.1^2)
$$
We can compute the individual probabilities, and then get the standard (product) likelihood versus the proposed sum functional.
Here's code for that in R:
N <- 10
x <- rnorm(N)
sigma <- 0.1
indiv_lik <- function(mu) dnorm(x, mu, sigma)
mu_seq <- seq(-3,3,length.out=10000)
par(mfrow=c(2,1))
plot(mu_seq, sapply(mu_seq, function(mu) sum(indiv_lik(mu))), type = 'l')
plot(mu_seq, sapply(mu_seq, function(mu) prod(indiv_lik(mu))), type = 'l')
This is the plot produced; the top plot shows the sum formulation, the bottom the standard product formulation:

Even though summation and product-taking are conceptually similar, the plots look nothing alike! The sum plot is highly multimodal and nonconvex, while the bottom plot is unimodal. And this is just for a normal likelihood with known variance!
The reason is that the normal likelihood is log-convex, which is a property conserved under product-taking.
However, the property conserved under summation is not log-convexity, but just plain-ol' convexity, which the normal density does not possess. The same is true of many other popular distributions: they are much friendlier when combined via a product than a sum.
Now all this being said, could I rule out some circumstance where summing the probabilities leads to a better inference situation than producting it? I certainly don't know of a proof ruling this out. But in most common situations, it will be inferior.