If I understand you correctly, you have eleven products and accompanying mean ratings per each product and you want to pick the "best" product, meaning the one with the highest rating. Still, you worry that the rating may not be comparable because of carrying sample sizes.
You are correct that the sample size will affect how precise the means are. We know that standard errors decrease with sample size by the factor of $\sqrt n$, so this is how much the sample size alone will affect the ratings.
Do you know the standard deviations of the ratings as well? If you knew them, you could calculate the confidence intervals for the means. If product $A$ has a higher rating than $B$, but $B$'s confidence interval overlaps with the mean rating of $A$ then we cannot conclude that they are statistically different. This wouldn't tell you which product is best, but will tell you which products aren't necessarily different from each other in terms of ratings.
The idea of considering the uncertainty in optimization is not new. In fact, in Bayesian optimization, we use acquisition functions for that. One of the optimization criteria used is the upper confidence bound, where we look at $\mu + c\sigma$ (where $c$ is some constant), to pick the most promising value to explore. Such optimization strategy considers the exploration-exploitation trade-off and picks the value that is "best among uncertain", where the $c$ parameter corrects for how much uncertainty you want to consider, or how eager to explore vs exploit you are.
The last example of Bayesian optimization shows an important problem here: it is subjective what you would consider "best". It will depend on how risk-averse you are. For the products with less data, you are simply more uncertain about the ratings, and it is about how much uncertainty are you willing to accept. If you can, gather more data, if you can't, it's your bet.
Finally, people sometimes "vote with their feet" so the information about the number of ratings can be important by itself.
Eg: Product 1 has 437 ratings and gets an average score of 2.89 Product 10 has 31 ratings but an average score of 3.01.
Can we really say Product 10 is the beter product? Does the difference in the sample size need to be considered?
– eightyfish Feb 05 '21 at 03:58