I would like to have a clear understanding of what exactly is a likelihood function in Bayes' Theorem, and why isn't considered a probability. As well as the distinction between the likelihood in discrete and continuous distribution.
Bayes' Theorem:
$$ P(m|D) = \frac{P(D|m)*P(m)}{P(D)} $$
Where m is the parameters of the model and D is the observed data set.
My understanding of likelihood is that it is simply a function with respect to m where the data set is already given. i.e :
$$L(m|D) = P(D|m)$$
I understand why the likelihood function is not a PDF since integrating it does not necessarily equal to one.
I don't understand why it is not considered a probability. I mean we are still getting a probability if we chose parameters for the model. Is it only to make the distinction that we are taking the function with respect to the model (with known dataset)? What I mean that, is it the fact that we don't know the parameters of the model makes P(D|m) a likelihood, since now it is a function with respect to the parameters of the model. Thus, plugging in the parameters of the model will give us a quantified probability for P(D|m) ?
Likelihood in Continuous vs Discrete distributions
It seems to me that likelihood is used differently in discrete and continuous distributions
For example, in Bernoulli distribution (discrete distribution) the likelihood function is given by:
$$ L(p | x_1,x_2, ..., x_n ) = \prod_{i=1}^{n} p^{x_i}*(1-p)^{(1-x_i)} $$
The intuition makes sense since we are multiplying the probabilities given unknown parameter p.
The confusion rises when we consider the likelihood of a continuous Gaussian Distribution. Which is:
$$ L(u,\sigma^2 | x_1,...,x_n) = (2\pi\sigma^2)^{-\frac{n}{2}}*\exp(-\frac{1}{2\sigma^2}*\sum_{i=1}^n(x_i-u)^2)$$
Basically, plugging in the independent variables in Gaussian PDF and multiplying them together. However:
$$P(x_i| u,\sigma^2) = 0 $$
In a continuous distribution so why are not integrating them in the likelihood function.
My presumption is that it doesn't matter integrating in the likelihood function since we usually use it to find the maximum of P(m|D). But we have still have P(D|m) which should be a quantifiable probability.
Edit
This is not a duplicate question of What is the difference between "likelihood" and "probability"?.
The accepted answer to the question is precisely my question. The answer states "In the continuous case the situation is similar with one important difference. We can no longer talk about the probability that we observed O given θ because in the continuous case P(O|θ)=0."
Why in the continuous case we no longer talk about the probability, yet we do in the discrete case.