I am understanding why likelihoods are not PDFs using links such as What is the reason that a likelihood function is not a pdf. However I am getting more confused.
For instance, the likelihood of I.I.D. Normals which can be seen as $$ \prod^N \mathcal N(x_n|\mu,\sigma^2) =\frac{1}{Z^N}\prod \frac{1}{2} (X-\mu)^\top (\sigma I)^{-1}(X-\mu) $$
This is equivalent with the PDF of an MVN $ N(\mu,\sigma^2 I)$.
This may also be the reason why maximizing the likelihood or the MVN PDF of the data with Covariance matrix $\sigma^2 I$ yields the same set of solution (particularly for linear regression).
Please help me clarify this. Is a reason why the likelihood is not considered a PDF is because of context? (The likelihood is meant to be a function that should be maximized with respect to the parameters)
The equivalence between the likelihood and the MVN above is just a coincidence? In fact I think any likelihood can be integrated to $1$ with a suitable normalizing constant unless maybe the likelihood itself does not converge under the integral.
