Likelihood is a function of $\theta$, given $x$, while $P$ is a function of $x$, given $\theta$.
Roughly like so (excuse the quick effort in MS paint):

In this sketch we have a single $x$ as our observation. Densities (functions of $x$ at some $\theta$) are in black running left to right and the likelihood functions (functions of $\theta$ at some $x$) are in red, running front to back (or rather back to front, since the $\theta$ axis comes 'forward' and somewhat to the left). The red curves are what you get when you 'slice' across the set of black densities, evaluating each at a given $x$. When we have some observation, it will 'pick out' a single red curve at $x=x_\text{obs}$.
The likelihood function is not a density (or pmf). It is defined in terms of density but it's a different one at every point. It doesn't integrate (/sum) to 1. It needn't even be normalizable.
Indeed, $\mathcal L$ may be continuous while $P$ is discrete (e.g. likelihood for a binomial parameter) or vice-versa (e.g. likelihood for an Erlang distribution with unit rate parameter but unspecified shape)
Imagine a bivariate function of a single potential observation $x$ (say a Poisson count) and a single parameter (e.g. $\lambda$) -- in this example discrete in $x$ and continuous in $\lambda$ -- then when you slice that bivariate function of $(x,\lambda)$ one way you get $p_\lambda(x)$ (each slice gives a different pmf) and when you slice it the other way you get $\mathcal L_x(\lambda)$ (each a different continuous likelihood function).
(That bivariate function simply expresses the way $x$ and $\lambda$ are related via your model)
Conversely, with a discrete $\theta$ and a continuous $x$ the likelihood is discrete and the density continuous.
As soon as you specify $x$, you identify a particular $\mathcal L$, that we call the likelihood function of that sample. It tells you about $\theta$ for that sample -- in particular what values had more or less likelihood of giving that sample.
Likelihood is a function that tell you about the relative chance that this value of $\theta$ could produce your data (in that ratios of likelihoods can be thought of as ratios of probabilities of being in the interval from $x$ to $x+dx$), when comparing it to other values for $\theta$.