Let's call your historical data $D_0$ and your new data $D_1$. The principled Bayesian approach would be that if you have access to the raw $D_0$ data, to pick the prior you would use a Bayesian model for this data, and use the posterior from this analysis as a prior for the $D_1$ data. So if your parameter of interest is $\mu$, you would calculate the posterior as
$$\begin{align}
p(\mu | D_1, D_0) &\propto p(D_1 | \mu) \, p(\mu | D_0) \\
&= p(D_1 | \mu) \,p(D_0 | \mu) \,p(\mu) \\
&= p(D_1, D_0 | \mu) \, p(\mu)
\end{align}$$
As you can see, by the Bayesian updating, this is the same as using all the data at once.
If you don't have raw data but only summary statistics, or don't want to be so strict, you could pick the priors using a less formal approach, by using the summary statistics. For example, if your prior is Gaussian, you could pick the mean and standard deviation from the $D_0$ summary statistics.
To do this, I understand I need both a prior distribution and a likelihood function. What are the next steps here to find the posterior with the given data?
If you have the prior $p(\theta)$ and likelihood $p(X | \theta)$, all you need to do is to apply Bayes theorem
$$
p(\theta | X) = \frac{p(X | \theta) \, p(\theta)}{\int p(X | \theta) \, p(\theta) \, d\theta}
$$
In some cases, you would have conjugate prior and a nice closed-form solution, in some you won't, and you would need to use numerical solutions (like MCMC sampling) to calculate the integral in the denominator.
Do I have to estimate the distribution of the prior using the population data?
No. You can't do that. If you used $D_1$ to calculate the prior and the likelihood, you would use this data twice, leading to an overconfident result. Prior is something that you pick before seeing the data. In your case, you can however pick the prior using $D_0$, as described above.
Then, how would I use the new data $D$ to specify a likelihood function?
You don't pick the likelihood based on the data but on your understanding of the problem. For example, if you know that your data represents counts of something in a fixed interval, that happens at a fixed rate, you could pick the Poisson distribution as a model, because it describes the such scenario. This is of course a bit idealistic, as in real life we peek at the data and consider its characteristics on it when deciding on the model (likelihood and prior) as discussed by Gelman, Simpson, and Betancourt (2017).