You are correct that selecting cities below a $p$-value threshold, like $p = .01$ in your example, and re-running the analysis on just those cities would give you an optimistic estimate of the treatment effect. The most common names for this effect is "selection bias" though other terms like "the winner's curse" are also used.
To see this quickly, imagine taking a thousand samples from a normal distribution $X_i \sim N(0,1)$, taking the 10 most positive observations, and taking the average of only those ten observations. This average will clearly be much larger than 0, the true mean.
Unbiased Estimation
Assuming you fit a model to the whole sample to select the cities as you described, the answer is that you can't get an unbiased estimate without making assumptions and the methods will be assumption-specific.
Taking a step back, you can get an unbiased estimate if you change your approach. The simplest way is to split your data into two data sets, and then use one data set to select the cities that get treatment and estimate the treatment effect in the selected cities using the other data set. The point estimates will be unbiased and you can construct confidence intervals with the correct coverages without any special methods. While simple, this isn't the most efficient use of all the data in the sample. There are numerous methods that build on this general approach but make more efficient use of the sample such as by repeating this procedure and combine the results in a way that preserves the unbiasedness property.
Related Literature
Three veins of work you can look for on this problem are "sample splitting", "post-selection inference", and "double/debiased machine learning" though there there are others. The literature on sample splitting is likely the least technical place to start because the other two settings often concern complications like high-dimensional problems. The problem is related to type M error as Sextus commented, but that concept is more for analyzing study designs since it involves considering hypothetical effect sizes (and sample sizes if done while planning a study).
Here are a few review paper:
Kuchibhotla AK, Kolassa JE, Kuffner TA. Post-selection inference. Annual Review of Statistics and Its Application. 2022 Mar 7;9:505-27.
Kennedy EH. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469. 2022 Mar 12. (pdf link)
Fithian W, Sun D, Taylor J. Optimal inference after model selection. arXiv preprint arXiv:1410.2597. 2014 Oct 9.(pdf link)
And an early paper on the subject that's incredibly short and readable:
Cox DR. A note on data-splitting for the evaluation of significance levels. Biometrika. 1975 Aug 1;62(2):441-4.