The two models aren't fundamentally different. The interaction term is 0.0072 in the first and 0.0070 in the second, so the practical magnitude of the interaction is almost identical. The difference is that the interaction term in the first model passed the arbitrary p < 0.05 criterion for "statistical significance" while the second didn't.
The best advice in this situation is to report both results honestly. See this page, among others on this site, for extensive discussion.
A few thoughts to help you understand what's going on and think this through further.
First, is that interaction term of ~0.007 significant in practice? It means that for every unit change in TMS_SUM_removed the association of AMSScore with outcome increases by 0.007 units above its single-predictor coefficient of 0.267. Similarly, a unit change in AMSScore changes the association of TMS_SUM_removed with outcome by 0.007 units above its baseline value of 0.048. Those values are hard to interpret without more information about what those variables mean and how they are coded, but they raise the possibility that the "statistically significant" result in the first model isn't very important in practice.
Second, the coefficient p-values in the first model are presumably based on Wald statistics, which assume normal distributions of their sampling distributions (that is, the distribution of coefficient estimates over multiple data samples). A better test of the "statistical significance" of an interaction term in a model can be a likelihood-ratio test comparing models that are equivalent except for the interaction term. That, however, requires fitting two models so it's not what's typically shown by default.
Third, bootstrapping has its own limitations. To get reliable estimates of extreme quantiles like 95% confidence intervals you need to do on the order of 1000 or more bootstrap samples. Even with 1000 bootstrap samples the 95% CI are determined by only the most extreme 25 values at either end of the distribution. See this page and its links. Unless there's a very large number of bootstrap samples, the results can depend on the choice of random seed used to set up the resampling. It's not clear how many were done in this situation.
Furthermore, the PROCESS FAQ says that the default bootstrap method since version 3 is the percentile bootstrap. That isn't always the best choice. See this page, for example, for extensive discussion. A "bias-corrected" bootstrap is available, but the "BCa" ("bias corrected and accelerated") bootstrap, which can be a good choice in some difficult situations, isn't even available in PROCESS.
You might want to review the PROCESS help pages to see how to specify the random seed (so that you and others can repeat your results), to specify a large number of bootstrap samples, and perhaps to specify the bias-corrected bootstrap.
LiveNumandRecordNum. Were those just cut off from the screenshot? – EdM Dec 12 '23 at 18:37