In previous threads , parametric bootstrapping was suggested as a method to test for differences between time series at specific time points. I have followed the methods as described in this answer (see below for more specifics), however I am not sure which test to run after obtaining the parametric bootstrap samples drawn from the fitted models. I tried a two-sample t-test, however the results seems questionable (see below).
Specifics
To test whether there are significant differences at specific times between two time series of hourly air quality data, I have fitted two Gaussian Process models to the two time series(using the GauPro() function from the GauPro-package in R). Subsequently, I have used the sample() function from the same package to draw 30 000 datapoints from the probability distributions of both fitted models at a time point of interest. This is what the histogram of the drawn datapoints look like (text continued below):
To test if there is a difference in the means of the distributions, I tried running a t-test on the bootstrapped data points, however the highly significant p-value seems a bit unreasonable, given the histogram above. Have I used an appropriate test for this comparison?
Test output:
Welch Two Sample t-test
data: GP1_nox_boot[, 2] and GP2_nox_boot[, 2]
t = -44.422, df = 59916, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.340702 -1.227391
sample estimates:
mean of x mean of y
70.24509 71.52913
Edit: Answers that I have encountered mostly deal with the nonparametric version of this bootstrap test (i.e., comparing two observed dataset by resampling, instead of comparing two simulated datasets)

Ignoring the arbitrarily large sample size, would you say a t-test is appropriate here?
– randomEcologist Nov 30 '23 at 08:40