It is not so unusual for in-sample and out-of-sample data to have differences in the class ratios, just by flukes of randomly sampling to allocate observations to the in-sample and out-of-sample data (unless you make a point to stratify in order to maintain the class ratio). However, your difference is so great that I have to think something is inherently different about the second time period that makes it much more likely to have the first outcome than it is in the first period. You do not have to have many observations to find this difference to be statistically significant.
Consequently, by training in period $1$ and testing in period $2$, you are testing in an inherently different situation.
The reason you have better classification accuracy in the second period than the first period is likely due to the imbalance in the first period leading the model to predict probabilities that are on the low side, probably quite a bit below $0.5$. When you make predictions for the second period, your predictions will still tend to be below a probability of $0.5$, so when you apply a threshold of $0.5$, those get rounded to category $0$, which is much more common in this time period, so you are more likely to get the right answer.
(Or maybe your majority class is coded as $1$, and your predicted probabilities tend to be quite a bit higher than $0.5$. Analogous logic applies.)
If you evaluate the probabilistic predictions in the two time periods however, such as with log loss, Brier score, or even a ROC curve, you are likely to have better performance on the training data from period one than the testing data from period two. I would consider the stronger performance in terms of accuracy to be something of a mirage.
If you have some way to predict the drift of the prior probability in each time period, you could perhaps calibrate your out-of-sample probabilities to reflect the class ratio in that period (instead of, falsely, assuming the class ratio to remain constant). If you have this ability, however, you might be more inclined to use those determinants of the prior probability in a time period as features in your classifier or probability prediction model.
If you do not have the ability to model how the prior probability in each time period changes, yet it does, then you are modeling a nonstationary process with no information about the dynamics. Of course your performance will be poor. (Again, your higher accuracy in the second time period is a mirage. If you evaluate the probabilities directly, you are likely to find worse performance in period two than in period one.)