I have an imbalanced time series dataset for use in a time series forecasting problem for regression (forecast 1 video of 24 hour data (144 7x7 images) given a 1 video of 24 hour data (144 7x7 images)). I did a test, in which I filtered the data from the training set, so that the training target data was equal to or less than the mean of the validation target data, this made the loss (Gradient Difference Loss + MAE) and metrics (RMSE and MAE) of training, validation and testing sets become much better in the early epochs since the first epoch, it was a very noticeable change. I understood that this way the training data is more balanced, since the data starts to have a distribution more similar to the validation data and consequently to the test data. From what I verified, this is a sampling technique, I believe it would be downsampling. I would like to know if this approach is valid, and what would be others solutions. Python code that I use to downsample:
mean_Y_val = np.mean(Y_val)
mask_Y_train = np.mean(Y_train, axis=(1, 2, 3)) <= mean_Y_val
Y_train = Y_train[mask_Y_train]
X_train = X_train[mask_Y_train]