The topic is somewhat generic but I will try to specify it as much as possible.
Theoretically, we have a dataset that being a survey could be biased (geographically, gender ...) in this case are about 100k respondents.
This dataset is a sample of people with their characteristics and a response variable of a survey that measures the propensity to buy bicycles "bike_buyer".
Question
Because this dataset is used to train a ML problem. Prior to training the model:
would it be correct to weight the variables in the dataset to correct their distribution towards a correct theoretical distribution?
I have not seen publications on this type of methodology, I wonder, if it is correct to do so.
I understand that this question is open, since I cannot specify certain issues such as, what is the way of weighting, how are the weights obtained?
So it can be answered from first in a generic way with yes/no and why. And comment or add some methodology or article on the subject, to demonstrate good practice.