1

I have built a machine learning model which predicts whether a customer will buy a product or not. The model performs well on cross validation tests. Now, I will deploy it in production to recommend the product to customers that the model produces a high likelihood of buying.

I will continue to collect data to further improve the model. I feel that collecting only the actions (bought or not bought) of the users which the product is recommended will create some bias. This seems to violate the i.i.d. assumption since the data collection will not be random. My question in general is how should I continue collecting data once the machine learning model is in production? Should I collect data randomly such as by making product recommendations to random users?

Sanyo Mn
  • 1,252
  • 12
  • 19
  • You could look into active learning, see for instance https://stats.stackexchange.com/questions/422186/motivations-for-experiment-design-in-statistical-learning/422518#422518. Consider add the tag [tag:active-learning] – kjetil b halvorsen Aug 16 '22 at 18:00

0 Answers0