0

In my data, I have some items that a customers purchase. I need to predict the customers behavior with different items. But in the test set, there are some items that are not present in the training set, how can I deal with these?

Thank you.

1 Answers1

0

Could you add a few more details? Do you have categories for the items (i.e., food, clothes, etc.)? I wrote briefly about a few things that might help below. I will update once I get a little more information about the task. :)

Domain Adaptation Domain adaptation might be useful in a setting like this. Since you have access to the (unlabeled?) test data, you can use something like importance weighting, where you adjust the weight of your source (training) distribution to reflect your target (test) distribution. This does not immediately solve the problem of predicting a customer's behavior with different items, but it may be useful. In practice domain adaptation is a hard problem and strong performance depends on a metric called the "discrepancy" of the distribution.

Here is a very good (and current) review on domain adaptation: http://www.cs.nyu.edu/~mohri/pub/nsmooth.pdf

Online Learning Are you able to turn this into an online learning problem? You could update the weights of your hypothesis based on the test solution. This won't necessarily help you when you see the new item the first time, but when you see the new item, you can update the algorithm accordingly.

Some thorough lecture notes and extensive reading list: http://www.cims.nyu.edu/~mohri/amls/lecture_4.pdf

Structured Learner Is there temporal structure to your data? Are you given the order in which items were purchased? There may be temporal structure that you can exploit here (time of day, number of items purchased previously, etc). These algorithms have been developed extensively for NLP problems, but they can be easily extended to other tasks.

More lecture notes and reading list: http://www.cims.nyu.edu/~mohri/amls/lecture_3.pdf