2

I am running a product recommendation using ALS method on retail transaction data. A simple question struck my mind on the using the methodology in case of implicit ratings. In my case I am using the quantities of different products bought by customer as my implicit rating. My question is if I am normalizing the quantities, should I normalize with respect to each product or each customer.

For example if I am using a normalizing there are the following two cases that I am puzzled with. Which one is ideal in my scenario?

Case a: Rating of Customer 1 on product 1 is = quantity of product 1 bought by customer 1/sum of quantities of all products bought by customer 1

Case b: Rating of Customer 1 on product 1 is = quantity of product 1 bought by customer 1/sum of quantities of product 1 bought by all customers

P.S. I am fairly new to this community and this is my first question. :)

Aditi
  • 23

1 Answers1

1

Without knowing what you refer to as ALS I'd say crafting the two features you answer two different questions:

The first way answers the question: What did that customer think about that product in the past?

The second feature answers the question: What did all customers think about this product in the past?

First of all when you have a choice of which features you should use an easy way of figuring out which ones are the 'good'/right ones and which ones are the bad ones is to craft both of them, let the model run and assess by any means (feature importance / evaluate the model 4 times, once without both features, two times with only one of the two features, one final time with both of them / ...) the model performance depending on those features. This is an 'automatic' way that will guide you towards new questions/insights.

For this particular scenario I would say whether feature number 1 is useful depends on the 'usual' customer behaviour: Does it happen often that a certain customer buys the same product often? In that case feature nr. 1 is particularly useful because it gives a personally tailored estimation of the 'value' of thet product to that particular customer (while features nr. 2 only gives a rough estimate of the average value of that product to all customers but does not express anything about how important that product is for the customer in question).

Does it happen often that customers buy a certain product for the first time? Then you need to go with either feature 2 or a more clever way of crafting that feature. Amazon has demonstrated that by creating 'customers who bought this were also interested in ...' and you could do something like that as well:

Create a vector of the products most bought by the current customer. Create such a vector for every other customer as well. Check how much customers liked that product in question that had a 'most similar' past behavioural vector in comparison to the present customer.

Does that help?

  • Thanks for your detailed answer. It really does help. By ALS i meant Alternating Least Squares Matrix Factorization algorithm to come up with product recommendations which can give out new products as well as the recommendations for products already bought in past. For now I have used case a and then ranked the products based on scores provided by the algorithm. And was confused on whether i should have done the normalization differently to include more informations. – Aditi Mar 17 '20 at 11:51