3

I'm still toying with things and just learning this, so please forgive any incorrect terminology.

My toy data set is a collection of recipes with a fairly significant overlap in ingredients. I'm using these as my features, and using Pearson squared distance as described in Programming Collective Intelligence to determine the correlation between recipes. I'm not looking to "train" anything; I currently would just like to compare the recipes in my corpus and find $X$ nearest and $Y$ farthest neighbors.

I thought it'd be fun to expand this, and include people's reviews/comments on the recipes as part of the distance calculation, leading to a closer association between recipes that were both described as "sweet" or "tart" or "sour". I don't know how to do this along with the other features though; I've found examples of bags-of-words for similarity, but I can't seem to find any information about combining it with other features.

Is this a valid approach, and is there some terminology that I should be looking for? Thanks.

Peck
  • 131
  • Cool idea! Welcome to CV. I bet any answers we might be able to offer here would go over well at Seasoned Advice too. That is, any answers here might provide many answers there too! – Nick Stauner Feb 18 '14 at 07:39
  • You could think of it this way: you have a distance $d_1$ between recipes in terms of the ingredients and a distance $d_2$ in terms of the reviews. You could weight them in different ways and define a combined distance $d_1 + \lambda d_2$. Then you have to choose an appropriate $\lambda$. You can either try things out until you get something reasonable or be more formal and use cross-validation to choose $\lambda$. See the chapter in Segaran on knn and the wines example where he does exactly this. – Flounderer Feb 18 '14 at 20:51
  • That seems so simple. In my case it'd be the less formal side since I don't believe there would be a "right" answer, but this looks promising. As I'm using Pearson correlation coefficient right now I guess I need to look into other distance measurements that are additive? – Peck Feb 18 '14 at 22:01

0 Answers0