I had an assignment in which we had to classify the cuisine and also give back the top-5 recipes based on given input. I did a count vectorization (countVectorize.transformer()) for the following data and then used Jaccard's distance to calculate the closest matches. Is this approach right or are there better distance metrics for my purpose?
Dataset : https://www.kaggle.com/c/whats-cooking/data
{ "id": 24717, "cuisine": "indian", "ingredients": [ "tumeric", "vegetable stock", "tomatoes", "garam masala", "naan", "red lentils", "red chili peppers", "onions", "spinach", "sweet potatoes" ] },