Sorry if this is a really simple question but I'm new to this and wondered if there's an easy way to do what I'm picturing.
Imagine I've got a bunch of people and I'm asking them what they've eaten in the last week. Each person will have a list of foods they have eaten, and a frequency of the number of times they've eaten each food. I want a single number that measures how similar each person's food list is to everyone else's. Probably no one will have eaten something that no one else has eaten, but probably no one will have eaten all kinds of food.
This is data wrangling for a logistic regression machine learning model, so the outcome will need to be a continuous variable. I was going to count something like the number of different foods they've eaten, but that's probably not very predictive.
My actual dataset has 500,000 "people" and 48 different "kinds of food".
How would you do this please? Thanks!
!KO_proxqntof "Various proximities" on my web-page. – ttnphns Mar 26 '23 at 18:36