I want to use ML to verify if Cuban cigars are overpriced. I want to use this website https://www.cigaraficionado.com/ratings/search?q=&brand= to get cigar data. The website provides a blind testing score for every cigar and key cigar characteristics e.g. length, type, price, origin.
My current thinking is that I should build a model to predict the Cigar rating based on the other variables. Then I can see if the 'origin = cuba' variable actually significantly determines the rating or not. If it is not a significant variable, then my understanding is that this proves that, when all other variables are kept equal, (i.e. for 2 cigars that are equivalent but one from cuba and one not) that the fact it comes from cuba does not make it better...
Does this make sense? Is there a better / different way?