None of the tests you list will be valid. The reason is that your data are not independent, and all of those tests require independence. That is, the ratings within the same group will be correlated with each other. As a result, you need to use some form of mixed effects / multilevel model. The fact that your $N < 30$ is not the key issue. If the population from which they were drawn was normal, you could use a linear mixed effects model. The Mann-Whitney U-test and the Wilcoxon are appropriate for smaller samples from non-normal populations (MW is for unpaired, and Wilcoxon is for paired so you would use the latter), but they don't address the within group clustering. If you aren't willing to make the assumption of normality (which seems prudent, given that these are ratings), you would probably do best to use a mixed-effects ordinal logistic regression model. I don't know what software you use, but in R this can be done using the ordinal package. These are somewhat advanced models to fit; if you aren't familiar with them, you should work with a statistical consultant.
On an unrelated note, be aware that your prototype is confounded with order. If you come to conclude that the before and after differ, you cannot tell whether that is because of the prototype or because these things change over time unfortunately.