I am completing a study that gathers subjective, quantitative data on a 0-10 Likert scale from patients that have single-sided deafness, that is normal hearing in one ear and impaired hearing with a cochlear implant in the other. They are asked to listen to an audio clip and rank numerous qualities on the Likert scale; they then repeat this for numerous audio clips. This is done first with the implanted ear ONLY, then the normal ear ONLY.
I want to determine if the mean ratings between ears are significantly different for each pairing of quality and clip so I can tabulate the # of clips within which a quality was differently experienced between ears. This is many comparisons, e.g. with 10 clips and 10 qualities, this is 100 comparisons of means.
Since there are 100 means to compares, a parametric test like a paired t-test seems ill-advised given that there will be (and I have checked quite a few w/ Shapiro-Wilkins) many instances where the scoring difference between conditions is not normally distributed.
So I've considered something nonparametric, like a Wilcoxon signed-rank test, however since the scoring is constrained to 0-10, there are often ties present, reducing my already small n of 11 much further and in testing resulting in seemingly aberrant findings of significance or non-significance that do not track with how the data appears.
Simply looking at the data, the results of the paired t-tests seem to track well with identifying differences in means where there visually seems to be a large disparity in scoring, but I am hesitant to use this given the normality issue.
Any recommendations on tests to use or other ways to more comprehensively elucidate these differences in quality by clip without so many comparisons?