I have this data where participants listened to each stimulus and gave a rating (1~5) for each. But each stimulus contains multiple sentence-like units (i.e., intonational phrase) within it, and for each of those units, we have a boundary tone (i.e. categorical variable, with levels like fall, rise, fall-rise etc.) - so multiple tones for each stimulus.
We want to see how and whether the types of boundary tones used affected the participants' ratings (dependent v). What makes things complicated is that what meaning/attitude/emotion each tone conveys depends on the sentence; I could code the sentence as a random effect. So it makes sense to me to treat each intonation phrase (sentence) as a separate data point, but in this case, we would be treating one response (rating) as it is multiple. Does this make sense?
What would be the best approach to deal with this data statistically?