Are there any issues with only including a subset of the population in the paired samples? For instance let's say that we are comparing pain scores (ordinal values 0,1,2,3,4,5) before and after initiating treatment. The treatment is only administered to patients with an initial pain score of 4 or 5. Is it ok to run a Wilcoxon Rank Sum test where the "before" scores are only 4 and 5?
ETA based on super helpful response below: So I guess the better comparison would be looking at change in pain score (delta pain) between those who took medication and those who didn't (or ideally took placebo) and doing a Mann-Whitney U test. Does that sound alright?