I have a task where multiple coders can apply non-mutually exclusive categories to a list of subjects, e.g., more categories can be applied to the same subject.
What could be the most appropriate method for assessing intercoder reliability for each category? Does Fleiss kappa suit in this case?