I work with nonhuman primates, and have had them (4 total) perform a series of 3 tasks. Each of the three tasks are 2 choice categorization tasks. So the data is correct/incorrect for each trial. There was not a set number of trials for each task, instead they had to meet a criterion of 85% correct in their last 120 trials. When the program beings, it creates an array of 60 1s and 60 0s, starting the participant at 50% correct. As the trials progress, the array is updated with either a 1 or 0 based on that trial. The first position in the array is replaced with a 1 or 0 depending on if they were correct or not in trial 1, position 2 in the array is replaced when trial 2 completed, etc... trial 121 replaces position 1 in the array, and so on. If they complete 160 trials, it looks at trials 41-120 to determine if they have finished. Because it starts at 50% correct they can meet criterion in less than 120 trials.
So what I have from each participant is repeated measures binomial data from 3 tasks varying in size (for example they may have finished task 1 in 306 trials, task 2 in 115 trials, and task 3 in 144 trials). For each primate, I want to be able to compare their performance between each task. Was performance significantly different between tasks? However, I am finding this to be impossible. I'm unable to find a test that will take binomial data of uneven sample sizes. I've looked mostly into a variety of Chi-square options, but it seems those only work well with even sample sizes. I used the Cochran's Q test to look at early performance (first 100 trials), but need to also be able to look at overall performance.