(1) If you can stratify on the delivery method, do so.
(2) If you can stratify on the targets, do so; come up with a meaningful stratification strategy that would give you mutually exclusive categories. (You would have to tell more about how the categories overlap for us to provide meaningful advice.)
(3) If you can stratify on both of these variables, stratify on the pair-wise cells.
Whatever strata you produce, consider proportional allocations between strata (unless you have reasons to expect that some of the variables of interest have certain behaviors associated with stratification). That is, the sampling rates within strata should be the same: if you decide you want to have a sample size of 50, then you might want to have the sampling rate of 1/7 in each strata. This would also mean that each strata should have at least 14 units in the population, so that you could sample at least 2 papers, in other to be able to produce variance estimates (you can't do that with just 1 observation in a stratum). Thus you may have to combine the strata that are too small.
The sample size issue is entirely up to you. This is a trade-off between accuracy and cost, and both of these are probably in your head only. If you had just one target variable of interest, and had an idea about the effect size you are trying to capture, there would have been a way to conduct some power analysis and try to figure out the sample size. If you have several response variables and no clue about how they behave in your population of papers, then basically you would want to sample until you get tired of reading these papers.