7

I have a population/set of papers (~350) that have been categorized into non-mutually exclusive types of in vivo biological target each with different numbers of paper. Another way of categorizing these papers is the method by which some drug against the target is administered. Categories by this categorization are mutually exclusive.

I want to compare the merits of the biological target categories. Should I stratify the papers by biological target, or by method of drug administration, or should I just take a simple random sample?

Similarly, if I were to do the converse - compare the merits of the methods of administration - how should I sample?

Finally, bearing in mind that the method of assessing each category is laborious, what sample size should I be using?

user1447630
  • 1,059
  • Clarifications: (1) are the categorizations of the targets and delivery methods already available? Options: (a) yes, I already have them; (b) no, but I can quickly get them for all 350 papers by scanning the abstract; (c) no, I have to read paper in depth to get them. (2) what are the variables of interest? That is, for the papers that you are going to select for in-depth analysis, what will you get out of such an analysis? E.g., effect size, cost of the study, number of citations, etc. – StasK Jun 28 '12 at 12:24
  • (1) (a); (2) The variables or attributes of interest are specific measures, such as ratio of adverse events to patients. – user1447630 Jun 28 '12 at 22:55

1 Answers1

1

(1) If you can stratify on the delivery method, do so.

(2) If you can stratify on the targets, do so; come up with a meaningful stratification strategy that would give you mutually exclusive categories. (You would have to tell more about how the categories overlap for us to provide meaningful advice.)

(3) If you can stratify on both of these variables, stratify on the pair-wise cells.

Whatever strata you produce, consider proportional allocations between strata (unless you have reasons to expect that some of the variables of interest have certain behaviors associated with stratification). That is, the sampling rates within strata should be the same: if you decide you want to have a sample size of 50, then you might want to have the sampling rate of 1/7 in each strata. This would also mean that each strata should have at least 14 units in the population, so that you could sample at least 2 papers, in other to be able to produce variance estimates (you can't do that with just 1 observation in a stratum). Thus you may have to combine the strata that are too small.

The sample size issue is entirely up to you. This is a trade-off between accuracy and cost, and both of these are probably in your head only. If you had just one target variable of interest, and had an idea about the effect size you are trying to capture, there would have been a way to conduct some power analysis and try to figure out the sample size. If you have several response variables and no clue about how they behave in your population of papers, then basically you would want to sample until you get tired of reading these papers.

StasK
  • 31,547
  • 2
  • 92
  • 179