3

Why do some single-cell RNA-seq papers give the number of cells sampled as a fraction of the size of the organ, animal, or embryo, as opposed to just saying how many cells they sampled?

For instance, Wagner et al.'s paper "Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo" says "For different developmental stages, we sampled 0.17x to 0.97x of the total cells per embryo, sufficient to detect cell states as rare as 0.1-0.5% of all cells." A pollster would never write "We sampled 4 out of every 10,000 registered voters in the USA"; they'd just give the number. Statistical results on cluster detection (e.g. Disentangling Gaussians by Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant) are also phrased in terms of number of samples, not this ratio of sample size to organism size.

It's worth mentioning that these studies use low-yield techniques. In order to sequence 0.97 times the number of cells in a zebrafish embryo, Wagner et al. had to sequence at least 50-100 embryos. So, it's more like sampling 97 of 100 cells with replacement and less like sampling 97 of 100 cells without replacement.

eric_kernfeld
  • 380
  • 1
  • 11

1 Answers1

1

The typical goal of scRNA-seq is to assess the possible cell-types and states that exist in an organism/tissue/etc. Given that, it can be useful to know what portion of the possible cells have been assayed, since that sets bounds on the number and size of cell populations not assayed. If the goal of a study is to assess the variability in a single cell-type, then you're correct that mentioning "0.8% of the population was sequenced" isn't actually informative. Note that "in a single cell-type" is important there, since if one is assaying multiple cell-types then it's important that enough of each are assayed given their prevalence in the relevant population.

Practically speaking, numbers like this are only ever written in papers if they look impressive, so no one is going to write about sequencing 0.000001% of the cells in an organism, since the reader/reviewer will then start to question the relevance of the results.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • Could I ask you for more specifics on one point? You say this proportion they give, e.g. the 0.97, "sets bounds on the number and size of cell populations not assayed." How would you compute those bounds? – eric_kernfeld Aug 22 '18 at 12:27
  • Normally you wouldn't compute those bounds, but if you really wanted to you'd have to assume some minimum size of a cell population and divide the total number of cells by it. Any numbers you'd get would be completely hand-wavy, of course, but that's how it goes in biology. – Devon Ryan Aug 22 '18 at 13:32
  • When you say "the total number of cells", you mean the total number of cells sequenced, right? If so, then this doesn't answer my question. My question is about why it is useful to represent the amount of cells as a fraction of the size of the system under study. – eric_kernfeld Aug 22 '18 at 17:38
  • 1
    No, not the total number sequenced, the total number in the organism/tissue/etc. – Devon Ryan Aug 22 '18 at 19:33
  • It makes sense that you could use the total number in the organism/tissue to do what you're saying, but that is also not the ratio that I am asking about. I still don't see why Wagner et al.'s 0.97 is useful. – eric_kernfeld Aug 22 '18 at 20:12
  • It's one of the ratios you're asking about. The 0.97 value makes sense if the fish were essentially genetically identical and from the same brood and the cells sampled from a given embryo are truly randomly selected. – Devon Ryan Aug 22 '18 at 22:11
  • I'm sorry, I can't understand your answer at all and I am not sure how to ask more clearly. Thanks for trying. – eric_kernfeld Aug 23 '18 at 13:08
  • Your difficulty is in thinking that the discussion you linked to on cross-validated is relevant to the typical goals of scRNA-seq. It's not. Tissues/organisms aren't a population that we want to estimate some mean and confidence interval from, that's implicitly what scRNA-seq is trying not to do. – Devon Ryan Aug 23 '18 at 14:46
  • That's very far from explaining why the 0.97 is useful, but it's a good point and I may edit my question to make reference to some more relevant statistical discussion. – eric_kernfeld Aug 23 '18 at 17:27