Is it possible to merge scRNAseq data from experiments with different design?

Question

I have 4 different single-cell RNAseq experiments, each one representing a different sample of cell type population. I'd like to merge them to a single dataset. However, different cell types are enriched in each sample, and different samples can represent also different conditions (control and disease).

If I run a PCA on the merged dataset (without any processing of data) I find that the different sample's cells cluster separately. This could be due to biological and technical (i.e. batch effects) variation.

Is it possible to merge these different samples while retaining a good amount of information or should they be analyzed separately?

Is there some samples (controls perhaps) in both experiments? What is your goal by merging them: compare a cell line of experiment A with a cell line in experiment B? — llrs, Feb 19 '18 at 21:21
No controls unfortunately. My goal by merging them is basically to find different sub-populations in the data and markers by exploiting a larger dataset. This data was presented to me after sequencing, and I think that it is only possible to use each experiment separately, but I'd like to be proven wrong - at least in some specific analysis. — gc5, Feb 19 '18 at 21:26
I'm not familiar enough with the methods for scRNA, sorry. But why do you think that increasing the number of cells (of different experiments) will improve your statistical power to find new/different sub-populations? (Perhaps is specific of the methods, but I think they should already be present in each individual dataset) — llrs, Feb 19 '18 at 21:28
I am thinking of sampling cells from different part of the same organ as a proxy for spatial resolution, for instance. The ability of merging different datasets may give the opportunity to compare the different cells populations with added spatial heterogeneity distribution. But however, this is just an example.. in my case I was wondering if merging could add more information. — gc5, Feb 19 '18 at 22:59
How different is the experimental design? cellranger supports “reanalyse” across samples but I’m not sure this applies with different platforms generating the data. — Tom Kelly, Jun 02 '18 at 16:20
@TomKelly they are generated with the same platform on 10X, but they are enriched differently using markers from FACS, so I think it is not possible. — gc5, Jun 04 '18 at 16:10
@TomKelly which is the tool within cellranger to merge different samples? I was looking at https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/reanalyze but it seems to me it cannot be used to merge samples — gc5, Jun 06 '18 at 20:46
I remember attending a seminar where scientists merged data from different types of sc experiments (i think from the same platform though). Its long ago, so my memory might not be correct on this, but I do recall that it would be a functionality planned to be implemented in Seurat - although I haven't yet had any luck finding documentation on https://satijalab.org/seurat/. In addition I think that someone at the seminar mentioned using WGCNA to map batch effects. Since my memory is not clear on this subject I don't dare post this as an answer though! — Kasper Thystrup Karstensen, Jun 07 '18 at 12:15
This can possibly be of interest: https://www.ncbi.nlm.nih.gov/pubmed/29608177 - Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors — Kasper Thystrup Karstensen, Jun 07 '18 at 12:18

score 2 · Accepted Answer · answered Jun 06 '18 at 21:47

2

It’s right there on the cellranger manual:

#aggregate results of counts for separate samples
cellranger aggr
#analyse the combined results
cellranger reanalyze

Note: I’m not sure this applies to the above case where batch effects could be an issue. However, this is how the analysis is performed for 10X data from multiple runs in principle.

cellranger aggr

cellranger reanalyze

answered Jun 06 '18 at 21:47

Tom Kelly

873
7
20

2

Adding to this answer months later: it does not work with different designs, it just runs depth normalization but it can be severely affected by batch effects. – gc5 Mar 04 '19 at 21:03
2

Yes batch effects must be considered. Since this question was specifically about combining datasets, that is what was described here. This is the first step to adjust for batch effects, there are several methods to do this on an aggregate dataset and there are separate questions on this already (these assume some common cells between samples or it will overfit). I recommend not to use downsampling but aggregate raw. Depth normalisation does not remove batch effects or background noise in each sample. – Tom Kelly Mar 05 '19 at 00:42
I agree. As you said, methods like ComBat are not suitable for experiments with different design because they require a balanced design (some common cells between samples). Thanks. – gc5 Mar 05 '19 at 15:58
The 3.0 release of the Seurat R package (currently in alpha) will have extensive support for combining data from different experiments. – Tom Kelly Mar 05 '19 at 23:19

score 2 · Answer 2 · answered Jun 07 '18 at 12:38

See this paper from the Marioni group, where they propose a method for correcting batch effects between single cell sequencing experiments when each experiment contains different sub-populations:

Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells

score 1 · Answer 3 · answered Jun 27 '19 at 03:11

1

Have you tried seurat3? Here is the reference: https://satijalab.org/seurat/v3.0/pancreas_integration_label_transfer.html

answered Jun 27 '19 at 03:11

Zhi-Ping Feng

21
1

Is it possible to merge scRNAseq data from experiments with different design?

3 Answers3