3

cellranger aggr can combine multiple libraries (samples), and appends each barcode with an integer (e.g. AGACCATTGAGACTTA-1). The sample identity is not recorded in the combined matrix.mtx file.

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/aggregate#gem_groups

How can we keep and assign sample information to each cell after loading the data into R? (e.g. Seurat::Read10X())

Peter
  • 2,634
  • 15
  • 33

2 Answers2

1

You have to keep track of the file order you used for aggr. The suffix number represents which represents the aggregated sample. You could store this information in a text file and load it into R independently of the Read10X function and combine it with the cell names to get their sample names. From the page you link:

Gem Groups

This is an integer that is appended to each barcode in the gene-barcode matrix. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different libraries, despite having the same nucleotide sequence.

The numbering of the GEM groups will reflect the order that the libraries were provided in the Aggregation CSV.

llrs
  • 4,693
  • 1
  • 18
  • 42
GWW
  • 752
  • 4
  • 14
1

An example solution for Seurat:

Retrieve sample IDs from the .csv used with cellranger:

samples <- read.csv(file.path("/path/to/csv", "nameof.csv"), stringsAsFactors=F)

Load the 10x dataset and initialize the Seurat object:

cells.data <- Read10X("path/to/filtered_gene_bc_matrices")

cells <- new("seurat", raw.data=cells.data)

Get barcodes and suffix:

cellcodes <- as.data.frame(cells@raw.data@Dimnames[[2]])
colnames(cellcodes) <- "barcodes"
rownames(cellcodes) <- cellcodes$barcodes

cellcodes$libcodes <- as.factor(gsub(pattern=".+-", replacement="", cellcodes$barcodes))
cellcodes$samples <- as.vector(samples$library_id[cellcodes$libcodes])

Create dataframe for meta.data argument and set up object:

sampleidentity <- cellcodes["samples"]

cells <- Setup(cells,
               meta.data=sampleidentity,
               min.cells=3,
               min.genes=200,
               do.logNormalize=T, total.expr=1e4, project="projectname")

Use the group.by argument of various functions.

Peter
  • 2,634
  • 15
  • 33