Normalization using parallel R script

Question

I need to implement a parallel version of an R script because on one core it takes forever to execute. The function that takes the most of time is scran::quickCluster() applied to an integer sparse matrix (UMI_count) that is 26000x66000. The code is presented below:

library(scran)
library(scater)
sce <- newSCESet(countData=UMI_count)
clusters <- quickCluster(sce)
sce <- computeSumFactors(sce, cluster=clusters)

I did not find much on parallel versions of quickCluster() function, so will it actually execute in parallel if I set the number of cores to more than one? I am trying to do normalization of heterogeneous dataset with multiple cell types, that is why I am using quickCluster().

And I just need to speed it up to be able to compute it.

Do you really have to use a sparse matrix? A matrix of that size with 64-bit integers is still <20GB and not have any of the annoyances of a sparse matrix. — Devon Ryan, Jan 04 '18 at 15:30
But will it speed up quickCluster() function in any way? I can convert it to regular matrix on the cluster (that gives up to 120Gb of RAM). — Nikita Vlasenko, Jan 04 '18 at 17:04
I suspect it will, it's generally easier to work with normal matrices. — Devon Ryan, Jan 04 '18 at 18:36

score 2 · Answer 1 · answered Jan 20 '20 at 21:18

The scran::quickCluster method has a BPPARAM (i.e., BiocParallelParam), so if one provides something like

clusters <- quickCluster(sce, BPPARAM=BiocParallel::MulticoreParam())

it will use all available cores (a specific number of cores can also be given). See the intro doc to BiocParallel for info on other options.

Normalization using parallel R script

1 Answers1