3

I need to implement a parallel version of an R script because on one core it takes forever to execute. The function that takes the most of time is scran::quickCluster() applied to an integer sparse matrix (UMI_count) that is 26000x66000. The code is presented below:

library(scran)
library(scater)
sce <- newSCESet(countData=UMI_count)
clusters <- quickCluster(sce)
sce <- computeSumFactors(sce, cluster=clusters)

I did not find much on parallel versions of quickCluster() function, so will it actually execute in parallel if I set the number of cores to more than one? I am trying to do normalization of heterogeneous dataset with multiple cell types, that is why I am using quickCluster().

And I just need to speed it up to be able to compute it.

llrs
  • 4,693
  • 1
  • 18
  • 42
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
  • 1
    Do you really have to use a sparse matrix? A matrix of that size with 64-bit integers is still <20GB and not have any of the annoyances of a sparse matrix. – Devon Ryan Jan 04 '18 at 15:30
  • But will it speed up quickCluster() function in any way? I can convert it to regular matrix on the cluster (that gives up to 120Gb of RAM). – Nikita Vlasenko Jan 04 '18 at 17:04
  • 1
    I suspect it will, it's generally easier to work with normal matrices. – Devon Ryan Jan 04 '18 at 18:36

1 Answers1

2

The scran::quickCluster method has a BPPARAM (i.e., BiocParallelParam), so if one provides something like

clusters <- quickCluster(sce, BPPARAM=BiocParallel::MulticoreParam())

it will use all available cores (a specific number of cores can also be given). See the intro doc to BiocParallel for info on other options.

merv
  • 651
  • 5
  • 15