1

I would use the SVD routine of CUDA 7.0 (cuSolver), i need to perform the SVD on all parts where i split the matrix (for example, dividing the matrix into 2x2 blocks, I want to perform four times the SVD in parallel) . The idea would be to invoke several times the kernel in relation to the matrix subdivision. so:

for loop(istart){
   for loop(jstart){
       "invoke kernel"
   }
}

But in this way the call to the kernel is serial, not parallel. Since there isn't the possibility to invoke these functions from the kernel, how can I parallelise these calls?

Robert Crovella
  • 131,712
  • 9
  • 184
  • 228
sim186
  • 29
  • 3
  • 10

0 Answers0