0

I have a processing data pipeline including 3 methods ( let's say A(), B(), C() sequentially) for an input text file. But I have to repeat this pipeline for 10000 different files. I have used adhoc multithreading: create 10000 threads, and add them to threadPool...Now I switch to Spark to achieve this parallel. My question are:

  1. If Spark can do better job, guide me basic steps please cause I'm new to Spark.
  2. If I use adhoc multithreading, deploy it on cluster. How can i manage resource to allocate threads running equally among nodes.I'm new to HPC system too.

I hope I ask the right questions, thanks !

rogue-one
  • 10,711
  • 6
  • 51
  • 67

0 Answers0