0

I need to run a parameter sweep over a grid search for a model written in Python where most of the work is written in Cython. I have thousands of input parameters sets to run through. I dispose of 36 cores in a local Desktop computer, with each cpu core multithread for a total of 72 parallel threads. I have 128 GB of RAM. Each run can take up to 2 GB of RAM.

What is the most efficient way to run concurrent jobs to do the parameter sweep? I have done it in two different ways:

  1. Using Python multiprocessing:
from multiprocessing import Pool

pool = Pool(processes=70)
pool.map(my_model, sweep_params)
pool.close()
pool.join()

where my intent is to use up to 70 concurrent processes at a time to run my_model with a different set of input parameter. And where sweep_params is a list of thousands of dictionnaries which contain the set of inputs required to run my model, including a unique identifier used to uniquely map the output to the which concurrent run generated it.

  1. Using SLURM on a cluster, and the environmental SLURM_ARRAY_TASK_ID to run the concurrent Python jobs. SLURM_ARRAY_TASK_ID simply indexes the list of input dictionnaries used in method 1). However, for any python job, I am limited to 50 parallel threads on that cluster. I am hesitant installing SLURM on my 72-core local machine used in 1) that has 72 threads, as I am not clear on whether using <=70 threads with SLUM locally will be more efficient than when using method 1).

Would SLURM run things more efficiently than method 1 if installed on my local machine?

Is there another more efficient way to perform the parameter sweep on my 72-thread machine?

I have seen this multiprocessing-based solution at How to use multiprocessing for grid search (parameter optimization) in Python is from 7 years ago, would it be equivalent to what I am already doing in method 1) ? Or more efficient?

Wall-E
  • 521
  • 4
  • 16

0 Answers0