Problem
I use the syntax of gurobi with multiprocessing to solve multiple problems in parallel. Even though multiple problems are solved concurrently, the solving time for a single problem increases significantly, so there are no benefits to use multiprocessing, which is strange.
Possible reasons that have been excluded
- The number of problems solved in parallel is much smaller than number of cpu cores. So it is not because problems are too many and my cpu cores are too few.
- Multiple problems are really solved concurrently. I can see the log file of each problem is written at the same time. And The number of busy cpu cores is also same with the number of problems. So it is not because running code cannot achieve multiprocessing.
Code
I solve the tsp (travelling salesman problem) already introduced in Gurobi tutorial in parallel to repeat this problem. With some modification of codes, the tsp (in tsp.py) is modified as follows (someone don't need look into codes in tsp.py since it won't help solve the problem).
#!/usr/bin/env python3.7
Copyright 2022, Gurobi Optimization, LLC
Solve a traveling salesman problem on a randomly generated set of
points using lazy constraints. The base MIP model only includes
'degree-2' constraints, requiring each node to have exactly
two incident edges. Solutions to this model may contain subtours -
tours that don't visit every city. The lazy constraint callback
adds new constraints to cut them off.
import math
import random
from itertools import combinations
import gurobipy as gp
from gurobipy import GRB
from datetime import datetime
Callback - use lazy constraints to eliminate sub-tours
def subtourelim(model, where):
if where == GRB.Callback.MIPSOL:
vals = model.cbGetSolution(model._vars)
# find the shortest cycle in the selected edge list
tour = model._subtour(vals,model._n)
if len(tour) < model._n:
# add subtour elimination constr. for every pair of cities in tour
model.cbLazy(gp.quicksum(model._vars[i, j]
for i, j in combinations(tour, 2))
<= len(tour)-1)
Given a tuplelist of edges, find the shortest subtour
def subtour(vals,n):
# make a list of edges selected in the solution
edges = gp.tuplelist((i, j) for i, j in vals.keys()
if vals[i, j] > 0.5)
unvisited = list(range(n))
cycle = range(n+1) # initial length has 1 more city
while unvisited: # true if list is non-empty
thiscycle = []
neighbors = unvisited
while neighbors:
current = neighbors[0]
thiscycle.append(current)
unvisited.remove(current)
neighbors = [j for i, j in edges.select(current, '*')
if j in unvisited]
if len(cycle) > len(thiscycle):
cycle = thiscycle
return cycle
class TSP:
def init(self,n,logFile,taskName,rndSeed,model) -> None:
self.n=n
self.logFile=logFile
self.startTime=datetime.now()
self.taskName=taskName
self.rndSeed=rndSeed
self.m=model
def solveTSP(self):
# with open(self.logFile,'a') as file:
# file.write('TaskName: {}\nRandom Seed: {}\nStart time: {}\n\n'.format(self.taskName,self.rndSeed,self.startTime))
# Create n random points
random.seed(self.rndSeed)
points = [(random.randint(0, 100), random.randint(0, 100)) for i in range(self.n)]
# Dictionary of Euclidean distance between each pair of points
dist = {(i, j):
math.sqrt(sum((points[i][k]-points[j][k])**2 for k in range(2)))
for i in range(self.n) for j in range(i)}
self.m.setParam('Threads',1)
self.m.setParam('logFile',self.logFile)
self.m.setParam('LogToConsole',0)
self.m._subtour=subtour
self.m._n=self.n
# Create variables
vars = self.m.addVars(dist.keys(), obj=dist, vtype=GRB.BINARY, name='e')
for i, j in vars.keys():
vars[j, i] = vars[i, j] # edge in opposite direction
# You could use Python looping constructs and self.m.addVar() to create
# these decision variables instead. The following would be equivalent
# to the preceding self.m.addVars() call...
#
# vars = tupledict()
# for i,j in dist.keys():
# vars[i,j] = self.m.addVar(obj=dist[i,j], vtype=GRB.BINARY,
# name='e[%d,%d]'%(i,j))
# Add degree-2 constraint
self.m.addConstrs(vars.sum(i, '*') == 2 for i in range(self.n))
# Using Python looping constructs, the preceding would be...
#
# for i in range(n):
# self.m.addConstr(sum(vars[i,j] for j in range(n)) == 2)
# Optimize model
self.m._vars = vars
self.m.Params.LazyConstraints = 1
self.m.optimize(subtourelim)
vals = self.m.getAttr('X', vars)
tour = subtour(vals,self.n)
with open(self.logFile,'a') as file:
file.write('TaskName: {}\nRandom Seed: {}\nOptimal tour: {}\nOptimal cost: {}\nStart time: {}\nSolved time: {}\n\n'.format(self.taskName,self.rndSeed,str(tour),self.m.objVal,self.startTime,datetime.now()))
And I run it in parallel by the code:
from tsp import TSP # import from tsp.py file defined above
import multiprocessing as mp
import gurobipy as gp
function that every process calls
def tspSolve(params):
n=params['n']
logFile=params['logFile']
taskName=params['taskName']
rndSeed=params['rndSeed']
with gp.Env() as env, gp.Model(env=env) as model: # new env
tspModel=TSP(n,logFile,taskName,rndSeed,model)
tspModel.solveTSP() # solve tsp
if name == "main":
caseNum=8 # 8 problems in parallel
processPool=mp.Pool(processes=caseNum)
tspParamsList=[] # collect solving parameters of every problems
for i in range(caseNum):
newTspParams={
'n':300, # number of tsp nodes
'logFile':'/home/littleqjy/workspace/code/myTSP{}.log'.format(i+1),
'taskName':'myTSP{}'.format(i+1),
'rndSeed':i+1
}
tspParamsList.append(newTspParams)
for tspParams in tspParamsList:
processPool.apply_async(func=tspSolve,args=(tspParams,)) # submit problems
processPool.close()
processPool.join() # wait for all problems to be solved
When I change the caseNum from 4 to 8 (4 problem to 8 problems), I read the log file and the solving time for a single problem doubles, so there are no benefits to use multiprocessing, which is strange. Does somebody know why this happens?
withto manage the model and env or calldispose()at the end ofsolveTSP(for both the env and the model). – David Torres Aug 12 '22 at 08:47