Why does PuLP call copy for addition and how can I avoid it?

Question

Using a for loop to append terms to an expression seems to be much faster than summing a group of terms all at once. Constructing the expression using a for loop uses __iadd__, which does not include a call to copy. The other methods of building the expression result in many calls to __add__ which does call copy and is quite slow.

Of the six methods below, "using_loop" is arguably the most difficult to read, but is by far the fastest.

Is there a best practice method of building large constraints which is both readable and avoids the call to __add__ (and copy) in pulp? If I edit the __add__ function in pulp to remove the copy, are there side effects I should anticipate?

import pulp
import numpy as np


def using_np_mat_mul(X, coef):
    x2d = np.atleast_2d(np.array(X))
    coef2d = np.atleast_2d(np.array(coef)).T
    expr = np.matmul(x2d, coef2d)
    return expr


def using_loop(X, coef):
    expr = 0
    for i in range(len(X)):
        expr += X[i]*coef[i]
    return expr


def using_sum_of_list(X, coef):
    expr = sum([X[i]*coef[i] for i in range(len(X))])
    return expr

def using_sum_mult(X, coef):
    expr = sum(np.array(X)*np.array(coef))
    return expr


def using_lpsum(X, coef):
    expr = pulp.lpSum(X[i]*coef[i] for i in range(len(X)))
    return expr

def using_dict_and_lpsum(X_dict, coef):
    expr = pulp.lpSum(X_dict[i]*coef[i] for i in X_dict.keys())
    return expr

if __name__ == "__main__":
    nx = 5000

    X = [pulp.LpVariable(str(i)) for i in range(nx)]

    coef = np.random.rand(nx)

    e1 = using_np_mat_mul(X, coef)
    e2 = using_loop(X, coef)
    e3 = using_sum_of_list(X, coef)
    e4 = using_sum_mult(X, coef)
    e5 = using_lpsum(X, coef)

    # create an expression = X * coef
    X_dict = pulp.LpVariable.dicts('', range(nx))
    e6 = using_dict_and_lpsum(X_dict, coef)

Is this a PuLP question or a numpy question? (Would the same issue arise if X were not derived from PuLP variables?) If it's general numpy, you should ask on [so] instead. If it's specific to PuLP, then it's welcome here. — LarrySnyder610, Dec 23 '19 at 13:49
Since you're timing, I suggest to test another version: expr = pulp.lpSum(X[i]*coef[i] for i in range(len(X))). Also, check to see the effect if you create your variable X using pulp.LpVariable.dicts() method. — EhsanK, Dec 24 '19 at 03:07
@EhsanK, thank you! Added. It has similar performance to the loop, but is much easier to read. — Charles Fox, Dec 24 '19 at 05:38

Why does PuLP call copy for addition and how can I avoid it?

0 Answers0