4

When trying to solve for assignments given a cost matrix, what is the difference between

  1. using Scipy's linear_sum_assignment function (which I think uses the Hungarian method)

  2. describing the LP problem using a objective function with many boolean variables, add in the appropriate constraints and send it to a solver, such as through scipy.optimize.linprog?

Is the later method slower than Hungarian method's O(N^3) but allows for more constraints to be added?

Athena Wisdom
  • 253
  • 1
  • 5
  • The main difference between a mathematical model and a heuristic algorithm to solve a specific problem is more likely to prove optimality rather feasibility. Now, one can decide which one to be selected in order to satisfy needs. – A.Omidi Mar 20 '22 at 07:54
  • 4
    @A.Omidi the Hungarian method is an exact algorithm – fontanf Mar 20 '22 at 08:26
  • @fontanf, you are right. What I said was to compare the exact and heuristic methods and it is not specific to Hungarian alg. Thanks for your hint. – A.Omidi Mar 20 '22 at 10:32

1 Answers1

7

The main differences probably are that there is a somewhat large overhead you have to pay when solving the AP as a linear program: You have to build an LP model and ship it to a solver. In addition, an LP solver is a generalist. It solves all LP problems and focus in development is to be fast on average on all LPs and also to be fast-ish in the pathological cases.

When using the Hungarian method, you do not build a model, you just pass the cost matrix to a tailored algorithm. You will then use an algorithm developed for that specific problem to solve it. Hence, it will most likely solve it faster since it is a specialist.

So if you want to solve an AP you should probably use the tailored algorithm. If you plan on extending your model to handle other more general constraints as well, you might need the LP after all.


Edit: From a simple test in Python, my assumption is confirmed in this specific setup (which is to the advantage of the Hungarian method, I believe). The set up is as follows:

  1. A size is chosen in $n\in \{5,10,\dots,500\}$
  2. A cost matrix is generated. Each coefficient $c_{ij}$ is generated as a uniformly distributed integer in the range $[250,999]$.
  3. The instance is solved using both linear_sum_assignment and as a linear program. The solution time is measured as wall clock time and only the time spent by linear_sum_assignment and the solve function is timed (not building the LP and not not generating the instance)

For each size, I have generated and solved ten instances, and I report the average time only.

And then there is of course the "but". I am not a ninja in Python and I have used pyomo for modelling the LPs. I believe that pyomo is known to be slow-ish whenbuilding models, hence I have only timed the solver.solve(model) part of the code - not building the model. There is however possibly a hugh overhead cost coming from pyomo translating the model to "gurobian" (I use gurobi as solver).

A plot of the computation time (in seconds) as a function of the instance size is given below, and it is pretty clear, that the Hungarian algorithm implemented in linear_sum_assignment is a lot faster than the pyomo+gurobi approach: enter image description here

Sune
  • 6,457
  • 2
  • 17
  • 31
  • 1
    Do you have some benchmark results to support this claim? Intuitively, I would have thought that the Hungarian method would be much slower in practice – fontanf Mar 20 '22 at 07:39
  • @fontanf I only have anecdotal proof from past experiments. Maybe an LP solver can work faster for repeated solves, where you exploit that the model is already build and basis info is available. But I honestly don't know. – Sune Mar 20 '22 at 09:37
  • 1
    It might be the case that the Hungarian method is faster for small problems (due to the overhead Sune mentioned for setting up an LP model) while simplex (or dual simplex, or maybe barrier) might be faster for large models because the setup cost is "amortized" better. (I'm just speculating here.) – prubin Mar 20 '22 at 15:30
  • 2
    The Hungarian algorithm is, of course, O(n^3) for fully dense assignment problems. I don't know if there is a simplex bound explicitly for assignments. Simplex is exponential in the worst case and linear in variables plus constraints (n^2 + 2n here) in practice. But assignments are highly degenerate (n positive basics out of 2n rows). Dual simplex may fare better than primal. Hungarian is all integer for integer costs, whereas a standard simplex code won't be unless it knows to detect that in preprocessing. That may lead to some overhead for linear algebra.

    Ha, an idea for a class project!

    – mjsaltzman Mar 20 '22 at 17:04
  • @mjsaltzman Yes, this could be a nice class project, indeed! I suspect that the winner will be a tailored assignment solver, as you have the often large overhead of parsing info to the solver. Even if you have a living model ready to solve your assignment instance, you still need to parse the costs to the solver, giving you an overhead of at least $\mathcal{O}(n^2)$ – Sune Mar 21 '22 at 09:39
  • 1
    Just for the sake of completeness, here's the same with gurobipy instead of Pyomo. On my machine, all LPs (n = 500) are solved in less than a second compared to roughly 4 seconds with Pyomo. – joni Mar 21 '22 at 16:21