6

My Problem

I am quite new to optimisation, so any advice is appreciated. I am currently trying to solve a problem as follows:
Given a pool of people, we want to create n teams such to find the optimal solution based on all players' preferences

As a test, I have been assuming there are 6 players, each of which selects a 1st and a 2nd preference for who they would like in their team. For now, I am looking to create 2 teams of 3 players.

How I have attempted it

I want to solve this using an open-source solver in Python, so I am currently trying the 'glpk' solver via Pyomo, however I am stuck. I created a matrix of preferences, whereby each row represents a given player's top 2 preferences (here, 2 means 1st pick) as follows:

preferenceMatrix =  [0 1 0 0 2 0]  # Player 1 would like players 5 (1st pick) and 2 (2nd pick)
                    [2 0 1 0 0 0]  # Player 2 would like players 1 (1st pick) and 3 (2nd pick)
                    [0 0 0 0 2 1]  # Player 3 would like players 5 (1st pick) and 6 (2nd pick)
                    [0 1 2 0 0 0]  # Player 4 would like players 3 (1st pick) and 2 (2nd pick)
                    [0 0 0 1 2 0]  # Player 5 would like players 5 (1st pick) and 4 (2nd pick)
                    [2 0 0 1 0 0]  # Player 6 would like players 1 (1st pick) and 4 (2nd pick)

Next, I multiply the preference matrix by a binary matrix (subject to a constraint of 2 players per row and column), and then maximise the sum over the entire matrix. An example of what the binary matrix could look like is:

binaryMatrix =   [0 1 1 0 0 0]
                 [1 0 1 0 0 0]
                 [1 1 0 0 0 0]
                 [0 0 0 0 1 1]
                 [0 0 0 1 0 1]
                 [0 0 0 1 1 0]

This would form 2 teams: Team 1) players 1,2,3, and Team 2) players 4,5,6 and the objective function (summing over rows) would be 1+3+0+0+1+1 = 6.

My questions

1) If I continue with this approach, then how could I constrain it to create exactly 2 teams? I originally posted this exact issue here

2) As I am finding it hard to approach the problem using glpk, is there a more appropriate open-source solver I could use instead?

3) Or, could I approach this entirely differently (e.g. using networkx where I specify that the problem should create 2 equal-sized connected groups)?

Jwem93
  • 333
  • 1
  • 8

1 Answers1

4
  1. If I continue with this approach, then how could I constrain it to create exactly 2 teams?

If you need two teams exactly, you could define a "preference cost" $p_{ij}$ betweeen each pair of players $(i,j)$. For example, you could define $$ p_{ij} = \left\{ \begin{array}{ll} 4 & \mbox{if $i$ and $j$ are each others first pick}\\ 3 & \mbox{if $i$ or $j$ is a first pick} \\ 2 & \mbox{if $i$ and $j$ are each others second pick} \\ 1 & \mbox{if $i$ or $j$ is a second pick} \\ 0 & \mbox{otherwise} \\ \end{array} \right. $$ Then use the following binary variables :

  • $x_{ij}^1 = 1$ if and only if players $i$ and $j$ end up in team $1$,
  • $x_{ij}^2 = 1$ if and only if players $i$ and $j$ end up in team $2$,
  • $w_{ij}=1$ if and only if players $i$ and $j$ end up together (whatever the team),
  • $y_i=1$ if and only if player $i$ is selected for team $1$ (and so $y_i=0$ if $i$ is selected for team $2$).

So you want to maximize the global preference : $$ \max \; \sum_{i,j} p_{ij}w_{ij} $$ subject to:

  • Each team must have $n/2$ players ($n$ denotes the total number of players): $$ \sum_{i}y_i = n/2 $$
  • $x_{ij}$ is only active if $i$ and $j$ are selected simultaneously: $$ x_{ij}^1 \le y_{i} \\ x_{ij}^1 \le y_{j} \\ x_{ij}^2 \le 1-y_{i} \\ x_{ij}^2 \le 1-y_{j} \\ $$
  • $i$ and $j$ are together if they are simultaneously in team $1$ or $2$: $$ w_{ij} = x_{ij}^1 + x_{ij}^2 $$
  • variables are binary $$ x_{ij}^1,x_{ij}^2,w_{ij},y_i \in \{0,1\} $$

Note: there is probably a way to simplify the above equations. You basically need to model $$ \boxed{ w_{ij}=1 \quad \Rightarrow y_i=y_j } $$

  1. As I am finding it hard to approach the problem using glpk, is there a more appropriate open-source solver I could use instead?

I would suggest using pulp instead. Pulp is a modeler, not a solver, but it can call any solver out there (including GLPK). With pulp, you can focus on the modeling part, and not worry about the solver, it will call the default one if you do not have any at hand (CBC). Check out the examples.

  1. Or, could I approach this entirely differently (e.g. using networkx where I specify that the problem should create 2 equal-sized connected groups)?

You could create a complete graph with one vertex per player, and one edge between each pair of vertices with the above defined preference cost. You want to partition your vertices into two equal sized sets, so you want to color the vertices of the graph with two colors exactly, such that 1) both colors have the same amount of vertices 2) the preference cost is maximized, and it is only active when both vertices have the same color. There is no algorithm in the networkx package for this, to my knowledge.


EDIT :

This is basically a wedding planning problem. There is an example given in pulp's documentation, where the problem is modeled differently than above: it is modeled as a set partitioning problem, where all possible combinations are generated a priori. You can use it and consider that you are planning a wedding with 2 tables. Note that they also define a "preference cost", which they call "happiness."

Kuifje
  • 13,324
  • 1
  • 23
  • 56
  • This is the kind of problem for which MILP solvers have some troubles in performance due to the quadratic binary expressions involved (here the definition of variables x[i][j]), whereas LocalSolver performs well thanks to its local-search heuristics. Nevertheless, in your case, the problem seems to be very small, so it may be ok even with GLPK, without the need to reformulate it following a set partitioning reformulation approach. – Hexaly Oct 12 '20 at 09:21
  • Thanks for your insight @LocalSolver ! Definitely for large instances the set partitioning approach is better, as it can be solved with column generation. And for even larger instances, heuristics is the way to go indeed. Out of curiosity, does LocalSolver use local search + heuristics exclusively, or do you also use linear programming formulations behind the curtains ? – Kuifje Oct 12 '20 at 09:58
  • 2
    You can enforce $w_{i,j}=1\implies y_i=y_j$ with $-(1-w_{i,j})\le y_i-y_j \le 1-w_{i,j}$. No need for $x$ and the associated constraints. – RobPratt Oct 12 '20 at 13:22
  • thanks @RobPratt ! this is clever. Did you come up with this with boolean logic or with intuition ? – Kuifje Oct 12 '20 at 13:25
  • Ok, I think I see. This is a linearization of $| y_i - y_j | \le 1-w_{ij}$. – Kuifje Oct 12 '20 at 13:27
  • Hi @Kuifje, thank you so much for your solution, it has been very helpful to me. I have managed to rebuild the problem using the Pulp approach you suggested. Given that in reality, my problem would look at pool sizes of between 80-400 with number of teams ranging from 2-5, and preferences from 1 to 5, I feel the Pulp approach is probably better suited to this problem. In saying this, I am interested to know if it is easy to speed this up? (e.g. I am guessing this particular problem could just run different combinations in parallel). Thanks again – Jwem93 Oct 12 '20 at 13:27
  • Thanks. It is the standard big-M approach for $z=1\implies f\le 0$, applied twice with $M=1$. – RobPratt Oct 12 '20 at 13:28
  • 1
    For wedding planning with dynamic column generation via Dantzig-Wolfe, see https://blogs.sas.com/content/operations/2014/11/10/do-you-have-an-uncle-louie-optimal-wedding-seat-assignments/ – RobPratt Oct 12 '20 at 13:34
  • @Jwem93 you could try adding the constraints proposed by RobPratt and see if it speeds up resolution. Otherwise, I would go with the set partitioning formulation, especially if you have more than 2 teams. – Kuifje Oct 12 '20 at 13:53
  • 1
    For such numbers (80-400 people, 2-5 teams, 5 preferences), you will observe that a MILP approach (direct or reformulated following a set partitioning approach, as suggested above) will lead to poor results. Said differently, you will need a consequent amount of work around your Pulp model to get quality solutions in reasonable running times. – Hexaly Oct 12 '20 at 14:03
  • Sorry if I wasn't clear @Kuifje, I actually meant that I am now modelling as a set partitioning formulation (as per the Pulp documentation link you shared). Currently running for a pool of 75, dividing into 3 teams – Jwem93 Oct 12 '20 at 14:03
  • Thanks @LocalSolver. In that case I am interested to know how the Pulp model run time could be improved (given a set partitioning approach) – Jwem93 Oct 12 '20 at 14:07
  • @Jwem93 ok that's great ! Which solver are you using ? if you show the log, can you see what is the relative gap once the solver stagnates ? – Kuifje Oct 12 '20 at 14:11
  • 2
    @LocalSolver It would be great to compare the 3 approaches : set partitioning formulation, set partitioning formulation with column generation, and LocalSolver ;) – Kuifje Oct 12 '20 at 14:11
  • Thanks, @Kuifje, for your kind words and interest. For such a problem, the feasible solutions will be found by local-search heuristics. We also have primal heuristics and lower bounds based on (linear and convex) relaxations inside LocalSolver. But for large instances, we can bet that these ones will not be effective. This is the point: local search is very helpful here because relaxations are poor. For some quadratic binary models, the convex relaxations can be quite good and then useful, but it really depends on the nature of the problem. Again, the only way to know is to test ;-) – Hexaly Oct 12 '20 at 14:15
  • 1
    @Jwem93 We suggest you implement a heuristic column generation because, as often, you don't need mathematically-proved optimal solutions. Thus, make a loop around your Pulp model as follows. Start with just some hundreds of columns. Then, at each loop, remove 10-20% of the useless columns (the ones not selected in your optimal solution) and replace them with some other "good" columns heuristically generated. Such an approach will be much simpler and faster than an exact, state-of-the-art column generation approach. – Hexaly Oct 12 '20 at 14:24
  • @Jwem93 Now we also suggest you implement a local-search heuristic for this problem. Start building a feasible assignment of people in teams. Then iterate as follows. Swap randomly 2 people in different groups and check the cost of the new assignment. If better, keep it, otherwise, leave it. The new cost can be evaluated incrementally, then quickly. The resulting algorithm will converge in seconds to quality solutions. This is the kind of approach that LocalSolver performs natively, under the hood, on the Boolean model described by Kuifje above. – Hexaly Oct 12 '20 at 14:40
  • 2
    @Kuifje you can also derive the formulation somewhat automatically by rewriting $w_{i,j} \implies (y_i \iff y_j)$ in conjunctive normal form as $(\lnot w_{i,j} \lor \lnot y_i \lor y_j) \land (\lnot w_{i,j} \lor y_i \lor \lnot y_j)$, yielding linear constraints $(1- w_{i,j}) + (1 - y_i) + y_j \ge 1$ and $(1-w_{i,j}) + y_i + (1-y_j)\ge 1$. – RobPratt Oct 12 '20 at 14:45
  • Kuifje, I was just using the default solver in Pulp (so cbc, same as the documentation). For now, I have hardly changed this, except for writing some additional code and altering the 'happiness' function. I can run and check the log to see. Thanks for the advice @LocalSolver, I will try and implement something along those lines. This is all still new to me, but so I appreciate the tips! – Jwem93 Oct 12 '20 at 14:46
  • Hi again @LocalSolver. I ended up trying what you said (swapping random 2 and rejecting if objective function is lower). Here I am actually not using any solver at all (not sure if that is what you intended), and all I do is randomly initialise the 3 teams of 25. This approach seems to be going quite well thanks. I'm sure I can improve this by including a solver – Jwem93 Oct 13 '20 at 13:05
  • 1
    Indeed you do not need a solver with this approach. To improve your solution, you can implement a strategy to escape local optima. In my opinion, one of the most effective and simple strategies for doing this is Tabu Search. In essence, once you have done all the "good" swappings, you need to explore other solutions, and to reach them you need to temporarily degrade your incumbent solution with non improving swaps. @LocalSolver correct me if I am wrong. – Kuifje Oct 13 '20 at 13:37
  • Thanks @Kuifje, that makes sense now. My solution did get stuck in local optima, so I was trying to find ways around this. I'll take a look into the Tabu search algorithm and see how it helps – Jwem93 Oct 13 '20 at 14:37
  • Thanks Jwem93 for your positive feedback, and thanks Kuifje for your additional comment about how to escape local optima. We would have been advise to implement a simple threshold or late acceptance hill climbing strategy based on first improvement (accept immediately a succeeding move). This is generally faster in practice than tabu search which is based on best improvement (search for the best succeeding move). But why not tabu search here :-) As Kuifje, we think that you don’t need a solver at this point. His advice – Hexaly Oct 13 '20 at 22:03
  • Thanks both for the advice, it has been really helpful. I will give both methods a try and see how the results turn out. – Jwem93 Oct 13 '20 at 23:01