5

I am trying to solve this task:

There are three datasets: first data on offices in cities: each city has a certain number of offices and each office has its own capacity of employees. Second, data on teams and open positions in these teams. Third, data on candidates for positions, showing their id, city and position.

We have to allocate applicants across teams and offices in such a way as to maximize the number of employees of one team located in one office together. At the same time, we have to minimize number of cases when there are less than two employees of a certain team in a certain office.

Example:

Input:

city_data:

    city          office_id           capacity
 New York           A                     3
 New York           B                     2
 New York           C                     6
 Boston             D                     2
 Boston             E                     5

team_data:

team_id         position
alpha            Manager
alpha            Manager
alpha            Engineer
alpha            Engineer
alpha            Engineer
alpha            Engineer
alpha            Designer
beta             Engineer
beta             Engineer
beta             Engineer
gamma            Designer
gamma            Engineer

employees_data:

employee_id               city                  position
1                        New York              Manager
2                        New York              Manager
3                        New York              Engineer
4                        New York              Engineer
5                        New York              Engineer
6                        New York              Engineer
7                        New York              Engineer
8                        New York              Designer
9                        New York              Designer
10                       Boston                Engineer
11                       Boston                Engineer
12                       Boston                Engineer

Possible output:

team_id      employee_id          position           city            office_id
alpha            1                Manager         New York              C
alpha            2                Manager         New York              C
alpha            3                Engineer         New York             C
alpha            4                Engineer         New York             C
alpha            5                Engineer         New York             C
alpha            6                Engineer         New York             B
alpha            8                Designer         New York             B
beta             10               Engineer         Boston               E
beta             11               Engineer         Boston               E
beta             12               Engineer         Boston               E
gamma            9                 Designer        New York             A
gamma            7                 Engineer        New York             A

I tried to solve this way:

  1. Sort the employee_data in decreasing order of the count of employees for each position and city.
  2. For each city and position, assign the employee_id to the team_id and office_id with the maximum capacity until it reaches the capacity limit.
  3. Repeat the step 2 until all employees are assigned to the team_id and office_id.

And wrote this code:

from collections import defaultdict

def allocate_employees(city_data, team_data, employee_data): city_office_capacity = defaultdict(dict) for city, office, capacity in city_data: city_office_capacity[city][office] = capacity

team_positions = defaultdict(list)
for team, position in team_data:
    team_positions[team].append(position)

employee_allocations = []
for employee, city, position in employee_data:
    max_capacity = 0
    max_office = None
    for office, capacity in city_office_capacity[city].items():
        if capacity > max_capacity:
            max_capacity = capacity
            max_office = office
    city_office_capacity[city][max_office] -= 1
    for team, positions in team_positions.items():
        if position in positions:
            employee_allocations.append((team, employee, position, city, max_office))
            break
return employee_allocations

city_data = [("New York", "A", 3), ("New York", "B", 2), ("New York", "C", 6), ("Boston", "D", 2), ("Boston", "E", 5)]

team_data = [("alpha", "Manager"), ("alpha", "Manager"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Designer"), ("beta", "Engineer"), ("beta", "Engineer"), ("beta", "Engineer"), ("gamma", "Designer"), ("gamma", "Engineer")]

employee_data = [(1, "New York", "Manager"), (2, "New York", "Manager"), (3, "New York", "Engineer"), (4, "New York", "Engineer"), (5, "New York", "Engineer"), (6, "New York", "Engineer"), (7, "New York", "Engineer"), (8, "New York", "Designer"), (9, "New York", "Designer"), (10, "Boston", "Engineer"), (11, "Boston", "Engineer"), (12, "Boston", "Engineer")]

allocate_employees(city_data, team_data, employee_data)

But I get the wrong output:

[('alpha', 1, 'Manager', 'New York', 'C'),
 ('alpha', 2, 'Manager', 'New York', 'C'),
 ('alpha', 3, 'Engineer', 'New York', 'C'),
 ('alpha', 4, 'Engineer', 'New York', 'A'),
 ('alpha', 5, 'Engineer', 'New York', 'C'),
 ('alpha', 6, 'Engineer', 'New York', 'A'),
 ('alpha', 7, 'Engineer', 'New York', 'B'),
 ('alpha', 8, 'Designer', 'New York', 'C'),
 ('alpha', 9, 'Designer', 'New York', 'A'),
 ('alpha', 10, 'Engineer', 'Boston', 'E'),
 ('alpha', 11, 'Engineer', 'Boston', 'E'),
 ('alpha', 12, 'Engineer', 'Boston', 'E')]

I tried a greedy algorithm here, but maybe integer programming will do better, for example?

Ir8_mind
  • 61
  • 6
  • You need to give some thought to your objective function. First, how will you combine the two objectives. Do you want to maximize the first and then, subject to that, minimize the second? Do you want to optimize a weighted combination of the two (in which case what weights would you use)? – prubin Feb 09 '23 at 18:10
  • Second, while the number of cases with single employees is unambiguous, the first objective (maximize the number of employees of one team kept together) is unclear. Is this the largest group from any single team (ignoring all other teams)? Is it the sum across teams of the largest cohort of the team kept together? Is it the minimum across all teams of the size of the largest cohort kept together? (There may be other possibilities.) – prubin Feb 09 '23 at 18:10
  • Why is it posted here? There are whole tags on Stack Overflow which happily accept homework questions like this. Are you question-banned there? – Peter Mortensen Feb 09 '23 at 19:35
  • Cross-posted: https://math.stackexchange.com/questions/4635650/how-to-solve-this-algorithmic-task-in-python – RobPratt Feb 10 '23 at 02:10
  • In your code, you don't remove a position from team_positions when you assign an employee to that position. Therefore a position will be reused for another employee. – md2perpe Feb 12 '23 at 21:30
  • Yes @md2perpe, I thought I had corrected this in my code below. Not sure if worked that way. Otherwise commercial solvers like gurobi/cplex is needed (they do provide acad license also) if solving by MIP. – Sutanu Majumdar Feb 14 '23 at 15:13
  • @Sutanu sorry, your code is very unclear. what is Else? also spaces are messed and its unclear where and how to insert it in my wrong solution – Ir8_mind Feb 14 '23 at 15:28
  • Else is if-else in Python. I have edited it a bit to align it corresponding IF. May not be needed also. Basic idea is max_capacity=0 must be outside of the entire block and you remove one of the positions as it is assigned by a pop or popitem(), inside the inner IF block where you are assigning if position is available. If using MIP, there also you can reduce number of variables (very helpful to reduce complexity) by indexing your assignment variable x on what combinations of employee-office-team are possible. – Sutanu Majumdar Feb 14 '23 at 15:40
  • @Corralien thx, i will ask there too. however, i think that this can be solved with greedy algorithm as well. – Ir8_mind Feb 09 '23 at 09:21
  • 2
    Yes like Dynamic Programming, it depends on the size of your problem. Unfortunately, at the moment, I'm not competent enough to answer your question :-( –  Feb 09 '23 at 09:32
  • @Corralien and do you any idea about solution (it maybe not be the most optimal)? – Ir8_mind Feb 09 '23 at 11:31
  • I noticed that in your 'employees_data' employees with ID 8 and 9 have a city/position of New York City/Designer. In the answer, these same IDs are assigned to Boston/Engineer. Is this a typo or are you allowed to ignore the employee city/position to some extent? –  Feb 09 '23 at 17:52
  • This problem is NP complete. You can use a greedy algorithm, and it may produce good results. Dynamic programming will have a large state, that's OK with "toy problem size", not so much if you have hundreds of people in a dozen roles. Integer programming with a good solver can give you either exact answers (with maybe a lot of work) or very good approximations. So I second @Corralien's advice. –  Feb 09 '23 at 17:59
  • @wLui155 yeah you are correct, it was a typo. i fixed it – Ir8_mind Feb 09 '23 at 18:21
  • @btilly there also was a typo in example, i fixed it now – Ir8_mind Feb 09 '23 at 18:21
  • @btilly i have about 3000 people in real dataset. could you please show me greedy algorithm implementation in Python? – Ir8_mind Feb 09 '23 at 18:46
  • Is using python a must? –  Feb 14 '23 at 14:48
  • @Lars yes it is – Ir8_mind Feb 14 '23 at 14:54
  • @Lars i see you specialise in php))). but unfortunately I don't know any php and can't convert it to python – Ir8_mind Feb 14 '23 at 15:31
  • From my point of view the question does not provide clear rules for optimization as there is no rule for choosing an optimal solution. Do the cases of less than 2 employees weight the same than a not to a full capacity allocated office? What about an empty office? Or an office with only 2 of 6? How to rate? As long as this is not clear enough one must make own assumptions running the danger that they are not accepted as answer. By the way: a bounty exceeding the own reputation ... what I am missing while thinking this is somehow weird? –  Feb 19 '23 at 15:26
  • but if not make sure to ask details : I thought I have asked for details already. Here again: WHAT exactly is what you want to have a maximum? How much reduce the case of 1 employee (it's the only case - what sense does it make to say: "less than 2 employees?) what you want to maximize and how much an office full with 6 employees of same team adds to the score? Sorry ... I don't really understand how it comes that you consider your requirements to be clear ... –  Feb 20 '23 at 09:34
  • Have you considered the case that MINIMIZING the one requirement goes against MAXIMAZING the other and also the other way? You seem to assume both requirements are independent, but it appears to me that they are not. Why is there no answer yet? What could be the reason? What do you think? –  Feb 20 '23 at 17:05
  • @Ir8_mind I have an idea for a solution, if you wait a few more hours before awarding the answer I might be able to provide one as well. –  Feb 21 '23 at 18:01
  • To my mind it seems a typical problem designed for genetic algorithm –  Feb 17 '23 at 16:54
  • @Ir8_mind it's a lot stuff there and I do not have a precise idea rigth now :-| have a look at pygad library https://pygad.readthedocs.io/en/latest/ , you must have online tutorial I think –  Feb 20 '23 at 20:24
  • Basically it will work like finding the optimized solution what's called a local optimum –  Feb 20 '23 at 20:25
  • 1
    While a canonical approach might work here, there are more optimal ways of solving it - especially when the size of variables increases and the problem becomes hard. Your description is a prime example of such a problem.

    I'd rather use a Satisfiability Modulo Theories (SMT) solver like Z3 to model your problem and let the Solver do the hard work for you.

    z3 has interfaces for a number of languages (including python).

    –  Feb 14 '23 at 16:51
  • @lr8_mind - just a heads up: you're correct about it deducting points up-front, however it will NOT refund those points, under any circumstance. See the help pages: https://stackoverflow.com/help/bounty –  Feb 21 '23 at 03:21
  • @Claudio i want to allocate applicants across teams and offices in such a way as to MAXIMIZE the number of employees of one team located in one office together. at the same time, number of cases when there are 1 employee of a certain team in certain office must be MINIMIZED.

    your questions "how much an office full with 6 employees of same team adds to the score?" really depends on solution algorithm you choose. which score? i think you should focus on teams, not offices and make sure that maximum number of its employees are in one office together.

    – Ir8_mind Feb 20 '23 at 13:31
  • @Claudio there are no empty offices. if office has only 2 out of 6, its okay, if there will be team where 2 employees are left, they can go there. i think that my formulation that "we need to to allocate applicants across teams and offices in such a way as to maximize the number of employees of one team located in one office together. at the same time, we have to minimize number of cases when there are less than 2 employees of a certain team in certain office" is very clear. but if not make sure to ask details – Ir8_mind Feb 20 '23 at 08:09
  • 1
    @Claudio regarding reputation, when you make a bounty, reputation is deducted instantly. if nobody answers the question, than it will be given back – Ir8_mind Feb 20 '23 at 08:11
  • @Andreas ok, i'll wait – Ir8_mind Feb 21 '23 at 18:37
  • @LaurentB. could you please describe steps of solving this problem with genetic algorithm? because its really not clear to me how to apply genetic algorithm here – Ir8_mind Feb 20 '23 at 13:33

6 Answers6

7

Here is solution with a MIP, based on a multicommodity flow formulation. A commodity is a "city-position" couple. Consider the following multipartite network with 5 layers:

  1. the candidate layer with nodes $V_1=\{\text{NewYork-Manager}, \text{NewYork-Engineer}, \text{NewYork-Designer}, \text{Boston-Engineer}\}$
  2. the position layer with nodes $V_2=\{\text{Manager}, \text{Engineer}, \text{Designer}\}$
  3. the team layer with nodes $V_3=\{\alpha, \beta, \gamma\}$
  4. the office layer with nodes $V_4=\{\text{A},\text{B},\text{C},\text{D},\text{E}\}$
  5. the city layer with nodes $V_5=\{\text{NewYork},\text{Boston}\}$

Each from one layer to another are add to the graph based on the data. The network is illustrated below (with obvious abreviations or labels): enter image description here

What does not appear in the graph is the fact that some edges are not allowed for some commodities. For example, between nodes $\alpha$ (or $a$ in the illustration) and $\textit{A}$, there is no existing edge for commodity $\text{BO-Eng}$, because such a commodity can only cross node $\text{BO}$ in the cities layer. Below is the network induced by the arcs of commodity $\text{BO-Eng}$ (green) and $\text{NY-Man}$ (red):

enter image description here

Once the network is setup, the model is straightforward: define integer variables $x_{ij}^c \in \mathbb{N}^+$ which denote the number of commodities of type $c$ on arc $(i,j)$. The following constraints must hold:

  • Flow conservation: $$ \sum_{i} x_{ij}^c = \sum_{i} x_{ij}^c \quad \forall c, \forall j\neq s,t \tag{1} $$
  • Office capacities: $$ \sum_{c} x_{ij}^c \le Q_{ij}\quad \forall (i,j)\in V_4 \times V_5 \tag{2} $$
  • Teams requirements: $$ \sum_{c} x_{ij}^c = R_{ij}\quad \forall (i,j)\in V_2 \times V_3 \tag{3} $$
  • Given amount of candidates/commodities: $$ x_{sc}^c = N_c \quad \forall c \tag{4} $$

The above is the core model. Now to maximize members of a given team in a given office, introduce binary variable $y_{ij}$ that takes value $1$ if and only if arc $(i,j)\in V_4\times V_5$ is used. The problem is to minimize $\sum_{(i,j)} y_{ij}$ subject to constraints $(1)-(4)$, with the additional constraint $x_{ij}^c\le M y_{ij}$ to link $x$ and $y$ variables. And to ensure no member of a team is by himself in an office, add $$ \sum_c x_{ij}^c \ge 2 y_{ij} \quad \forall (i,j) \in V_3 \times V_4 $$

When I run the model, I get the following solution:

NY,Eng
=======
Eng a 4.0
Eng g 1.0
a C 4.0
g B 1.0
B NY 1.0
C NY 4.0

NY,Man

Man a 2.0 a A 2.0 A NY 2.0

NY,Des

Des a 1.0 Des g 1.0 a C 1.0 g B 1.0 B NY 1.0 C NY 1.0

BO,Eng

Eng b 3.0 b E 3.0 E BO 3.0

Or in terms of teams:

Team  a
======
office A -> 2.0 employees
office C -> 5.0 employees

Team b

office E -> 3.0 employees

Team g

office B -> 2.0 employees

Kuifje
  • 13,324
  • 1
  • 23
  • 56
  • thanks. there was a typo in example, i fixed it now – Ir8_mind Feb 09 '23 at 18:24
  • could you please explain what does the mean:

    NY-Eng

    =======

    Eng a 1.0

    Eng b 3.0

    Eng g 1.0

    a C 1.0

    b E 3.0

    g B 1.0

    B NY 1.0 C NY 1.0 E BO 3.0 . how to interpret it?

    – Ir8_mind Feb 10 '23 at 07:52
  • got it, those are numbers of candidates. i thought those were candidates ids. is it possible to specify the id of the candidates and what teams and offices they go to? – Ir8_mind Feb 10 '23 at 08:51
  • just pick any ID corresponding to the NY-Engineer position, and assign it to one of the solutions. The ID numbers to not matter. – Kuifje Feb 10 '23 at 08:55
  • this result is wrong. it says that 5 employees should be in Boston, but there are only three candidates in Boston – Ir8_mind Feb 10 '23 at 13:47
  • it says that 3 employees are in office E (boston), and none in office D (boston), which makes a total of 3 in boston. – Kuifje Feb 10 '23 at 13:56
  • i ran your code on bigger data and my jupyter notebook died. is it optimal for large data? if i have thousands of employees – Ir8_mind Feb 11 '23 at 12:49
  • With thousands of employees you will certainly need a commercial solver (Pulp's default solver is CBC which is free and open source but not as powerfull as gurobi/cplex/xpress). – Kuifje Feb 11 '23 at 18:33
  • You can check the CBC log though to see what exactly happened. – Kuifje Feb 11 '23 at 18:35
  • @lr8_mind, would you mind sharing a larger data set? – Kuifje Feb 14 '23 at 15:19
  • sorry, i can't. it has some personal data and also other issues, which don't allow me to do that. but given example fully describes the structure of that dataset – Ir8_mind Feb 14 '23 at 15:23
  • Is this for academia or business? – Kuifje Feb 14 '23 at 15:42
  • kind of for both, but mostly business – Ir8_mind Feb 14 '23 at 15:46
  • yeah, sure. why not. but where? – Ir8_mind Feb 14 '23 at 15:56
  • if i run your code with this:

    `candidates = ["NY,Eng"] candidate_numbers = [1] offices = ["A", "B", "C", "D", "E"] cities = ["NY", "BO"] office_cities = [("A", "NY"), ("B", "NY"), ("C", "NY"), ("D", "BO"), ("E", "BO")] capacities = [3, 2, 6, 2, 5] positions = ["Eng"] teams = ["a"] position_teams = [("Eng", "a")] requirements = [1]

    candidate_numbers = dict(zip(candidates, candidate_numbers)) position_team_requirements = dict(zip(position_teams, requirements)) office_capacities = dict(zip(office_cities, capacities)) `

    prob.solve() will output -1. why?

    – Ir8_mind Feb 14 '23 at 16:33
  • You need to comment out the last constraint ("at least two members of a team in a same office"). Since you only have one candidate, this cannot be satisfied, and therefore the solver returns -1 (i.e. infeasible) – Kuifje Feb 14 '23 at 17:46
3

you can try with integer programming but we can see if this combinatorics works
In the code put\

from collections import defaultdict

def allocate_employees(city_data, team_data, employee_data): city_office_capacity = defaultdict(dict) for city, office, capacity in city_data: city_office_capacity[city][office] = capacity

team_positions = defaultdict(list)
for team, position in team_data:
    team_positions[team].append(position)

employee_allocations = []    
for employee, city, position in employee_data:
            max_capacity =0;
            max_office = None              
            for office, capacity in city_office_capacity[city].items():
                if capacity > max_capacity:
                    max_capacity = capacity
                    max_office = office

                    for team, positions in team_positions.items():    
                       if position in positions:
                           employee_allocations.append((team, employee, position, city, max_office))
                           #team_positions[team].pop(position)
                           team_positions[team].remove(position)
                           city_office_capacity[city][max_office] -= 1
                           break
                    break       
return employee_allocations

If you choose to go for MIP, here is suggested solution: Sets:
Roles: $ R_r$ Teams: $ T_t$: Role $ R_{rt}$
Office $ O_o$: Capacity $ Cap_o$, City $ Ct_o$
City: $Ct_c$
Employees: $ E_e$: City $Ct_e$: Role $ \tau_{r,e} =1$ if employee has that role r, else 0\

Derived sets:
ET: $ \{(e,t): R_e \in R_t \}$
EO: $\{(e,o): Ct_e \in Ct_o \}$
ETOC = ET$\times$EO Vars:
$ x_{e,t,o} = 1$ if assigned, else 0. No need for city as office-id itself will determine the city
$ z_{to}$ binary: to track of cases where less than 2 employees are assigned per team per ofice

Max $\sum_{eto} x_{eto} - \sum_o\sum_tz_{to}$
s.t.
$ \sum_t \sum_{e\in EO} x_{e,t,o} \le Cap_o \ \ \forall o \in O$
$R_{r,t} \le \sum_o\sum_{e\in ET} \tau_{r,e}x_{e,t,o} \ \ \forall r \in R \ \forall t \in T$: $\tau$ just represents set membership (if employee E if of the role ,R) Can be replaced with set membership logic like EO & ET.

$(2+e) - \sum_e x_{eto} \le 3z_{t,o} $
$\sum_e x_{eto}-(2+e) \le 3(z_{t,o}-1) \ \ \forall t \in T \ \ \forall o \in O$

Using Gurobipy, solves in 0.02 secs, 1 node

#Combos
ETOClist = []
for employee in employee_data:
  for team in team_data:
    if employee[2] == team[1]:
      #tp = employee[0],team[0], employee[2]
      #ETlist.append(tp)
      for office in city_data:
        if employee[1] == office[0]:
          ta = office[1], employee[0],team[0], employee[2]
          ETOClist.append(ta)

ETOCSet = set(ETOClist) ETOC = tuplelist(ETOCSet) #EO = tuplelist(ETlist)

TOlist = [] for team in team_data: for office in city_data: tp = team[0],office[1] TOlist.append(tp)

TOSet = set(TOlist) TO = tuplelist(TOSet)

#Dict from citydata office,city,cap = multidict({office:[city,capa] for city,office,capa in city_data}) req = {('Manager','alpha'):2,('Engineer','alpha'):4,('Designer','alpha'):1, ('Engineer', 'beta'): 3, ('Designer', 'gamma'): 1, ('Engineer', 'gamma'): 1 } model = Model('Assign') x = model.addVars(ETOC,vtype='b',name='x') z = model.addVars(TO,vtype='b',name='z')

M = 100 #Constraints C1 = model.addConstrs((x.sum(o,'','','') <= cap[o] for o in office),'Capacity') C2= model.addConstrs((req[r,t] <= x.sum('','*',t,r) for r,t in req),'Req')

#Ensure no employee is assigned to a team unless role is there. redundant #C3= model.addConstrs((x.sum('','',t,r) <= Mreq[r,t] for r,t in req),'Optional')

C4 = model.addConstrs(2.01 - x.sum(o,'',t,'') <= 3z[t,o] for t,o in TO) C5 = model.addConstrs(x.sum(o,'',t,'') - 2.01 <= 3(z[t,o]-1) for t,o in TO)

#Prioritizing objectives: P1, P2 P1, P2 = 3,1 obj = P1x.sum() - P2z.sum()

model.setObjective(obj, GRB.MAXIMIZE) model.optimize() model.printAttr('x')

Sutanu Majumdar
  • 3,461
  • 1
  • 2
  • 11
  • and how to get desired output after model.optimize()? – Ir8_mind Feb 14 '23 at 22:14
  • To see all results use mode.printAttr('x'). It prints vars with a 1- assignment. Also the combinations code above that, using yours only, can also give an answer but may not satisfy that minimization requirement. It has one glitch, you were not eliminating a team position once it's assigned. So either solution may be helpful. – Sutanu Majumdar Feb 14 '23 at 22:35
  • it returns ------------------------------------------------------------------------------- GurobiError Traceback (most recent call last) in 50 model.setObjective(obj, GRB.MAXIMIZE) 51 model.optimize() ---> 52 model.printAttr('x')

    src/gurobipy/model.pxi in gurobipy.Model.printAttr()

    src/gurobipy/model.pxi in gurobipy.Model.getAttr()

    src/gurobipy/attrutil.pxi in gurobipy.__gettypedattrlist()

    GurobiError: Unable to retrieve attribute 'x'

    – Ir8_mind Mar 10 '23 at 13:51
  • Did it give optimal value? Like run with last printAttr removed. It must print optimal value or would say infeasible/unbounded – Sutanu Majumdar Mar 10 '23 at 14:13
  • no, it returned GurobiError – Ir8_mind Mar 11 '23 at 13:51
1

I second the integer programming suggestion. But greedy is worth a shot.

I don't have time to give a full implementation, but here is the basic idea.

for all teams from biggest to smallest
    if you can put them entirely colocated in one city
        can you do it without leaving just one employee out?
            put them in the smallest such available city.

for all cities from least open spaces to most: how many is the most that can be assigned from one team? can you do it without leaving only one out (city or team)? do that from the smallest team you can. if that is more than 2: assign one less than that from the smallest team you can (leave one out from the most commonly needed role) else: assign as many as you can anyways repeat until the city is assigned

1

I put together a solution for the provided data, not sure if my stuff is very scalable, with large datasets. My line of thinking with the data you provided was:

1.) add all employees to a dataframe 2.) assign largest teams to largest cities containing employees 3.) assign city teams to city rooms based on the size of team/room with the condition a room must always have 2 members from a team.

my assign employees to rooms was definitely a hack and slash, conditionals could be reworked or improved on. Regardless the code below will give you the desired output for the datasets you provided.

import pandas as pd
import math

city_data = [("New York", "A", 3), ("New York", "B", 2), ("New York", "C", 6), ("Boston", "D", 2), ("Boston", "E", 5)]

team_data = [("alpha", "Manager"), ("alpha", "Manager"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Designer"), ("beta", "Engineer"), ("beta", "Engineer"), ("beta", "Engineer"), ("gamma", "Designer"), ("gamma", "Engineer")]

employee_data = [(1, "New York", "Manager"), (2, "New York", "Manager"), (3, "New York", "Engineer"), (4, "New York", "Engineer"), (5, "New York", "Engineer"), (6, "New York", "Engineer"), (7, "New York", "Engineer"), (8, "New York", "Designer"), (9, "New York", "Designer"), (10, "Boston", "Engineer"), (11, "Boston", "Engineer"), (12, "Boston", "Engineer")]

df = pd.DataFrame({ "team_id" : [], "employee_id" : [], "position" : [], "city" : [], "office_id": [] })

def add_employee(employee, df): new_employee = pd.DataFrame({ "team_id" : [''], "employee_id" : [employee[0]], "position" : [employee[2]], "city" : [employee[1]], "office_id": [''], }) return pd.concat([df, new_employee], ignore_index=True)

def add_team(team_id, employee_id, df): df.loc[df['employee_id'] == employee_id, 'team_id'] = team_id

def assign_employee_to_team(team_data, df): current_team= ""

for item in team_data:      
    team = item[0]
    position = item[1]      

    if team != current_team:
        current_team = team
        location = list(get_largest_available_employees_location(team_data, df).keys())[0]

    if location is None:
        employee = df.loc[(df['team_id'] == '') &amp; (df['position'] == position), 'employee_id'].iloc[0]
    else:   
        employee = df.loc[(df['team_id'] == '') &amp; (df['position'] == position) &amp; (df['city'] == location), 'employee_id'].iloc[0]

    add_team(team, employee, df)

def get_largest_available_employees_location(team_data, df): return df.loc[df['team_id'] == '', 'city'].value_counts().to_dict()

def add_all_employees(df): for employee in employee_data: df = add_employee(employee, df) return df

def assign_room(employee, room, df): df.loc[(df['office_id'] == '') & (df['employee_id'] == employee), 'office_id'] = room

def update_employee_rooms(people_to_add, team_name, city, room_id, df): for person in range(people_to_add): employee = df.loc[(df['office_id'] == '') & (df['team_id'] == team_name) & (df['city'] == city), 'employee_id'].iloc[0] assign_room(employee,room_id,df)

def get_total_team_count_per_city(df): return df.value_counts(['team_id', 'city'])

def assign_employee_to_room(city_data, df): rooms = sorted(city_data, key=lambda tup: (tup[0], tup[2]), reverse=True) teams = get_total_team_count_per_city(df) carry_over = 0

for team_info in teams.items():
    team_name = team_info[0][0]
    team_city = team_info[0][1]
    num_of_people = team_info[1]

    for index, item in enumerate(rooms):
        if num_of_people == 0:
            break

        city = item[0]
        room_id = item[1]
        max_size = item[2]

        if max_size == 'FULL':
            continue

        if team_city == city:
            if (math.floor(num_of_people/max_size)) &gt;= 2:
                people_to_add = num_of_people - max_size
                update_employee_rooms(people_to_add, team_name, city, room_id, df)
                num_of_people = num_of_people - people_to_add
            else:
                if (num_of_people - (max_size-1)) &lt; 0:
                    people_to_add = num_of_people
                elif (num_of_people - max_size) == 0:
                    people_to_add = num_of_people
                else:
                    people_to_add = (max_size-1)

                num_of_people = (num_of_people - people_to_add)
                update_employee_rooms(people_to_add, team_name, city, room_id, df)
                rooms[index] = (city, room_id, 'FULL')

df = add_all_employees(df) assign_employee_to_team(team_data, df) assign_employee_to_room(city_data, df)

print(df)

1

I used the library pulp (github) which is nice for linear programming and optimization. It allows to add additional constraints later on (see the # comments), the code is a bit longer and clunky (because I just noticed your question today) but it should do the job:


The data:

import pandas as pd
from pulp import *

Load the data

city_data = pd.DataFrame({ 'city': ['New York', 'New York', 'New York', 'Boston', 'Boston'], 'office_id': ['A', 'B', 'C', 'D', 'E'], 'capacity': [3, 2, 6, 2, 5] }) team_data = pd.DataFrame({ 'team_id': ['alpha', 'alpha', 'alpha', 'alpha', 'alpha', 'alpha', 'alpha', 'beta', 'beta', 'beta', 'gamma', 'gamma'], 'position': ['Manager', 'Manager', 'Engineer', 'Engineer', 'Engineer', 'Engineer', 'Designer', 'Engineer', 'Engineer', 'Engineer', 'Designer', 'Engineer'] }) employees_data = pd.DataFrame({ 'employee_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'city': ['New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'Boston', 'Boston', 'Boston'], 'position': ['Manager', 'Manager', 'Engineer', 'Engineer', 'Engineer', 'Engineer', 'Engineer', 'Designer', 'Designer', 'Engineer', 'Engineer', 'Engineer'] })


The Code:

# Create a list of cities, offices and teams
cities = city_data['city'].unique()
offices = city_data['office_id'].unique()
teams = team_data['team_id'].unique()

Create a dictionary of office capacities

capacities = dict(zip(city_data['office_id'], city_data['capacity']))

Create a dictionary of employees by city and position

employees = {} for city in cities: for position in team_data['position'].unique(): employees[(city, position)] = employees_data[(employees_data['city'] == city) & (employees_data['position'] == position)]['employee_id'].tolist()

Create the LP problem

prob = LpProblem("Employee Allocation Problem", LpMaximize)

Create a binary variable for each employee, indicating whether they are assigned to a team

employee_vars = LpVariable.dicts("Employee", [(employee_id, team_id, office_id) for employee_id in employees_data['employee_id'] for team_id in teams for office_id in offices], cat='Binary')

Create Auxiliary table

df_employee_vars = pd.DataFrame.from_dict(employee_vars, orient='index', columns=['Employee']).reset_index().rename(columns={'index': 'assignment'}) # , index=range(len(employee_vars)) df_employee_vars['employee_id'] = df_employee_vars['assignment'].str[0] df_employee_vars['team'] = df_employee_vars['assignment'].str[1] df_employee_vars['office'] = df_employee_vars['assignment'].str[2]

Objective 1: minimize number of offices

employees_in_office = (df_employee_vars['Employee']*df_employee_vars['office']).values prob += lpSum(-len({y for x in employees_in_office for y in x.values()}))

Objective 2: maximize number of employees of same team in same office

vals = df_employee_vars.groupby(['team', 'office'])['Employee'].sum().values prob += lpSum(np.mean(vals))

Constraint: each employee can only be assigned to one team and one office

for employee_id in employees_data['employee_id']: prob += lpSum([employee_vars[(employee_id, team_id, office_id)] for team_id in teams for office_id in offices]) == 1

Constraint: Each office has a capacity constraint

for office_id in offices: for team_id in teams: prob += lpSum([employee_vars[(employee_id, team_id, office_id)] for employee_id in employees_data['employee_id']]) <= capacities[office_id]

Constraint: Each office has a capacity constraint

for office_id in offices: employees_in_office = df_employee_vars[df_employee_vars['office'].eq(office_id)]['Employee'].values prob += lpSum(employees_in_office) <= capacities[office_id]

Constraint: Each team has a capacity constraint

team_capacities = team_data.groupby(['team_id'])['team_id'].count().to_dict() for team_id, capacity in team_capacities.items(): employees_in_team = df_employee_vars[df_employee_vars['team'].eq(team_id)]['Employee'].values prob += lpSum(employees_in_team) <= capacity

Solve

prob.solve()

Print the status of the solution

print("Status:", LpStatus[prob.status])

Print the optimal objective value

print("Objective value:", value(prob.objective))

Print the optimal variable values

for v in prob.variables(): if v.varValue == 1: print(v.name, "=", v.varValue)


The Output:

INTEGER OPTIMAL SOLUTION FOUND
Time used:   0.0 secs
Memory used: 0.3 Mb (266494 bytes)
Writing MIP solution to 'C:\Users\...\AppData\Local\Temp\...-pulp.sol'...
Status: Optimal
Objective value: 5
Employee_(1,_'alpha',_'A') = 1
Employee_(10,_'alpha',_'A') = 1
Employee_(11,_'alpha',_'A') = 1
Employee_(12,_'gamma',_'E') = 1
Employee_(2,_'gamma',_'E') = 1
Employee_(3,_'alpha',_'B') = 1
Employee_(4,_'alpha',_'B') = 1
Employee_(5,_'alpha',_'C') = 1
Employee_(6,_'alpha',_'C') = 1
Employee_(7,_'beta',_'E') = 1
Employee_(8,_'beta',_'E') = 1
Employee_(9,_'beta',_'E') = 1
  • @ no its all wrong – Ir8_mind Feb 25 '23 at 10:18
  • i ran this code, but the output i get is different and wrong: Status: Optimal Objective value: 5.0 Employee_(1,'alpha','B') = 1.0 Employee_(10,'alpha','B') = 1.0 Employee_(11,'alpha','A') = 1.0 Employee_(12,'beta','E') = 1.0 Employee_(2,'gamma','E') = 1.0 Employee_(3,'alpha','A') = 1.0 Employee_(4,'beta','C') = 1.0 Employee_(5,'alpha','D') = 1.0 Employee_(6,'alpha','D') = 1.0 Employee_(7,'beta','C') = 1.0 Employee_(8,'alpha','A') = 1.0 Employee_(9,'gamma','E') = 1.0 – Ir8_mind Feb 22 '23 at 07:53
  • @Ir8_mind did you by any chance not included further constrains? Because the code works fine for me. Maybe you could read the #-comments within the code for Objectives and Constraints and let me know if one is missing or provide an example of a result which breaks any constraints in your oppinion. –  Feb 25 '23 at 11:40
  • @Ir8_mind, I modified the code, does it work now? –  Feb 22 '23 at 14:44
1

Here's my modified code:

from collections import defaultdict, Counter

def allocate_employees(city_data, team_data, employee_data): city_office_capacity = defaultdict(dict) for city, office, capacity in city_data: city_office_capacity[city][office] = capacity

team_position_count = defaultdict(Counter)
for team, position in team_data:
    team_position_count[team][position] += 1

employee_allocations = []
for employee, city, position in employee_data:
    max_capacity = 0
    max_office = None
    for office, capacity in city_office_capacity[city].items():
        if capacity &gt; max_capacity:
            max_capacity = capacity
            max_office = office
    city_office_capacity[city][max_office] -= 1
    for team, position_count in team_position_count.items():
        if position_count[position] &gt; 0:
            employee_allocations.append((team, employee, position, city, max_office))
            team_position_count[team][position] -= 1
            break
return employee_allocations

city_data = [ ("New York", "A", 3), ("New York", "B", 2), ("New York", "C", 6), ("Boston", "D", 2), ("Boston", "E", 5), ]

team_data = [ ("alpha", "Manager"), ("alpha", "Manager"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Engineer"), ("alpha", "Designer"), ("beta", "Engineer"), ("beta", "Engineer"), ("beta", "Engineer"), ("gamma", "Designer"), ("gamma", "Engineer"), ]

employee_data = [ (1, "New York", "Manager"), (2, "New York", "Manager"), (3, "New York", "Engineer"), (4, "New York", "Engineer"), (5, "New York", "Engineer"), (6, "New York", "Engineer"), (7, "New York", "Engineer"), (8, "New York", "Designer"), (9, "New York", "Designer"), (10, "Boston", "Engineer"), (11, "Boston", "Engineer"), (12, "Boston", "Engineer"), ]

allocations = allocate_employees(city_data, team_data, employee_data) for r in allocations: print(r) ```

md2perpe
  • 111
  • 2