5

I am working on a problem in which I am trying to maximize the average of a variable only for the data that meet a certain condition with a constraint on the number of data that meet this condition. I am actually interested in constructing this condition.

Mathematically, my problem can be formulated as follows \begin{align}\max &\quad \frac{\sum_{i}y_{i}w_{i}}{\sum_{i}w_{i}}\\\text{s.t.}&\quad\sum_{i}w_{i} = n\end{align} where the $w_{i} = I(v_{i} \ge \sum_{j} x_{j} z_{ij})$ represent the condition. I want to optimize with respect to the $x_{j}$ variables. $v_{i}$, $y_{i}$ and $z_{ij}$ are available data.

My question is then: is it possible to linearize this problem for solving as a linear program? If not, do you have any suggestion to tackle this problem?

I've tried actually adding the $w_{i}$ as variables in the program with the constraint $$v_{i} - M \ge \sum_{j} x_{j} z_{ij} - Mw_{i}$$ with $M$ a value larger than the maximum of $v_{i}$. But it seems to result in a unbounded problem, although I am unsure why.

It is also worth noting that the constraint $\sum_{i}w_{i} = n$ could be relaxed as an inequality one.

Thank you for your help.

Pierre
  • 53
  • 4

2 Answers2

7

The usual big-M approach would impose two sets of inequalities: \begin{align} \sum_j x_j z_{i,j}-v_i &\le \left(\sum_j \overline{x}_j z_{i,j}-v_i\right)(1-w_i)\\ v_i+\epsilon-\sum_j x_j z_{i,j}&\le \left(v_i+\epsilon-\sum_j \underline{x}_j z_{i,j}\right)w_i \end{align} To linearize the objective function, replace the denominator with $n$.

RobPratt
  • 32,006
  • 1
  • 44
  • 84
  • Thanks for the help! Do these two sets of inequality come in addition to the big-M constraint I wrote in the question? And I assume that ϵ is a small number? Do the bars on top and under $x_{j}$ on the right-hand side have special meaning? – Pierre Apr 04 '20 at 18:43
  • The first constraint replaces yours. The second one enforces the converse that $w_i=0 \implies v_i < \sum_j x_j z_{i,j}$. Yes, $\epsilon$ is a small positive constant. The bars indicate upper or lower bounds. I had assumed $z_{i,j} \ge 0$; if not, my big-M values will need adjusting. – RobPratt Apr 04 '20 at 19:40
  • Can we just replace the entire bracket with M? – ooo Apr 05 '20 at 07:31
  • @anoopyadav, yes, that expression plays the role of $M$ but is a small upper bound that is based on the problem data rather than just an arbitrary number. – RobPratt Apr 05 '20 at 13:12
3

Since Rob pointed out how to linearize the definition of the indicator variables, I'll focus on your objective, which is nonlinear. If you stick with the equality constraint $\sum_i w_i =n$, you can just replace the denominator in the objective with $n$ and the objective is linear. If, as is suggested in your question, you change the constraint to $\sum_i w_i \le n$ (or $\sum_i w_i \ge n$), then the objective function becomes a sticking point. Depending on the size of $n$ and the (unspecified) number of values of index $i$, your best bet might be to solve a sequence of problems where the "budget" constraint is $\sum_i w_i = k$ and $k$ is 1, then 2, ... then $n$ (or counts down to $n$ in the $\ge$ case).

RobPratt
  • 32,006
  • 1
  • 44
  • 84
prubin
  • 39,078
  • 3
  • 37
  • 104