For subset selection regression as a mixed integer program, how tightly should the bounding box be set?

Question

When solving best subset regression as a mixed integer program, how do you decide how tightly to bound the range of values of the $X$ values? When the box is tight, the solver finds a solution quickly, but potential solutions are ignored. How quickly should I expand the box?

I built my constraints to estimate variance using the following approach:

Rather than apply cuts on the covariance matrix directly, the cuts are applied on the principal components.

I want to minimize the variance of the sum of a vector $X$ ($1\times N$):

$$\text{Variance} = X \Sigma X^\top= X(Q\Lambda Q^\top)X^\top.$$

Let

$E_i:$ Eigenvalue $i$ ($1\times1$)

$V_i:$ Eigenvector $i$ ($N\times1$)

$= \sum_{i} E_i (XV_i)^2 = \sum_{i}\text{ComponentContribution}_i$

Near a reference vector $X_0$:

$$\text{ComponentContribution}_i \ge E_i (X_0V_i)^2+ 2E_i (X_0V_i)(XV_i-X_0V_i).$$

Note that there are many potential $X_0$ that have an equal value for $X_0V_i$. The above estimate for the component's contribution to variance does not depend on $X_0$. It only depends on $X_0V_i$. This allows the approximation to be useful near many different $X_0$. It is an approximation near a load value. Substituting:

\begin{align}L_0 &= X_0V_i\quad(1\times1)\\\text{ComponentContribution}_i &\ge E_iL^2 + 2E_iL(XV_i - L)\\\text{ComponentContribution}_i &\ge E_i(2LXV_i - L^2).\end{align}

This allows each component's contribution to be estimated independently. One benefit is that more constraints (more $L$ values) can be made for estimating the first component than for the smallest.

However, this leads to a second question, when evaluating the sources of error for an intermediate solution, how should I decide which components are contributing enough to error in the estimate to justify adding a constraint? Which should be lumped together into a single constraint?

I posted a working test to github.

For subset selection regression as a mixed integer program, how tightly should the bounding box be set?

0 Answers0