Algorithms for Quadratic Programming

Equality-Constrained Quadratic Programs

Equality-constrained quadratic programs are QPs where only equality constraints are present. They arise both in applications (e.g., structural analysis) and as subproblems in active set methods for solving the general QPs. Consider the equality-constrained quadratic program:
\[ \begin{array}{lll}
EQP: & \min_x & \frac{1}{2} x^T Q x + c^T x \\
& \mbox{s.t.} & A x = b
\end{array}
\]

The first-order necessary conditions for \(x^*\) to be a solution of EQP state that there is a vector \(\lambda^*\) such that the following system of equations, the KKT system, is satisfied:
\[\begin{bmatrix}
Q & -A^T \\
A & 0
\end{bmatrix}
\begin{bmatrix}
x^*\\
\lambda^*
\end{bmatrix}
=
\begin{bmatrix}
-c \\
b
\end{bmatrix} \]

If we express \(x^*\) as \(x^* = x + p\) where \(x\) is an estimate of the solution and \(p\) is a step, we obtain an alternative form:
\[ \begin{bmatrix}
Q & A^T \\
A & 0
\end{bmatrix}
\begin{bmatrix}
-p \\
\lambda^*
\end{bmatrix}
=
\begin{bmatrix}
c + Qx \\
Ax – b
\end{bmatrix} \] The matrix is called the KKT matrix. Let \(Z\) denote the \(n \times (n-m)\) matrix whose columns are a basis for the null space of \(A\). When \(A\) has full row rank and the reduced-Hessian matrix \(Z^T Q Z\) is positive definite, there is a unique vector pair \((x^*, \lambda^*)\) that satisfies the KKT system. There are several methods for solving the KKT system.

Range-space methods can be used when \(Q\) is positive definite and easy to invert (for example, diagonal or block-diagonal). Multiplying the first equation in the alternative equation above by \(A Q^{-1}\) and subtracting the second equation, we obtain a linear system in the vector \(\lambda^*\):
\[ (A Q^{-1} A^T)\lambda^* = (A Q^{-1} c + b)\] Then, we recover \(p\) by solving
\[Qp = A^T \lambda^* – (c + Qx).\]

Null-space methods require a null-space basis matrix \(Z\). This matrix can be computed with orthogonal factorizations or, for sparse problems, by LU factorization of a submatrix of \(A\). Given a feasible vector \(x_0\), we can express any other feasible vector \(x\) in the form \(x = x_0 + Z w\) for some \(w \in R^m\). Direct computation shows that the equality-constrained subproblem EQP is equivalent to the unconstrained subproblem

\[ \min_w \; \frac{1}{2} w^T (Z^T Q Z) w + (Q x_0 + c)^T Z w.\]

If the reduced Hessian matrix \(Z^T Q Z \) is positive definite, then the unique solution \(w^* \,\) of this subproblem can be obtained by solving the linear system
\[ (Z^T Q Z) w = – Z^T (Q x_0 + c).\]

The solution \(x^*\) of the equality-constrained subproblem EQP is then recovered by using \(x = x_0 + Z w\). Lagrange multipliers can be computed from \(x^*\) by noting that the first-order condition for optimality in EQP is that there exists a multiplier vector \(\lambda^* \,\) such that \(Q x^* + c + A^T \lambda^* = 0\). If \(A\) has full rank, then \(\lambda^* = – (A^T A)^{-1} A (Q x^* + c)\) is the unique set of multipliers. Most traditional codes use null-space methods.

Inequality-Constrained Quadratic Programs

Inequality-constrained quadratic programs are QPs that contain inequality constraints and possibly equality constraints. Active set methods can be applied to both convex and nonconvex problems. Gradient projection methods, which allow rapid changes in the active set, are most effective for QPs with only bound constraints. Interior point methods work well for large convex QPs.

Active Set Methods

Active set methods start by finding a feasible point during an initial phase and then search for a solution along the edges and faces of the feasible set by solving a sequence of equality-constrained QPs. Active set methods differ from the simplex method for linear programming in that neither the iterates nor the solution need to be vertices of the feasible set. When the quadratic programming problem is nonconvex, these methods usually find a local minimizer. Finding a global minimizer is a more difficult task.

The active set \(\mathcal{A}(x^*)\) at an optimal point \(x^*\) is defined as the indices of the constraints at which equality holds:
\[ \mathcal{A}(x^*) = \{i \in \mathcal{E} \cup \mathcal{I}: a_i^T x^* = b_i\}.\] If \(\mathcal{A}(x^*)\) were known, the solution could be found by solving an equality-constrained QP of the form:
\[ \begin{array}{ll}
\min_x & q(x) = \frac{1}{2} x^T Q x + c^T x \\
\mbox{s.t.} & a_i^T x = b_i \quad \forall i \in \mathcal{A}(x^*)
\end{array}\] Therefore, the main challenge in solving inequality-constrained QPs is determining this set.

Given a feasible \(x_k\), these methods find a direction \(d_k\) by solving the subproblem
\[ \begin{array}{lll}
\mbox{EQP}_k & \min & q(x_k + d) \\
& \mbox{s.t.} & a_i^T(x_k + d) = b_i\qquad i \in \mathcal{W}_k
\end{array}
\] where \(q\) is the objective function \(q(x) = \frac{1}{2} x^T Q x + c^T x\) and \(\mathcal{W}_k \) is a working set of constraints. In all cases, \(\mathcal{W}_k\) is a subset of \(\mathcal{A}(x_k) = \{ i \in \mathcal{I} : a_i^T x_k = b_i \} \cup \mathcal{E}\), the set of constraints that are active at \(x_k\). Typically, \(\mathcal{W}_k\) either is equal to \(\mathcal{A}(x_k)\) or else has one fewer index than \(\mathcal{A}(x_k)\).

The working set \(\mathcal{W}_k\) is updated at each iteration with the aim of determining the set \(\mathcal{A}^*\) of active constraints at a solution \(x^*\). When \(\mathcal{W}_k\) is equal to \(\mathcal{A}^*\), a local minimizer of the original problem can be obtained as a solution of the equality-constrained subproblem \(\mbox{EQP}_k\). The updating of \(\mathcal{W}_k\) depends on the solution of the direction-finding subproblem.

Subproblem \(\mbox{EQP}_k\) has a solution if the reduced Hessian matrix \(Z_k^T Q Z_k\) is positive definite. This is always the case if \(Q\) is positive definite. If subproblem \(\mbox{EQP}_k\) has a solution \(d_k\), we compute the largest possible step
\[\mu_k = \max\{ \frac{b_i – a_i^T x_k}{a_i^T d_k}: \; a_i^T d_k > 0, i \not \in \mathcal{W}_k \}\] that does not violate any constraints, and we set \(x_{k+1} = x_k + \alpha_k d_k\), where \(\alpha_k = \min\{ 1 , \mu_k \}\).

The step \(\alpha_k = 1\) would take us to the minimizer of the objective function on the subspace defined by the current working set, but it may be necessary to truncate this step if a new constraint is encountered. The working set is updated by including in \(\mathcal{W}_{k+1}\) all constraints active at \(x_{k+1}\).

If the solution to subproblem \(\mbox{EQP}_k\) is \(d_k=0\), then \(x_k\) is the minimizer of the objective function on the subspace defined by \(\mathcal{W}_k\). First-order optimality conditions for subproblem \(\mbox{EQP}_k\) imply that there are multipliers \(\lambda_i^{(k)}\) such that \(Q x_k + c + \sum_{i \in \mathcal{W}_k} \lambda_i^{(k)} a_i = 0\).

If \(\lambda_i^{(k)} \geq 0\) for \(i \in \mathcal{W}_k\), then \(x_k\) is a local minimizer of problem QP. Otherwise, we obtain \(\mathcal{W}_{k+1}\) by deleting one of the indices \(i\) for which \(\lambda_i^{(k)} ≤ 0\). As in the case of linear programming, various pricing schemes for making this choice can be implemented.

If the reduced Hessian matrix \(Z_k^T Q Z_k\) is indefinite, then subproblem \(\mbox{EQP}_k\) is unbounded below. In this case we need to determine a direction \(d_k\) such that \(q(x_k + \alpha d_k)\) is unbounded below, using techniques based on factorizations of the reduced Hessian matrix. Given \(d_k\), we compute \(\mu_k\) as before, and define \(x_{k+1} = x_k + \mu d_k\). The new working set \(\mathcal{W}_{k+1}\) is obtained by adding to \(\mathcal{W}_k\) all constraints active at \(x_{k+1}\).

A key to the efficient implementation of active set methods is the reuse of information from solving the equality-constrained subproblem at the next iteration. The only difference between consecutive subproblems is that the working set grows or shrinks by a single component. Efficient codes perform updates of the matrix factorizations obtained at the previous iteration, rather than calculating them from scratch each time.

Interior Point Methods

Path-following methods (also known as trajectory-following, barrier or interior-point methods) offer a good alternative to the earlier active-set methods. Although path-following methods may be applied to the general problem QP, it is easier to describe them for problems of the form
\[ \begin{array}{lll}
QP2: & \min & \frac{1}{2} x^T Q x + c^T x\\
& \mbox{s.t.} & A x = b \\
& & x \geq 0
\end{array} \] for which first-order optimality conditions are that
\[\begin{array}{lll}
A x^* & = & b \\
Qx^* + c & = & A^T y^* + z^*\\
x_i^* z_i^* & = & 0 \; \forall i = \{1,\dots,n\}
\end{array}\] for optimal primal variables \(x^* \geq 0\), Lagrange multipliers \(y_{}^*\) and dual variables \(z^* \geq 0\).

In their simplest form, interior-point methods trace the central path that is defined as the solution \(v_{}^{}(t) = (x(t),y(t),z(t))\) to the parametric nonlinear system \(A x(t) = b, \;\; Qx(t) + c = A^T y(t) + z(t)\) and \(x_i^{}(t) z_i^{}(t) = t\) for \(i = \{1,\dots,n\}\) with \((x_{}^{}(t),z(t)) > 0\) as the scalar \(t_{}\) decreases to 0. Notice that all points on the central path are primal and dual feasible, and that complementary slackness is achieved in the limit as \(t\) approaches 0. A disadvantage of this simple idea is that a point \(v_{}^{}(t_0)\) must be available for some \(t_0 > 0\), but such a point may be found as a first-order critical point of the logarithmic-barrier function \(\frac{1}{2} x^T Q x + c^T x – t_0 \sum_{i=1}^n \log x_i\) within the region \(A_{}^{} x = b\); indeed, early path-following methods were based on a sequential minimization of the logarithmic barrier function.

Notwithstanding, to cope with this potential deficiency, infeasible interior point methods start from any \(v_{}^s = (x_{}^{s},y_{}^{s},z_{}^{s})\) for which \((x_{}^{s},z_{}^{s}) > 0\) and follow instead the trajectory \(v_{}^{}(t)\) that satisfies the homotopy \(A x_{}^{}(t) – b = \theta(t) [ A x_{}^s – b ]\) , \(Q x^{}_{}(t) + c – A^T y(t) – z(t) = \theta(t) [ Qx_{}^s + c – A^T y^s – z^s ]\), and
\(x_i^{}(t) z_i^{}(t) = \theta(t) x_i^s z_i^s\) for \(i = \{1,\dots,n\}\) as \(t_{}\) decreases from 1 to 0. The scalar function \(\theta_{}^{}(t)\) may be any increasing function for which \(\theta^{}_{}(0) = 0\) and \(\theta_{}^{}(1) = 1\). The simplest choice \(\theta_{}^{}(t) = t\) is popular, but there are theoretical advantages in using \(\theta_{}^{}(t) = t^2\) since then the unknown trajectory \(v_{}^{}(t)\) is analytic for convex problems at \(t_{} = 0\).

In practice, it is sometimes advantageous for numerical reasons to aim for a small value of the complementarity instead of zero, and in this case the complementary slackness part of the homotopy may be replaced by
\[ x_i^{}(t) z_i^{}(t) = \theta(t) x_i^s z_i^s + [1-\theta(t)] \sigma \;\; \forall i = \{1,\dots,n\}\] and some small centering parameter \(\sigma_{}^{} > 0\).

Notice that all of these homotopies define their trajectories \(v_{}^{}(t)\) implicitly; all that is known is the starting point \(v_{}^s\). Many path-following methods replace the true but unknown \(v_{}^{}(t)\) by a Taylor series approximation \(v_{}^{s}(t)\) evaluated about \(v_{}^s\) and trace this approximation instead. Clearly, as \(v_{}^{s}(t)\) is simply an approximation, it will most likely diverge from \(v_{}^{}(t)\) as \(t_{}\) decreases from 1 towards 0. To cope with this, sophisticated safeguarding rules are used to decide how far \(t_{}\) may decrease while giving an adequate approximation, and if \(t^l_{}\) is this best value, \(v_{}^s\) is replaced by \(v_{}^{s}(t^l)\) and the process repeated. The resulting iteration defines a typical path-following method.

The centering parameter is sometimes computed after a initial predictor step (a first-order Taylor approximation with \(\sigma = 0\)) is used to compute an estimate of the solution. Once \(\sigma > 0\) is known, the Taylor approximation to the revised homotopy gives the corrector step.

The Taylor series coefficients are found by repeated differentiation of the homotopy equations with respect to \(t_{}\), and the \(k\) th order coefficients \((x^{(k)}, y^{(k)}, z^{(k)})\) may be obtained by solving the linear system
\[\begin{pmatrix}
A & 0 & 0 \\ Q & – A^T & – I\\ Z^s & 0 & X^s
\end{pmatrix}
\begin{pmatrix}
x^{(k)} \\ y^{(k)} \\ z^{(k)}
\end{pmatrix}
=
\begin{pmatrix}
r_p^{(k)} \\ r_d^{(k)} \\ r_c^{(k)}
\end{pmatrix}
\doteq r^{(k)},\] where \(X^s_{}\) and \(Z^s_{}\) are the diagonal matrices whose diagonal entries are \(x^s_{}\) and \(z^s_{}\) respectively, and the right-hand side \(r^{(k)}\) depends on the values of previously-calculated lower-order coefficients. Since the coefficient matrix is the same for each order of coefficients, a single factorization enables us to find increasingly accurate Taylor approximations at a gradually increasing but reasonable cost. Block elimination of the system results in the smaller, symmetric system
\[\begin{pmatrix}
Q + (X^s)^{-1} Z^s & A^T \\ A & 0
\end{pmatrix}
\begin{pmatrix}
x^{(k)} \\ – y^{(k)}
\end{pmatrix}
=
\begin{pmatrix}
r_d^{(k)} + (X^s_{})^{-1}r_c^{(k)} \\ r_p^{(k)}
\end{pmatrix},\] and this is usually exploited in practice; the variables \(z^{(k)}\) may be recovered as \((X^s)^{-1}[r_c^{(k)} – Z^s_{} x_{}^{(k)}]\). Some algorithms seek to avoid possible numerical difficulties by regularizing these defining systems. For example, the coefficient matrix above is replaced by
\[
\begin{pmatrix}
Q + (X^s)^{-1} Z^s + D_d & A^T \\ A & – D_d
\end{pmatrix}
\] where \(D_p\) and \(D_d\) are small, positive-definite diagonal perturbations. Other algorithms, try to avoid these difficulties by pre-processing the data to remove singularities. Both techniques appear to work well in practice.

If the problem is convex, iterations of the form described can be shown to converge very fast and in a polynomially-bounded number of iterations. For non-convex problems, most methods prefer instead to approximately minimize the logarithmic barrier function for a decreasing sequence of values of \(t_0\) using a globally-convergent (linesearch or trust-region) method typically used for linearly-constrained optimization; many of the details – such as the structure of the vital linear systems – are effectively the same as in the convex case.

Linear Least-Squares Problems

Linear least-squares problems are special instances of convex quadratic programs that arise frequently in data-fitting applications. The linear least-squares problem is
\[ \begin{array}{lllll}
LLS: & \min & \frac{1}{2} \| C x – d \|_2^2 & & \\
& \mbox{s.t.} & a_i^T x & = & b_i \; \forall i \in \mathcal{E}\\
& & a_i^T x & \geq & b_i \; \forall i \in \mathcal{I}
\end{array} \] where \(C \in R^{m\times n}\) and \(d \in R^m\). LLS is a special case of problem QP, which can be seen by replacing \(Q\) by \(C^T C\) and \(c\) by \(C^T d\) in problem QP. In general, it is preferable to solve a least squares problem with a code that takes advantage of the special structure of the least squares problem. LSSOL is a Fortran package for constrained linear least-squares problems and convex QPs. See Lawson and Hanson (1974) for more information.

References

Lawson, C. L. and Hanson, R. J. 1974. Solving Least Squares Problems, Prentice-Hall, Englewood Cliffs, NJ.