The quadratic programming (QP) problem involves minimization of a quadratic function subject to linear constraints. Most codes use the formulation
QP: minimize $\frac{1}{2} x^T Q x + c^T x$
subject to $a_i^T x = b_i$ for $i \in \mathcal{E}\qquad $
$a_i^T x \geq b_i$ for $i \in \mathcal{I}$,
where $Q \in R^{n\times n}$ is symmetric, and the index sets $\mathcal{I} \,$ and $\mathcal{E} \,$ specify the inequality and equality constraints, respectively. Quadratic programs are fundamental since many other types of optimization are solved by solving a sequence of QP (this method is known as Sequential Quadratic Programming.
The difficulty of solving the quadratic programming problem depends largely on the nature of the matrix $Q \,$. In ''convex'' quadratic programs, which are relatively easy to solve - there are polynomial-time algorithms - , the matrix $Q \,$ is positive semidefinite (on the feasible set). If $Q \,$ has negative eigenvalues-''nonconvex'' quadratic programming-then the objective function may have more than one local minimizer, and the problem is NP-complete. An extreme example is the problem
$\qquad \qquad \min \; - x^T x \; : \; -1 \leq x_i \leq 1, \; i=1,\ldots,n$,
which has a minimizer at any $x \,$ with
$\qquad \qquad|x_i|\,=1$ for $i = 1,..., n \,$ - a total of $\, 2^n_{} \,$ local minimizers.
Necessary optimality conditions for the vector $x^*_{} \,$ to be a local minimizer of problem QP are that it should be
primal feasible, i.e.,
$\qquad a_i^T x^* = b_i$ for $i \in \mathcal{E}\qquad $ and $a_i^T x^* \geq b_i$ for $i \in \mathcal{I}$,
dual feasible, i.e.,
$ \qquad Q x^* + c = \sum_{i \in \mathcal{E} \cup \mathcal{I}} a_i y_i^* \,$ and $y_i^* \geq 0$ for $i \in I$,
for some vector of ''Lagrange multipliers'' $y^*_{} \,$, and that the ''complementary slackness'' condition
$ \qquad ( a_i^T x^* - b_i ) y_i^* = 0 \;\;\; $ for all $i \in \mathcal{I}$,
should hold. These requirements are commonly known as the Karush-Kuhn_Tucker (KKT) conditions.
Such a point is a local minimizer if and only if $s^T H s \geq 0$ for all vectors $s \in \mathcal{S}$, where
$ \mathcal{S} = \{ s: $ |$ a_i^T s = 0 $ for $i \in \mathcal{E}\qquad $ and $i \in \mathcal{I}\qquad $ such that $a_i^T x^* = b_i,$ and $y^*_i > 0 $, and
|$ a_i^T s \geq 0 $ for $i \in \mathcal{I}\qquad $ such that $a_i^T x^* = b_i,$ and $y^*_i = 0 \}.$
This second-order condition is trivially satisfied for convex problems, but may be hard (NP-complete) to check for non-convex ones
if there are many $i \in \mathcal{I}\qquad $ for which $a_i^T x^* = b_i,$ and $y^*_i = 0$.
Algorithms
''Equality-constrained'' quadratic programs arise, not only as subproblems in solving the general problem, but also in structural analysis and other areas of application. ''Null-space methods'' for solving
EQP: minimize $\frac{1}{2} x^T Q x + c^T x$
subject to $A x = b \qquad$
find a full-rank matrix $Z\in R^{n\times m} \,$, such that $Z \,$ spans the null space of $A \,$. This matrix can be computed with orthogonal factorizations or, in the case of sparse problems, by ''LU'' factorization of a submatrix of $A \,$, just as in the simplex method for linear programming. Given a feasible vector $x_0 \,$, we can express any other feasible vector $x \,$ in the form
$$x = x_0 + Z w \,$$
for some $w \in R^m$. Direct computation shows that the equality-constrained subproblem EQP is equivalent to the unconstrained subproblem
$$\min_w \; \frac{1}{2} w^T (Z^T Q Z) w + (Q x_0 + c)^T Z w.$$
If the reduced Hessian matrix $Z^T Q Z \,$ is positive definite, then the unique solution $w^* \,$ of this subproblem can be obtained by solving the linear system
$$(Z^T Q Z) w = - Z^T (Q x_0 + c) \,.$$
The solution $x^* \,$of the equality-constrained subproblem EQP is then recovered by using $x = x_0 + Z w$ Lagrange multipliers can be computed from $ x^* $ by noting that the first-order condition for optimality in EQP is that there exists a multiplier vector $y^*$ such that
$$Q x^* + c + A^T y^* = 0 \,.$$
If $A$ has full rank, then
$$y^* = - (A^T A)^{-1} A (Q x^* + c) \,$$
is the unique set of multipliers. Most traditional codes use null-space methods. ''Range-space methods'' for problem EQP can be used when $Q \,$ is positive definite and easy to invert, for example, diagonal or block-diagonal. In this approach, the solution and the multiplier vectors are calculated from the formulae
$$y^* = - (A Q^{-1}A)^{-1}(b + A Q^{-1} c) \,$$
and
$$x^* = - Q^{-1}(c + A^T y^*) \,.$$
Although this approach works only for a subclass of problems, there are many applications in which it is useful. Finally, ''full-space methods'' compute both $x^* \,$ and $y^* \,$ together by solving the symmetric, indefinite block system of linear equations
$$
\begin{pmatrix}
Q & A^T \\
A & 0
\end{pmatrix}
\begin{pmatrix}
x^* \\ y^*
\end{pmatrix}
=
\begin{pmatrix}
- c \\ b
\end{pmatrix}.
$$
Full-space methods are particularly useful when the problem is large and sparse as the resulting block system retains this sparsity.
The reduced Hessian is positive definite if and only if the number of negative eigenvalues of the full-space system matrix is equal to the rank of $A \,$.
Active-set methods
The codes in the BQPD, LINDO, LSSOL, PORT 3, QPA and QPOPT packages are based on active set methods. After finding a feasible point during an initial phase, these methods search for a solution along the edges and faces of the feasible set by solving a sequence of equality-constrained quadratic programming problems. Active set methods differ from the simplex method for linear programming in that neither the iterates nor the solution need be vertices of the feasible set. When the quadratic programming problem is nonconvex, these methods usually find a local minimizer. Finding a ''global'' minimizer is a more difficult task that is not addressed by the software currently available.
Active set methods for the inequality-constrained problem QP solve a sequence of equality-constrained problems. Given a feasible $x_k \,$, these methods find a direction $d_k \,$ by solving the subproblem
EQP$_k$ : minimize $q(x_k + d) \,$
subject to $a_i^T(x_k + d) = b_i\qquad i \in \mathcal{W}_k$
where $q \,$ is the objective function
$q(x) = \frac{1}{2} x^T Q x + c^T x$
and $\mathcal{W}_k \,$ is a ''working set'' of constraints. In all cases $\mathcal{W}_k \,$ is a subset of
$\mathcal{A}(x_k) = \{ i \in \mathcal{I} : a_i^T x_k = b_i \} \cup \mathcal{E}$
the set of constraints that are active at $x_k \,$. Typically, $\mathcal{W}_k \,$ either is equal to $\mathcal{A}(x_k) \,$ or else has one fewer index than $\mathcal{A}(x_k) \,$.
The working set $\mathcal{W}_k \,$ is updated at each iteration with the aim of determining the set $\mathcal{A}^* \,$ of active constraints at a solution $x^* \,$. When $\mathcal{W}_k \,$ is equal to $\mathcal{A}^* \,$, a local minimizer of the original problem can be obtained as a solution of the equality-constrained subproblem EQP$_k$ . The updating of $\mathcal{W}_k \,$ depends on the solution of the direction-finding subproblem.
Subproblem EQP$_k$ has a solution if the reduced Hessian matrix $Z_k^T Q Z_k$ is positive definite. This is always the case if $Q \,$ is positive definite. If subproblem EQP$_k$ has a solution $d_k \,$, we compute the largest possible step
$\mu_k = \max\{ \frac{b_i - a_i^T x_k}{a_i^T d_k}: \; a_i^T d_k > 0, i \not \in \mathcal{W}_k \}$
that does not violate any constraints, and we set $x_{k+1} = x_k + \alpha_k d_k \,$, where $\alpha_k = \min\{ 1 , \mu_k \} \,$.
The step $\alpha_k = 1 \,$ would take us to the minimizer of the objective function on the subspace defined by the current working set, but it may be necessary to truncate this step if a new constraint is encountered. The working set is updated by including in $\mathcal{W}_{k+1} \,$ all constraints active at $x_{k+1} \,$.
If the solution to subproblem EQP$_k$ is $d_k=0 \,$, then $x_k \,$ is the minimizer of the objective function on the subspace defined by $\mathcal{W}_k \,$. First-order optimality conditions for subproblem EQP$_k$ imply that there are multipliers $y_i^{(k)}$ such that $Q x_k + c + \sum_{i \in \mathcal{W}_k} y_i^{(k)} a_i = 0$.
If $y_i^{(k)} \geq 0$ for $i \in \mathcal{W}_k$, then $x_k$ is a local minimizer of problem QP. Otherwise, we obtain $\mathcal{W}_{k+1} \,$ by deleting one of the indices $i \,$ for which $y_i^{(k)}
If the reduced Hessian matrix $Z_k^T Q Z_k$ is indefinite, then subproblem EQP$_k$ is unbounded below. In this case we need to determine a direction $d_k \,$ such that $q(x_k + \alpha d_k) \,$ is unbounded below, using techniques based on factorizations of the reduced Hessian matrix. Given $d_k \,$, we compute $\mu_k \,$ as before, and define $x_{k+1} = x_k + \mu d_k \,$.
The new working set $\mathcal{W}_{k+1} \,$ is obtained by adding to $\mathcal{W}_k \,$ all constraints active at $x_{k+1} \,$.
A key to the efficient implementation of active set methods is the reuse of information from solving the equality-constrained subproblem at the next iteration. The only difference between consecutive subproblems is that the working set grows or shrinks by a single component. Efficient codes perform updates of the matrix factorizations obtained at the previous iteration, rather than calculating them from scratch each time.
The LSSOL package (duplicated in [[NAG C Library]]) is specifically designed for convex quadratic programs and linearly constrained linear least squares problems. It is not aimed at large-scale problems; the constraint matrices and the Hessian $Q \,$ are all specified in dense storage format. The quadratic programming routine in IMSL contains codes for dense quadratic programs. If the matrix $Q \,$ is not positive definite, it is replaced by
$$Q + \gamma I \,$$
where $\gamma \geq 0 \,$ is chosen large enough to force convexity.
BQPD uses a null-space method to solve quadratic programs that are not necessarily convex. The linear algebra operations are performed in a modular way; the user is allowed to choose between sparse and dense matrix algebra. The reduced Hessian matrix is, however, processed as a dense matrix, even when sparse techniques are used to handle $Q \,$ and the constraints. The code is efficient for large-scale problems when the size of the working set is close to $n \,$. [[LINDO]] also takes account of sparsity, while [[MATLAB]], and QPOPT (also available in the [[NAG C Library]] library) are designed for dense quadratic programs that are not necessarily convex. [[GALAHAD|QPA]] uses a full-space approach together with a Schur-complement update to account for changes in the working set; the code is thus appropriate in the sparse case, and is designed to handle non-convex problems.
Path-following methods
Path-following (or as they are sometimes known, ''trajectory-following'', ''barrier'' or Interior-Point Methods) methods offer a good alternative to the earlier active-set methods. The packages
COPL,
CPLEX
GALAHAD
CVXOPT,
Gurobi ,
HOPDM ,
LOQO
MOSEK
GALAHAD
RegQP
and Xpress
are all based on path-following ideas. Although path-following methods may be applied to the general problem QP, it is easier to describe them for problems of the form
QP2: minimize $\frac{1}{2} x^T Q x + c^T x$
subject to $A x = b \qquad$ and $x \geq 0$,
for which first-order optimality conditions are that
$ A x^* = b, \;\; Qx^* + c = A^T y^* + z^*$ and $ x_i^* z_i^* = 0 $ for $i = \{1,\dots,n\}$,
for optimal primal variables $x^* \geq 0$, Lagrange multipliers $y_{}^*$,
and dual variables $z^* \geq 0$.
In their simplest form, interior-point methods trace the ''central path'' that is defined as the solution $v_{}^{}(t) = (x(t),y(t),z(t))$ to the parametric nonlinear system
$ A x(t) = b, \;\; Qx(t) + c = A^T y(t) + z(t)$ and $ x_i^{}(t) z_i^{}(t) = t $ for $i = \{1,\dots,n\}$
with $(x_{}^{}(t),z(t))>0$
as the scalar $t_{}$ decreases to 0. Notice that all points on the central path are primal and dual feasible, and that complementary slackness is achieved in the limit as $t$ approaches 0. A disadvantage of this simple idea is that a point $v_{}^{}(t_0)$ must be available for some $t_0 > 0$, but such a point may be found as a first-order critical point of the ''logarithmic-barrier function''
$\frac{1}{2} x^T Q x + c^T x - t_0 \sum_{i=1}^n \log x_i$
within the region $ A_{}^{} x = b$; indeed, early path-following methods were based on a sequential minimization of the logarithmic barrier function.
Notwithstanding, to cope with this potential deficiency, "infeasible" interior point methods start from any $v_{}^s = (x_{}^{s},y_{}^{s},z_{}^{s})$ for which $(x_{}^{s},z_{}^{s}) > 0$ and follow instead the trajectory $v_{}^{}(t)$
that satisfies the ''homotopy''
$ A x_{}^{}(t) - b = \theta(t) [ A x_{}^s - b ] $ ,
$ Q x^{}_{}(t) + c - A^T y(t) - z(t) = \theta(t) [ Qx_{}^s + c - A^T y^s - z^s ] $, and
$ x_i^{}(t) z_i^{}(t) = \theta(t) x_i^s z_i^s $ for $i = \{1,\dots,n\}$
as $t_{}$ decreases from 1 to 0. The scalar function $\theta_{}^{}(t)$ may be any increasing function for which $\theta^{}_{}(0) = 0 $ and $\theta_{}^{}(1) = 1 $. The simplest choice $\theta_{}^{}(t) = t$ is popular, but there are theoretical advantages in using $\theta_{}^{}(t) = t^2$ since then the unknown trajectory $v_{}^{}(t)$ is analytic for convex problems at $t_{} = 0$.
In practice, it is sometimes advantageous for numerical reasons to aim for a small value of the complementarity instead of zero, and in this case the complementary slackness part of the homotopy may be replaced by
$ x_i^{}(t) z_i^{}(t) = \theta(t) x_i^s z_i^s + [1-\theta(t)] \sigma $ for $i = \{1,\dots,n\}$
and some small ''centering'' parameter $\sigma_{}^{} > 0$.
Notice that all of these homotopies define their trajectories $v_{}^{}(t)$ implicitly; all that is known is the starting point $v_{}^s$. Many path-following methods replace the true but unknown $v_{}^{}(t)$ by a Taylor series
approximation $v_{}^{s}(t)$ evaluated about $v_{}^s$, and trace this approximation instead. Clearly, as $v_{}^{s}(t)$ is simply an approximation, it will most likely diverge from $v_{}^{}(t)$ as
$t_{}$ decreases from 1 towards 0. To cope with this, sophisticated safeguarding rules are used to decide how far $t_{}$ may decrease while giving an adequate approximation, and if $t^l_{}$ is this best value, $v_{}^s$ is replaced by
$v_{}^{s}(t^l)$ and the process repeated. The resulting iteration defines a typical path-following method.
The centering parameter is sometimes computed after a initial ''predictor'' step (a first-order Taylor approximation with $\sigma = 0$) is used to compute an estimate of the solution. Once $\sigma > 0$ is known, the Taylor approximation to the revised homotopy gives the ''corrector'' step.
The Taylor series coefficients are found by repeated differentiation of the homotopy equations with respect to $t_{}$,
and the $k$th order coefficients $(x^{(k)}, y^{(k)}, z^{(k)})$ may be obtained by solving the linear system
:$
\begin{pmatrix}
A & 0 & 0 \\ Q & - A^T & - I\\ Z^s & 0 & X^s
\end{pmatrix}
\begin{pmatrix}
x^{(k)} \\ y^{(k)} \\ z^{(k)}
\end{pmatrix}
=
\begin{pmatrix}
r_p^{(k)} \\ r_d^{(k)} \\ r_c^{(k)}
\end{pmatrix}
\doteq r^{(k)},
$
where $X^s_{}$ and $Z^s_{}$ are the diagonal matrices whose diagonal entries are $x^s_{}$ and $z^s_{}$ respectively, and the right-hand side $ r^{(k)} $ depends on the values of previously-calculated lower-order coefficients.
Since the coefficient matrix is the same for each order of coefficients, a single factorization enables us to find increasingly accurate Taylor approximations at a gradually increasing but reasonable cost. Block elimination of the system results in the
smaller, symmetric system
:$
\begin{pmatrix}
Q + (X^s)^{-1} Z^s & A^T \\ A & 0
\end{pmatrix}
\begin{pmatrix}
x^{(k)} \\ - y^{(k)}
\end{pmatrix}
=
\begin{pmatrix}
r_d^{(k)} + (X^s_{})^{-1}r_c^{(k)} \\ r_p^{(k)}
\end{pmatrix},
$
and this is usually exploited in practice; the variables $z^{(k)}$ may be recovered as
$ (X^s)^{-1}[r_c^{(k)} - Z^s_{} x_{}^{(k)}]$. Some algorithms seek to avoid possible numerical difficulties by regularizing these
defining systems. For example, the coefficient matrix above is replaced by
:$
\begin{pmatrix}
Q + (X^s)^{-1} Z^s + D_d & A^T \\ A & - D_d
\end{pmatrix}
$
where $D_p$ and $D_d$ are small, positive-definite diagonal perturbations. Other algorithms, try to avoid these
difficulties by pre-processing the data to remove singularities. Both techniques appear to work well in practice.
If the problem is convex, iterations of the form described can be shown to converge very fast and in a polynomially-bounded number of iterations. For non-convex problems, most methods prefer instead to approximately minimize the logarithmic barrier function for a decreasing sequence of values of $ t_0 $ using a globally-convergent (linesearch or trust-region) method typically used for linearly-constrained optimization; many of the details - such as the structure of the vital linear systems - are effectively the same as in the convex case.
Linear Least-Squares Problems
Linear least squares problems are special instances of convex quadratic programs that arise frequently in data-fitting applications. The linear least squares problem
LLS: minimize $ \frac{1}{2} \| C x - d \|_2^2 $
subject to $a_i^T x = b_i$ |for $i \in \mathcal{E}\qquad $
$a_i^T x \geq b_i$ |for $i \in \mathcal{I}$,
where $C \in R^{m\times n}$ and $d \in R^m$, is a special case of problem QP; we can see this by replacing $Q \,$ by $C^T C \,$ and $c \,$ by $C^T d \,$ in problem QP. In general, it is preferable to solve a least squares problem with a code that takes advantage of the special structure of the least squares problem (for example, [[LSSOL]]).
Algorithms for solving linear least squares problems tend to rely on full- or null-space active set methods. For a least squares problem the null-space matrix $Z \,$ can be obtained from the $QR \,$ factorization of $C \,$ explicit formation of $C^T C \,$ is avoided, since $C^T C \,$ is usually less well conditioned than $C \,$.
Software
*SeDuMi solves convex QP
*SDPT3 solves convex QP
References
- A. Altman & J. Gondzio, ''Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization''. Optimization Methods and Software, 11 (1999) pp. 275-302.
- R. Fletcher, ''A general quadratic programming algorithm''. Journal of the Institute of Mathematics and its Applications, 7 (1971), pp. 76-91.
- M.P. Friedlander & D. Orban, ''A primal-dual regularized interior-point method for convex quadratic programs''. Mathematical Programming Computation 4 (2012), pp. 71-107.
- P.E. Gill, W. Murray, M.A. Saunders & M.H. Wright, ''Inertia-controlling methods for general quadratic programming. SIAM Review, 33 (1991), pp. 1-36.
- D. Goldfarb & A.U. Idnani (1983), ''A numerically stable dual method for solving strictly convex quadratic programs''. Mathematical Programming, 27 (1983), pp. 1-33.
- N.I.M. Gould & Ph.L. Toint, ''An iterative working-set method for large-scale non-convex quadratic programming''. Applied Numerical Mathematics, 43 (2002), pp. 109-128.
- N.I.M. Gould, D. Orban, A. Sartenaer & Ph.L. Toint, ''Superlinear convergence of primal-dual interior point algorithms for nonlinear programming''. SIAM Journal on Optimization, 11 (2001), pp. 974-1002.
- M.K. Kozlov, S.P. Tarasov & L.G. Khachiyan, ''Polynomial solvability of convex quadratic programming''. Soviet Mathematics Doklady, 20 (1979), pp. 1108-1111
- K.G. Murty & S.N. Kabadi, ''Some NP-complete problems in quadratic and nonlinear programming''. Mathematical Programming, 39 (1987), pp. 117-129.
- F.A. Potra & J. Stoer, ''On a class of superlinearly convergent polynomial time interior point methods for sufficient LCP''. SIAM Journal on Optimization, 20 (2009), pp. 1333-1363.
- J. Stoer &, M. Wechs, ''Infeasible-interior-point paths for sufficient linear complementarity problems and their analyticity''. Mathematical Programming, 83 (1998), pp. 407-423.
- J. Stoer &, M. Wechs ''On the analyticity properties of infeasible-interior-point paths for monotone linear complementarity problems''. Numerische Mathematik, 81 (1999), pp. 631-645.
- R.J. Vanderbei, ''LOQO: An interior point code for quadratic programming''. Optimization Methods and Software, 12 (1999) pp. 451–484.
- S.A. Vavasis, ''Nonlinear Optimization: Complexity Issues'', Oxford University Press, Oxford, England (1991).
- Y. Zhang, ''On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem''. SIAM Journal on Optimization, 4 (1994), pp. 208-227.
- G. Zhao & J. Sun, ''On the rate of local convergence of high-order-infeasible-path- following algorithms for p ∗-linear complementarity problems''. Computational Optimization and Applications, 14 (1999), pp. 293-307.
See also the somewhat outdated BiBTeX [ftp://ftp.numerical.rl.ac.uk/pub/qpbook/qpbook.bib QP bibliography] by Nick Gould and Philippe Toint, and a summarising [ftp://ftp.numerical.rl.ac.uk/pub/qpbook/qp.pdf paper]
- Related areas
- Specialization
