Reduced-Gradient Methods

See Also: Constrained Optimization Nonlinear Programming

Reduced-gradient algorithms avoid the use of penalty parameters by searching along curves that stay near the feasible set. Essentially, these methods take the second version of the nonlinear programming formulation and use the equality constraints to eliminate a subset of the variables, thereby reducing the original problem to a bound-constrained problem in the space of the remaining variables. If \(x_B\) is the vector of eliminated or basic variables, and \(x_N\) is the vector of nonbasic variables, then
\[x_B = h(x_N),\] where the mapping \(h\) is defined implicitly by the equation
\[c[h(x_N), x_N] = 0.\]

We have assumed that the components of \(x\) have been arranged so that the basic variables come first. In practice,
\[x_B = h(x_N)\] can be recalculated using Newton’s method whenever \(x_N\) changes. Each Newton iteration has the form
\[x_B \leftarrow x_B – \partial_Bc(x_B, x_N)^{-1}c(x_B, x_N),\] where \(\partial_Bc\) is the Jacobian matrix of \(c\) with respect to the basic variables. The original constrained problem is now transformed into the bound-constrained problem
\[\min \{ f(h(x_N), x_N) : l_N \leq x_N \leq u_N \}.\]

Algorithms for this reduced subproblem subdivide the nonbasic variables into two categories. These are the fixed variables \(x_F\), which usually include most of the variables that are at either their upper or lower bounds and that are to be held constant on the current iteration, and the superbasic variables \(x_S\), which are free to move on this iteration. The standard reduced-gradient algorithm, implemented in CONOPT, searches along the steepest-descent direction in the superbasic variables. The generalized reduced-gradient codes GRG2 and LSGRG2 use more sophisticated approaches. They either maintain a dense BFGS approximation of the Hessian of \(f\) with respect to \(x_S\) or use limited-memory conjugate gradient techniques. MINOS also uses a dense approximation to the superbasic Hessian matrix. The main difference between MINOS and the other three codes is that MINOS does not apply the reduced-gradient algorithm directly to the problem but rather uses it to solve a linearly constrained subproblem to find the next step. The overall technique is known as a projected augmented Lagrangian algorithm.

Operations involving the inverse of \(\partial_Bc(x_B, x_N)\) are frequently required in reduced-gradient algorithms. These operations are facilitated by an \(LU\) factorization of the matrix. GRG2 performs a dense factorization, while CONOPT, MINOS, and LSGRG2 use sparse factorization techniques, making them more suitable for large-scale problems.

When some of the components of the constraint functions are linear, most algorithms aim to retain feasibility of all iterates with respect to these constraints. The optimization problem becomes easier in the sense that there is no curvature term corresponding to these constraints that must be accounted for and, because of feasibility, these constraints make no contribution to the merit function. Numerous codes, such as NPSOL, MINOS and some routines from the NAG library, are able to take advantage of linearity in the constraint set. Other codes, such as those in the IMSL, PORT 3, and PROC NLP libraries, are specifically designed for linearly constrained problems. The IMSL codes are based on a sequential quadratic programming algorithm that combines features of the EQP and IQP variants. At each iteration, this algorithm determines a set \(N_k\) of near-active indices defined by
\[N_k = {i \in I : c_i(x_k) \geq -r_i} \,\]

where the tolerances \(r_i\) tend to decrease on later iterations. The step \(d_k\) is obtained by solving the subproblem
\[\min \{ q_k(d) : c_i(x_k + d_k)=0, i \in \mathcal{E} , c_i(x_k + d_k) \leq c_i(x_k), i \in \mathcal{N}_k \},\] where
\[q_k(d) = \nabla f(x_k)^T d + \frac{1}{2} d^TB_kd,\] and \(B_k\) is a BFGS approximation to \(\nabla^2f(x_k)\). This algorithm is designed to avoid the short steps that EQP methods sometimes produce, without taking many unnecessary constraints into account, as IQP methods do.