Gradient Projection Methods

See Also: Constrained Optimization Bound Constrained Optimization

Gradient project methods are methods for solving bound constrained optimization problems. In solving bound constrained optimization problems, active set methods face criticism because the working set changes slowly; at each iteration, at most one constraint is added to or dropped from the working set. If there are \(k_0\) constraints active at the initial \(W_0\), but \(k_\theta\) constraints active at the solution, then at least \(| k_\theta – k_0 |\) iterations are required for convergence. This property can be a serious disadvantage in large problems if the working set at the starting point is vastly different from the active set at the solution. As a result, researchers have developed algorithms that allow the working set to undergo radical changes at each iteration and to interior-point algorithms that do not explicitly maintain a working set.

The gradient-projection algorithm is the prototypical method that allows large changes in the working set at each iteration. Given \(x_k\), this algorithm searches along the piecewise linear path
\[P[x_k – \alpha \nabla f(x_k)], \alpha \geq 0,\] where \(P\) is the projection onto the feasible set. A new point
\[x_{k+1} = P[x_k – \alpha_k \nabla f(x_k)]\] is obtained when a suitable \(\alpha_k > 0\) is found. For bound-constrained problems, the projection can be easily computed by setting
\[[P(x)]_i = \, mid \{ x_i, l_i, u_i \},\] where mid\( \{ \cdot \}\) is the middle (median) element of a set. The search for \(\alpha_k\) has to be done carefully since the function
\[\phi(\alpha) = f(P[x_k – \alpha_k \nabla f(x_k)])\] is only piecewise differentiable.

If properly implemented, the gradient-projection method is guaranteed to identify the active set at a solution in a finite number of iterations. After it has identified the correct active set, the gradient-projection algorithm reduces to the steepest-descent algorithm on the subspace of free variables. As a result, this method is invariably used in conjunction with other methods with faster rates of convergence.

Trust-region algorithms can be extended to bound-constrained problems. The main difference between the unconstrained and the bound-constrained version is that we now require the step \(s_k\) to be an approximate solution of the subproblem
\[\min \{ q_k(s) : \| D_ks \| \leq \Delta_k, l \leq x_k + s \leq u \},\] where
\[q_k(s) = \nabla f(x_k)^Ts + \frac{1}{2} s^T B_ks.\]

An accurate solution to this subproblem is not necessary, at least in early iterations. Instead, we use the gradient-projection algorithm to predict a step \(s_k^C\) (the Cauchy step) and then require merely that our step, \(s_k\), satisfies the constraints in the trust-region subproblem with \(q_k(s_k) \leq q_k(s_k^C)\). An approach along these lines is used by VE08 and PORT 3. In the bound-constrained code in LANCELOT, the trust region is defined by the \(l_{\infty} \)-norm and \(D_k = I\), yielding the equivalent subproblem
\[\min \{ q_k(s) : \max (l-x_k, \Delta_ke \leq s \leq \min(u-x_k, \Delta_ke) \} \] where \(e\) is the vector of all ones.

The advantage of strategies that combine the gradient-projection method with trust-region methods is that the working set is allowed to change rapidly, and yet eventually settle into the working set for the solution. LANCELOT uses this approach, together with special data structures that exploit the (group) partially separable structure of \(f\), to solve large bound-constrained problems.