Nonlinear Conjugate Gradient Method

See Also: Unconstrained Optimization

Nonlinear conjugate gradient methods make up another popular class of algorithms for large-scale optimization. These algorithms can be derived as extensions of the conjugate gradient algorithm or as specializations of limited-memory quasi-Newton methods. Given an iterate \(x_k\) and a direction \(d_k\), a line search is performed along \(d_k\) to produce \(x_{k+1} = x_k + \alpha_k d_k\). The Fletcher-Reeves variant of the nonlinear conjugate algorithm generates \(d_{k+1}\) from the simple recursion
\[d_{k+1} = – \nabla f(x_{k+1}) + \beta_k d_k \quad\quad\quad \beta_k = \left( \frac{\| \nabla f(x_{k+1}) \|_2} {\| \nabla f(x_k) \|_2} \right) ^ 2\]

The method’s performance is sometimes enhanced by re-starting, that is, periodically \(\beta_k\) setting to zero. The Polak-Ribiere variant of conjugate gradient defines \(\beta_k\) as
\[\beta_k = \frac{[ \nabla f(x_{k+1}) – \nabla f(x_k)]^T \nabla f(x_{k+1}) } { \nabla f(x_k)^T \nabla f(x_k)}\]

These two definitions of \(\beta_k\) are equivalent when \(f\) is quadratic, but not otherwise. Numerical testing suggests that the Polak-Ribiere method tends to be more efficient than the Fletcher-Reeves method. IMSL, NAG(FORTRAN), NAG(C), OPTIMA Library, OPTPACK, and PROC NLP contain nonlinear conjugate gradient codes.