Nonlinear conjugate gradient methods make up another popular class of algorithms for large-scale optimization. These algorithms can be derived as extensions of the conjugate gradient algorithm or as specializations of limited-memory quasi-Newton methods. Given an iterate $$x_k$$ and a direction $$d_k$$, a line search is performed along $$d_k$$ to produce $$x_{k+1} = x_k + \alpha_k d_k$$. The Fletcher-Reeves variant of the nonlinear conjugate algorithm generates $$d_{k+1}$$ from the simple recursion
$d_{k+1} = – \nabla f(x_{k+1}) + \beta_k d_k \quad\quad\quad \beta_k = \left( \frac{\| \nabla f(x_{k+1}) \|_2} {\| \nabla f(x_k) \|_2} \right) ^ 2$
The method’s performance is sometimes enhanced by re-starting, that is, periodically $$\beta_k$$ setting to zero. The Polak-Ribiere variant of conjugate gradient defines $$\beta_k$$ as
$\beta_k = \frac{[ \nabla f(x_{k+1}) – \nabla f(x_k)]^T \nabla f(x_{k+1}) } { \nabla f(x_k)^T \nabla f(x_k)}$
These two definitions of $$\beta_k$$ are equivalent when $$f$$ is quadratic, but not otherwise. Numerical testing suggests that the Polak-Ribiere method tends to be more efficient than the Fletcher-Reeves method. IMSL, NAG(FORTRAN), NAG(C), OPTIMA Library, OPTPACK, and PROC NLP contain nonlinear conjugate gradient codes.