Line Search Methods

See Also: Unconstrained Optimization

Line search methods generate the iterates by setting \(x_{k+1} = x_k + \alpha_k d_k \,\) where \(d_k\) is a search direction and \(\alpha_k > 0\) is chosen so that \(f({x+1}) < f(x_k).\)

Most line search versions of the basic Newton method generate the direction \(d_k\) by modifying the Hessian matrix \(\nabla^2f(x_k)\) to ensure that the quadratic model \(q_k\) of the function has a unique minimizer. The modified Cholesky decomposition approach adds positive quantities to the diagonal of \(\nabla^2f(x_k)\) during the Cholesky factorization. As a result, a diagonal matrix, \(E_k\), with nonnegative diagonal entries is generated such that \(\nabla^2f(x_k) + E_k\) is positive definite. Given this decomposition, the search direction \(d_k\) is obtained by solving \((\nabla^2f(x_k) + E_k)d_k = -\nabla f(x_k)\)

After \(d_k\) is found, a line search procedure is used to choose an \(\alpha_k > 0\) that approximately minimizes \(f\) along the ray \({x_k + \alpha d_k : \alpha > 0}\). The following Newton codes use line-search methods: GAUSS, NAG Fortran Library, NAG C Library, and OPTIMA.

The algorithms used in these codes for determining \(\alpha_k\) rely on quadratic or cubic interpolation of the univariate function \(\phi_\alpha = f(x_k + \alpha d_k)\) in their search for a suitable \(\alpha_k\). An elegant and practical criterion for a suitable \(\alpha_k\) is to require \(\alpha_k\) to satisfy the sufficient decrease condition:
\(f(x_k + \alpha d_k) \leq f(x_k) + \mu \alpha_k \nabla f(x_k)^T d_k\) and the curvature condition:
\(| \nabla f(x_k + \alpha d_k)^T d_k|  \leq \eta | \nabla f(x_k)^T d_k| \)

where \(\mu\) and \(\eta\) are two constants with \(0 < \mu < \eta < 1\). The sufficient decrease condition guarantees, in particular, that \(f(x_{k+1}) < f(x_k)\) while the curvature condition requires that \(\alpha_k\) be not too far from a minimizer of \(\phi\). Requiring an accurate minimizer is generally wasteful of function and gradient evaluations, so codes typically use \(\mu=0.001\) and \(\eta=0.9\) in these conditions.