Trust-Region Methods

Back to Unconstrained Optimization

The trust-region approach can be motivated by noting that the quadratic model \(q_k\) is a useful model of the function \(f\) only near \(x_k\). When the Hessian matrix is indefinite, the quadratic function \(q_k\) is unbounded below, so \(q_k\) is obviously a poor model of \(f(x_k+s)\) when \(s\) is large. Therefore, it is reasonable to select the step \(s_k\) by solving the subproblem
\[\min \{ q_k(s) : \| D_ks \|_2 \leq \Delta_k \}\]
for some \(\Delta_k > 0\) and scaling matrix \(D_k\). The trust-region parameter, \(\Delta_k\), is adjusted between iterations according to the agreement between predicted and actual reduction in the function \(f\) as measured by the ratio
\[\rho_k = \frac{f(x_k) - f(x_k + s_k)}{f(x_k) - q_k(s_k)}.\]

If there is good agreement, that is, \(p \approx 1\) then \(\Delta_k\) is increased. If the agreement is poor, i.e., \(\rho_k\) is small or negative, then \(\Delta_k\) is decreased. The decision to accept the step \(s_k\) is also based on \(\rho_k\). Usually,
\[x_{k+1} = x_k + s_k,\]
if \(\rho_k \geq \sigma_0\) where \(\sigma_0\) is small (typically, \(10^{-4}\)); otherwise \(x_{k+1} = x_k\).

The following Newton codes use trust-region methods with different algorithms for choosing the step \(s_k\): IMSL, LANCELOT, PORT 3, and PROC NLP.

Most of these algorithms rely on the observation that there is a \(\lambda_k\) such that
\[(\nabla ^2 f(x_k) + \lambda_k D_k^T D_k ) s_k = - \nabla f(x_k), \quad\quad\quad(1.1)\]
where either \(\lambda_k = 0\) and \(\| D_ks_k \|_2 \leq \Delta_k\), or \(\lambda_k > 0\) and \(\| D_ks_k \|_2 = \Delta_k\). An appropriate \(\lambda_k\) is determined by an iterative process in which (1.1) is solved for each trial value of \(\lambda_k\).

The algorithm implemented in the TENMIN package extends Newton's method by forming low-rank approximations to the third- and fourth-order terms in the Taylor series approximation. These approximations are formed by using matching conditions that require storage of the gradient (but not the Hessian) on a number of previous iterations.