Trust-Region Methods

See Also: Unconstrained Optimization

The trust-region approach can be motivated by noting that the quadratic model \(q_k\) is a useful model of the function \(f\) only near \(x_k\). When the Hessian matrix is indefinite, the quadratic function \(q_k\) is unbounded below, so \(q_k\) is obviously a poor model of \(f(x_k+s)\) when \(s\) is large. Therefore, it is reasonable to select the step \(s_k\) by solving the subproblem
\[\min \{ q_k(s) : \| D_ks \|_2 \leq \Delta_k \}\] for some \(\Delta_k > 0\) and scaling matrix \(D_k\). The trust-region parameter, \(\Delta_k\), is adjusted between iterations according to the agreement between predicted and actual reduction in the function \(f\) as measured by the ratio
\[\rho_k = \frac{f(x_k) – f(x_k + s_k)}{f(x_k) – q_k(s_k)}.\]

If there is good agreement, that is, \(p \approx 1\) then \(\Delta_k\) is increased. If the agreement is poor, i.e., \(\rho_k\) is small or negative, then \(\Delta_k\) is decreased. The decision to accept the step \(s_k\) is also based on \(\rho_k\). Usually,
\[x_{k+1} = x_k + s_k,\] if \(\rho_k \geq \sigma_0\) where \(\sigma_0\) is small (typically, \(10^{-4}\)); otherwise \(x_{k+1} = x_k\).

The following Newton codes use trust-region methods with different algorithms for choosing the step \(s_k\): IMSL, LANCELOT, PORT 3, and PROC NLP.

Most of these algorithms rely on the observation that there is a \(\lambda_k\) such that
\[(\nabla ^2 f(x_k) + \lambda_k D_k^T D_k ) s_k = – \nabla f(x_k), \quad\quad\quad(1.1)\] where either \(\lambda_k = 0\) and \(\| D_ks_k \|_2 \leq \Delta_k\), or \(\lambda_k > 0\) and \(\| D_ks_k \|_2 = \Delta_k\). An appropriate \(\lambda_k\) is determined by an iterative process in which (1.1) is solved for each trial value of \(\lambda_k\).