KKT Reduction Methods

The basic idea of KKT reduction methods, which were already known from Chebyshev approximation, consists of replacing \(P\) (whose objective function \(f\) and constraint functions \(g\) are assumed to be smooth) with a nonlinear system of equations obtained from the KKT local optimality conditions for \(P\).

In fact, under suitable conditions, if \(\overline{x}\) is a local minimizer of \(P\), there exist indices \(\overline{t}_{j}\in T_{a}(\overline{x}), j = 1,\dots,q(\overline{x})\), with \(q(\overline{x})\in \mathbb{N}\) depending on \(\overline{x}\), and nonnegative multipliers \(\overline{\lambda}_{j}, j=1,\dots,q(\overline{x})\), such that
\[\nabla_{x}f(\overline{x})=\sum_{j=1}^{q(\overline{x})}\overline{\lambda}_{j}\nabla_{x}g(\overline{x},\overline{t}_j).\]

We assume also the availability of a description of \(T \subset \mathbb{R}^m\) as
\[T=\{t\in \mathbb{R}^m : u_{i}(t) \geq 0, \,\,i=1,\dots, m\},\] where \(u_{i}\) is smooth for all \(i=1,\dots,m\). Observe that \(q(\overline{x})\) is the number of global minima of the lower level problem \(Q(\overline{x})\), provided that \(\nu (Q(\overline{x}))=0\). In that case, \(\overline{t}_{j}\in T_{a}(\overline{x})\) if and only if \(\overline{t}_{j}\) is an optimal solution of the (finite) lower level problem at \(\overline{x}\)
\[Q(\overline{x})\,:\, \min_{t} g(\overline{x},t)\,\,\, \text{s.t.}\,\,\, u_{i}(t)\geq 0,\,\, i=1,\dots, m. \]

Then, under some constraint qualification, the classical KKT theorem yields the existence of nonnegative multipliers \(\overline{\theta}_{i}^{j}, i=1,,\dots m,\) such that
\[\nabla_{t}g(\overline{x},\overline{t}_{j})=\sum_{i=1}^{m}\overline{\theta}_{i}^{j}\nabla_{t}u_{i}(\overline{t}_j)\] and
\[\overline{\theta}_{i}^{j}u_{i}(\overline{t}_j)= 0,\,\, i=1,\dots, m.\]

Step \(k\): Start with a given \(x_{k}\) (not necessarily feasible).

Estimate \(q(x_{k})\).
Apply \(N_{k}\) steps of a quasi-Newton method (for finite systems of equations) to
\begin{array}{llll}
\nabla_{x}f(x) & = & \sum_{j=1}^{q(x_{k})}\lambda _{j}\nabla _{x}g(x,t_{j}) & \\
g(x,t_{j}) & = & 0 & j=1,…,q(x_{k}) \\
\nabla_{t}g(x,t_{j}) & = &\sum_{i=1}^{m}\theta_{i}^{j}\nabla_{t}u_{i}(t_{j}) & \\
\theta_{i}^{j}u_{i}(t_{j}) &= & 0 & i=1,\dots,m,\,\,j=1,\dots,q(x_{k})
\end{array}

with unknowns \(x\), \(t_{j}\), \(\lambda_{j}\), \(\theta_{i}^{j}\), \(i=1,\dots,m\), \(j=1,\dots,q(x_{k})\), leading to iterates \(x_{k,l}\), \(l=1,\dots,N_{k}\).
Set \(x_{k+1}=x_{k,N_{k}}\) and \(k=k+1.\)

The main advantage of the KKT reduction methods on the discretization and cutting plane methods is their fast local convergence (as they are adaptations of quasi-Newton methods) provided that they start sufficiently close to an optimal solution.

The so-called two-phase methods combine a discretization or central cutting plane method providing a rough approximation of an optimal solution (1st phase) and a reduction one improving this approximation (2nd phase). No theoretical result supports the decision to switch to phase 2.

For more details, see Glashoff and Gustafson (1983), Hettich and Kortanek (1993), López and Still (2007), and Reemtsen and Görner (1998) in the Semi-infinite Programming References.