The nonlinear least-squares problem has the general form
$$\min \{ r(x) : x \in \mathbb{R}^n \}$$
where \(r \,\) is the function defined by \(r(x) = \frac{1}{2}\| f(x) \|_2^2\) for some vector-valued function \(f\) that maps \(\mathbb{R}^n \) to \(\mathbb{R}^m \).
Least-squares problems often arise in data-fitting applications. Suppose that some physical or economic process is modeled by a nonlinear function \(\phi \,\) that depends on a parameter vector \(x \) and time \(t \). If \(b_i \) is the actual output of the system at time \(t_i \), then the residual
$$\phi(x,t_i) – b_i \, \,$$
measures the discrepancy between the predicted and observed outputs of the system at time \(t_i \). A reasonable estimate for the parameter \(x\) may be obtained by defining the \(i\)th component of \(f \) by
$$f_i(x) = \phi(x,t_i) – b_i \,$$
and solving the least-squares problem with this definition of \(f \).
From an algorithmic point of view, the feature that distinguishes least-squares problems from the general unconstrained optimization problem is the structure of the Hessian matrix of \(r \). The Jacobian matrix of \(f \),
$$f'(x) = \left( \partial_1 f(x), \ldots, \partial_n f(x) \right)$$
can be used to express the gradient of \(r \,\) since \(\nabla r(x) = f'(x)^T f(x)\). Similarly, \(f'(x) \) is part of the Hessian matrix
$$\nabla^2 r(x) = f'(x)^T f'(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x).$$
To calculate the gradient of \(r \,\), we need to calculate the Jacobian matrix \(f'(x)\). Having done so, we know the first term in the Hessian matrix, namely \(f'(x)^Tf'(x) \,\) without doing any further evaluations. Nonlinear least-squares algorithms exploit this structure.
In many practical circumstances, the first term, \(f'(x)^T f'(x) \,\) in the Hessian is more important than the second term, most notably when the residuals \(f_i(x) \,\) are small at the solution. Specifically, we say that a problem has small residuals if, for all \(x \,\) near a solution, the quantities
$$|f_i(x)| \| \nabla^2 f_i(x) \|, \quad i=1,2,\ldots,n$$
are small relative to the smallest eigenvalue of \(f'(x)^Tf'(x) \,\).
Notes and References
Nonlinear least-squares algorithms are discussed in the books of Bates and Watts [1]; Dennis and Schnabel [3]; Fletcher [4]; Gill, Murray, and Wright [5]; Nocedal and Wright[6]; and Seber and Wild [7]. The books by Bates and Watts [1] and by Seber and Wild [7] are written from a statistical point of view. Bates and Watts [1] emphasize applications, while Seber and Wild [7] concentrate on computational methods. Björck [2] discusses algorithms for linear least-squares problems in a comprehensive survey that covers, in particular, sparse least-squares problems and nonlinear least-squares.
- Bates, D. M. and Watts, D. G. 1988. Nonlinear Regression Analysis and Its Applications, John Wiley &, Inc., New York.
- Björck, A. 1990. Least squares methods. In Handbook of Numerical Analysis, P. G. Ciarlet and J. L. Lions, eds., North-Holland, Amsterdam, pp. 465-647.
- Dennis, J. E. and Schnabel, R. B. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, NJ.
- Fletcher, R. 1987. Practical Methods of Optimization, 2nd ed., John Wiley & Sons, Inc., New York.
- Gill, P. E., Murray, W., and Wright, M. H. 1981. Practical Optimization, Academic Press, New York.
- Nocedal, J. and Wright, S. J. 1999. Numerical Optimization, Springer-Verlag, New York.
- Seber, G. A. F. and Wild, C. J. 1989. Nonlinear Regression, John Wiley & Sons, Inc., New York.