The * nonlinear least-squares problem* has the general form

$$\min \{ r(x) : x \in \mathbb{R}^n \}$$

where \(r \,\) is the function defined by \(r(x) = \frac{1}{2}\| f(x) \|_2^2\) for some vector-valued function \(f\) that maps \(\mathbb{R}^n \) to \(\mathbb{R}^m \).

Least-squares problems often arise in data-fitting applications. Suppose that some physical or economic process is modeled by a nonlinear function \(\phi \,\) that depends on a parameter vector \(x \) and time \(t \). If \(b_i \) is the actual output of the system at time \(t_i \), then the residual

$$\phi(x,t_i) – b_i \, \,$$

measures the discrepancy between the predicted and observed outputs of the system at time \(t_i \). A reasonable estimate for the parameter \(x\) may be obtained by defining the \(i\)th component of \(f \) by

$$f_i(x) = \phi(x,t_i) – b_i \,$$

and solving the least-squares problem with this definition of \(f \).

From an algorithmic point of view, the feature that distinguishes least-squares problems from the general unconstrained optimization problem is the structure of the Hessian matrix of \(r \). The Jacobian matrix of \(f \),

$$f'(x) = \left( \partial_1 f(x), \ldots, \partial_n f(x) \right)$$

can be used to express the gradient of \(r \,\) since \(\nabla r(x) = f'(x)^T f(x)\). Similarly, \(f'(x) \) is part of the Hessian matrix

$$\nabla^2 r(x) = f'(x)^T f'(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x).$$

To calculate the gradient of \(r \,\), we need to calculate the Jacobian matrix \(f'(x)\). Having done so, we know the first term in the Hessian matrix, namely \(f'(x)^Tf'(x) \,\) without doing any further evaluations. Nonlinear least-squares algorithms exploit this structure.

In many practical circumstances, the first term, \(f'(x)^T f'(x) \,\) in the Hessian is more important than the second term, most notably when the residuals \(f_i(x) \,\) are small at the solution. Specifically, we say that a problem has small residuals if, for all \(x \,\) near a solution, the quantities

$$|f_i(x)| \| \nabla^2 f_i(x) \|, \quad i=1,2,\ldots,n$$

are small relative to the smallest eigenvalue of \(f'(x)^Tf'(x) \,\).

##### Notes and References

Nonlinear least-squares algorithms are discussed in the books of Bates and Watts [1]; Dennis and Schnabel [3]; Fletcher [4]; Gill, Murray, and Wright [5]; Nocedal and Wright[6]; and Seber and Wild [7]. The books by Bates and Watts [1] and by Seber and Wild [7] are written from a statistical point of view. Bates and Watts [1] emphasize applications, while Seber and Wild [7] concentrate on computational methods. Björck [2] discusses algorithms for linear least-squares problems in a comprehensive survey that covers, in particular, sparse least-squares problems and nonlinear least-squares.

- Bates, D. M. and Watts, D. G. 1988.
*Nonlinear Regression Analysis and Its Applications*, John Wiley &, Inc., New York. - Björck, A. 1990. Least squares methods. In
*Handbook of Numerical Analysis*, P. G. Ciarlet and J. L. Lions, eds., North-Holland, Amsterdam, pp. 465-647. - Dennis, J. E. and Schnabel, R. B. 1983.
*Numerical Methods for Unconstrained Optimization and Nonlinear Equations*, Prentice Hall, Englewood Cliffs, NJ. - Fletcher, R. 1987.
*Practical Methods of Optimization*, 2nd ed., John Wiley & Sons, Inc., New York. - Gill, P. E., Murray, W., and Wright, M. H. 1981.
*Practical Optimization*, Academic Press, New York. - Nocedal, J. and Wright, S. J. 1999.
*Numerical Optimization*, Springer-Verlag, New York. - Seber, G. A. F. and Wild, C. J. 1989.
*Nonlinear Regression*, John Wiley & Sons, Inc., New York.