# Nonlinear Least Squares

The nonlinear least-squares problem has the general form
$$\min \{ r(x) : x \in \mathbb{R}^n \}$$
where $$r \,$$ is the function defined by $$r(x) = \frac{1}{2}\| f(x) \|_2^2$$ for some vector-valued function $$f$$ that maps $$\mathbb{R}^n$$ to $$\mathbb{R}^m$$.

Least-squares problems often arise in data-fitting applications. Suppose that some physical or economic process is modeled by a nonlinear function $$\phi \,$$ that depends on a parameter vector $$x$$ and time $$t$$. If $$b_i$$ is the actual output of the system at time $$t_i$$, then the residual
$$\phi(x,t_i) – b_i \, \,$$
measures the discrepancy between the predicted and observed outputs of the system at time $$t_i$$. A reasonable estimate for the parameter $$x$$ may be obtained by defining the $$i$$th component of $$f$$ by
$$f_i(x) = \phi(x,t_i) – b_i \,$$
and solving the least-squares problem with this definition of $$f$$.

From an algorithmic point of view, the feature that distinguishes least-squares problems from the general unconstrained optimization problem is the structure of the Hessian matrix of $$r$$. The Jacobian matrix of $$f$$,
$$f'(x) = \left( \partial_1 f(x), \ldots, \partial_n f(x) \right)$$
can be used to express the gradient of $$r \,$$ since $$\nabla r(x) = f'(x)^T f(x)$$. Similarly, $$f'(x)$$ is part of the Hessian matrix
$$\nabla^2 r(x) = f'(x)^T f'(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x).$$

To calculate the gradient of $$r \,$$, we need to calculate the Jacobian matrix $$f'(x)$$. Having done so, we know the first term in the Hessian matrix, namely $$f'(x)^Tf'(x) \,$$ without doing any further evaluations. Nonlinear least-squares algorithms exploit this structure.

In many practical circumstances, the first term, $$f'(x)^T f'(x) \,$$ in the Hessian is more important than the second term, most notably when the residuals $$f_i(x) \,$$ are small at the solution. Specifically, we say that a problem has small residuals if, for all $$x \,$$ near a solution, the quantities
$$|f_i(x)| \| \nabla^2 f_i(x) \|, \quad i=1,2,\ldots,n$$
are small relative to the smallest eigenvalue of $$f'(x)^Tf'(x) \,$$.

##### Notes and References

Nonlinear least-squares algorithms are discussed in the books of Bates and Watts ; Dennis and Schnabel ; Fletcher ; Gill, Murray, and Wright ; Nocedal and Wright; and Seber and Wild . The books by Bates and Watts  and by Seber and Wild  are written from a statistical point of view. Bates and Watts  emphasize applications, while Seber and Wild  concentrate on computational methods. Björck  discusses algorithms for linear least-squares problems in a comprehensive survey that covers, in particular, sparse least-squares problems and nonlinear least-squares.

1. Bates, D. M. and Watts, D. G. 1988. Nonlinear Regression Analysis and Its Applications, John Wiley &, Inc., New York.
2. Björck, A. 1990. Least squares methods. In Handbook of Numerical Analysis, P. G. Ciarlet and J. L. Lions, eds., North-Holland, Amsterdam, pp. 465-647.
3. Dennis, J. E. and Schnabel, R. B. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, NJ.
4. Fletcher, R. 1987. Practical Methods of Optimization, 2nd ed., John Wiley & Sons, Inc., New York.
5. Gill, P. E., Murray, W., and Wright, M. H. 1981. Practical Optimization, Academic Press, New York.
6. Nocedal, J. and Wright, S. J. 1999. Numerical Optimization, Springer-Verlag, New York.
7. Seber, G. A. F. and Wild, C. J. 1989. Nonlinear Regression, John Wiley & Sons, Inc., New York.