Nonlinear Least Squares

The nonlinear least-squares problem has the general form
$$\min \{ r(x) : x \in \mathbb{R}^n \}$$
where $r \,$ is the function defined by $r(x) = \frac{1}{2}\| f(x) \|_2^2$ for some vector-valued function $f$ that maps $\mathbb{R}^n $ to $\mathbb{R}^m $.

Least-squares problems often arise in data-fitting applications. Suppose that some physical or economic process is modeled by a nonlinear function $\phi \,$ that depends on a parameter vector $x $ and time $t $. If $b_i $ is the actual output of the system at time $t_i $, then the residual
$$\phi(x,t_i) – b_i \, \,$$
measures the discrepancy between the predicted and observed outputs of the system at time $t_i $. A reasonable estimate for the parameter $x$ may be obtained by defining the $i$th component of $f $ by
$$f_i(x) = \phi(x,t_i) – b_i \,$$
and solving the least-squares problem with this definition of $f $.

From an algorithmic point of view, the feature that distinguishes least-squares problems from the general unconstrained optimization problem is the structure of the Hessian matrix of $r $. The Jacobian matrix of $f $,
$$f'(x) = \left( \partial_1 f(x), \ldots, \partial_n f(x) \right)$$
can be used to express the gradient of $r \,$ since $\nabla r(x) = f'(x)^T f(x)$. Similarly, $f'(x) $ is part of the Hessian matrix
$$\nabla^2 r(x) = f'(x)^T f'(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x).$$

To calculate the gradient of $r \,$, we need to calculate the Jacobian matrix $f'(x)$. Having done so, we know the first term in the Hessian matrix, namely $f'(x)^Tf'(x) \,$ without doing any further evaluations. Nonlinear least-squares algorithms exploit this structure.

In many practical circumstances, the first term, $f'(x)^T f'(x) \,$ in the Hessian is more important than the second term, most notably when the residuals $f_i(x) \,$ are small at the solution. Specifically, we say that a problem has small residuals if, for all $x \,$ near a solution, the quantities
$$|f_i(x)| \| \nabla^2 f_i(x) \|, \quad i=1,2,\ldots,n$$
are small relative to the smallest eigenvalue of $f'(x)^Tf'(x) \,$.

Notes and References

Nonlinear least-squares algorithms are discussed in the books of Bates and Watts [1]; Dennis and Schnabel [3]; Fletcher [4]; Gill, Murray, and Wright [5]; Nocedal and Wright[6]; and Seber and Wild [7]. The books by Bates and Watts [1] and by Seber and Wild [7] are written from a statistical point of view. Bates and Watts [1] emphasize applications, while Seber and Wild [7] concentrate on computational methods. Björck [2] discusses algorithms for linear least-squares problems in a comprehensive survey that covers, in particular, sparse least-squares problems and nonlinear least-squares.

Bates, D. M. and Watts, D. G. 1988. Nonlinear Regression Analysis and Its Applications, John Wiley &, Inc., New York.
Björck, A. 1990. Least squares methods. In Handbook of Numerical Analysis, P. G. Ciarlet and J. L. Lions, eds., North-Holland, Amsterdam, pp. 465-647.
Dennis, J. E. and Schnabel, R. B. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, NJ.
Fletcher, R. 1987. Practical Methods of Optimization, 2nd ed., John Wiley & Sons, Inc., New York.
Gill, P. E., Murray, W., and Wright, M. H. 1981. Practical Optimization, Academic Press, New York.
Nocedal, J. and Wright, S. J. 1999. Numerical Optimization, Springer-Verlag, New York.
Seber, G. A. F. and Wild, C. J. 1989. Nonlinear Regression, John Wiley & Sons, Inc., New York.