1 @cindex nonlinear least squares fitting
2 @cindex least squares fitting, nonlinear
4 This chapter describes functions for multidimensional nonlinear
5 least-squares fitting. The library provides low level components for a
6 variety of iterative solvers and convergence tests. These can be
7 combined by the user to achieve the desired solution, with full access
8 to the intermediate steps of the iteration. Each class of methods uses
9 the same framework, so that you can switch between solvers at runtime
10 without needing to recompile your program. Each instance of a solver
11 keeps track of its own state, allowing the solvers to be used in
12 multi-threaded programs.
14 The header file @file{gsl_multifit_nlin.h} contains prototypes for the
15 multidimensional nonlinear fitting functions and related declarations.
18 * Overview of Nonlinear Least-Squares Fitting::
19 * Initializing the Nonlinear Least-Squares Solver::
20 * Providing the Function to be Minimized::
21 * Iteration of the Minimization Algorithm::
22 * Search Stopping Parameters for Minimization Algorithms::
23 * Minimization Algorithms using Derivatives::
24 * Minimization Algorithms without Derivatives::
25 * Computing the covariance matrix of best fit parameters::
26 * Example programs for Nonlinear Least-Squares Fitting::
27 * References and Further Reading for Nonlinear Least-Squares Fitting::
30 @node Overview of Nonlinear Least-Squares Fitting
32 @cindex nonlinear least squares fitting, overview
34 The problem of multidimensional nonlinear least-squares fitting requires
35 the minimization of the squared residuals of @math{n} functions,
36 @math{f_i}, in @math{p} parameters, @math{x_i},
40 \Phi(x) = {1 \over 2} || F(x) ||^2
41 = {1 \over 2} \sum_{i=1}^{n} f_i (x_1, \dots, x_p)^2
48 \Phi(x) = (1/2) || F(x) ||^2
49 = (1/2) \sum_@{i=1@}^@{n@} f_i(x_1, ..., x_p)^2
54 All algorithms proceed from an initial guess using the linearization,
58 \psi(p) = || F(x+p) || \approx || F(x) + J p\, ||
65 \psi(p) = || F(x+p) || ~=~ || F(x) + J p ||
70 where @math{x} is the initial point, @math{p} is the proposed step
72 Jacobian matrix @c{$J_{ij} = \partial f_i / \partial x_j$}
73 @math{J_@{ij@} = d f_i / d x_j}.
74 Additional strategies are used to enlarge the region of convergence.
75 These include requiring a decrease in the norm @math{||F||} on each
76 step or using a trust region to avoid steps which fall outside the linear
79 To perform a weighted least-squares fit of a nonlinear model
80 @math{Y(x,t)} to data (@math{t_i}, @math{y_i}) with independent gaussian
81 errors @math{\sigma_i}, use function components of the following form,
85 f_i = {(Y(x, t_i) - y_i) \over \sigma_i}
92 f_i = (Y(x, t_i) - y_i) / \sigma_i
97 Note that the model parameters are denoted by @math{x} in this chapter
98 since the non-linear least-squares algorithms are described
99 geometrically (i.e. finding the minimum of a surface). The
100 independent variable of any data to be fitted is denoted by @math{t}.
102 With the definition above the Jacobian is
103 @c{$J_{ij} = (1 / \sigma_i) \partial Y_i / \partial x_j$}
104 @math{J_@{ij@} =(1 / \sigma_i) d Y_i / d x_j}, where @math{Y_i = Y(x,t_i)}.
107 @node Initializing the Nonlinear Least-Squares Solver
108 @section Initializing the Solver
110 @deftypefun {gsl_multifit_fsolver *} gsl_multifit_fsolver_alloc (const gsl_multifit_fsolver_type * @var{T}, size_t @var{n}, size_t @var{p})
111 This function returns a pointer to a newly allocated instance of a
112 solver of type @var{T} for @var{n} observations and @var{p} parameters.
113 The number of observations @var{n} must be greater than or equal to
116 If there is insufficient memory to create the solver then the function
117 returns a null pointer and the error handler is invoked with an error
118 code of @code{GSL_ENOMEM}.
121 @deftypefun {gsl_multifit_fdfsolver *} gsl_multifit_fdfsolver_alloc (const gsl_multifit_fdfsolver_type * @var{T}, size_t @var{n}, size_t @var{p})
122 This function returns a pointer to a newly allocated instance of a
123 derivative solver of type @var{T} for @var{n} observations and @var{p}
124 parameters. For example, the following code creates an instance of a
125 Levenberg-Marquardt solver for 100 data points and 3 parameters,
128 const gsl_multifit_fdfsolver_type * T
129 = gsl_multifit_fdfsolver_lmder;
130 gsl_multifit_fdfsolver * s
131 = gsl_multifit_fdfsolver_alloc (T, 100, 3);
135 The number of observations @var{n} must be greater than or equal to
138 If there is insufficient memory to create the solver then the function
139 returns a null pointer and the error handler is invoked with an error
140 code of @code{GSL_ENOMEM}.
143 @deftypefun int gsl_multifit_fsolver_set (gsl_multifit_fsolver * @var{s}, gsl_multifit_function * @var{f}, gsl_vector * @var{x})
144 This function initializes, or reinitializes, an existing solver @var{s}
145 to use the function @var{f} and the initial guess @var{x}.
148 @deftypefun int gsl_multifit_fdfsolver_set (gsl_multifit_fdfsolver * @var{s}, gsl_multifit_function_fdf * @var{fdf}, gsl_vector * @var{x})
149 This function initializes, or reinitializes, an existing solver @var{s}
150 to use the function and derivative @var{fdf} and the initial guess
154 @deftypefun void gsl_multifit_fsolver_free (gsl_multifit_fsolver * @var{s})
155 @deftypefunx void gsl_multifit_fdfsolver_free (gsl_multifit_fdfsolver * @var{s})
156 These functions free all the memory associated with the solver @var{s}.
159 @deftypefun {const char *} gsl_multifit_fsolver_name (const gsl_multifit_fsolver * @var{s})
160 @deftypefunx {const char *} gsl_multifit_fdfsolver_name (const gsl_multifit_fdfsolver * @var{s})
161 These functions return a pointer to the name of the solver. For example,
164 printf ("s is a '%s' solver\n",
165 gsl_multifit_fdfsolver_name (s));
169 would print something like @code{s is a 'lmder' solver}.
172 @node Providing the Function to be Minimized
173 @section Providing the Function to be Minimized
175 You must provide @math{n} functions of @math{p} variables for the
176 minimization algorithms to operate on. In order to allow for
177 arbitrary parameters the functions are defined by the following data
180 @deftp {Data Type} gsl_multifit_function
181 This data type defines a general system of functions with arbitrary parameters.
184 @item int (* f) (const gsl_vector * @var{x}, void * @var{params}, gsl_vector * @var{f})
185 this function should store the vector result
186 @c{$f(x,\hbox{\it params})$}
187 @math{f(x,params)} in @var{f} for argument @var{x} and arbitrary parameters @var{params},
188 returning an appropriate error code if the function cannot be computed.
191 the number of functions, i.e. the number of components of the
195 the number of independent variables, i.e. the number of components of
199 a pointer to the arbitrary parameters of the function.
203 @deftp {Data Type} gsl_multifit_function_fdf
204 This data type defines a general system of functions with arbitrary parameters and
205 the corresponding Jacobian matrix of derivatives,
208 @item int (* f) (const gsl_vector * @var{x}, void * @var{params}, gsl_vector * @var{f})
209 this function should store the vector result
210 @c{$f(x,\hbox{\it params})$}
211 @math{f(x,params)} in @var{f} for argument @var{x} and arbitrary parameters @var{params},
212 returning an appropriate error code if the function cannot be computed.
214 @item int (* df) (const gsl_vector * @var{x}, void * @var{params}, gsl_matrix * @var{J})
215 this function should store the @var{n}-by-@var{p} matrix result
216 @c{$J_{ij} = \partial f_i(x,\hbox{\it params}) / \partial x_j$}
217 @math{J_ij = d f_i(x,params) / d x_j} in @var{J} for argument @var{x}
218 and arbitrary parameters @var{params}, returning an appropriate error code if the
219 function cannot be computed.
221 @item int (* fdf) (const gsl_vector * @var{x}, void * @var{params}, gsl_vector * @var{f}, gsl_matrix * @var{J})
222 This function should set the values of the @var{f} and @var{J} as above,
223 for arguments @var{x} and arbitrary parameters @var{params}. This function
224 provides an optimization of the separate functions for @math{f(x)} and
225 @math{J(x)}---it is always faster to compute the function and its
226 derivative at the same time.
229 the number of functions, i.e. the number of components of the
233 the number of independent variables, i.e. the number of components of
237 a pointer to the arbitrary parameters of the function.
241 Note that when fitting a non-linear model against experimental data,
242 the data is passed to the functions above using the
243 @var{params} argument and the trial best-fit parameters through the
246 @node Iteration of the Minimization Algorithm
249 The following functions drive the iteration of each algorithm. Each
250 function performs one iteration to update the state of any solver of the
251 corresponding type. The same functions work for all solvers so that
252 different methods can be substituted at runtime without modifications to
255 @deftypefun int gsl_multifit_fsolver_iterate (gsl_multifit_fsolver * @var{s})
256 @deftypefunx int gsl_multifit_fdfsolver_iterate (gsl_multifit_fdfsolver * @var{s})
257 These functions perform a single iteration of the solver @var{s}. If
258 the iteration encounters an unexpected problem then an error code will
259 be returned. The solver maintains a current estimate of the best-fit
260 parameters at all times.
263 The solver struct @var{s} contains the following entries, which can
264 be used to track the progress of the solution:
268 The current position.
271 The function value at the current position.
273 @item gsl_vector * dx
274 The difference between the current position and the previous position,
275 i.e. the last step, taken as a vector.
278 The Jacobian matrix at the current position (for the
279 @code{gsl_multifit_fdfsolver} struct only)
282 The best-fit information also can be accessed with the following
285 @deftypefun {gsl_vector *} gsl_multifit_fsolver_position (const gsl_multifit_fsolver * @var{s})
286 @deftypefunx {gsl_vector *} gsl_multifit_fdfsolver_position (const gsl_multifit_fdfsolver * @var{s})
287 These functions return the current position (i.e. best-fit parameters)
288 @code{s->x} of the solver @var{s}.
291 @node Search Stopping Parameters for Minimization Algorithms
292 @section Search Stopping Parameters
293 @cindex nonlinear fitting, stopping parameters
295 A minimization procedure should stop when one of the following conditions is
300 A minimum has been found to within the user-specified precision.
303 A user-specified maximum number of iterations has been reached.
306 An error has occurred.
310 The handling of these conditions is under user control. The functions
311 below allow the user to test the current estimate of the best-fit
312 parameters in several standard ways.
314 @deftypefun int gsl_multifit_test_delta (const gsl_vector * @var{dx}, const gsl_vector * @var{x}, double @var{epsabs}, double @var{epsrel})
316 This function tests for the convergence of the sequence by comparing the
317 last step @var{dx} with the absolute error @var{epsabs} and relative
318 error @var{epsrel} to the current position @var{x}. The test returns
319 @code{GSL_SUCCESS} if the following condition is achieved,
323 |dx_i| < \hbox{\it epsabs} + \hbox{\it epsrel\/}\, |x_i|
330 |dx_i| < epsabs + epsrel |x_i|
335 for each component of @var{x} and returns @code{GSL_CONTINUE} otherwise.
338 @cindex residual, in nonlinear systems of equations
339 @deftypefun int gsl_multifit_test_gradient (const gsl_vector * @var{g}, double @var{epsabs})
340 This function tests the residual gradient @var{g} against the absolute
341 error bound @var{epsabs}. Mathematically, the gradient should be
342 exactly zero at the minimum. The test returns @code{GSL_SUCCESS} if the
343 following condition is achieved,
347 \sum_i |g_i| < \hbox{\it epsabs}
354 \sum_i |g_i| < epsabs
359 and returns @code{GSL_CONTINUE} otherwise. This criterion is suitable
360 for situations where the precise location of the minimum, @math{x},
361 is unimportant provided a value can be found where the gradient is small
366 @deftypefun int gsl_multifit_gradient (const gsl_matrix * @var{J}, const gsl_vector * @var{f}, gsl_vector * @var{g})
367 This function computes the gradient @var{g} of @math{\Phi(x) = (1/2)
368 ||F(x)||^2} from the Jacobian matrix @math{J} and the function values
369 @var{f}, using the formula @math{g = J^T f}.
372 @node Minimization Algorithms using Derivatives
373 @section Minimization Algorithms using Derivatives
375 The minimization algorithms described in this section make use of both
376 the function and its derivative. They require an initial guess for the
377 location of the minimum. There is no absolute guarantee of
378 convergence---the function must be suitable for this technique and the
379 initial guess must be sufficiently close to the minimum for it to work.
381 @comment ============================================================
382 @cindex Levenberg-Marquardt algorithms
383 @deffn {Derivative Solver} gsl_multifit_fdfsolver_lmsder
384 @cindex LMDER algorithm
385 @cindex MINPACK, minimization algorithms
386 This is a robust and efficient version of the Levenberg-Marquardt
387 algorithm as implemented in the scaled @sc{lmder} routine in
388 @sc{minpack}. Minpack was written by Jorge J. Mor@'e, Burton S. Garbow
389 and Kenneth E. Hillstrom.
391 The algorithm uses a generalized trust region to keep each step under
392 control. In order to be accepted a proposed new position @math{x'} must
393 satisfy the condition @math{|D (x' - x)| < \delta}, where @math{D} is a
394 diagonal scaling matrix and @math{\delta} is the size of the trust
395 region. The components of @math{D} are computed internally, using the
396 column norms of the Jacobian to estimate the sensitivity of the residual
397 to each component of @math{x}. This improves the behavior of the
398 algorithm for badly scaled functions.
400 On each iteration the algorithm attempts to minimize the linear system
401 @math{|F + J p|} subject to the constraint @math{|D p| < \Delta}. The
402 solution to this constrained linear system is found using the
403 Levenberg-Marquardt method.
405 The proposed step is now tested by evaluating the function at the
406 resulting point, @math{x'}. If the step reduces the norm of the
407 function sufficiently, and follows the predicted behavior of the
408 function within the trust region, then it is accepted and the size of the
409 trust region is increased. If the proposed step fails to improve the
410 solution, or differs significantly from the expected behavior within
411 the trust region, then the size of the trust region is decreased and
412 another trial step is computed.
414 The algorithm also monitors the progress of the solution and returns an
415 error if the changes in the solution are smaller than the machine
416 precision. The possible error codes are,
420 the decrease in the function falls below machine precision
423 the change in the position vector falls below machine precision
426 the norm of the gradient, relative to the norm of the function, falls
427 below machine precision
431 These error codes indicate that further iterations will be unlikely to
432 change the solution from its current value.
436 @deffn {Derivative Solver} gsl_multifit_fdfsolver_lmder
437 This is an unscaled version of the @sc{lmder} algorithm. The elements of the
438 diagonal scaling matrix @math{D} are set to 1. This algorithm may be
439 useful in circumstances where the scaled version of @sc{lmder} converges too
440 slowly, or the function is already scaled appropriately.
443 @node Minimization Algorithms without Derivatives
444 @section Minimization Algorithms without Derivatives
446 There are no algorithms implemented in this section at the moment.
448 @node Computing the covariance matrix of best fit parameters
449 @section Computing the covariance matrix of best fit parameters
450 @cindex best-fit parameters, covariance
451 @cindex least squares, covariance of best-fit parameters
452 @cindex covariance matrix, nonlinear fits
454 @deftypefun int gsl_multifit_covar (const gsl_matrix * @var{J}, double @var{epsrel}, gsl_matrix * @var{covar})
455 This function uses the Jacobian matrix @var{J} to compute the covariance
456 matrix of the best-fit parameters, @var{covar}. The parameter
457 @var{epsrel} is used to remove linear-dependent columns when @var{J} is
460 The covariance matrix is given by,
471 covar = (J^T J)^@{-1@}
476 and is computed by QR decomposition of J with column-pivoting. Any
477 columns of @math{R} which satisfy
481 |R_{kk}| \leq epsrel |R_{11}|
488 |R_@{kk@}| <= epsrel |R_@{11@}|
493 are considered linearly-dependent and are excluded from the covariance
494 matrix (the corresponding rows and columns of the covariance matrix are
497 If the minimisation uses the weighted least-squares function
498 @math{f_i = (Y(x, t_i) - y_i) / \sigma_i} then the covariance
499 matrix above gives the statistical error on the best-fit parameters
500 resulting from the gaussian errors @math{\sigma_i} on
501 the underlying data @math{y_i}. This can be verified from the relation
502 @math{\delta f = J \delta c} and the fact that the fluctuations in @math{f}
503 from the data @math{y_i} are normalised by @math{\sigma_i} and
504 so satisfy @c{$\langle \delta f \delta f^T \rangle = I$}
505 @math{<\delta f \delta f^T> = I}.
507 For an unweighted least-squares function @math{f_i = (Y(x, t_i) -
508 y_i)} the covariance matrix above should be multiplied by the variance
509 of the residuals about the best-fit @math{\sigma^2 = \sum (y_i - Y(x,t_i))^2 / (n-p)}
510 to give the variance-covariance
511 matrix @math{\sigma^2 C}. This estimates the statistical error on the
512 best-fit parameters from the scatter of the underlying data.
514 For more information about covariance matrices see @ref{Fitting Overview}.
517 @comment ============================================================
519 @node Example programs for Nonlinear Least-Squares Fitting
522 The following example program fits a weighted exponential model with
523 background to experimental data, @math{Y = A \exp(-\lambda t) + b}. The
524 first part of the program sets up the functions @code{expb_f} and
525 @code{expb_df} to calculate the model and its Jacobian. The appropriate
526 fitting function is given by,
530 f_i = ((A \exp(-\lambda t_i) + b) - y_i)/\sigma_i
537 f_i = ((A \exp(-\lambda t_i) + b) - y_i)/\sigma_i
542 where we have chosen @math{t_i = i}. The Jacobian matrix @math{J} is
543 the derivative of these functions with respect to the three parameters
544 (@math{A}, @math{\lambda}, @math{b}). It is given by,
548 J_{ij} = {\partial f_i \over \partial x_j}
555 J_@{ij@} = d f_i / d x_j
560 where @math{x_0 = A}, @math{x_1 = \lambda} and @math{x_2 = b}.
563 @verbatiminclude examples/expfit.c
567 The main part of the program sets up a Levenberg-Marquardt solver and
568 some simulated random data. The data uses the known parameters
569 (1.0,5.0,0.1) combined with gaussian noise (standard deviation = 0.1)
570 over a range of 40 timesteps. The initial guess for the parameters is
571 chosen as (0.0, 1.0, 0.0).
574 @verbatiminclude examples/nlfit.c
578 The iteration terminates when the change in x is smaller than 0.0001, as
579 both an absolute and relative change. Here are the results of running
583 iter: 0 x=1.00000000 0.00000000 0.00000000 |f(x)|=117.349
585 iter: 1 x=1.64659312 0.01814772 0.64659312 |f(x)|=76.4578
587 iter: 2 x=2.85876037 0.08092095 1.44796363 |f(x)|=37.6838
589 iter: 3 x=4.94899512 0.11942928 1.09457665 |f(x)|=9.58079
591 iter: 4 x=5.02175572 0.10287787 1.03388354 |f(x)|=5.63049
593 iter: 5 x=5.04520433 0.10405523 1.01941607 |f(x)|=5.44398
595 iter: 6 x=5.04535782 0.10404906 1.01924871 |f(x)|=5.44397
597 A = 5.04536 +/- 0.06028
598 lambda = 0.10405 +/- 0.00316
599 b = 1.01925 +/- 0.03782
604 The approximate values of the parameters are found correctly, and the
605 chi-squared value indicates a good fit (the chi-squared per degree of
606 freedom is approximately 1). In this case the errors on the parameters
607 can be estimated from the square roots of the diagonal elements of the
610 If the chi-squared value shows a poor fit (i.e. @c{$\chi^2/(n-p) \gg 1$}
611 @math{chi^2/dof >> 1}) then the error estimates obtained from the
612 covariance matrix will be too small. In the example program the error estimates
613 are multiplied by @c{$\sqrt{\chi^2/(n-p)}$}
614 @math{\sqrt@{\chi^2/dof@}} in this case, a common way of increasing the
615 errors for a poor fit. Note that a poor fit will result from the use
616 an inappropriate model, and the scaled error estimates may then
617 be outside the range of validity for gaussian errors.
621 @center @image{fit-exp,3.4in}
624 @node References and Further Reading for Nonlinear Least-Squares Fitting
625 @section References and Further Reading
627 The @sc{minpack} algorithm is described in the following article,
631 J.J. Mor@'e, @cite{The Levenberg-Marquardt Algorithm: Implementation and
632 Theory}, Lecture Notes in Mathematics, v630 (1978), ed G. Watson.
636 The following paper is also relevant to the algorithms described in this
641 J.J. Mor@'e, B.S. Garbow, K.E. Hillstrom, ``Testing Unconstrained
642 Optimization Software'', ACM Transactions on Mathematical Software, Vol
643 7, No 1 (1981), p 17--41.