htswanalysis/MACS/lib/gsl/gsl-1.11/doc/multimin.texi

   1 @cindex minimization, multidimensional
   2
   3 This chapter describes routines for finding minima of arbitrary
   4 multidimensional functions.  The library provides low level components
   5 for a variety of iterative minimizers and convergence tests.  These can
   6 be combined by the user to achieve the desired solution, while providing
   7 full access to the intermediate steps of the algorithms.  Each class of
   8 methods uses the same framework, so that you can switch between
   9 minimizers at runtime without needing to recompile your program.  Each
  10 instance of a minimizer keeps track of its own state, allowing the
  11 minimizers to be used in multi-threaded programs. The minimization
  12 algorithms can be used to maximize a function by inverting its sign.
  13
  14 The header file @file{gsl_multimin.h} contains prototypes for the
  15 minimization functions and related declarations.
  16
  17 @menu
  18 * Multimin Overview::
  19 * Multimin Caveats::
  20 * Initializing the Multidimensional Minimizer::
  21 * Providing a function to minimize::
  22 * Multimin Iteration::
  23 * Multimin Stopping Criteria::
  24 * Multimin Algorithms::
  25 * Multimin Examples::
  26 * Multimin References and Further Reading::
  27 @end menu
  28
  29 @node Multimin Overview
  30 @section Overview
  31
  32 The problem of multidimensional minimization requires finding a point
  33 @math{x} such that the scalar function,
  34 @tex
  35 \beforedisplay
  36 $$
  37 f(x_1, \dots, x_n)
  38 $$
  39 \afterdisplay
  40 @end tex
  41 @ifinfo
  42
  43 @example
  44 f(x_1, @dots{}, x_n)
  45 @end example
  46
  47 @end ifinfo
  48 @noindent
  49 takes a value which is lower than at any neighboring point. For smooth
  50 functions the gradient @math{g = \nabla f} vanishes at the minimum. In
  51 general there are no bracketing methods available for the
  52 minimization of @math{n}-dimensional functions.  The algorithms
  53 proceed from an initial guess using a search algorithm which attempts
  54 to move in a downhill direction.
  55
  56 Algorithms making use of the gradient of the function perform a
  57 one-dimensional line minimisation along this direction until the lowest
  58 point is found to a suitable tolerance.  The search direction is then
  59 updated with local information from the function and its derivatives,
  60 and the whole process repeated until the true @math{n}-dimensional
  61 minimum is found.
  62
  63 The Nelder-Mead Simplex algorithm applies a different strategy.  It
  64 maintains @math{n+1} trial parameter vectors as the vertices of a
  65 @math{n}-dimensional simplex.  In each iteration step it tries to
  66 improve the worst vertex by a simple geometrical transformation until
  67 the size of the simplex falls below a given tolerance.
  68
  69 Both types of algorithms use a standard framework. The user provides a
  70 high-level driver for the algorithms, and the library provides the
  71 individual functions necessary for each of the steps.  There are three
  72 main phases of the iteration.  The steps are,
  73
  74 @itemize @bullet
  75 @item
  76 initialize minimizer state, @var{s}, for algorithm @var{T}
  77
  78 @item
  79 update @var{s} using the iteration @var{T}
  80
  81 @item
  82 test @var{s} for convergence, and repeat iteration if necessary
  83 @end itemize
  84
  85 @noindent
  86 Each iteration step consists either of an improvement to the
  87 line-minimisation in the current direction or an update to the search
  88 direction itself.  The state for the minimizers is held in a
  89 @code{gsl_multimin_fdfminimizer} struct or a
  90 @code{gsl_multimin_fminimizer} struct.
  91
  92 @node Multimin Caveats
  93 @section Caveats
  94 @cindex Multimin, caveats
  95
  96 Note that the minimization algorithms can only search for one local
  97 minimum at a time.  When there are several local minima in the search
  98 area, the first minimum to be found will be returned; however it is
  99 difficult to predict which of the minima this will be.  In most cases,
 100 no error will be reported if you try to find a local minimum in an area
 101 where there is more than one.
 102
 103 It is also important to note that the minimization algorithms find local
 104 minima; there is no way to determine whether a minimum is a global
 105 minimum of the function in question.
 106
 107 @node Initializing the Multidimensional Minimizer
 108 @section Initializing the Multidimensional Minimizer
 109
 110 The following function initializes a multidimensional minimizer.  The
 111 minimizer itself depends only on the dimension of the problem and the
 112 algorithm and can be reused for different problems.
 113
 114 @deftypefun {gsl_multimin_fdfminimizer *} gsl_multimin_fdfminimizer_alloc (const gsl_multimin_fdfminimizer_type * @var{T}, size_t @var{n})
 115 @deftypefunx {gsl_multimin_fminimizer *} gsl_multimin_fminimizer_alloc (const gsl_multimin_fminimizer_type * @var{T}, size_t @var{n})
 116 This function returns a pointer to a newly allocated instance of a
 117 minimizer of type @var{T} for an @var{n}-dimension function.  If there
 118 is insufficient memory to create the minimizer then the function returns
 119 a null pointer and the error handler is invoked with an error code of
 120 @code{GSL_ENOMEM}.
 121 @end deftypefun
 122
 123 @deftypefun int gsl_multimin_fdfminimizer_set (gsl_multimin_fdfminimizer * @var{s}, gsl_multimin_function_fdf * @var{fdf}, const gsl_vector * @var{x}, double @var{step_size}, double @var{tol})
 124 This function initializes the minimizer @var{s} to minimize the function
 125 @var{fdf} starting from the initial point @var{x}.  The size of the
 126 first trial step is given by @var{step_size}.  The accuracy of the line
 127 minimization is specified by @var{tol}.  The precise meaning of this
 128 parameter depends on the method used.  Typically the line minimization
 129 is considered successful if the gradient of the function @math{g} is
 130 orthogonal to the current search direction @math{p} to a relative
 131 accuracy of @var{tol}, where @c{$p\cdot g < tol |p| |g|$}
 132 @math{dot(p,g) < tol |p| |g|}.  A @var{tol} value of 0.1 is
 133 suitable for most purposes, since line minimization only needs to
 134 be carried out approximately.    Note that setting @var{tol} to zero will
 135 force the use of ``exact'' line-searches, which are extremely expensive.
 136
 137 @deftypefunx int gsl_multimin_fminimizer_set (gsl_multimin_fminimizer * @var{s}, gsl_multimin_function * @var{f}, const gsl_vector * @var{x}, const gsl_vector * @var{step_size})
 138 This function initializes the minimizer @var{s} to minimize the function
 139 @var{f}, starting from the initial point
 140 @var{x}. The size of the initial trial steps is given in vector
 141 @var{step_size}. The precise meaning of this parameter depends on the
 142 method used.
 143 @end deftypefun
 144
 145 @deftypefun void gsl_multimin_fdfminimizer_free (gsl_multimin_fdfminimizer * @var{s})
 146 @deftypefunx void gsl_multimin_fminimizer_free (gsl_multimin_fminimizer * @var{s})
 147 This function frees all the memory associated with the minimizer
 148 @var{s}.
 149 @end deftypefun
 150
 151 @deftypefun {const char *} gsl_multimin_fdfminimizer_name (const gsl_multimin_fdfminimizer * @var{s})
 152 @deftypefunx {const char *} gsl_multimin_fminimizer_name (const gsl_multimin_fminimizer * @var{s})
 153 This function returns a pointer to the name of the minimizer.  For example,
 154
 155 @example
 156 printf ("s is a '%s' minimizer\n",
 157         gsl_multimin_fdfminimizer_name (s));
 158 @end example
 159
 160 @noindent
 161 would print something like @code{s is a 'conjugate_pr' minimizer}.
 162 @end deftypefun
 163
 164 @node Providing a function to minimize
 165 @section Providing a function to minimize
 166
 167 You must provide a parametric function of @math{n} variables for the
 168 minimizers to operate on.  You may also need to provide a routine which
 169 calculates the gradient of the function and a third routine which
 170 calculates both the function value and the gradient together.  In order
 171 to allow for general parameters the functions are defined by the
 172 following data types:
 173
 174 @deftp {Data Type} gsl_multimin_function_fdf
 175 This data type defines a general function of @math{n} variables with
 176 parameters and the corresponding gradient vector of derivatives,
 177
 178 @table @code
 179 @item double (* f) (const gsl_vector * @var{x}, void * @var{params})
 180 this function should return the result
 181 @c{$f(x,\hbox{\it params})$}
 182 @math{f(x,params)} for argument @var{x} and parameters @var{params}.
 183 If the function cannot be computed, an error value of @code{GSL_NAN}
 184 should be returned.
 185
 186 @item void (* df) (const gsl_vector * @var{x}, void * @var{params}, gsl_vector * @var{g})
 187 this function should store the @var{n}-dimensional gradient
 188 @c{$g_i = \partial f(x,\hbox{\it params}) / \partial x_i$}
 189 @math{g_i = d f(x,params) / d x_i} in the vector @var{g} for argument @var{x}
 190 and parameters @var{params}, returning an appropriate error code if the
 191 function cannot be computed.
 192
 193 @item void (* fdf) (const gsl_vector * @var{x}, void * @var{params}, double * f, gsl_vector * @var{g})
 194 This function should set the values of the @var{f} and @var{g} as above,
 195 for arguments @var{x} and parameters @var{params}.  This function
 196 provides an optimization of the separate functions for @math{f(x)} and
 197 @math{g(x)}---it is always faster to compute the function and its
 198 derivative at the same time.
 199
 200 @item size_t n
 201 the dimension of the system, i.e. the number of components of the
 202 vectors @var{x}.
 203
 204 @item void * params
 205 a pointer to the parameters of the function.
 206 @end table
 207 @end deftp
 208 @deftp {Data Type} gsl_multimin_function
 209 This data type defines a general function of @math{n} variables with
 210 parameters,
 211
 212 @table @code
 213 @item double (* f) (const gsl_vector * @var{x}, void * @var{params})
 214 this function should return the result
 215 @c{$f(x,\hbox{\it params})$}
 216 @math{f(x,params)} for argument @var{x} and parameters @var{params}.
 217 If the function cannot be computed, an error value of @code{GSL_NAN}
 218 should be returned.
 219
 220 @item size_t n
 221 the dimension of the system, i.e. the number of components of the
 222 vectors @var{x}.
 223
 224 @item void * params
 225 a pointer to the parameters of the function.
 226 @end table
 227 @end deftp
 228
 229 @noindent
 230 The following example function defines a simple two-dimensional
 231 paraboloid with five parameters,
 232
 233 @example
 234 @verbatiminclude examples/multiminfn.c
 235 @end example
 236
 237 @noindent
 238 The function can be initialized using the following code,
 239
 240 @example
 241 gsl_multimin_function_fdf my_func;
 242
 243 /* Paraboloid center at (1,2), scale factors (10, 20),
 244    minimum value 30 */
 245 double p[5] = @{ 1.0, 2.0, 10.0, 20.0, 30.0 @};
 246
 247 my_func.n = 2;  /* number of function components */
 248 my_func.f = &my_f;
 249 my_func.df = &my_df;
 250 my_func.fdf = &my_fdf;
 251 my_func.params = (void *)p;
 252 @end example
 253
 254 @node Multimin Iteration
 255 @section Iteration
 256
 257 The following function drives the iteration of each algorithm.  The
 258 function performs one iteration to update the state of the minimizer.
 259 The same function works for all minimizers so that different methods can
 260 be substituted at runtime without modifications to the code.
 261
 262 @deftypefun int gsl_multimin_fdfminimizer_iterate (gsl_multimin_fdfminimizer * @var{s})
 263 @deftypefunx int gsl_multimin_fminimizer_iterate (gsl_multimin_fminimizer * @var{s})
 264 These functions perform a single iteration of the minimizer @var{s}.  If
 265 the iteration encounters an unexpected problem then an error code will
 266 be returned.
 267 @end deftypefun
 268
 269 @noindent
 270 The minimizer maintains a current best estimate of the minimum at all
 271 times.  This information can be accessed with the following auxiliary
 272 functions,
 273
 274 @deftypefun {gsl_vector *} gsl_multimin_fdfminimizer_x (const gsl_multimin_fdfminimizer * @var{s})
 275 @deftypefunx {gsl_vector *} gsl_multimin_fminimizer_x (const gsl_multimin_fminimizer * @var{s})
 276 @deftypefunx double gsl_multimin_fdfminimizer_minimum (const gsl_multimin_fdfminimizer * @var{s})
 277 @deftypefunx double gsl_multimin_fminimizer_minimum (const gsl_multimin_fminimizer * @var{s})
 278 @deftypefunx {gsl_vector *} gsl_multimin_fdfminimizer_gradient (const gsl_multimin_fdfminimizer * @var{s})
 279 @deftypefunx double gsl_multimin_fminimizer_size (const gsl_multimin_fminimizer * @var{s})
 280 These functions return the current best estimate of the location of the
 281 minimum, the value of the function at that point, its gradient,
 282 and minimizer specific characteristic size for the minimizer @var{s}.
 283 @end deftypefun
 284
 285 @deftypefun int gsl_multimin_fdfminimizer_restart (gsl_multimin_fdfminimizer * @var{s})
 286 This function resets the minimizer @var{s} to use the current point as a
 287 new starting point.
 288 @end deftypefun
 289
 290 @node Multimin Stopping Criteria
 291 @section Stopping Criteria
 292
 293 A minimization procedure should stop when one of the following
 294 conditions is true:
 295
 296 @itemize @bullet
 297 @item
 298 A minimum has been found to within the user-specified precision.
 299
 300 @item
 301 A user-specified maximum number of iterations has been reached.
 302
 303 @item
 304 An error has occurred.
 305 @end itemize
 306
 307 @noindent
 308 The handling of these conditions is under user control.  The functions
 309 below allow the user to test the precision of the current result.
 310
 311 @deftypefun int gsl_multimin_test_gradient (const gsl_vector * @var{g}, double @var{epsabs})
 312 This function tests the norm of the gradient @var{g} against the
 313 absolute tolerance @var{epsabs}. The gradient of a multidimensional
 314 function goes to zero at a minimum. The test returns @code{GSL_SUCCESS}
 315 if the following condition is achieved,
 316 @tex
 317 \beforedisplay
 318 $$
 319 |g| < \hbox{\it epsabs}
 320 $$
 321 \afterdisplay
 322 @end tex
 323 @ifinfo
 324
 325 @example
 326 |g| < epsabs
 327 @end example
 328
 329 @end ifinfo
 330 @noindent
 331 and returns @code{GSL_CONTINUE} otherwise.  A suitable choice of
 332 @var{epsabs} can be made from the desired accuracy in the function for
 333 small variations in @math{x}.  The relationship between these quantities
 334 is given by @c{$\delta{f} = g\,\delta{x}$}
 335 @math{\delta f = g \delta x}.
 336 @end deftypefun
 337
 338 @deftypefun int gsl_multimin_test_size (const double @var{size}, double @var{epsabs})
 339 This function tests the minimizer specific characteristic
 340 size (if applicable to the used minimizer) against absolute tolerance @var{epsabs}.
 341 The test returns @code{GSL_SUCCESS} if the size is smaller than tolerance,
 342 otherwise @code{GSL_CONTINUE} is returned.
 343 @end deftypefun
 344
 345 @node Multimin Algorithms
 346 @section Algorithms
 347
 348 There are several minimization methods available. The best choice of
 349 algorithm depends on the problem.  All of the algorithms use the value
 350 of the function and its gradient at each evaluation point, except for
 351 the Simplex algorithm which uses function values only.
 352
 353 @deffn {Minimizer} gsl_multimin_fdfminimizer_conjugate_fr
 354 @cindex Fletcher-Reeves conjugate gradient algorithm, minimization
 355 @cindex Conjugate gradient algorithm, minimization
 356 @cindex minimization, conjugate gradient algorithm
 357 This is the Fletcher-Reeves conjugate gradient algorithm. The conjugate
 358 gradient algorithm proceeds as a succession of line minimizations. The
 359 sequence of search directions is used to build up an approximation to the
 360 curvature of the function in the neighborhood of the minimum.
 361
 362 An initial search direction @var{p} is chosen using the gradient, and line
 363 minimization is carried out in that direction.  The accuracy of the line
 364 minimization is specified by the parameter @var{tol}.  The minimum
 365 along this line occurs when the function gradient @var{g} and the search direction
 366 @var{p} are orthogonal.  The line minimization terminates when
 367 @c{$p\cdot g < tol |p| |g|$}
 368 @math{dot(p,g) < tol |p| |g|}.  The
 369 search direction is updated  using the Fletcher-Reeves formula
 370 @math{p' = g' - \beta g} where @math{\beta=-|g'|^2/|g|^2}, and
 371 the line minimization is then repeated for the new search
 372 direction.
 373 @end deffn
 374
 375 @deffn {Minimizer} gsl_multimin_fdfminimizer_conjugate_pr
 376 @cindex Polak-Ribiere algorithm, minimization
 377 @cindex minimization, Polak-Ribiere algorithm
 378 This is the Polak-Ribiere conjugate gradient algorithm.  It is similar
 379 to the Fletcher-Reeves method, differing only in the choice of the
 380 coefficient @math{\beta}. Both methods work well when the evaluation
 381 point is close enough to the minimum of the objective function that it
 382 is well approximated by a quadratic hypersurface.
 383 @end deffn
 384
 385 @deffn {Minimizer} gsl_multimin_fdfminimizer_vector_bfgs2
 386 @deffnx {Minimizer} gsl_multimin_fdfminimizer_vector_bfgs
 387 @cindex BFGS algorithm, minimization
 388 @cindex minimization, BFGS algorithm
 389 These methods use the vector Broyden-Fletcher-Goldfarb-Shanno (BFGS)
 390 algorithm.  This is a quasi-Newton method which builds up an approximation
 391 to the second derivatives of the function @math{f} using the difference
 392 between successive gradient vectors.  By combining the first and second
 393 derivatives the algorithm is able to take Newton-type steps towards the
 394 function minimum, assuming quadratic behavior in that region.
 395
 396 The @code{bfgs2} version of this minimizer is the most efficient
 397 version available, and is a faithful implementation of the line
 398 minimization scheme described in Fletcher's @cite{Practical Methods of
 399 Optimization}, Algorithms 2.6.2 and 2.6.4.  It supercedes the original
 400 @code{bfgs} routine and requires substantially fewer function and
 401 gradient evaluations.  The user-supplied tolerance @var{tol}
 402 corresponds to the parameter @math{\sigma} used by Fletcher.  A value
 403 of 0.1 is recommended for typical use (larger values correspond to
 404 less accurate line searches).
 405
 406 @end deffn
 407
 408 @deffn {Minimizer} gsl_multimin_fdfminimizer_steepest_descent
 409 @cindex steepest descent algorithm, minimization
 410 @cindex minimization, steepest descent algorithm
 411 The steepest descent algorithm follows the downhill gradient of the
 412 function at each step. When a downhill step is successful the step-size
 413 is increased by a factor of two.  If the downhill step leads to a higher
 414 function value then the algorithm backtracks and the step size is
 415 decreased using the parameter @var{tol}.  A suitable value of @var{tol}
 416 for most applications is 0.1.  The steepest descent method is
 417 inefficient and is included only for demonstration purposes.
 418 @end deffn
 419
 420 @deffn {Minimizer} gsl_multimin_fminimizer_nmsimplex
 421 @cindex Nelder-Mead simplex algorithm for minimization
 422 @cindex simplex algorithm, minimization
 423 @cindex minimization, simplex algorithm
 424 This is the Simplex algorithm of Nelder and Mead. It constructs
 425 @math{n} vectors @math{p_i} from the
 426 starting vector @var{x} and the vector @var{step_size} as follows:
 427 @tex
 428 \beforedisplay
 429 $$
 430 \eqalign{
 431 p_0 & = (x_0, x_1, \cdots , x_n) \cr
 432 p_1 & = (x_0 + step\_size_0, x_1, \cdots , x_n) \cr
 433 p_2 & = (x_0, x_1 + step\_size_1, \cdots , x_n) \cr
 434 \dots &= \dots \cr
 435 p_n & = (x_0, x_1, \cdots , x_n+step\_size_n) \cr
 436 }
 437 $$
 438 \afterdisplay
 439 @end tex
 440 @ifinfo
 441
 442 @example
 443 p_0 = (x_0, x_1, ... , x_n)
 444 p_1 = (x_0 + step_size_0, x_1, ... , x_n)
 445 p_2 = (x_0, x_1 + step_size_1, ... , x_n)
 446 ... = ...
 447 p_n = (x_0, x_1, ... , x_n+step_size_n)
 448 @end example
 449
 450 @end ifinfo
 451 @noindent
 452 These vectors form the @math{n+1} vertices of a simplex in @math{n}
 453 dimensions.  On each iteration the algorithm tries to improve
 454 the parameter vector @math{p_i} corresponding to the highest
 455 function value by simple geometrical transformations.  These
 456 are reflection, reflection followed by expansion, contraction and multiple
 457 contraction. Using these transformations the simplex moves through
 458 the parameter space towards the minimum, where it contracts itself.
 459
 460 After each iteration, the best vertex is returned.  Note, that due to
 461 the nature of the algorithm not every step improves the current
 462 best parameter vector.  Usually several iterations are required.
 463
 464 The routine calculates the minimizer specific characteristic size as the
 465 average distance from the geometrical center of the simplex to all its
 466 vertices.  This size can be used as a stopping criteria, as the simplex
 467 contracts itself near the minimum. The size is returned by the function
 468 @code{gsl_multimin_fminimizer_size}.
 469 @end deffn
 470
 471 @node Multimin Examples
 472 @section Examples
 473
 474 This example program finds the minimum of the paraboloid function
 475 defined earlier.  The location of the minimum is offset from the origin
 476 in @math{x} and @math{y}, and the function value at the minimum is
 477 non-zero. The main program is given below, it requires the example
 478 function given earlier in this chapter.
 479
 480 @smallexample
 481 @verbatiminclude examples/multimin.c
 482 @end smallexample
 483
 484 @noindent
 485 The initial step-size is chosen as 0.01, a conservative estimate in this
 486 case, and the line minimization parameter is set at 0.0001.  The program
 487 terminates when the norm of the gradient has been reduced below
 488 0.001. The output of the program is shown below,
 489
 490 @example
 491 @verbatiminclude examples/multimin.out
 492 @end example
 493
 494 @noindent
 495 Note that the algorithm gradually increases the step size as it
 496 successfully moves downhill, as can be seen by plotting the successive
 497 points.
 498
 499 @iftex
 500 @sp 1
 501 @center @image{multimin,3.4in}
 502 @end iftex
 503
 504 @noindent
 505 The conjugate gradient algorithm finds the minimum on its second
 506 direction because the function is purely quadratic. Additional
 507 iterations would be needed for a more complicated function.
 508
 509 Here is another example using the Nelder-Mead Simplex algorithm to
 510 minimize the same example object function, as above.
 511
 512 @smallexample
 513 @verbatiminclude examples/nmsimplex.c
 514 @end smallexample
 515
 516 @noindent
 517 The minimum search stops when the Simplex size drops to 0.01. The output is
 518 shown below.
 519
 520 @example
 521 @verbatiminclude examples/nmsimplex.out
 522 @end example
 523
 524 @noindent
 525 The simplex size first increases, while the simplex moves towards the
 526 minimum. After a while the size begins to decrease as the simplex
 527 contracts around the minimum.
 528
 529 @node Multimin References and Further Reading
 530 @section References and Further Reading
 531
 532 The conjugate gradient and BFGS methods are described in detail in the
 533 following book,
 534
 535 @itemize @asis
 536 @item R. Fletcher,
 537 @cite{Practical Methods of Optimization (Second Edition)} Wiley
 538 (1987), ISBN 0471915475.
 539 @end itemize
 540
 541 A brief description of multidimensional minimization algorithms and
 542 more recent references can be found in,
 543
 544 @itemize @asis
 545 @item C.W. Ueberhuber,
 546 @cite{Numerical Computation (Volume 2)}, Chapter 14, Section 4.4
 547 ``Minimization Methods'', p.@: 325--335, Springer (1997), ISBN
 548 3-540-62057-5.
 549 @end itemize
 550
 551 @noindent
 552 The simplex algorithm is described in the following paper,
 553
 554 @itemize @asis
 555 @item J.A. Nelder and R. Mead,
 556 @cite{A simplex method for function minimization}, Computer Journal
 557 vol.@: 7 (1965), 308--315.
 558 @end itemize
 559
 560 @noindent