htswanalysis/MACS/lib/gsl/gsl-1.11/doc/fitting.texi

   1 @cindex fitting
   2 @cindex least squares fit
   3 @cindex regression, least squares
   4 @cindex weighted linear fits
   5 @cindex unweighted linear fits
   6 This chapter describes routines for performing least squares fits to
   7 experimental data using linear combinations of functions.  The data
   8 may be weighted or unweighted, i.e. with known or unknown errors.  For
   9 weighted data the functions compute the best fit parameters and their
  10 associated covariance matrix.  For unweighted data the covariance
  11 matrix is estimated from the scatter of the points, giving a
  12 variance-covariance matrix.
  13
  14 The functions are divided into separate versions for simple one- or
  15 two-parameter regression and multiple-parameter fits.  The functions
  16 are declared in the header file @file{gsl_fit.h}.
  17
  18 @menu
  19 * Fitting Overview::
  20 * Linear regression::
  21 * Linear fitting without a constant term::
  22 * Multi-parameter fitting::
  23 * Fitting Examples::
  24 * Fitting References and Further Reading::
  25 @end menu
  26
  27 @node Fitting Overview
  28 @section Overview
  29
  30 Least-squares fits are found by minimizing @math{\chi^2}
  31 (chi-squared), the weighted sum of squared residuals over @math{n}
  32 experimental datapoints @math{(x_i, y_i)} for the model @math{Y(c,x)},
  33 @tex
  34 \beforedisplay
  35 $$
  36 \chi^2 = \sum_i w_i (y_i - Y(c, x_i))^2
  37 $$
  38 \afterdisplay
  39 @end tex
  40 @ifinfo
  41
  42 @example
  43 \chi^2 = \sum_i w_i (y_i - Y(c, x_i))^2
  44 @end example
  45
  46 @end ifinfo
  47 @noindent
  48 The @math{p} parameters of the model are @c{$c = \{c_0, c_1, \dots\}$}
  49 @math{c = @{c_0, c_1, @dots{}@}}.  The
  50 weight factors @math{w_i} are given by @math{w_i = 1/\sigma_i^2},
  51 where @math{\sigma_i} is the experimental error on the data-point
  52 @math{y_i}.  The errors are assumed to be
  53 gaussian and uncorrelated.
  54 For unweighted data the chi-squared sum is computed without any weight factors.
  55
  56 The fitting routines return the best-fit parameters @math{c} and their
  57 @math{p \times p} covariance matrix.  The covariance matrix measures the
  58 statistical errors on the best-fit parameters resulting from the
  59 errors on the data, @math{\sigma_i}, and is defined
  60 @cindex covariance matrix, linear fits
  61 as @c{$C_{ab} = \langle \delta c_a \delta c_b \rangle$}
  62 @math{C_@{ab@} = <\delta c_a \delta c_b>} where @c{$\langle \, \rangle$}
  63 @math{< >} denotes an average over the gaussian error distributions of the underlying datapoints.
  64
  65 The covariance matrix is calculated by error propagation from the data
  66 errors @math{\sigma_i}.  The change in a fitted parameter @math{\delta
  67 c_a} caused by a small change in the data @math{\delta y_i} is given
  68 by
  69 @tex
  70 \beforedisplay
  71 $$
  72 \delta c_a = \sum_i {\partial c_a \over \partial y_i} \delta y_i
  73 $$
  74 \afterdisplay
  75 @end tex
  76 @ifinfo
  77
  78 @example
  79 \delta c_a = \sum_i (dc_a/dy_i) \delta y_i
  80 @end example
  81
  82 @end ifinfo
  83 @noindent
  84 allowing the covariance matrix to be written in terms of the errors on the data,
  85 @tex
  86 \beforedisplay
  87 $$
  88 C_{ab} =  \sum_{i,j} {\partial c_a \over \partial y_i}
  89                      {\partial c_b \over \partial y_j}
  90                      \langle \delta y_i \delta y_j \rangle
  91 $$
  92 \afterdisplay
  93 @end tex
  94 @ifinfo
  95
  96 @example
  97 C_@{ab@} = \sum_@{i,j@} (dc_a/dy_i) (dc_b/dy_j) <\delta y_i \delta y_j>
  98 @end example
  99
 100 @end ifinfo
 101 @noindent
 102 For uncorrelated data the fluctuations of the underlying datapoints satisfy
 103 @c{$\langle \delta y_i \delta y_j \rangle = \sigma_i^2 \delta_{ij}$}
 104 @math{<\delta y_i \delta y_j> = \sigma_i^2 \delta_@{ij@}}, giving a
 105 corresponding parameter covariance matrix of
 106 @tex
 107 \beforedisplay
 108 $$
 109 C_{ab} = \sum_{i} {1 \over w_i} {\partial c_a \over \partial y_i} {\partial c_b \over \partial y_i}
 110 $$
 111 \afterdisplay
 112 @end tex
 113 @ifinfo
 114
 115 @example
 116 C_@{ab@} = \sum_i (1/w_i) (dc_a/dy_i) (dc_b/dy_i)
 117 @end example
 118
 119 @end ifinfo
 120 @noindent
 121 When computing the covariance matrix for unweighted data, i.e. data with unknown errors,
 122 the weight factors @math{w_i} in this sum are replaced by the single estimate @math{w =
 123 1/\sigma^2}, where @math{\sigma^2} is the computed variance of the
 124 residuals about the best-fit model, @math{\sigma^2 = \sum (y_i - Y(c,x_i))^2 / (n-p)}.
 125 This is referred to as the @dfn{variance-covariance matrix}.
 126 @cindex variance-covariance matrix, linear fits
 127
 128 The standard deviations of the best-fit parameters are given by the
 129 square root of the corresponding diagonal elements of
 130 the covariance matrix, @c{$\sigma_{c_a} = \sqrt{C_{aa}}$}
 131 @math{\sigma_@{c_a@} = \sqrt@{C_@{aa@}@}}.
 132 The correlation coefficient of the fit parameters @math{c_a} and @math{c_b}
 133 is given by @c{$\rho_{ab} = C_{ab} / \sqrt{C_{aa} C_{bb}}$}
 134 @math{\rho_@{ab@} = C_@{ab@} / \sqrt@{C_@{aa@} C_@{bb@}@}}.
 135
 136 @node   Linear regression
 137 @section Linear regression
 138 @cindex linear regression
 139
 140 The functions described in this section can be used to perform
 141 least-squares fits to a straight line model, @math{Y(c,x) = c_0 + c_1 x}.
 142
 143 @cindex covariance matrix, from linear regression
 144 @deftypefun int gsl_fit_linear (const double * @var{x}, const size_t @var{xstride}, const double * @var{y}, const size_t @var{ystride}, size_t @var{n}, double * @var{c0}, double * @var{c1}, double * @var{cov00}, double * @var{cov01}, double * @var{cov11}, double * @var{sumsq})
 145 This function computes the best-fit linear regression coefficients
 146 (@var{c0},@var{c1}) of the model @math{Y = c_0 + c_1 X} for the dataset
 147 (@var{x}, @var{y}), two vectors of length @var{n} with strides
 148 @var{xstride} and @var{ystride}.  The errors on @var{y} are assumed unknown so
 149 the variance-covariance matrix for the
 150 parameters (@var{c0}, @var{c1}) is estimated from the scatter of the
 151 points around the best-fit line and returned via the parameters
 152 (@var{cov00}, @var{cov01}, @var{cov11}).
 153 The sum of squares of the residuals from the best-fit line is returned
 154 in @var{sumsq}.  Note: the correlation coefficient of the data can be computed using @code{gsl_stats_correlation} (@pxref{Correlation}), it does not depend on the fit.
 155 @end deftypefun
 156
 157 @deftypefun int gsl_fit_wlinear (const double * @var{x}, const size_t @var{xstride}, const double * @var{w}, const size_t @var{wstride}, const double * @var{y}, const size_t @var{ystride}, size_t @var{n}, double * @var{c0}, double * @var{c1}, double * @var{cov00}, double * @var{cov01}, double * @var{cov11}, double * @var{chisq})
 158 This function computes the best-fit linear regression coefficients
 159 (@var{c0},@var{c1}) of the model @math{Y = c_0 + c_1 X} for the weighted
 160 dataset (@var{x}, @var{y}), two vectors of length @var{n} with strides
 161 @var{xstride} and @var{ystride}.  The vector @var{w}, of length @var{n}
 162 and stride @var{wstride}, specifies the weight of each datapoint. The
 163 weight is the reciprocal of the variance for each datapoint in @var{y}.
 164
 165 The covariance matrix for the parameters (@var{c0}, @var{c1}) is
 166 computed using the weights and returned via the parameters
 167 (@var{cov00}, @var{cov01}, @var{cov11}).  The weighted sum of squares
 168 of the residuals from the best-fit line, @math{\chi^2}, is returned in
 169 @var{chisq}.
 170 @end deftypefun
 171
 172 @deftypefun int gsl_fit_linear_est (double @var{x}, double @var{c0}, double @var{c1}, double @var{cov00}, double @var{cov01}, double @var{cov11}, double * @var{y}, double * @var{y_err})
 173 This function uses the best-fit linear regression coefficients
 174 @var{c0}, @var{c1} and their covariance
 175 @var{cov00}, @var{cov01}, @var{cov11} to compute the fitted function
 176 @var{y} and its standard deviation @var{y_err} for the model @math{Y =
 177 c_0 + c_1 X} at the point @var{x}.
 178 @end deftypefun
 179
 180 @node Linear fitting without a constant term
 181 @section Linear fitting without a constant term
 182
 183 The functions described in this section can be used to perform
 184 least-squares fits to a straight line model without a constant term,
 185 @math{Y = c_1 X}.
 186
 187 @deftypefun int gsl_fit_mul (const double * @var{x}, const size_t @var{xstride}, const double * @var{y}, const size_t @var{ystride}, size_t @var{n}, double * @var{c1}, double * @var{cov11}, double * @var{sumsq})
 188 This function computes the best-fit linear regression coefficient
 189 @var{c1} of the model @math{Y = c_1 X} for the datasets (@var{x},
 190 @var{y}), two vectors of length @var{n} with strides @var{xstride} and
 191 @var{ystride}.  The errors on @var{y} are assumed unknown so the
 192 variance of the parameter @var{c1} is estimated from
 193 the scatter of the points around the best-fit line and returned via the
 194 parameter @var{cov11}.  The sum of squares of the residuals from the
 195 best-fit line is returned in @var{sumsq}.
 196 @end deftypefun
 197
 198 @deftypefun int gsl_fit_wmul (const double * @var{x}, const size_t @var{xstride}, const double * @var{w}, const size_t @var{wstride}, const double * @var{y}, const size_t @var{ystride}, size_t @var{n}, double * @var{c1}, double * @var{cov11}, double * @var{sumsq})
 199 This function computes the best-fit linear regression coefficient
 200 @var{c1} of the model @math{Y = c_1 X} for the weighted datasets
 201 (@var{x}, @var{y}), two vectors of length @var{n} with strides
 202 @var{xstride} and @var{ystride}.  The vector @var{w}, of length @var{n}
 203 and stride @var{wstride}, specifies the weight of each datapoint. The
 204 weight is the reciprocal of the variance for each datapoint in @var{y}.
 205
 206 The variance of the parameter @var{c1} is computed using the weights
 207 and returned via the parameter @var{cov11}.  The weighted sum of
 208 squares of the residuals from the best-fit line, @math{\chi^2}, is
 209 returned in @var{chisq}.
 210 @end deftypefun
 211
 212 @deftypefun int gsl_fit_mul_est (double @var{x}, double @var{c1}, double @var{cov11}, double * @var{y}, double * @var{y_err})
 213 This function uses the best-fit linear regression coefficient @var{c1}
 214 and its covariance @var{cov11} to compute the fitted function
 215 @var{y} and its standard deviation @var{y_err} for the model @math{Y =
 216 c_1 X} at the point @var{x}.
 217 @end deftypefun
 218
 219 @node Multi-parameter fitting
 220 @section Multi-parameter fitting
 221 @cindex multi-parameter regression
 222 @cindex fits, multi-parameter linear
 223 The functions described in this section perform least-squares fits to a
 224 general linear model, @math{y = X c} where @math{y} is a vector of
 225 @math{n} observations, @math{X} is an @math{n} by @math{p} matrix of
 226 predictor variables, and the elements of the vector @math{c} are the @math{p} unknown best-fit parameters which are to be estimated.  The chi-squared value is given by @c{$\chi^2 = \sum_i w_i (y_i - \sum_j X_{ij} c_j)^2$}
 227 @math{\chi^2 = \sum_i w_i (y_i - \sum_j X_@{ij@} c_j)^2}.
 228
 229 This formulation can be used for fits to any number of functions and/or
 230 variables by preparing the @math{n}-by-@math{p} matrix @math{X}
 231 appropriately.  For example, to fit to a @math{p}-th order polynomial in
 232 @var{x}, use the following matrix,
 233 @tex
 234 \beforedisplay
 235 $$
 236 X_{ij} = x_i^j
 237 $$
 238 \afterdisplay
 239 @end tex
 240 @ifinfo
 241
 242 @example
 243 X_@{ij@} = x_i^j
 244 @end example
 245
 246 @end ifinfo
 247 @noindent
 248 where the index @math{i} runs over the observations and the index
 249 @math{j} runs from 0 to @math{p-1}.
 250
 251 To fit to a set of @math{p} sinusoidal functions with fixed frequencies
 252 @math{\omega_1}, @math{\omega_2}, @dots{}, @math{\omega_p}, use,
 253 @tex
 254 \beforedisplay
 255 $$
 256 X_{ij} = \sin(\omega_j x_i)
 257 $$
 258 \afterdisplay
 259 @end tex
 260 @ifinfo
 261
 262 @example
 263 X_@{ij@} = sin(\omega_j x_i)
 264 @end example
 265
 266 @end ifinfo
 267 @noindent
 268 To fit to @math{p} independent variables @math{x_1}, @math{x_2}, @dots{},
 269 @math{x_p}, use,
 270 @tex
 271 \beforedisplay
 272 $$
 273 X_{ij} = x_j(i)
 274 $$
 275 \afterdisplay
 276 @end tex
 277 @ifinfo
 278
 279 @example
 280 X_@{ij@} = x_j(i)
 281 @end example
 282
 283 @end ifinfo
 284 @noindent
 285 where @math{x_j(i)} is the @math{i}-th value of the predictor variable
 286 @math{x_j}.
 287
 288 The functions described in this section are declared in the header file
 289 @file{gsl_multifit.h}.
 290
 291 The solution of the general linear least-squares system requires an
 292 additional working space for intermediate results, such as the singular
 293 value decomposition of the matrix @math{X}.
 294
 295 @deftypefun {gsl_multifit_linear_workspace *} gsl_multifit_linear_alloc (size_t @var{n}, size_t @var{p})
 296 This function allocates a workspace for fitting a model to @var{n}
 297 observations using @var{p} parameters.
 298 @end deftypefun
 299
 300 @deftypefun void gsl_multifit_linear_free (gsl_multifit_linear_workspace * @var{work})
 301 This function frees the memory associated with the workspace @var{w}.
 302 @end deftypefun
 303
 304 @deftypefun int gsl_multifit_linear (const gsl_matrix * @var{X}, const gsl_vector * @var{y}, gsl_vector * @var{c}, gsl_matrix * @var{cov}, double * @var{chisq}, gsl_multifit_linear_workspace * @var{work})
 305 @deftypefunx int gsl_multifit_linear_svd (const gsl_matrix * @var{X}, const gsl_vector * @var{y}, double @var{tol}, size_t * @var{rank}, gsl_vector * @var{c}, gsl_matrix * @var{cov}, double * @var{chisq}, gsl_multifit_linear_workspace * @var{work})
 306 These functions compute the best-fit parameters @var{c} of the model
 307 @math{y = X c} for the observations @var{y} and the matrix of predictor
 308 variables @var{X}.  The variance-covariance matrix of the model
 309 parameters @var{cov} is estimated from the scatter of the observations
 310 about the best-fit.  The sum of squares of the residuals from the
 311 best-fit, @math{\chi^2}, is returned in @var{chisq}. If the coefficient
 312 of determination is desired, it can be computed from the expression
 313 @math{R^2 = 1 - \chi^2 / TSS}, where the total sum of squares (TSS) of
 314 the observations @var{y} may be computed from @code{gsl_stats_tss}.
 315
 316 The best-fit is found by singular value decomposition of the matrix
 317 @var{X} using the preallocated workspace provided in @var{work}. The
 318 modified Golub-Reinsch SVD algorithm is used, with column scaling to
 319 improve the accuracy of the singular values. Any components which have
 320 zero singular value (to machine precision) are discarded from the fit.
 321 In the second form of the function the components are discarded if the
 322 ratio of singular values @math{s_i/s_0} falls below the user-specified
 323 tolerance @var{tol}, and the effective rank is returned in @var{rank}.
 324 @end deftypefun
 325
 326 @deftypefun int gsl_multifit_wlinear (const gsl_matrix * @var{X}, const gsl_vector * @var{w}, const gsl_vector * @var{y}, gsl_vector * @var{c}, gsl_matrix * @var{cov}, double * @var{chisq}, gsl_multifit_linear_workspace * @var{work})
 327 @deftypefunx int gsl_multifit_wlinear_svd (const gsl_matrix * @var{X}, const gsl_vector * @var{w}, const gsl_vector * @var{y}, double @var{tol}, size_t * @var{rank}, gsl_vector * @var{c}, gsl_matrix * @var{cov}, double * @var{chisq}, gsl_multifit_linear_workspace * @var{work})
 328
 329 This function computes the best-fit parameters @var{c} of the weighted
 330 model @math{y = X c} for the observations @var{y} with weights @var{w}
 331 and the matrix of predictor variables @var{X}.  The covariance matrix of
 332 the model parameters @var{cov} is computed with the given weights.  The
 333 weighted sum of squares of the residuals from the best-fit,
 334 @math{\chi^2}, is returned in @var{chisq}. If the coefficient
 335 of determination is desired, it can be computed from the expression
 336 @math{R^2 = 1 - \chi^2 / WTSS}, where the weighted total sum of squares
 337 (WTSS) of the observations @var{y} may be computed from
 338 @code{gsl_stats_wtss}.
 339
 340 The best-fit is found by singular value decomposition of the matrix
 341 @var{X} using the preallocated workspace provided in @var{work}. Any
 342 components which have zero singular value (to machine precision) are
 343 discarded from the fit.  In the second form of the function the
 344 components are discarded if the ratio of singular values @math{s_i/s_0}
 345 falls below the user-specified tolerance @var{tol}, and the effective
 346 rank is returned in @var{rank}.
 347 @end deftypefun
 348
 349 @deftypefun int gsl_multifit_linear_est (const gsl_vector * @var{x}, const gsl_vector * @var{c}, const gsl_matrix * @var{cov}, double * @var{y}, double * @var{y_err})
 350 This function uses the best-fit multilinear regression coefficients
 351 @var{c} and their covariance matrix
 352 @var{cov} to compute the fitted function value
 353 @var{y} and its standard deviation @var{y_err} for the model @math{y = x.c}
 354 at the point @var{x}.
 355 @end deftypefun
 356
 357 @deftypefun int gsl_multifit_linear_residuals (const gsl_matrix * @var{X}, const gsl_vector * @var{y}, const gsl_vector * @var{c}, gsl_vector * @var{r})
 358 This function computes the vector of residuals @math{r = y - X c} for
 359 the observations @var{y}, coefficients @var{c} and matrix of predictor
 360 variables @var{X}.
 361 @end deftypefun
 362
 363 @node Fitting Examples
 364 @section Examples
 365
 366 The following program computes a least squares straight-line fit to a
 367 simple dataset, and outputs the best-fit line and its
 368 associated one standard-deviation error bars.
 369
 370 @example
 371 @verbatiminclude examples/fitting.c
 372 @end example
 373
 374 @noindent
 375 The following commands extract the data from the output of the program
 376 and display it using the @sc{gnu} plotutils @code{graph} utility,
 377
 378 @example
 379 $ ./demo > tmp
 380 $ more tmp
 381 # best fit: Y = -106.6 + 0.06 X
 382 # covariance matrix:
 383 # [ 39602, -19.9
 384 #   -19.9, 0.01]
 385 # chisq = 0.8
 386
 387 $ for n in data fit hi lo ;
 388    do
 389      grep "^$n" tmp | cut -d: -f2 > $n ;
 390    done
 391 $ graph -T X -X x -Y y -y 0 20 -m 0 -S 2 -Ie data
 392      -S 0 -I a -m 1 fit -m 2 hi -m 2 lo
 393 @end example
 394
 395 @iftex
 396 @sp 1
 397 @center @image{fit-wlinear,3.0in}
 398 @end iftex
 399
 400 The next program performs a quadratic fit @math{y = c_0 + c_1 x + c_2
 401 x^2} to a weighted dataset using the generalised linear fitting function
 402 @code{gsl_multifit_wlinear}.  The model matrix @math{X} for a quadratic
 403 fit is given by,
 404 @tex
 405 \beforedisplay
 406 $$
 407 X=\pmatrix{1&x_0&x_0^2\cr
 408 1&x_1&x_1^2\cr
 409 1&x_2&x_2^2\cr
 410 \dots&\dots&\dots\cr}
 411 $$
 412 \afterdisplay
 413 @end tex
 414 @ifinfo
 415
 416 @example
 417 X = [ 1   , x_0  , x_0^2 ;
 418       1   , x_1  , x_1^2 ;
 419       1   , x_2  , x_2^2 ;
 420       ... , ...  , ...   ]
 421 @end example
 422
 423 @end ifinfo
 424 @noindent
 425 where the column of ones corresponds to the constant term @math{c_0}.
 426 The two remaining columns corresponds to the terms @math{c_1 x} and
 427 @math{c_2 x^2}.
 428
 429 The program reads @var{n} lines of data in the format (@var{x}, @var{y},
 430 @var{err}) where @var{err} is the error (standard deviation) in the
 431 value @var{y}.
 432
 433 @example
 434 @verbatiminclude examples/fitting2.c
 435 @end example
 436
 437 @noindent
 438 A suitable set of data for fitting can be generated using the following
 439 program.  It outputs a set of points with gaussian errors from the curve
 440 @math{y = e^x} in the region @math{0 < x < 2}.
 441
 442 @example
 443 @verbatiminclude examples/fitting3.c
 444 @end example
 445
 446 @noindent
 447 The data can be prepared by running the resulting executable program,
 448
 449 @example
 450 $ ./generate > exp.dat
 451 $ more exp.dat
 452 0.1 0.97935 0.110517
 453 0.2 1.3359 0.12214
 454 0.3 1.52573 0.134986
 455 0.4 1.60318 0.149182
 456 0.5 1.81731 0.164872
 457 0.6 1.92475 0.182212
 458 ....
 459 @end example
 460
 461 @noindent
 462 To fit the data use the previous program, with the number of data points
 463 given as the first argument.  In this case there are 19 data points.
 464
 465 @example
 466 $ ./fit 19 < exp.dat
 467 0.1 0.97935 +/- 0.110517
 468 0.2 1.3359 +/- 0.12214
 469 ...
 470 # best fit: Y = 1.02318 + 0.956201 X + 0.876796 X^2
 471 # covariance matrix:
 472 [ +1.25612e-02, -3.64387e-02, +1.94389e-02
 473   -3.64387e-02, +1.42339e-01, -8.48761e-02
 474   +1.94389e-02, -8.48761e-02, +5.60243e-02 ]
 475 # chisq = 23.0987
 476 @end example
 477
 478 @noindent
 479 The parameters of the quadratic fit match the coefficients of the
 480 expansion of @math{e^x}, taking into account the errors on the
 481 parameters and the @math{O(x^3)} difference between the exponential and
 482 quadratic functions for the larger values of @math{x}.  The errors on
 483 the parameters are given by the square-root of the corresponding
 484 diagonal elements of the covariance matrix.  The chi-squared per degree
 485 of freedom is 1.4, indicating a reasonable fit to the data.
 486
 487 @iftex
 488 @sp 1
 489 @center @image{fit-wlinear2,3.0in}
 490 @end iftex
 491
 492 @node Fitting References and Further Reading
 493 @section References and Further Reading
 494
 495 A summary of formulas and techniques for least squares fitting can be
 496 found in the ``Statistics'' chapter of the Annual Review of Particle
 497 Physics prepared by the Particle Data Group,
 498
 499 @itemize @asis
 500 @item
 501 @cite{Review of Particle Properties},
 502 R.M. Barnett et al., Physical Review D54, 1 (1996)
 503 @uref{http://pdg.lbl.gov/}
 504 @end itemize
 505
 506 @noindent
 507 The Review of Particle Physics is available online at the website given
 508 above.
 509
 510 @cindex NIST Statistical Reference Datasets
 511 @cindex Statistical Reference Datasets (StRD)
 512 The tests used to prepare these routines are based on the NIST
 513 Statistical Reference Datasets. The datasets and their documentation are
 514 available from NIST at the following website,
 515
 516 @center @uref{http://www.nist.gov/itl/div898/strd/index.html}.
 517
 518