htswanalysis/MACS/lib/gsl/gsl-1.11/doc/sort.texi

   1 @cindex sorting
   2 @cindex heapsort
   3 This chapter describes functions for sorting data, both directly and
   4 indirectly (using an index).  All the functions use the @dfn{heapsort}
   5 algorithm.  Heapsort is an @math{O(N \log N)} algorithm which operates
   6 in-place and does not require any additional storage.  It also provides
   7 consistent performance, the running time for its worst-case (ordered
   8 data) being not significantly longer than the average and best cases.
   9 Note that the heapsort algorithm does not preserve the relative ordering
  10 of equal elements---it is an @dfn{unstable} sort.  However the resulting
  11 order of equal elements will be consistent across different platforms
  12 when using these functions.
  13
  14 @menu
  15 * Sorting objects::
  16 * Sorting vectors::
  17 * Selecting the k smallest or largest elements::
  18 * Computing the rank::
  19 * Sorting Examples::
  20 * Sorting References and Further Reading::
  21 @end menu
  22
  23 @node Sorting objects
  24 @section Sorting objects
  25
  26 The following function provides a simple alternative to the standard
  27 library function @code{qsort}.  It is intended for systems lacking
  28 @code{qsort}, not as a replacement for it.  The function @code{qsort}
  29 should be used whenever possible, as it will be faster and can provide
  30 stable ordering of equal elements.  Documentation for @code{qsort} is
  31 available in the @cite{GNU C Library Reference Manual}.
  32
  33 The functions described in this section are defined in the header file
  34 @file{gsl_heapsort.h}.
  35
  36 @cindex comparison functions, definition
  37 @deftypefun void gsl_heapsort (void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare})
  38
  39 This function sorts the @var{count} elements of the array @var{array},
  40 each of size @var{size}, into ascending order using the comparison
  41 function @var{compare}.  The type of the comparison function is defined by,
  42
  43 @example
  44 int (*gsl_comparison_fn_t) (const void * a,
  45                             const void * b)
  46 @end example
  47
  48 @noindent
  49 A comparison function should return a negative integer if the first
  50 argument is less than the second argument, @code{0} if the two arguments
  51 are equal and a positive integer if the first argument is greater than
  52 the second argument.
  53
  54 For example, the following function can be used to sort doubles into
  55 ascending numerical order.
  56
  57 @example
  58 int
  59 compare_doubles (const double * a,
  60                  const double * b)
  61 @{
  62     if (*a > *b)
  63        return 1;
  64     else if (*a < *b)
  65        return -1;
  66     else
  67        return 0;
  68 @}
  69 @end example
  70
  71 @noindent
  72 The appropriate function call to perform the sort is,
  73
  74 @example
  75 gsl_heapsort (array, count, sizeof(double),
  76               compare_doubles);
  77 @end example
  78
  79 Note that unlike @code{qsort} the heapsort algorithm cannot be made into
  80 a stable sort by pointer arithmetic.  The trick of comparing pointers for
  81 equal elements in the comparison function does not work for the heapsort
  82 algorithm.  The heapsort algorithm performs an internal rearrangement of
  83 the data which destroys its initial ordering.
  84 @end deftypefun
  85
  86 @cindex indirect sorting
  87 @deftypefun int gsl_heapsort_index (size_t * @var{p}, const void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare})
  88
  89 This function indirectly sorts the @var{count} elements of the array
  90 @var{array}, each of size @var{size}, into ascending order using the
  91 comparison function @var{compare}.  The resulting permutation is stored
  92 in @var{p}, an array of length @var{n}.  The elements of @var{p} give the
  93 index of the array element which would have been stored in that position
  94 if the array had been sorted in place.  The first element of @var{p}
  95 gives the index of the least element in @var{array}, and the last
  96 element of @var{p} gives the index of the greatest element in
  97 @var{array}.  The array itself is not changed.
  98 @end deftypefun
  99
 100 @node Sorting vectors
 101 @section Sorting vectors
 102
 103 The following functions will sort the elements of an array or vector,
 104 either directly or indirectly.  They are defined for all real and integer
 105 types using the normal suffix rules.  For example, the @code{float}
 106 versions of the array functions are @code{gsl_sort_float} and
 107 @code{gsl_sort_float_index}.  The corresponding vector functions are
 108 @code{gsl_sort_vector_float} and @code{gsl_sort_vector_float_index}.  The
 109 prototypes are available in the header files @file{gsl_sort_float.h}
 110 @file{gsl_sort_vector_float.h}.  The complete set of prototypes can be
 111 included using the header files @file{gsl_sort.h} and
 112 @file{gsl_sort_vector.h}.
 113
 114 There are no functions for sorting complex arrays or vectors, since the
 115 ordering of complex numbers is not uniquely defined.  To sort a complex
 116 vector by magnitude compute a real vector containing the magnitudes
 117 of the complex elements, and sort this vector indirectly.  The resulting
 118 index gives the appropriate ordering of the original complex vector.
 119
 120 @cindex sorting vector elements
 121 @cindex vector, sorting elements of
 122 @deftypefun void gsl_sort (double * @var{data}, size_t @var{stride}, size_t @var{n})
 123 This function sorts the @var{n} elements of the array @var{data} with
 124 stride @var{stride} into ascending numerical order.
 125 @end deftypefun
 126
 127 @deftypefun void gsl_sort_vector (gsl_vector * @var{v})
 128 This function sorts the elements of the vector @var{v} into ascending
 129 numerical order.
 130 @end deftypefun
 131
 132 @cindex indirect sorting, of vector elements
 133 @deftypefun void gsl_sort_index (size_t * @var{p}, const double * @var{data}, size_t @var{stride}, size_t @var{n})
 134 This function indirectly sorts the @var{n} elements of the array
 135 @var{data} with stride @var{stride} into ascending order, storing the
 136 resulting permutation in @var{p}.  The array @var{p} must be allocated with
 137 a sufficient length to store the @var{n} elements of the permutation.
 138 The elements of @var{p} give the index of the array element which would
 139 have been stored in that position if the array had been sorted in place.
 140 The array @var{data} is not changed.
 141 @end deftypefun
 142
 143 @deftypefun int gsl_sort_vector_index (gsl_permutation * @var{p}, const gsl_vector * @var{v})
 144 This function indirectly sorts the elements of the vector @var{v} into
 145 ascending order, storing the resulting permutation in @var{p}.  The
 146 elements of @var{p} give the index of the vector element which would
 147 have been stored in that position if the vector had been sorted in
 148 place.  The first element of @var{p} gives the index of the least element
 149 in @var{v}, and the last element of @var{p} gives the index of the
 150 greatest element in @var{v}.  The vector @var{v} is not changed.
 151 @end deftypefun
 152
 153 @node Selecting the k smallest or largest elements
 154 @section Selecting the k smallest or largest elements
 155
 156 The functions described in this section select the @math{k} smallest
 157 or largest elements of a data set of size @math{N}.  The routines use an
 158 @math{O(kN)} direct insertion algorithm which is suited to subsets that
 159 are small compared with the total size of the dataset. For example, the
 160 routines are useful for selecting the 10 largest values from one million
 161 data points, but not for selecting the largest 100,000 values.  If the
 162 subset is a significant part of the total dataset it may be faster
 163 to sort all the elements of the dataset directly with an @math{O(N \log
 164 N)} algorithm and obtain the smallest or largest values that way.
 165
 166 @deftypefun int gsl_sort_smallest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
 167 This function copies the @var{k} smallest elements of the array
 168 @var{src}, of size @var{n} and stride @var{stride}, in ascending
 169 numerical order into the array @var{dest}.  The size @var{k} of the subset must be
 170 less than or equal to @var{n}.  The data @var{src} is not modified by
 171 this operation.
 172 @end deftypefun
 173
 174 @deftypefun int gsl_sort_largest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
 175 This function copies the @var{k} largest elements of the array
 176 @var{src}, of size @var{n} and stride @var{stride}, in descending
 177 numerical order into the array @var{dest}. @var{k} must be
 178 less than or equal to @var{n}. The data @var{src} is not modified by
 179 this operation.
 180 @end deftypefun
 181
 182 @deftypefun int gsl_sort_vector_smallest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v})
 183 @deftypefunx int gsl_sort_vector_largest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v})
 184 These functions copy the @var{k} smallest or largest elements of the
 185 vector @var{v} into the array @var{dest}. @var{k}
 186 must be less than or equal to the length of the vector @var{v}.
 187 @end deftypefun
 188
 189 The following functions find the indices of the @math{k} smallest or
 190 largest elements of a dataset,
 191
 192 @deftypefun int gsl_sort_smallest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
 193 This function stores the indices of the @var{k} smallest elements of
 194 the array @var{src}, of size @var{n} and stride @var{stride}, in the
 195 array @var{p}.  The indices are chosen so that the corresponding data is
 196 in ascending numerical order.  @var{k} must be
 197 less than or equal to @var{n}. The data @var{src} is not modified by
 198 this operation.
 199 @end deftypefun
 200
 201 @deftypefun int gsl_sort_largest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
 202 This function stores the indices of the @var{k} largest elements of
 203 the array @var{src}, of size @var{n} and stride @var{stride}, in the
 204 array @var{p}.  The indices are chosen so that the corresponding data is
 205 in descending numerical order.  @var{k} must be
 206 less than or equal to @var{n}. The data @var{src} is not modified by
 207 this operation.
 208 @end deftypefun
 209
 210 @deftypefun int gsl_sort_vector_smallest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v})
 211 @deftypefunx int gsl_sort_vector_largest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v})
 212 These functions store the indices of the @var{k} smallest or largest
 213 elements of the vector @var{v} in the array @var{p}. @var{k} must be less than or equal to the length of the vector
 214 @var{v}.
 215 @end deftypefun
 216
 217
 218 @node Computing the rank
 219 @section Computing the rank
 220
 221 The @dfn{rank} of an element is its order in the sorted data.  The rank
 222 is the inverse of the index permutation, @var{p}.  It can be computed
 223 using the following algorithm,
 224
 225 @example
 226 for (i = 0; i < p->size; i++)
 227 @{
 228     size_t pi = p->data[i];
 229     rank->data[pi] = i;
 230 @}
 231 @end example
 232
 233 @noindent
 234 This can be computed directly from the function
 235 @code{gsl_permutation_inverse(rank,p)}.
 236
 237 The following function will print the rank of each element of the vector
 238 @var{v},
 239
 240 @example
 241 void
 242 print_rank (gsl_vector * v)
 243 @{
 244   size_t i;
 245   size_t n = v->size;
 246   gsl_permutation * perm = gsl_permutation_alloc(n);
 247   gsl_permutation * rank = gsl_permutation_alloc(n);
 248
 249   gsl_sort_vector_index (perm, v);
 250   gsl_permutation_inverse (rank, perm);
 251
 252   for (i = 0; i < n; i++)
 253    @{
 254     double vi = gsl_vector_get(v, i);
 255     printf ("element = %d, value = %g, rank = %d\n",
 256              i, vi, rank->data[i]);
 257    @}
 258
 259   gsl_permutation_free (perm);
 260   gsl_permutation_free (rank);
 261 @}
 262 @end example
 263
 264 @node Sorting Examples
 265 @section Examples
 266
 267 The following example shows how to use the permutation @var{p} to print
 268 the elements of the vector @var{v} in ascending order,
 269
 270 @example
 271 gsl_sort_vector_index (p, v);
 272
 273 for (i = 0; i < v->size; i++)
 274 @{
 275     double vpi = gsl_vector_get (v, p->data[i]);
 276     printf ("order = %d, value = %g\n", i, vpi);
 277 @}
 278 @end example
 279
 280 @noindent
 281 The next example uses the function @code{gsl_sort_smallest} to select
 282 the 5 smallest numbers from 100000 uniform random variates stored in an
 283 array,
 284
 285 @example
 286 @verbatiminclude examples/sortsmall.c
 287 @end example
 288 The output lists the 5 smallest values, in ascending order,
 289
 290 @example
 291 $ ./a.out
 292 @verbatiminclude examples/sortsmall.out
 293 @end example
 294
 295 @node Sorting References and Further Reading
 296 @section References and Further Reading
 297
 298 The subject of sorting is covered extensively in Knuth's
 299 @cite{Sorting and Searching},
 300
 301 @itemize @asis
 302 @item
 303 Donald E. Knuth, @cite{The Art of Computer Programming: Sorting and
 304 Searching} (Vol 3, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896850.
 305 @end itemize
 306
 307 @noindent
 308 The Heapsort algorithm is described in the following book,
 309
 310 @itemize @asis
 311 @item Robert Sedgewick, @cite{Algorithms in C}, Addison-Wesley,
 312 ISBN 0201514257.
 313 @end itemize
 314
 315