3 This chapter describes functions for sorting data, both directly and
4 indirectly (using an index). All the functions use the @dfn{heapsort}
5 algorithm. Heapsort is an @math{O(N \log N)} algorithm which operates
6 in-place and does not require any additional storage. It also provides
7 consistent performance, the running time for its worst-case (ordered
8 data) being not significantly longer than the average and best cases.
9 Note that the heapsort algorithm does not preserve the relative ordering
10 of equal elements---it is an @dfn{unstable} sort. However the resulting
11 order of equal elements will be consistent across different platforms
12 when using these functions.
17 * Selecting the k smallest or largest elements::
18 * Computing the rank::
20 * Sorting References and Further Reading::
24 @section Sorting objects
26 The following function provides a simple alternative to the standard
27 library function @code{qsort}. It is intended for systems lacking
28 @code{qsort}, not as a replacement for it. The function @code{qsort}
29 should be used whenever possible, as it will be faster and can provide
30 stable ordering of equal elements. Documentation for @code{qsort} is
31 available in the @cite{GNU C Library Reference Manual}.
33 The functions described in this section are defined in the header file
34 @file{gsl_heapsort.h}.
36 @cindex comparison functions, definition
37 @deftypefun void gsl_heapsort (void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare})
39 This function sorts the @var{count} elements of the array @var{array},
40 each of size @var{size}, into ascending order using the comparison
41 function @var{compare}. The type of the comparison function is defined by,
44 int (*gsl_comparison_fn_t) (const void * a,
49 A comparison function should return a negative integer if the first
50 argument is less than the second argument, @code{0} if the two arguments
51 are equal and a positive integer if the first argument is greater than
54 For example, the following function can be used to sort doubles into
55 ascending numerical order.
59 compare_doubles (const double * a,
72 The appropriate function call to perform the sort is,
75 gsl_heapsort (array, count, sizeof(double),
79 Note that unlike @code{qsort} the heapsort algorithm cannot be made into
80 a stable sort by pointer arithmetic. The trick of comparing pointers for
81 equal elements in the comparison function does not work for the heapsort
82 algorithm. The heapsort algorithm performs an internal rearrangement of
83 the data which destroys its initial ordering.
86 @cindex indirect sorting
87 @deftypefun int gsl_heapsort_index (size_t * @var{p}, const void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare})
89 This function indirectly sorts the @var{count} elements of the array
90 @var{array}, each of size @var{size}, into ascending order using the
91 comparison function @var{compare}. The resulting permutation is stored
92 in @var{p}, an array of length @var{n}. The elements of @var{p} give the
93 index of the array element which would have been stored in that position
94 if the array had been sorted in place. The first element of @var{p}
95 gives the index of the least element in @var{array}, and the last
96 element of @var{p} gives the index of the greatest element in
97 @var{array}. The array itself is not changed.
100 @node Sorting vectors
101 @section Sorting vectors
103 The following functions will sort the elements of an array or vector,
104 either directly or indirectly. They are defined for all real and integer
105 types using the normal suffix rules. For example, the @code{float}
106 versions of the array functions are @code{gsl_sort_float} and
107 @code{gsl_sort_float_index}. The corresponding vector functions are
108 @code{gsl_sort_vector_float} and @code{gsl_sort_vector_float_index}. The
109 prototypes are available in the header files @file{gsl_sort_float.h}
110 @file{gsl_sort_vector_float.h}. The complete set of prototypes can be
111 included using the header files @file{gsl_sort.h} and
112 @file{gsl_sort_vector.h}.
114 There are no functions for sorting complex arrays or vectors, since the
115 ordering of complex numbers is not uniquely defined. To sort a complex
116 vector by magnitude compute a real vector containing the magnitudes
117 of the complex elements, and sort this vector indirectly. The resulting
118 index gives the appropriate ordering of the original complex vector.
120 @cindex sorting vector elements
121 @cindex vector, sorting elements of
122 @deftypefun void gsl_sort (double * @var{data}, size_t @var{stride}, size_t @var{n})
123 This function sorts the @var{n} elements of the array @var{data} with
124 stride @var{stride} into ascending numerical order.
127 @deftypefun void gsl_sort_vector (gsl_vector * @var{v})
128 This function sorts the elements of the vector @var{v} into ascending
132 @cindex indirect sorting, of vector elements
133 @deftypefun void gsl_sort_index (size_t * @var{p}, const double * @var{data}, size_t @var{stride}, size_t @var{n})
134 This function indirectly sorts the @var{n} elements of the array
135 @var{data} with stride @var{stride} into ascending order, storing the
136 resulting permutation in @var{p}. The array @var{p} must be allocated with
137 a sufficient length to store the @var{n} elements of the permutation.
138 The elements of @var{p} give the index of the array element which would
139 have been stored in that position if the array had been sorted in place.
140 The array @var{data} is not changed.
143 @deftypefun int gsl_sort_vector_index (gsl_permutation * @var{p}, const gsl_vector * @var{v})
144 This function indirectly sorts the elements of the vector @var{v} into
145 ascending order, storing the resulting permutation in @var{p}. The
146 elements of @var{p} give the index of the vector element which would
147 have been stored in that position if the vector had been sorted in
148 place. The first element of @var{p} gives the index of the least element
149 in @var{v}, and the last element of @var{p} gives the index of the
150 greatest element in @var{v}. The vector @var{v} is not changed.
153 @node Selecting the k smallest or largest elements
154 @section Selecting the k smallest or largest elements
156 The functions described in this section select the @math{k} smallest
157 or largest elements of a data set of size @math{N}. The routines use an
158 @math{O(kN)} direct insertion algorithm which is suited to subsets that
159 are small compared with the total size of the dataset. For example, the
160 routines are useful for selecting the 10 largest values from one million
161 data points, but not for selecting the largest 100,000 values. If the
162 subset is a significant part of the total dataset it may be faster
163 to sort all the elements of the dataset directly with an @math{O(N \log
164 N)} algorithm and obtain the smallest or largest values that way.
166 @deftypefun int gsl_sort_smallest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
167 This function copies the @var{k} smallest elements of the array
168 @var{src}, of size @var{n} and stride @var{stride}, in ascending
169 numerical order into the array @var{dest}. The size @var{k} of the subset must be
170 less than or equal to @var{n}. The data @var{src} is not modified by
174 @deftypefun int gsl_sort_largest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
175 This function copies the @var{k} largest elements of the array
176 @var{src}, of size @var{n} and stride @var{stride}, in descending
177 numerical order into the array @var{dest}. @var{k} must be
178 less than or equal to @var{n}. The data @var{src} is not modified by
182 @deftypefun int gsl_sort_vector_smallest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v})
183 @deftypefunx int gsl_sort_vector_largest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v})
184 These functions copy the @var{k} smallest or largest elements of the
185 vector @var{v} into the array @var{dest}. @var{k}
186 must be less than or equal to the length of the vector @var{v}.
189 The following functions find the indices of the @math{k} smallest or
190 largest elements of a dataset,
192 @deftypefun int gsl_sort_smallest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
193 This function stores the indices of the @var{k} smallest elements of
194 the array @var{src}, of size @var{n} and stride @var{stride}, in the
195 array @var{p}. The indices are chosen so that the corresponding data is
196 in ascending numerical order. @var{k} must be
197 less than or equal to @var{n}. The data @var{src} is not modified by
201 @deftypefun int gsl_sort_largest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n})
202 This function stores the indices of the @var{k} largest elements of
203 the array @var{src}, of size @var{n} and stride @var{stride}, in the
204 array @var{p}. The indices are chosen so that the corresponding data is
205 in descending numerical order. @var{k} must be
206 less than or equal to @var{n}. The data @var{src} is not modified by
210 @deftypefun int gsl_sort_vector_smallest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v})
211 @deftypefunx int gsl_sort_vector_largest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v})
212 These functions store the indices of the @var{k} smallest or largest
213 elements of the vector @var{v} in the array @var{p}. @var{k} must be less than or equal to the length of the vector
218 @node Computing the rank
219 @section Computing the rank
221 The @dfn{rank} of an element is its order in the sorted data. The rank
222 is the inverse of the index permutation, @var{p}. It can be computed
223 using the following algorithm,
226 for (i = 0; i < p->size; i++)
228 size_t pi = p->data[i];
234 This can be computed directly from the function
235 @code{gsl_permutation_inverse(rank,p)}.
237 The following function will print the rank of each element of the vector
242 print_rank (gsl_vector * v)
246 gsl_permutation * perm = gsl_permutation_alloc(n);
247 gsl_permutation * rank = gsl_permutation_alloc(n);
249 gsl_sort_vector_index (perm, v);
250 gsl_permutation_inverse (rank, perm);
252 for (i = 0; i < n; i++)
254 double vi = gsl_vector_get(v, i);
255 printf ("element = %d, value = %g, rank = %d\n",
256 i, vi, rank->data[i]);
259 gsl_permutation_free (perm);
260 gsl_permutation_free (rank);
264 @node Sorting Examples
267 The following example shows how to use the permutation @var{p} to print
268 the elements of the vector @var{v} in ascending order,
271 gsl_sort_vector_index (p, v);
273 for (i = 0; i < v->size; i++)
275 double vpi = gsl_vector_get (v, p->data[i]);
276 printf ("order = %d, value = %g\n", i, vpi);
281 The next example uses the function @code{gsl_sort_smallest} to select
282 the 5 smallest numbers from 100000 uniform random variates stored in an
286 @verbatiminclude examples/sortsmall.c
288 The output lists the 5 smallest values, in ascending order,
292 @verbatiminclude examples/sortsmall.out
295 @node Sorting References and Further Reading
296 @section References and Further Reading
298 The subject of sorting is covered extensively in Knuth's
299 @cite{Sorting and Searching},
303 Donald E. Knuth, @cite{The Art of Computer Programming: Sorting and
304 Searching} (Vol 3, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896850.
308 The Heapsort algorithm is described in the following book,
311 @item Robert Sedgewick, @cite{Algorithms in C}, Addison-Wesley,