3 * Copyright (C) 1996, 1997, 1998, 1999, 2000, 2007 James Theiler, Brian Gough
5 * This program is free software; you can redistribute it and/or modify
6 * it under the terms of the GNU General Public License as published by
7 * the Free Software Foundation; either version 3 of the License, or (at
8 * your option) any later version.
10 * This program is distributed in the hope that it will be useful, but
11 * WITHOUT ANY WARRANTY; without even the implied warranty of
12 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13 * General Public License for more details.
15 * You should have received a copy of the GNU General Public License
16 * along with this program; if not, write to the Free Software
17 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21 Random Discrete Events
23 Given K discrete events with different probabilities P[k]
24 produce a value k consistent with its probability.
26 This program is free software; you can redistribute it and/or
27 modify it under the terms of the GNU General Public License as
28 published by the Free Software Foundation; either version 3 of the
29 License, or (at your option) any later version.
31 This program is distributed in the hope that it will be useful, but
32 WITHOUT ANY WARRANTY; without even the implied warranty of
33 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
34 General Public License for more details. You should have received
35 a copy of the GNU General Public License along with this program;
36 if not, write to the Free Foundation, Inc., 59 Temple Place, Suite
37 330, Boston, MA 02111-1307 USA
41 * Based on: Alastair J Walker, An efficient method for generating
42 * discrete random variables with general distributions, ACM Trans
43 * Math Soft 3, 253-256 (1977). See also: D. E. Knuth, The Art of
44 * Computer Programming, Volume 2 (Seminumerical algorithms), 3rd
45 * edition, Addison-Wesley (1997), p120.
47 * Walker's algorithm does some preprocessing, and provides two
48 * arrays: floating point F[k] and integer A[k]. A value k is chosen
49 * from 0..K-1 with equal likelihood, and then a uniform random number
50 * u is compared to F[k]. If it is less than F[k], then k is
51 * returned. Otherwise, A[k] is returned.
53 * Walker's original paper describes an O(K^2) algorithm for setting
54 * up the F and A arrays. I found this disturbing since I wanted to
55 * use very large values of K. I'm sure I'm not the first to realize
56 * this, but in fact the preprocessing can be done in O(K) steps.
58 * A figure of merit for the preprocessing is the average value for
59 * the F[k]'s (that is, SUM_k F[k]/K); this corresponds to the
60 * probability that k is returned, instead of A[k], thereby saving a
61 * redirection. Walker's O(K^2) preprocessing will generally improve
62 * that figure of merit, compared to my cheaper O(K) method; from some
63 * experiments with a perl script, I get values of around 0.6 for my
64 * method and just under 0.75 for Walker's. Knuth has pointed out
65 * that finding _the_ optimum lookup tables, which maximize the
66 * average F[k], is a combinatorially difficult problem. But any
67 * valid preprocessing will still provide O(1) time for the call to
68 * gsl_ran_discrete(). I find that if I artificially set F[k]=1 --
69 * ie, better than optimum! -- I get a speedup of maybe 20%, so that's
70 * the maximum I could expect from the most expensive preprocessing.
71 * Folding in the difference of 0.6 vs 0.75, I'd estimate that the
72 * speedup would be less than 10%.
74 * I've not implemented it here, but one compromise is to sort the
75 * probabilities once, and then work from the two ends inward. This
76 * requires O(K log K), still lots cheaper than O(K^2), and from my
77 * experiments with the perl script, the figure of merit is within
78 * about 0.01 for K up to 1000, and no sign of diverging (in fact,
79 * they seemed to be converging, but it's hard to say with just a
82 * The O(K) algorithm goes through all the p_k's and decides if they
83 * are "smalls" or "bigs" according to whether they are less than or
84 * greater than the mean value 1/K. The indices to the smalls and the
85 * bigs are put in separate stacks, and then we work through the
86 * stacks together. For each small, we pair it up with the next big
87 * in the stack (Walker always wanted to pair up the smallest small
88 * with the biggest big). The small "borrows" from the big just
89 * enough to bring the small up to mean. This reduces the size of the
90 * big, so the (smaller) big is compared again to the mean, and if it
91 * is smaller, it gets "popped" from the big stack and "pushed" to the
92 * small stack. Otherwise, it stays put. Since every time we pop a
93 * small, we are able to deal with it right then and there, and we
94 * never have to pop more than K smalls, then the algorithm is O(K).
96 * This implementation sets up two separate stacks, and allocates K
97 * elements between them. Since neither stack ever grows, we do an
98 * extra O(K) pass through the data to determine how many smalls and
99 * bigs there are to begin with and allocate appropriately. In all
100 * there are 2*K*sizeof(double) transient bytes of memory that are
101 * used than returned, and K*(sizeof(int)+sizeof(double)) bytes used
102 * in the lookup table.
104 * Walker spoke of using two random numbers (an integer 0..K-1, and a
105 * floating point u in [0,1]), but Knuth points out that one can just
106 * use the integer and fractional parts of K*u where u is in [0,1].
107 * In fact, Knuth further notes that taking F'[k]=(k+F[k])/K, one can
108 * directly compare u to F'[k] without having to explicitly set
113 * Starting with an array of probabilities P, initialize and do
114 * preprocessing with a call to:
117 * gsl_ran_discrete_t *f;
118 * f = gsl_ran_discrete_preproc(K,P);
120 * Then, whenever a random index 0..K-1 is desired, use
122 * k = gsl_ran_discrete(r,f);
124 * Note that several different randevent struct's can be
125 * simultaneously active.
127 * Aside: A very clever alternative approach is described in
128 * Abramowitz and Stegun, p 950, citing: Marsaglia, Random variables
129 * and computers, Proc Third Prague Conference in Probability Theory,
130 * 1962. A more accesible reference is: G. Marsaglia, Generating
131 * discrete random numbers in a computer, Comm ACM 6, 37-38 (1963).
132 * If anybody is interested, I (jt) have also coded up this version as
133 * part of another software package. However, I've done some
134 * comparisons, and the Walker method is both faster and more stingy
135 * with memory. So, in the end I decided not to include it with the
138 * Written 26 Jan 1999, James Theiler, jt@lanl.gov
139 * Adapted to GSL, 30 Jan 1999, jt
144 #include <stdio.h> /* used for NULL, also fprintf(stderr,...) */
145 #include <stdlib.h> /* used for malloc's */
147 #include <gsl/gsl_errno.h>
148 #include <gsl/gsl_rng.h>
149 #include <gsl/gsl_randist.h>
151 #define KNUTH_CONVENTION 1 /* Saves a few steps of arithmetic
152 * in the call to gsl_ran_discrete()
155 /*** Begin Stack (this code is used just in this file) ***/
157 /* Stack code converted to use unsigned indices (i.e. s->i == 0 now
158 means an empty stack, instead of -1), for consistency and to give a
159 bigger allowable range. BJG */
162 size_t N; /* max number of elts on stack */
163 size_t *v; /* array of values on the stack */
164 size_t i; /* index of top of stack */
168 new_stack(size_t N) {
170 s = (gsl_stack_t *)malloc(sizeof(gsl_stack_t));
172 s->i = 0; /* indicates stack is empty */
173 s->v = (size_t *)malloc(sizeof(size_t)*N);
178 push_stack(gsl_stack_t *s, size_t v)
180 if ((s->i) >= (s->N)) {
181 fprintf(stderr,"Cannot push stack!\n");
182 abort(); /* FIXME: fatal!! */
188 static size_t pop_stack(gsl_stack_t *s)
191 fprintf(stderr,"Cannot pop stack!\n");
192 abort(); /* FIXME: fatal!! */
195 return ((s->v)[s->i]);
198 static inline size_t size_stack(const gsl_stack_t *s)
203 static void free_stack(gsl_stack_t *s)
205 free((char *)(s->v));
212 /*** Begin Walker's Algorithm ***/
215 gsl_ran_discrete_preproc(size_t Kevents, const double *ProbArray)
218 gsl_ran_discrete_t *g;
219 size_t nBigs, nSmalls;
223 double pTotal = 0.0, mean, d;
226 /* Could probably treat Kevents=1 as a special case */
228 GSL_ERROR_VAL ("number of events must be a positive integer",
232 /* Make sure elements of ProbArray[] are positive.
233 * Won't enforce that sum is unity; instead will just normalize
236 for (k=0; k<Kevents; ++k) {
237 if (ProbArray[k] < 0) {
238 GSL_ERROR_VAL ("probabilities must be non-negative",
241 pTotal += ProbArray[k];
244 /* Begin setting up the main "object" (just a struct, no steroids) */
245 g = (gsl_ran_discrete_t *)malloc(sizeof(gsl_ran_discrete_t));
247 g->F = (double *)malloc(sizeof(double)*Kevents);
248 g->A = (size_t *)malloc(sizeof(size_t)*Kevents);
250 E = (double *)malloc(sizeof(double)*Kevents);
253 GSL_ERROR_VAL ("Cannot allocate memory for randevent", GSL_ENOMEM, 0);
256 for (k=0; k<Kevents; ++k) {
257 E[k] = ProbArray[k]/pTotal;
260 /* Now create the Bigs and the Smalls */
263 for (k=0; k<Kevents; ++k) {
264 if (E[k] < mean) ++nSmalls;
267 Bigs = new_stack(nBigs);
268 Smalls = new_stack(nSmalls);
269 for (k=0; k<Kevents; ++k) {
271 push_stack(Smalls,k);
277 /* Now work through the smalls */
278 while (size_stack(Smalls) > 0) {
279 s = pop_stack(Smalls);
280 if (size_stack(Bigs) == 0) {
287 (g->F)[s]=Kevents*E[s];
289 fprintf(stderr,"s=%2d, A=%2d, F=%.4f\n",s,(g->A)[s],(g->F)[s]);
292 E[s] += d; /* now E[s] == mean */
295 push_stack(Smalls,b); /* no longer big, join ranks of the small */
297 else if (E[b] > mean) {
298 push_stack(Bigs,b); /* still big, put it back where you found it */
301 /* E[b]==mean implies it is finished too */
306 while (size_stack(Bigs) > 0) {
311 /* Stacks have been emptied, and A and F have been filled */
313 if ( size_stack(Smalls) != 0) {
314 GSL_ERROR_VAL ("Smalls stack has not been emptied",
319 /* if 1, then artificially set all F[k]'s to unity. This will
320 * give wrong answers, but you'll get them faster. But, not
321 * that much faster (I get maybe 20%); that's an upper bound
322 * on what the optimal preprocessing would give.
324 for (k=0; k<Kevents; ++k) {
330 /* For convenience, set F'[k]=(k+F[k])/K */
331 /* This saves some arithmetic in gsl_ran_discrete(); I find that
332 * it doesn't actually make much difference.
334 for (k=0; k<Kevents; ++k) {
336 (g->F)[k] /= Kevents;
348 gsl_ran_discrete(const gsl_rng *r, const gsl_ran_discrete_t *g)
352 u = gsl_rng_uniform(r);
361 /* fprintf(stderr,"c,f,u: %d %.4f %f\n",c,f,u); */
362 if (f == 1.0) return c;
372 void gsl_ran_discrete_free(gsl_ran_discrete_t *g)
374 free((char *)(g->A));
375 free((char *)(g->F));
380 gsl_ran_discrete_pdf(size_t k, const gsl_ran_discrete_t *g)
386 for (i=0; i<K; ++i) {
393 } else if (k == (g->A)[i]) {