3 This chapter describes functions for creating histograms. Histograms
4 provide a convenient way of summarizing the distribution of a set of
5 data. A histogram consists of a set of @dfn{bins} which count the number
6 of events falling into a given range of a continuous variable @math{x}.
7 In GSL the bins of a histogram contain floating-point numbers, so they
8 can be used to record both integer and non-integer distributions. The
9 bins can use arbitrary sets of ranges (uniformly spaced bins are the
10 default). Both one and two-dimensional histograms are supported.
12 Once a histogram has been created it can also be converted into a
13 probability distribution function. The library provides efficient
14 routines for selecting random samples from probability distributions.
15 This can be useful for generating simulations based on real data.
17 The functions are declared in the header files @file{gsl_histogram.h}
18 and @file{gsl_histogram2d.h}.
21 * The histogram struct::
22 * Histogram allocation::
23 * Copying Histograms::
24 * Updating and accessing histogram elements::
25 * Searching histogram ranges::
26 * Histogram Statistics::
27 * Histogram Operations::
28 * Reading and writing histograms::
29 * Resampling from histograms::
30 * The histogram probability distribution struct::
31 * Example programs for histograms::
32 * Two dimensional histograms::
33 * The 2D histogram struct::
34 * 2D Histogram allocation::
35 * Copying 2D Histograms::
36 * Updating and accessing 2D histogram elements::
37 * Searching 2D histogram ranges::
38 * 2D Histogram Statistics::
39 * 2D Histogram Operations::
40 * Reading and writing 2D histograms::
41 * Resampling from 2D histograms::
42 * Example programs for 2D histograms::
45 @node The histogram struct
46 @section The histogram struct
48 A histogram is defined by the following struct,
50 @deftp {Data Type} {gsl_histogram}
53 This is the number of histogram bins
55 The ranges of the bins are stored in an array of @math{@var{n}+1} elements
56 pointed to by @var{range}.
58 The counts for each bin are stored in an array of @var{n} elements
59 pointed to by @var{bin}. The bins are floating-point numbers, so you can
60 increment them by non-integer values if necessary.
66 The range for @var{bin}[i] is given by @var{range}[i] to
67 @var{range}[i+1]. For @math{n} bins there are @math{n+1} entries in the
68 array @var{range}. Each bin is inclusive at the lower end and exclusive
69 at the upper end. Mathematically this means that the bins are defined by
70 the following inequality,
74 \hbox{bin[i] corresponds to range[i]} \le x < \hbox{range[i+1]}
80 bin[i] corresponds to range[i] <= x < range[i+1]
85 Here is a diagram of the correspondence between ranges and bins on the
86 number-line for @math{x},
90 [ bin[0] )[ bin[1] )[ bin[2] )[ bin[3] )[ bin[4] )
91 ---|---------|---------|---------|---------|---------|--- x
92 r[0] r[1] r[2] r[3] r[4] r[5]
97 In this picture the values of the @var{range} array are denoted by
98 @math{r}. On the left-hand side of each bin the square bracket
99 @samp{[} denotes an inclusive lower bound
101 @math{r <= x}), and the round parentheses @samp{)} on the right-hand
102 side denote an exclusive upper bound (@math{x < r}). Thus any samples
103 which fall on the upper end of the histogram are excluded. If you want
104 to include this value for the last bin you will need to add an extra bin
107 The @code{gsl_histogram} struct and its associated functions are defined
108 in the header file @file{gsl_histogram.h}.
110 @node Histogram allocation
111 @section Histogram allocation
112 The functions for allocating memory to a histogram follow the style of
113 @code{malloc} and @code{free}. In addition they also perform their own
114 error checking. If there is insufficient memory available to allocate a
115 histogram then the functions call the error handler (with an error
116 number of @code{GSL_ENOMEM}) in addition to returning a null pointer.
117 Thus if you use the library error handler to abort your program then it
118 isn't necessary to check every histogram @code{alloc}.
120 @deftypefun {gsl_histogram *} gsl_histogram_alloc (size_t @var{n})
121 This function allocates memory for a histogram with @var{n} bins, and
122 returns a pointer to a newly created @code{gsl_histogram} struct. If
123 insufficient memory is available a null pointer is returned and the
124 error handler is invoked with an error code of @code{GSL_ENOMEM}. The
125 bins and ranges are not initialized, and should be prepared using one of
126 the range-setting functions below in order to make the histogram ready
130 @comment @deftypefun {gsl_histogram *} gsl_histogram_calloc (size_t @var{n})
131 @comment This function allocates memory for a histogram with @var{n} bins, and
132 @comment returns a pointer to its newly initialized @code{gsl_histogram} struct.
133 @comment The bins are uniformly spaced with a total range of
134 @comment @c{$0 \le x < n$}
135 @comment @math{0 <= x < n},
136 @comment as shown in the table below.
139 @comment \beforedisplay
142 @comment \hbox{bin[0]}&\hbox{corresponds to}& 0 \le x < 1\cr
143 @comment \hbox{bin[1]}&\hbox{corresponds to}& 1 \le x < 2\cr
144 @comment \dots&\dots&\dots\cr
145 @comment \hbox{bin[n-1]}&\hbox{corresponds to}&n-1 \le x < n}
147 @comment \afterdisplay
151 @comment bin[0] corresponds to 0 <= x < 1
152 @comment bin[1] corresponds to 1 <= x < 2
154 @comment bin[n-1] corresponds to n-1 <= x < n
155 @comment @end display
158 @comment The bins are initialized to zero so the histogram is ready for use.
160 @comment If insufficient memory is available a null pointer is returned and the
161 @comment error handler is invoked with an error code of @code{GSL_ENOMEM}.
162 @comment @end deftypefun
164 @comment @deftypefun {gsl_histogram *} gsl_histogram_calloc_uniform (size_t @var{n}, double @var{xmin}, double @var{xmax})
165 @comment This function allocates memory for a histogram with @var{n} uniformly
166 @comment spaced bins from @var{xmin} to @var{xmax}, and returns a pointer to the
167 @comment newly initialized @code{gsl_histogram} struct.
168 @comment If insufficient memory is available a null pointer is returned and the
169 @comment error handler is invoked with an error code of @code{GSL_ENOMEM}.
170 @comment @end deftypefun
172 @comment @deftypefun {gsl_histogram *} gsl_histogram_calloc_range (size_t @var{n}, double * @var{range})
173 @comment This function allocates a histogram of size @var{n} using the @math{n+1}
174 @comment bin ranges specified by the array @var{range}.
175 @comment @end deftypefun
177 @deftypefun int gsl_histogram_set_ranges (gsl_histogram * @var{h}, const double @var{range}[], size_t @var{size})
178 This function sets the ranges of the existing histogram @var{h} using
179 the array @var{range} of size @var{size}. The values of the histogram
180 bins are reset to zero. The @code{range} array should contain the
181 desired bin limits. The ranges can be arbitrary, subject to the
182 restriction that they are monotonically increasing.
184 The following example shows how to create a histogram with logarithmic
185 bins with ranges [1,10), [10,100) and [100,1000).
188 gsl_histogram * h = gsl_histogram_alloc (3);
190 /* bin[0] covers the range 1 <= x < 10 */
191 /* bin[1] covers the range 10 <= x < 100 */
192 /* bin[2] covers the range 100 <= x < 1000 */
194 double range[4] = @{ 1.0, 10.0, 100.0, 1000.0 @};
196 gsl_histogram_set_ranges (h, range, 4);
200 Note that the size of the @var{range} array should be defined to be one
201 element bigger than the number of bins. The additional element is
202 required for the upper value of the final bin.
205 @deftypefun int gsl_histogram_set_ranges_uniform (gsl_histogram * @var{h}, double @var{xmin}, double @var{xmax})
206 This function sets the ranges of the existing histogram @var{h} to cover
207 the range @var{xmin} to @var{xmax} uniformly. The values of the
208 histogram bins are reset to zero. The bin ranges are shown in the table
213 \matrix{\hbox{bin[0]}&\hbox{corresponds to}& xmin \le x < xmin + d\cr
214 \hbox{bin[1]} &\hbox{corresponds to}& xmin + d \le x < xmin + 2 d\cr
216 \hbox{bin[n-1]} & \hbox{corresponds to}& xmin + (n-1)d \le x < xmax}
222 bin[0] corresponds to xmin <= x < xmin + d
223 bin[1] corresponds to xmin + d <= x < xmin + 2 d
225 bin[n-1] corresponds to xmin + (n-1)d <= x < xmax
230 where @math{d} is the bin spacing, @math{d = (xmax-xmin)/n}.
233 @deftypefun void gsl_histogram_free (gsl_histogram * @var{h})
234 This function frees the histogram @var{h} and all of the memory
238 @node Copying Histograms
239 @section Copying Histograms
241 @deftypefun int gsl_histogram_memcpy (gsl_histogram * @var{dest}, const gsl_histogram * @var{src})
242 This function copies the histogram @var{src} into the pre-existing
243 histogram @var{dest}, making @var{dest} into an exact copy of @var{src}.
244 The two histograms must be of the same size.
247 @deftypefun {gsl_histogram *} gsl_histogram_clone (const gsl_histogram * @var{src})
248 This function returns a pointer to a newly created histogram which is an
249 exact copy of the histogram @var{src}.
252 @node Updating and accessing histogram elements
253 @section Updating and accessing histogram elements
255 There are two ways to access histogram bins, either by specifying an
256 @math{x} coordinate or by using the bin-index directly. The functions
257 for accessing the histogram through @math{x} coordinates use a binary
258 search to identify the bin which covers the appropriate range.
260 @deftypefun int gsl_histogram_increment (gsl_histogram * @var{h}, double @var{x})
261 This function updates the histogram @var{h} by adding one (1.0) to the
262 bin whose range contains the coordinate @var{x}.
264 If @var{x} lies in the valid range of the histogram then the function
265 returns zero to indicate success. If @var{x} is less than the lower
266 limit of the histogram then the function returns @code{GSL_EDOM}, and
267 none of bins are modified. Similarly, if the value of @var{x} is greater
268 than or equal to the upper limit of the histogram then the function
269 returns @code{GSL_EDOM}, and none of the bins are modified. The error
270 handler is not called, however, since it is often necessary to compute
271 histograms for a small range of a larger dataset, ignoring the values
272 outside the range of interest.
275 @deftypefun int gsl_histogram_accumulate (gsl_histogram * @var{h}, double @var{x}, double @var{weight})
276 This function is similar to @code{gsl_histogram_increment} but increases
277 the value of the appropriate bin in the histogram @var{h} by the
278 floating-point number @var{weight}.
281 @deftypefun double gsl_histogram_get (const gsl_histogram * @var{h}, size_t @var{i})
282 This function returns the contents of the @var{i}-th bin of the histogram
283 @var{h}. If @var{i} lies outside the valid range of indices for the
284 histogram then the error handler is called with an error code of
285 @code{GSL_EDOM} and the function returns 0.
288 @deftypefun int gsl_histogram_get_range (const gsl_histogram * @var{h}, size_t @var{i}, double * @var{lower}, double * @var{upper})
289 This function finds the upper and lower range limits of the @var{i}-th
290 bin of the histogram @var{h}. If the index @var{i} is valid then the
291 corresponding range limits are stored in @var{lower} and @var{upper}.
292 The lower limit is inclusive (i.e. events with this coordinate are
293 included in the bin) and the upper limit is exclusive (i.e. events with
294 the coordinate of the upper limit are excluded and fall in the
295 neighboring higher bin, if it exists). The function returns 0 to
296 indicate success. If @var{i} lies outside the valid range of indices for
297 the histogram then the error handler is called and the function returns
298 an error code of @code{GSL_EDOM}.
301 @deftypefun double gsl_histogram_max (const gsl_histogram * @var{h})
302 @deftypefunx double gsl_histogram_min (const gsl_histogram * @var{h})
303 @deftypefunx size_t gsl_histogram_bins (const gsl_histogram * @var{h})
304 These functions return the maximum upper and minimum lower range limits
305 and the number of bins of the histogram @var{h}. They provide a way of
306 determining these values without accessing the @code{gsl_histogram}
310 @deftypefun void gsl_histogram_reset (gsl_histogram * @var{h})
311 This function resets all the bins in the histogram @var{h} to zero.
314 @node Searching histogram ranges
315 @section Searching histogram ranges
317 The following functions are used by the access and update routines to
318 locate the bin which corresponds to a given @math{x} coordinate.
320 @deftypefun int gsl_histogram_find (const gsl_histogram * @var{h}, double @var{x}, size_t * @var{i})
321 This function finds and sets the index @var{i} to the bin number which
322 covers the coordinate @var{x} in the histogram @var{h}. The bin is
323 located using a binary search. The search includes an optimization for
324 histograms with uniform range, and will return the correct bin
325 immediately in this case. If @var{x} is found in the range of the
326 histogram then the function sets the index @var{i} and returns
327 @code{GSL_SUCCESS}. If @var{x} lies outside the valid range of the
328 histogram then the function returns @code{GSL_EDOM} and the error
332 @node Histogram Statistics
333 @section Histogram Statistics
334 @cindex histogram statistics
335 @cindex statistics, from histogram
336 @cindex maximum value, from histogram
337 @cindex minimum value, from histogram
338 @deftypefun double gsl_histogram_max_val (const gsl_histogram * @var{h})
339 This function returns the maximum value contained in the histogram bins.
342 @deftypefun size_t gsl_histogram_max_bin (const gsl_histogram * @var{h})
343 This function returns the index of the bin containing the maximum
344 value. In the case where several bins contain the same maximum value the
345 smallest index is returned.
348 @deftypefun double gsl_histogram_min_val (const gsl_histogram * @var{h})
349 This function returns the minimum value contained in the histogram bins.
352 @deftypefun size_t gsl_histogram_min_bin (const gsl_histogram * @var{h})
353 This function returns the index of the bin containing the minimum
354 value. In the case where several bins contain the same maximum value the
355 smallest index is returned.
358 @cindex mean value, from histogram
359 @deftypefun double gsl_histogram_mean (const gsl_histogram * @var{h})
360 This function returns the mean of the histogrammed variable, where the
361 histogram is regarded as a probability distribution. Negative bin values
362 are ignored for the purposes of this calculation. The accuracy of the
363 result is limited by the bin width.
366 @cindex standard deviation, from histogram
367 @cindex variance, from histogram
368 @deftypefun double gsl_histogram_sigma (const gsl_histogram * @var{h})
369 This function returns the standard deviation of the histogrammed
370 variable, where the histogram is regarded as a probability
371 distribution. Negative bin values are ignored for the purposes of this
372 calculation. The accuracy of the result is limited by the bin width.
375 @deftypefun double gsl_histogram_sum (const gsl_histogram * @var{h})
376 This function returns the sum of all bin values. Negative bin values
377 are included in the sum.
380 @node Histogram Operations
381 @section Histogram Operations
383 @deftypefun int gsl_histogram_equal_bins_p (const gsl_histogram * @var{h1}, const gsl_histogram * @var{h2})
384 This function returns 1 if the all of the individual bin
385 ranges of the two histograms are identical, and 0
389 @deftypefun int gsl_histogram_add (gsl_histogram * @var{h1}, const gsl_histogram * @var{h2})
390 This function adds the contents of the bins in histogram @var{h2} to the
391 corresponding bins of histogram @var{h1}, i.e. @math{h'_1(i) = h_1(i) +
392 h_2(i)}. The two histograms must have identical bin ranges.
395 @deftypefun int gsl_histogram_sub (gsl_histogram * @var{h1}, const gsl_histogram * @var{h2})
396 This function subtracts the contents of the bins in histogram @var{h2}
397 from the corresponding bins of histogram @var{h1}, i.e. @math{h'_1(i) =
398 h_1(i) - h_2(i)}. The two histograms must have identical bin ranges.
401 @deftypefun int gsl_histogram_mul (gsl_histogram * @var{h1}, const gsl_histogram * @var{h2})
402 This function multiplies the contents of the bins of histogram @var{h1}
403 by the contents of the corresponding bins in histogram @var{h2},
404 i.e. @math{h'_1(i) = h_1(i) * h_2(i)}. The two histograms must have
405 identical bin ranges.
408 @deftypefun int gsl_histogram_div (gsl_histogram * @var{h1}, const gsl_histogram * @var{h2})
409 This function divides the contents of the bins of histogram @var{h1} by
410 the contents of the corresponding bins in histogram @var{h2},
411 i.e. @math{h'_1(i) = h_1(i) / h_2(i)}. The two histograms must have
412 identical bin ranges.
415 @deftypefun int gsl_histogram_scale (gsl_histogram * @var{h}, double @var{scale})
416 This function multiplies the contents of the bins of histogram @var{h}
417 by the constant @var{scale}, i.e. @c{$h'_1(i) = h_1(i) * \hbox{\it scale}$}
418 @math{h'_1(i) = h_1(i) * scale}.
421 @deftypefun int gsl_histogram_shift (gsl_histogram * @var{h}, double @var{offset})
422 This function shifts the contents of the bins of histogram @var{h} by
423 the constant @var{offset}, i.e. @c{$h'_1(i) = h_1(i) + \hbox{\it offset}$}
424 @math{h'_1(i) = h_1(i) + offset}.
427 @node Reading and writing histograms
428 @section Reading and writing histograms
430 The library provides functions for reading and writing histograms to a file
431 as binary data or formatted text.
433 @deftypefun int gsl_histogram_fwrite (FILE * @var{stream}, const gsl_histogram * @var{h})
434 This function writes the ranges and bins of the histogram @var{h} to the
435 stream @var{stream} in binary format. The return value is 0 for success
436 and @code{GSL_EFAILED} if there was a problem writing to the file. Since
437 the data is written in the native binary format it may not be portable
438 between different architectures.
441 @deftypefun int gsl_histogram_fread (FILE * @var{stream}, gsl_histogram * @var{h})
442 This function reads into the histogram @var{h} from the open stream
443 @var{stream} in binary format. The histogram @var{h} must be
444 preallocated with the correct size since the function uses the number of
445 bins in @var{h} to determine how many bytes to read. The return value is
446 0 for success and @code{GSL_EFAILED} if there was a problem reading from
447 the file. The data is assumed to have been written in the native binary
448 format on the same architecture.
451 @deftypefun int gsl_histogram_fprintf (FILE * @var{stream}, const gsl_histogram * @var{h}, const char * @var{range_format}, const char * @var{bin_format})
452 This function writes the ranges and bins of the histogram @var{h}
453 line-by-line to the stream @var{stream} using the format specifiers
454 @var{range_format} and @var{bin_format}. These should be one of the
455 @code{%g}, @code{%e} or @code{%f} formats for floating point
456 numbers. The function returns 0 for success and @code{GSL_EFAILED} if
457 there was a problem writing to the file. The histogram output is
458 formatted in three columns, and the columns are separated by spaces,
462 range[0] range[1] bin[0]
463 range[1] range[2] bin[1]
464 range[2] range[3] bin[2]
466 range[n-1] range[n] bin[n-1]
470 The values of the ranges are formatted using @var{range_format} and the
471 value of the bins are formatted using @var{bin_format}. Each line
472 contains the lower and upper limit of the range of the bins and the
473 value of the bin itself. Since the upper limit of one bin is the lower
474 limit of the next there is duplication of these values between lines but
475 this allows the histogram to be manipulated with line-oriented tools.
478 @deftypefun int gsl_histogram_fscanf (FILE * @var{stream}, gsl_histogram * @var{h})
479 This function reads formatted data from the stream @var{stream} into the
480 histogram @var{h}. The data is assumed to be in the three-column format
481 used by @code{gsl_histogram_fprintf}. The histogram @var{h} must be
482 preallocated with the correct length since the function uses the size of
483 @var{h} to determine how many numbers to read. The function returns 0
484 for success and @code{GSL_EFAILED} if there was a problem reading from
488 @node Resampling from histograms
489 @section Resampling from histograms
490 @cindex resampling from histograms
491 @cindex sampling from histograms
492 @cindex probability distributions, from histograms
494 A histogram made by counting events can be regarded as a measurement of
495 a probability distribution. Allowing for statistical error, the height
496 of each bin represents the probability of an event where the value of
497 @math{x} falls in the range of that bin. The probability distribution
498 function has the one-dimensional form @math{p(x)dx} where,
514 In this equation @math{n_i} is the number of events in the bin which
515 contains @math{x}, @math{w_i} is the width of the bin and @math{N} is
516 the total number of events. The distribution of events within each bin
517 is assumed to be uniform.
519 @node The histogram probability distribution struct
520 @section The histogram probability distribution struct
521 @cindex probability distribution, from histogram
522 @cindex sampling from histograms
523 @cindex random sampling from histograms
524 @cindex histograms, random sampling from
525 The probability distribution function for a histogram consists of a set
526 of @dfn{bins} which measure the probability of an event falling into a
527 given range of a continuous variable @math{x}. A probability
528 distribution function is defined by the following struct, which actually
529 stores the cumulative probability distribution function. This is the
530 natural quantity for generating samples via the inverse transform
531 method, because there is a one-to-one mapping between the cumulative
532 probability distribution and the range [0,1]. It can be shown that by
533 taking a uniform random number in this range and finding its
534 corresponding coordinate in the cumulative probability distribution we
535 obtain samples with the desired probability distribution.
537 @deftp {Data Type} {gsl_histogram_pdf}
540 This is the number of bins used to approximate the probability
541 distribution function.
543 The ranges of the bins are stored in an array of @math{@var{n}+1} elements
544 pointed to by @var{range}.
546 The cumulative probability for the bins is stored in an array of
547 @var{n} elements pointed to by @var{sum}.
553 The following functions allow you to create a @code{gsl_histogram_pdf}
554 struct which represents this probability distribution and generate
555 random samples from it.
557 @deftypefun {gsl_histogram_pdf *} gsl_histogram_pdf_alloc (size_t @var{n})
558 This function allocates memory for a probability distribution with
559 @var{n} bins and returns a pointer to a newly initialized
560 @code{gsl_histogram_pdf} struct. If insufficient memory is available a
561 null pointer is returned and the error handler is invoked with an error
562 code of @code{GSL_ENOMEM}.
565 @deftypefun int gsl_histogram_pdf_init (gsl_histogram_pdf * @var{p}, const gsl_histogram * @var{h})
566 This function initializes the probability distribution @var{p} with
567 the contents of the histogram @var{h}. If any of the bins of @var{h} are
568 negative then the error handler is invoked with an error code of
569 @code{GSL_EDOM} because a probability distribution cannot contain
573 @deftypefun void gsl_histogram_pdf_free (gsl_histogram_pdf * @var{p})
574 This function frees the probability distribution function @var{p} and
575 all of the memory associated with it.
578 @deftypefun double gsl_histogram_pdf_sample (const gsl_histogram_pdf * @var{p}, double @var{r})
579 This function uses @var{r}, a uniform random number between zero and
580 one, to compute a single random sample from the probability distribution
581 @var{p}. The algorithm used to compute the sample @math{s} is given by
582 the following formula,
586 s = \hbox{range}[i] + \delta * (\hbox{range}[i+1] - \hbox{range}[i])
593 s = range[i] + delta * (range[i+1] - range[i])
598 where @math{i} is the index which satisfies
599 @c{$sum[i] \le r < sum[i+1]$}
600 @math{sum[i] <= r < sum[i+1]} and
602 @c{$(r - sum[i])/(sum[i+1] - sum[i])$}
603 @math{(r - sum[i])/(sum[i+1] - sum[i])}.
606 @node Example programs for histograms
607 @section Example programs for histograms
609 The following program shows how to make a simple histogram of a column
610 of numerical data supplied on @code{stdin}. The program takes three
611 arguments, specifying the upper and lower bounds of the histogram and
612 the number of bins. It then reads numbers from @code{stdin}, one line at
613 a time, and adds them to the histogram. When there is no more data to
614 read it prints out the accumulated histogram using
615 @code{gsl_histogram_fprintf}.
618 @verbatiminclude examples/histogram.c
622 Here is an example of the program in use. We generate 10000 random
623 samples from a Cauchy distribution with a width of 30 and histogram
624 them over the range -100 to 100, using 200 bins.
627 $ gsl-randist 0 10000 cauchy 30
628 | gsl-histogram -100 100 200 > histogram.dat
632 A plot of the resulting histogram shows the familiar shape of the
633 Cauchy distribution and the fluctuations caused by the finite sample
637 $ awk '@{print $1, $3 ; print $2, $3@}' histogram.dat
643 @center @image{histogram,3.0in,2.8in}
646 @node Two dimensional histograms
647 @section Two dimensional histograms
648 @cindex two dimensional histograms
649 @cindex 2D histograms
651 A two dimensional histogram consists of a set of @dfn{bins} which count
652 the number of events falling in a given area of the @math{(x,y)}
653 plane. The simplest way to use a two dimensional histogram is to record
654 two-dimensional position information, @math{n(x,y)}. Another possibility
655 is to form a @dfn{joint distribution} by recording related
656 variables. For example a detector might record both the position of an
657 event (@math{x}) and the amount of energy it deposited @math{E}. These
658 could be histogrammed as the joint distribution @math{n(x,E)}.
660 @node The 2D histogram struct
661 @section The 2D histogram struct
663 Two dimensional histograms are defined by the following struct,
665 @deftp {Data Type} {gsl_histogram2d}
668 This is the number of histogram bins in the x and y directions.
669 @item double * xrange
670 The ranges of the bins in the x-direction are stored in an array of
671 @math{@var{nx} + 1} elements pointed to by @var{xrange}.
672 @item double * yrange
673 The ranges of the bins in the y-direction are stored in an array of
674 @math{@var{ny} + 1} elements pointed to by @var{yrange}.
676 The counts for each bin are stored in an array pointed to by @var{bin}.
677 The bins are floating-point numbers, so you can increment them by
678 non-integer values if necessary. The array @var{bin} stores the two
679 dimensional array of bins in a single block of memory according to the
680 mapping @code{bin(i,j)} = @code{bin[i * ny + j]}.
686 The range for @code{bin(i,j)} is given by @code{xrange[i]} to
687 @code{xrange[i+1]} in the x-direction and @code{yrange[j]} to
688 @code{yrange[j+1]} in the y-direction. Each bin is inclusive at the lower
689 end and exclusive at the upper end. Mathematically this means that the
690 bins are defined by the following inequality,
695 \hbox{bin(i,j) corresponds to} &
696 \hbox{\it xrange}[i] \le x < \hbox{\it xrange}[i+1] \cr
697 \hbox{and} & \hbox{\it yrange}[j] \le y < \hbox{\it yrange}[j+1]}
703 bin(i,j) corresponds to xrange[i] <= x < xrange[i+1]
704 and yrange[j] <= y < yrange[j+1]
709 Note that any samples which fall on the upper sides of the histogram are
710 excluded. If you want to include these values for the side bins you will
711 need to add an extra row or column to your histogram.
713 The @code{gsl_histogram2d} struct and its associated functions are
714 defined in the header file @file{gsl_histogram2d.h}.
716 @node 2D Histogram allocation
717 @section 2D Histogram allocation
719 The functions for allocating memory to a 2D histogram follow the style
720 of @code{malloc} and @code{free}. In addition they also perform their
721 own error checking. If there is insufficient memory available to
722 allocate a histogram then the functions call the error handler (with
723 an error number of @code{GSL_ENOMEM}) in addition to returning a null
724 pointer. Thus if you use the library error handler to abort your program
725 then it isn't necessary to check every 2D histogram @code{alloc}.
727 @deftypefun {gsl_histogram2d *} gsl_histogram2d_alloc (size_t @var{nx}, size_t @var{ny})
728 This function allocates memory for a two-dimensional histogram with
729 @var{nx} bins in the x direction and @var{ny} bins in the y direction.
730 The function returns a pointer to a newly created @code{gsl_histogram2d}
731 struct. If insufficient memory is available a null pointer is returned
732 and the error handler is invoked with an error code of
733 @code{GSL_ENOMEM}. The bins and ranges must be initialized with one of
734 the functions below before the histogram is ready for use.
737 @comment @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc (size_t @var{nx}, size_t @var{ny})
738 @comment This function allocates memory for a two-dimensional histogram with
739 @comment @var{nx} bins in the x direction and @var{ny} bins in the y
740 @comment direction. The function returns a pointer to a newly initialized
741 @comment @code{gsl_histogram2d} struct. The bins are uniformly spaced with a
742 @comment total range of
743 @comment @c{$0 \le x < nx$}
744 @comment @math{0 <= x < nx} in the x-direction and
745 @comment @c{$0 \le y < ny$}
746 @comment @math{0 <= y < ny} in the y-direction, as shown in the table below.
748 @comment The bins are initialized to zero so the histogram is ready for use.
750 @comment If insufficient memory is available a null pointer is returned and the
751 @comment error handler is invoked with an error code of @code{GSL_ENOMEM}.
752 @comment @end deftypefun
754 @comment @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_uniform (size_t @var{nx}, size_t @var{ny}, double @var{xmin}, double @var{xmax}, double @var{ymin}, double @var{ymax})
755 @comment This function allocates a histogram of size @var{nx}-by-@var{ny} which
756 @comment uniformly covers the ranges @var{xmin} to @var{xmax} and @var{ymin} to
757 @comment @var{ymax} in the @math{x} and @math{y} directions respectively.
758 @comment @end deftypefun
760 @comment @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_range (size_t @var{nx}, size_t @var{ny}, double * @var{xrange}, double * @var{yrange})
761 @comment This function allocates a histogram of size @var{nx}-by-@var{ny} using
762 @comment the @math{nx+1} and @math{ny+1} bin ranges specified by the arrays
763 @comment @var{xrange} and @var{xyrange}.
764 @comment @end deftypefun
766 @deftypefun int gsl_histogram2d_set_ranges (gsl_histogram2d * @var{h}, const double @var{xrange}[], size_t @var{xsize}, const double @var{yrange}[], size_t @var{ysize})
767 This function sets the ranges of the existing histogram @var{h} using
768 the arrays @var{xrange} and @var{yrange} of size @var{xsize} and
769 @var{ysize} respectively. The values of the histogram bins are reset to
773 @deftypefun int gsl_histogram2d_set_ranges_uniform (gsl_histogram2d * @var{h}, double @var{xmin}, double @var{xmax}, double @var{ymin}, double @var{ymax})
774 This function sets the ranges of the existing histogram @var{h} to cover
775 the ranges @var{xmin} to @var{xmax} and @var{ymin} to @var{ymax}
776 uniformly. The values of the histogram bins are reset to zero.
779 @deftypefun void gsl_histogram2d_free (gsl_histogram2d * @var{h})
780 This function frees the 2D histogram @var{h} and all of the memory
784 @node Copying 2D Histograms
785 @section Copying 2D Histograms
787 @deftypefun int gsl_histogram2d_memcpy (gsl_histogram2d * @var{dest}, const gsl_histogram2d * @var{src})
788 This function copies the histogram @var{src} into the pre-existing
789 histogram @var{dest}, making @var{dest} into an exact copy of @var{src}.
790 The two histograms must be of the same size.
793 @deftypefun {gsl_histogram2d *} gsl_histogram2d_clone (const gsl_histogram2d * @var{src})
794 This function returns a pointer to a newly created histogram which is an
795 exact copy of the histogram @var{src}.
798 @node Updating and accessing 2D histogram elements
799 @section Updating and accessing 2D histogram elements
801 You can access the bins of a two-dimensional histogram either by
802 specifying a pair of @math{(x,y)} coordinates or by using the bin
803 indices @math{(i,j)} directly. The functions for accessing the histogram
804 through @math{(x,y)} coordinates use binary searches in the x and y
805 directions to identify the bin which covers the appropriate range.
807 @deftypefun int gsl_histogram2d_increment (gsl_histogram2d * @var{h}, double @var{x}, double @var{y})
808 This function updates the histogram @var{h} by adding one (1.0) to the
809 bin whose x and y ranges contain the coordinates (@var{x},@var{y}).
811 If the point @math{(x,y)} lies inside the valid ranges of the
812 histogram then the function returns zero to indicate success. If
813 @math{(x,y)} lies outside the limits of the histogram then the
814 function returns @code{GSL_EDOM}, and none of the bins are modified. The
815 error handler is not called, since it is often necessary to compute
816 histograms for a small range of a larger dataset, ignoring any
817 coordinates outside the range of interest.
820 @deftypefun int gsl_histogram2d_accumulate (gsl_histogram2d * @var{h}, double @var{x}, double @var{y}, double @var{weight})
821 This function is similar to @code{gsl_histogram2d_increment} but increases
822 the value of the appropriate bin in the histogram @var{h} by the
823 floating-point number @var{weight}.
826 @deftypefun double gsl_histogram2d_get (const gsl_histogram2d * @var{h}, size_t @var{i}, size_t @var{j})
827 This function returns the contents of the (@var{i},@var{j})-th bin of the
828 histogram @var{h}. If (@var{i},@var{j}) lies outside the valid range of
829 indices for the histogram then the error handler is called with an error
830 code of @code{GSL_EDOM} and the function returns 0.
833 @deftypefun int gsl_histogram2d_get_xrange (const gsl_histogram2d * @var{h}, size_t @var{i}, double * @var{xlower}, double * @var{xupper})
834 @deftypefunx int gsl_histogram2d_get_yrange (const gsl_histogram2d * @var{h}, size_t @var{j}, double * @var{ylower}, double * @var{yupper})
835 These functions find the upper and lower range limits of the @var{i}-th
836 and @var{j}-th bins in the x and y directions of the histogram @var{h}.
837 The range limits are stored in @var{xlower} and @var{xupper} or
838 @var{ylower} and @var{yupper}. The lower limits are inclusive
839 (i.e. events with these coordinates are included in the bin) and the
840 upper limits are exclusive (i.e. events with the value of the upper
841 limit are not included and fall in the neighboring higher bin, if it
842 exists). The functions return 0 to indicate success. If @var{i} or
843 @var{j} lies outside the valid range of indices for the histogram then
844 the error handler is called with an error code of @code{GSL_EDOM}.
847 @deftypefun double gsl_histogram2d_xmax (const gsl_histogram2d * @var{h})
848 @deftypefunx double gsl_histogram2d_xmin (const gsl_histogram2d * @var{h})
849 @deftypefunx size_t gsl_histogram2d_nx (const gsl_histogram2d * @var{h})
850 @deftypefunx double gsl_histogram2d_ymax (const gsl_histogram2d * @var{h})
851 @deftypefunx double gsl_histogram2d_ymin (const gsl_histogram2d * @var{h})
852 @deftypefunx size_t gsl_histogram2d_ny (const gsl_histogram2d * @var{h})
853 These functions return the maximum upper and minimum lower range limits
854 and the number of bins for the x and y directions of the histogram
855 @var{h}. They provide a way of determining these values without
856 accessing the @code{gsl_histogram2d} struct directly.
859 @deftypefun void gsl_histogram2d_reset (gsl_histogram2d * @var{h})
860 This function resets all the bins of the histogram @var{h} to zero.
863 @node Searching 2D histogram ranges
864 @section Searching 2D histogram ranges
866 The following functions are used by the access and update routines to
867 locate the bin which corresponds to a given @math{(x,y)} coordinate.
869 @deftypefun int gsl_histogram2d_find (const gsl_histogram2d * @var{h}, double @var{x}, double @var{y}, size_t * @var{i}, size_t * @var{j})
870 This function finds and sets the indices @var{i} and @var{j} to the to
871 the bin which covers the coordinates (@var{x},@var{y}). The bin is
872 located using a binary search. The search includes an optimization for
873 histograms with uniform ranges, and will return the correct bin immediately
874 in this case. If @math{(x,y)} is found then the function sets the
875 indices (@var{i},@var{j}) and returns @code{GSL_SUCCESS}. If
876 @math{(x,y)} lies outside the valid range of the histogram then the
877 function returns @code{GSL_EDOM} and the error handler is invoked.
880 @node 2D Histogram Statistics
881 @section 2D Histogram Statistics
883 @deftypefun double gsl_histogram2d_max_val (const gsl_histogram2d * @var{h})
884 This function returns the maximum value contained in the histogram bins.
887 @deftypefun void gsl_histogram2d_max_bin (const gsl_histogram2d * @var{h}, size_t * @var{i}, size_t * @var{j})
888 This function finds the indices of the bin containing the maximum value
889 in the histogram @var{h} and stores the result in (@var{i},@var{j}). In
890 the case where several bins contain the same maximum value the first bin
894 @deftypefun double gsl_histogram2d_min_val (const gsl_histogram2d * @var{h})
895 This function returns the minimum value contained in the histogram bins.
898 @deftypefun void gsl_histogram2d_min_bin (const gsl_histogram2d * @var{h}, size_t * @var{i}, size_t * @var{j})
899 This function finds the indices of the bin containing the minimum value
900 in the histogram @var{h} and stores the result in (@var{i},@var{j}). In
901 the case where several bins contain the same maximum value the first bin
905 @deftypefun double gsl_histogram2d_xmean (const gsl_histogram2d * @var{h})
906 This function returns the mean of the histogrammed x variable, where the
907 histogram is regarded as a probability distribution. Negative bin values
908 are ignored for the purposes of this calculation.
911 @deftypefun double gsl_histogram2d_ymean (const gsl_histogram2d * @var{h})
912 This function returns the mean of the histogrammed y variable, where the
913 histogram is regarded as a probability distribution. Negative bin values
914 are ignored for the purposes of this calculation.
917 @deftypefun double gsl_histogram2d_xsigma (const gsl_histogram2d * @var{h})
918 This function returns the standard deviation of the histogrammed
919 x variable, where the histogram is regarded as a probability
920 distribution. Negative bin values are ignored for the purposes of this
924 @deftypefun double gsl_histogram2d_ysigma (const gsl_histogram2d * @var{h})
925 This function returns the standard deviation of the histogrammed
926 y variable, where the histogram is regarded as a probability
927 distribution. Negative bin values are ignored for the purposes of this
931 @deftypefun double gsl_histogram2d_cov (const gsl_histogram2d * @var{h})
932 This function returns the covariance of the histogrammed x and y
933 variables, where the histogram is regarded as a probability
934 distribution. Negative bin values are ignored for the purposes of this
938 @deftypefun double gsl_histogram2d_sum (const gsl_histogram2d * @var{h})
939 This function returns the sum of all bin values. Negative bin values
940 are included in the sum.
943 @node 2D Histogram Operations
944 @section 2D Histogram Operations
946 @deftypefun int gsl_histogram2d_equal_bins_p (const gsl_histogram2d * @var{h1}, const gsl_histogram2d * @var{h2})
947 This function returns 1 if all the individual bin ranges of the two
948 histograms are identical, and 0 otherwise.
951 @deftypefun int gsl_histogram2d_add (gsl_histogram2d * @var{h1}, const gsl_histogram2d * @var{h2})
952 This function adds the contents of the bins in histogram @var{h2} to the
953 corresponding bins of histogram @var{h1},
954 i.e. @math{h'_1(i,j) = h_1(i,j) + h_2(i,j)}.
955 The two histograms must have identical bin ranges.
958 @deftypefun int gsl_histogram2d_sub (gsl_histogram2d * @var{h1}, const gsl_histogram2d * @var{h2})
959 This function subtracts the contents of the bins in histogram @var{h2} from the
960 corresponding bins of histogram @var{h1},
961 i.e. @math{h'_1(i,j) = h_1(i,j) - h_2(i,j)}.
962 The two histograms must have identical bin ranges.
965 @deftypefun int gsl_histogram2d_mul (gsl_histogram2d * @var{h1}, const gsl_histogram2d * @var{h2})
966 This function multiplies the contents of the bins of histogram @var{h1}
967 by the contents of the corresponding bins in histogram @var{h2},
968 i.e. @math{h'_1(i,j) = h_1(i,j) * h_2(i,j)}.
969 The two histograms must have identical bin ranges.
972 @deftypefun int gsl_histogram2d_div (gsl_histogram2d * @var{h1}, const gsl_histogram2d * @var{h2})
973 This function divides the contents of the bins of histogram @var{h1}
974 by the contents of the corresponding bins in histogram @var{h2},
975 i.e. @math{h'_1(i,j) = h_1(i,j) / h_2(i,j)}.
976 The two histograms must have identical bin ranges.
979 @deftypefun int gsl_histogram2d_scale (gsl_histogram2d * @var{h}, double @var{scale})
980 This function multiplies the contents of the bins of histogram @var{h}
981 by the constant @var{scale}, i.e. @c{$h'_1(i,j) = h_1(i,j) * \hbox{\it scale}$}
982 @math{h'_1(i,j) = h_1(i,j) scale}.
985 @deftypefun int gsl_histogram2d_shift (gsl_histogram2d * @var{h}, double @var{offset})
986 This function shifts the contents of the bins of histogram @var{h}
987 by the constant @var{offset}, i.e. @c{$h'_1(i,j) = h_1(i,j) + \hbox{\it offset}$}
988 @math{h'_1(i,j) = h_1(i,j) + offset}.
991 @node Reading and writing 2D histograms
992 @section Reading and writing 2D histograms
994 The library provides functions for reading and writing two dimensional
995 histograms to a file as binary data or formatted text.
997 @deftypefun int gsl_histogram2d_fwrite (FILE * @var{stream}, const gsl_histogram2d * @var{h})
998 This function writes the ranges and bins of the histogram @var{h} to the
999 stream @var{stream} in binary format. The return value is 0 for success
1000 and @code{GSL_EFAILED} if there was a problem writing to the file. Since
1001 the data is written in the native binary format it may not be portable
1002 between different architectures.
1005 @deftypefun int gsl_histogram2d_fread (FILE * @var{stream}, gsl_histogram2d * @var{h})
1006 This function reads into the histogram @var{h} from the stream
1007 @var{stream} in binary format. The histogram @var{h} must be
1008 preallocated with the correct size since the function uses the number of
1009 x and y bins in @var{h} to determine how many bytes to read. The return
1010 value is 0 for success and @code{GSL_EFAILED} if there was a problem
1011 reading from the file. The data is assumed to have been written in the
1012 native binary format on the same architecture.
1015 @deftypefun int gsl_histogram2d_fprintf (FILE * @var{stream}, const gsl_histogram2d * @var{h}, const char * @var{range_format}, const char * @var{bin_format})
1016 This function writes the ranges and bins of the histogram @var{h}
1017 line-by-line to the stream @var{stream} using the format specifiers
1018 @var{range_format} and @var{bin_format}. These should be one of the
1019 @code{%g}, @code{%e} or @code{%f} formats for floating point
1020 numbers. The function returns 0 for success and @code{GSL_EFAILED} if
1021 there was a problem writing to the file. The histogram output is
1022 formatted in five columns, and the columns are separated by spaces,
1026 xrange[0] xrange[1] yrange[0] yrange[1] bin(0,0)
1027 xrange[0] xrange[1] yrange[1] yrange[2] bin(0,1)
1028 xrange[0] xrange[1] yrange[2] yrange[3] bin(0,2)
1030 xrange[0] xrange[1] yrange[ny-1] yrange[ny] bin(0,ny-1)
1032 xrange[1] xrange[2] yrange[0] yrange[1] bin(1,0)
1033 xrange[1] xrange[2] yrange[1] yrange[2] bin(1,1)
1034 xrange[1] xrange[2] yrange[1] yrange[2] bin(1,2)
1036 xrange[1] xrange[2] yrange[ny-1] yrange[ny] bin(1,ny-1)
1040 xrange[nx-1] xrange[nx] yrange[0] yrange[1] bin(nx-1,0)
1041 xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,1)
1042 xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,2)
1044 xrange[nx-1] xrange[nx] yrange[ny-1] yrange[ny] bin(nx-1,ny-1)
1048 Each line contains the lower and upper limits of the bin and the
1049 contents of the bin. Since the upper limits of the each bin are the
1050 lower limits of the neighboring bins there is duplication of these
1051 values but this allows the histogram to be manipulated with
1052 line-oriented tools.
1055 @deftypefun int gsl_histogram2d_fscanf (FILE * @var{stream}, gsl_histogram2d * @var{h})
1056 This function reads formatted data from the stream @var{stream} into the
1057 histogram @var{h}. The data is assumed to be in the five-column format
1058 used by @code{gsl_histogram2d_fprintf}. The histogram @var{h} must be
1059 preallocated with the correct lengths since the function uses the sizes
1060 of @var{h} to determine how many numbers to read. The function returns 0
1061 for success and @code{GSL_EFAILED} if there was a problem reading from
1065 @node Resampling from 2D histograms
1066 @section Resampling from 2D histograms
1068 As in the one-dimensional case, a two-dimensional histogram made by
1069 counting events can be regarded as a measurement of a probability
1070 distribution. Allowing for statistical error, the height of each bin
1071 represents the probability of an event where (@math{x},@math{y}) falls in
1072 the range of that bin. For a two-dimensional histogram the probability
1073 distribution takes the form @math{p(x,y) dx dy} where,
1077 p(x,y) = n_{ij}/ (N A_{ij})
1084 p(x,y) = n_@{ij@}/ (N A_@{ij@})
1091 @math{n_@{ij@}} is the number of events in the bin which
1092 contains @math{(x,y)},
1094 @math{A_@{ij@}} is the area of the bin and @math{N} is
1095 the total number of events. The distribution of events within each bin
1096 is assumed to be uniform.
1098 @deftp {Data Type} {gsl_histogram2d_pdf}
1101 This is the number of histogram bins used to approximate the probability
1102 distribution function in the x and y directions.
1103 @item double * xrange
1104 The ranges of the bins in the x-direction are stored in an array of
1105 @math{@var{nx} + 1} elements pointed to by @var{xrange}.
1106 @item double * yrange
1107 The ranges of the bins in the y-direction are stored in an array of
1108 @math{@var{ny} + 1} pointed to by @var{yrange}.
1110 The cumulative probability for the bins is stored in an array of
1111 @var{nx}*@var{ny} elements pointed to by @var{sum}.
1117 The following functions allow you to create a @code{gsl_histogram2d_pdf}
1118 struct which represents a two dimensional probability distribution and
1119 generate random samples from it.
1121 @deftypefun {gsl_histogram2d_pdf *} gsl_histogram2d_pdf_alloc (size_t @var{nx}, size_t @var{ny})
1122 This function allocates memory for a two-dimensional probability
1123 distribution of size @var{nx}-by-@var{ny} and returns a pointer to a
1124 newly initialized @code{gsl_histogram2d_pdf} struct. If insufficient
1125 memory is available a null pointer is returned and the error handler is
1126 invoked with an error code of @code{GSL_ENOMEM}.
1129 @deftypefun int gsl_histogram2d_pdf_init (gsl_histogram2d_pdf * @var{p}, const gsl_histogram2d * @var{h})
1130 This function initializes the two-dimensional probability distribution
1131 calculated @var{p} from the histogram @var{h}. If any of the bins of
1132 @var{h} are negative then the error handler is invoked with an error
1133 code of @code{GSL_EDOM} because a probability distribution cannot
1134 contain negative values.
1137 @deftypefun void gsl_histogram2d_pdf_free (gsl_histogram2d_pdf * @var{p})
1138 This function frees the two-dimensional probability distribution
1139 function @var{p} and all of the memory associated with it.
1142 @deftypefun int gsl_histogram2d_pdf_sample (const gsl_histogram2d_pdf * @var{p}, double @var{r1}, double @var{r2}, double * @var{x}, double * @var{y})
1143 This function uses two uniform random numbers between zero and one,
1144 @var{r1} and @var{r2}, to compute a single random sample from the
1145 two-dimensional probability distribution @var{p}.
1149 @node Example programs for 2D histograms
1150 @section Example programs for 2D histograms
1151 This program demonstrates two features of two-dimensional histograms.
1152 First a 10-by-10 two-dimensional histogram is created with x and y running
1153 from 0 to 1. Then a few sample points are added to the histogram, at
1154 (0.3,0.3) with a height of 1, at (0.8,0.1) with a height of 5 and at
1155 (0.7,0.9) with a height of 0.5. This histogram with three events is
1156 used to generate a random sample of 1000 simulated events, which are
1160 @verbatiminclude examples/histogram2d.c
1165 The following plot shows the distribution of the simulated events. Using
1166 a higher resolution grid we can see the original underlying histogram
1167 and also the statistical fluctuations caused by the events being
1168 uniformly distributed over the area of the original bins.
1171 @center @image{histogram2d,3.4in}