Université Paris 6
Pierre et Marie Curie
 
Université Paris 7
Denis Diderot

 
CNRS U.M.R. 7599
 
``Probabilités et Modèles Aléatoires'' 

Gibbs Histograms


This software implements the algorithm described in

O. Catoni Data compression and adaptive histograms, International Conference on Foundations of Computational Mathematics in honor of Professor Steve Smale's 70th Birthday July 13-17 2000

It is distributed under the GNU Lesser General Public License.

Package Download

Download the file gibbsHist.tar.gz and unpack it where you mean to install the package with the command tar xvzf gibbsHist.tar.gz. Then go to gibbsHist/src and execute make. After this you should be ready to run the tclsh8.0 scripts located in gibbsHist/data. You need tclsh8.0, however adapting the scripts to another language, such as bash, should be straightforward. You need also gnuplot, to output the graphics. This software has been developped under Suse Linux : I did not test it on other Unix platforms, although I do not see why it should not work as well (provided the proper utilities, such as gnumake and gnuplot, are installed).

Package Description

Under the directory gibbsHist, you will find the following subdirectories :

A mathematical description of the implemented algorithms and explanations about the meaning of the output are given in the above mentionned preprint.

Application Programming Interface

The program manipulates five data structures, defined in precode.h :
typedef struct {
    int size;
    double *first;
} Sample;
/* contains the size and values of a sample of points in the unit interval */
 
/* Warning : d->first has size d->size+1 , and d->first[0] = 0*/
typedef struct {
    int size;
    double *first;
    double *dens;
} Dist;
/* Defines the distribution of a histogram with dist->size bins of constant width.
 * dist->first is an array containing the cumulated distribution
 * function,
 * whereas dist->dens is an array of density values with respect to the
 * Lebesgue measure.
*/
 
typedef struct {
    int depth;
    int *first;
} Count;
/* a binary tree of counters.
 * tree nodes are indexed by integers, 1 for the root, 2 for its left son,
 * 3 for its right son, 4 for the left son of its left son etc.
*/
 
typedef struct {
    int depth;
    double *first;
} Weight;
/* a binary tree of weights, could represent different things : conditional
 * probability densities, logarithms of conditional densities ...
*/
 
/* possibly irregular histogram distribution */
/* Warning : dens->first has size dens->size+1 , and dens->first[0] = 0*/
typedef struct {
    int size;
    double *first;
    double *dens;
} Dens;
/* It sounds like Dist but is not used in the same way :
 * dens->first is an array of bin boundaries, whereas dist->dens contains
 * the corresponding densities. Unlike Dist, Dens represents a histogram
 * with variable bin widths.
*/
The following functions are implemented : The io functions are straightforward. Note that the output functions produce ascii files which do not keep all the precision of the data. Therefore, they should not be used in their present state to interface different applications, but only for debugging or to produce indicative graphics.
Contact author at : catoni@ccr.jussieu.fr