Université Paris 6
Pierre et Marie Curie
Université Paris 7
Denis Diderot

CNRS U.M.R. 7599
``Probabilités et Modèles Aléatoires''

How many bins should be put in a regular histogram


Code(s) de Classification MSC:

Résumé: Given an $n$-sample from some unknown density $f$ on $[0,1]$, it is easy to construct an histogram of the data based on some given partition of $[0,1]$, but not so much is known about an optimal choice of the partition, especially when the set of data is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of $f$. Our purpose in this paper is to give a fully automatic and simple method to choose the number of bins of the partition from the data. It is based on a nonasymptotic evaluation of the performances of penalized maximum likelihood estimators in some exponential families due to Castellan and heavy simulations which allowed us to optimize the form of the penalty function. These simulations show that the method works quite well for sample sizes as small as 25.

Mots Clés: Regular histogram ; density estimation ; penalized maximum likelihood ; model selection

Date: 2002-04-12

Prépublication numéro: PMA-721

Pdf file : PMA-721.pdf