Université Paris 6Pierre et Marie Curie Université Paris 7Denis Diderot CNRS U.M.R. 7599 Probabilités et Modèles Aléatoires''

### Aggregated estimators and empirical complexity for least square regression

Auteur(s):

Code(s) de Classification MSC:

• 62G08 Nonparametric regression
• 94A17 Measures of information, entropy
Résumé: Numerous empirical results have shown that combining regression procedures can be a very efficient method. This work provides PAC bounds for the $L^2$ generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the Kullback-Leibler divergence between the aggregating distribution $\rhoz$ and a prior distribution $\pi$ and by the empirical mean of the variance of the regression functions under the probability $\rhoz$. Secondly, by structural risk minimization, we derive an aggregating procedure which takes advantage of the unknown properties of the best mixture $\tildf$: when the best convex combination $\tildf$ of $d$ regression functions belongs to the $d$ initial functions (i.e. when combining does not make the bias decrease), the convergence rate is of order $(\log d) / N$.%\frac{\log d}{N}$. In the worst case, our combining procedure achieves a convergence rate of order$\sqrt{(\log d) / N}$%$\sqrt{\frac{\log d}{N}}$which is known to be optimal in a uniform sense when$d > \sqrt{N}$(see [9,13]). In support vector machines, we have to solve a$N$-dimensional linearly constrained quadratic problem. In our regression procedure, we have a$N\$-dimensional unconstrained minimization problem. As in AdaBoost, the mixture posterior distribution tends to favor functions which disagree with the mixture on mispredicted points. Our algorithm is tested on artificial classification data (which have been also used for testing other boosting methods, such as AdaBoost).