Outliers detection for functional data by depth measures Manuel Febrero1 , Pedro Galeano1 and Wenceslao González-Manteiga1 1 Departamento de Estadı́stica e Investigación Operativa, Universidad de Santiago de Compostela, Spain Abstract: This paper analyzes outlier detection for functional data by means of depth measures. We provide some insights of the usefulness of looking for outliers in functional datasets. We propose a method based in functional depths for outlier detection which relies on a boostrap procedure and it is valid for any univariate data depth. We illustrate the procedure with several Monte Carlo experiments and a real data example. Keywords: Depths; Functional median; Functional Trimmed mean; Outliers. 1 Introduction Functional data analysis deals with situations in which the individual observed data are curves. In the recent years, a lot of effort has being made to develop statistical methods for functional data analysis. For instance, principal component analysis has been analyzed by Dauxois, Pousse and Romain (1982), Silverman (1996) and Boente and Fraiman (2000), regression with functional regressors and real and functional responses has been studied by Cardot, Ferraty and Sarda (1999), Cuevas, Febrero and Fraiman (2003) and Cai and Hall (2006), classification and discrimination of random curves has been considered by Ferraty and Vieu (2003), while analysis of variance has been developed by Cuevas, Febrero and Fraiman (2004). Monographs on functional data analysis, including these and other related methods are Ramsay and Silverman (2004, 2005), which provide a large catalog of methods and case studies for handle functional data in fields such as economics, medicine, meteorology or growth analysis, and Ferraty and Vieu (2006), which present a non-parametric approach for analyzing functional samples. Although the presence of outliers may have significative impact on statistical methodology in many different ways, the analysis of outliers in functional data has been seldom addressed. For instance, some robust estimates of the center of the functional distribution has been proposed, such as trimmed means for functional data based on functional depths, analyzed 2 Outliers detection for functional data by Fraiman and Muniz (2001), which depending on the trimming, range from the mean to the median. As far as we know, an outlier detection procedure for functional data has not been considered yet. In this paper we try to fill this gap, by proposing such a procedure based on functional depth measures. In particular, a curve among the dataset is considered as an outlier if its depth with respect the other curves is small enough compared with the rest of curves. A bootstrap procedure is proposed to obtain percentiles of the distribution of functional depths of the curves. The rest of this work is organized as follows. We first review depth measures for functional data. Then, we provide a formal definition of outlier in functional samples and propose an outlier detection procedure for functional observations based on data depths. The performance of the proposed procedure is analyzed by means of several Monte Carlo experiments, and it is illustrated with a real data example. Acknowledgments: The first and the third authors acknowledge financial support from grant MTM2005-00820. The second author acknowledges financial support by Xunta de Galicia under the Isidro Parga Pondal Program and Ministerio de Educación y Ciencia grant SEJ2004-03303. References Boente, G. and Fraiman, R. (2000). Kernel-based functional principal components. Statistics and Probability Letters, 48, 335-345. Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression. Annals of Statistics, In press. Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model. Statistics and Probability Letters, 45, 11-22. Cuevas, A., Febrero, M. and Fraiman, R. (2003). Linear functional regression: the case of fixed design and functional response. Canadian Journal of Statistics, 30, 285-300. Cuevas, A., Febrero, M. and Fraiman, R. (2004). An anova test for functional data. Computational Statistics and Data Analysis, 47, 111-122. Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal components analysis of a vector random function: some applications to statistical inference. Journal of Multivariate Analysis, 12, 136-154. Ferraty, F. and Vieu, P. (2003). Curves discrimination: a nonparametric functional approach. Computational Statistics and Data Analysis, 44, 161-173. Manuel Febrero et al. 3 Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: methods, theory, applications and implementations. Springer-Verlag, London. Fraiman, R. and Muniz, G. (2001). Trimmed means for functional data. Test, 10, 419-440. Ramsay, J. O. and Silverman, B. W. (2005). Applied Functional Data Analysis. Springer, New York. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd Edition. Springer, New York. Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm Annals of Statistics, 24, 1-24.