Outliers detection for functional data by depth measures

Anuncio
Outliers detection for functional data by
depth measures
Manuel Febrero1 , Pedro Galeano1 and Wenceslao
González-Manteiga1
1
Departamento de Estadı́stica e Investigación Operativa, Universidad de Santiago de Compostela, Spain
Abstract: This paper analyzes outlier detection for functional data by means of
depth measures. We provide some insights of the usefulness of looking for outliers
in functional datasets. We propose a method based in functional depths for outlier
detection which relies on a boostrap procedure and it is valid for any univariate
data depth. We illustrate the procedure with several Monte Carlo experiments
and a real data example.
Keywords: Depths; Functional median; Functional Trimmed mean; Outliers.
1
Introduction
Functional data analysis deals with situations in which the individual observed data are curves. In the recent years, a lot of effort has being made
to develop statistical methods for functional data analysis. For instance,
principal component analysis has been analyzed by Dauxois, Pousse and
Romain (1982), Silverman (1996) and Boente and Fraiman (2000), regression with functional regressors and real and functional responses has been
studied by Cardot, Ferraty and Sarda (1999), Cuevas, Febrero and Fraiman
(2003) and Cai and Hall (2006), classification and discrimination of random curves has been considered by Ferraty and Vieu (2003), while analysis
of variance has been developed by Cuevas, Febrero and Fraiman (2004).
Monographs on functional data analysis, including these and other related
methods are Ramsay and Silverman (2004, 2005), which provide a large
catalog of methods and case studies for handle functional data in fields
such as economics, medicine, meteorology or growth analysis, and Ferraty
and Vieu (2006), which present a non-parametric approach for analyzing
functional samples.
Although the presence of outliers may have significative impact on statistical methodology in many different ways, the analysis of outliers in
functional data has been seldom addressed. For instance, some robust estimates of the center of the functional distribution has been proposed, such
as trimmed means for functional data based on functional depths, analyzed
2
Outliers detection for functional data
by Fraiman and Muniz (2001), which depending on the trimming, range
from the mean to the median.
As far as we know, an outlier detection procedure for functional data has
not been considered yet. In this paper we try to fill this gap, by proposing
such a procedure based on functional depth measures. In particular, a curve
among the dataset is considered as an outlier if its depth with respect the
other curves is small enough compared with the rest of curves. A bootstrap
procedure is proposed to obtain percentiles of the distribution of functional
depths of the curves.
The rest of this work is organized as follows. We first review depth measures for functional data. Then, we provide a formal definition of outlier
in functional samples and propose an outlier detection procedure for functional observations based on data depths. The performance of the proposed
procedure is analyzed by means of several Monte Carlo experiments, and
it is illustrated with a real data example.
Acknowledgments: The first and the third authors acknowledge financial support from grant MTM2005-00820. The second author acknowledges
financial support by Xunta de Galicia under the Isidro Parga Pondal Program and Ministerio de Educación y Ciencia grant SEJ2004-03303.
References
Boente, G. and Fraiman, R. (2000). Kernel-based functional principal components. Statistics and Probability Letters, 48, 335-345.
Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression.
Annals of Statistics, In press.
Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model. Statistics and Probability Letters, 45, 11-22.
Cuevas, A., Febrero, M. and Fraiman, R. (2003). Linear functional regression: the case of fixed design and functional response. Canadian Journal of Statistics, 30, 285-300.
Cuevas, A., Febrero, M. and Fraiman, R. (2004). An anova test for functional data. Computational Statistics and Data Analysis, 47, 111-122.
Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the
principal components analysis of a vector random function: some applications to statistical inference. Journal of Multivariate Analysis,
12, 136-154.
Ferraty, F. and Vieu, P. (2003). Curves discrimination: a nonparametric
functional approach. Computational Statistics and Data Analysis, 44,
161-173.
Manuel Febrero et al.
3
Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis:
methods, theory, applications and implementations. Springer-Verlag,
London.
Fraiman, R. and Muniz, G. (2001). Trimmed means for functional data.
Test, 10, 419-440.
Ramsay, J. O. and Silverman, B. W. (2005). Applied Functional Data Analysis. Springer, New York.
Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd
Edition. Springer, New York.
Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm Annals of Statistics, 24, 1-24.
Descargar