Filtering Outliers in One Step with Genetic Programming
LEGRAND, Pierrick
Université de Bordeaux [UB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Université de Bordeaux [UB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
LEGRAND, Pierrick
Université de Bordeaux [UB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
< Réduire
Université de Bordeaux [UB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Langue
en
Communication dans un congrès
Ce document a été publié dans
PPSN 2018 - Fifteenth International Conference on Parallel Problem Solving from Nature (PPSN XV), 2018-09-08, Coimbra. vol. LNCS - Lecture Notes in Computer Science, n° 11102
Springer
Résumé en anglais
Outliers are one of the most difficult issues when dealing with real-world modeling tasks. Even a small percentage of outliers can impede a learning algorithm’s ability to fit a dataset. While robust regression algorithms ...Lire la suite >
Outliers are one of the most difficult issues when dealing with real-world modeling tasks. Even a small percentage of outliers can impede a learning algorithm’s ability to fit a dataset. While robust regression algorithms exist, they fail when a dataset is corrupted by more than 50% of outliers (breakdown point). In the case of Genetic Programming, robust regression has not been properly studied. In this paper we present a method that works as a filter, removing outliers from the target variable (vertical outliers). The algorithm is simple, it uses a randomly generated population of GP trees to determine which target values should be labeled as outliers. The method is highly efficient. Results show that it can return a clean dataset when contamination reaches as high as 90%, and may be able to handle higher levels of contamination. In this study only synthetic univariate benchmarks are used to evaluate the approach, but it must be stressed that no other approaches can deal with such high levels of outlier contamination while requiring such small computational effort.< Réduire
Origine
Importé de halUnités de recherche