TRUJILLO, Leonardo; LOPEZ, Uriel; LEGRAND, Pierrick

doi:10.1016/j.ins.2020.03.071

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Afficher la notice abrégée

hal.structure.identifier	Instituto Tecnológico de Tijuana = Tijuana Institute of Technology [Tijuana]
dc.contributor.author	TRUJILLO, Leonardo
hal.structure.identifier	Instituto Tecnológico de Tijuana = Tijuana Institute of Technology [Tijuana]
dc.contributor.author	LOPEZ, Uriel
hal.structure.identifier	Quality control and dynamic reliability [CQFD]
dc.contributor.author	LEGRAND, Pierrick
dc.date.accessioned	2024-04-04T02:55:00Z
dc.date.available	2024-04-04T02:55:00Z
dc.date.issued	2020
dc.identifier.issn	0020-0255
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/192323
dc.description.abstractEn	Genetic Programming (GP) is an evolutionary algorithm for the automatic generation of symbolic models expressed as syntax trees. GP has been successfully applied in many domain, but most research in this area has not considered the presence of outliers in the training set. Outliers make supervised learning problems difficult, and sometimes impossible, to solve. For instance, robust regression methods cannot handle more than 50% of outlier contamination, referred to as their breakdown point. This paper studies problems where outlier contamination is high, reaching up to 90% contamination levels, extreme cases that can appear in some domains. This work shows, for the first time, that a random population of GP individuals can detect outliers in the output variable. From this property, a new filtering algorithm is proposed called Semantic Outlier Automatic Preprocessing (SOAP), which can be used with any learning algorithm to differentiate between inliers and outliers. Since the method uses a GP population, the algorithm can be carried out for free in a GP symbolic regression system. The approach is the only method that can perform such an automatic cleaning of a dataset without incurring an exponential cost as the percentage of outliers in the dataset increases.
dc.language.iso	en
dc.publisher	Elsevier
dc.title.en	SOAP: Semantic Outliers Automatic Preprocessing
dc.type	Article de revue
dc.identifier.doi	10.1016/j.ins.2020.03.071
dc.subject.hal	Informatique [cs]/Intelligence artificielle [cs.AI]
dc.description.sponsorshipEurope	Analysis and classification of mental states of vigilance with evolutionary computation
bordeaux.journal	Information Sciences
bordeaux.page	20
bordeaux.volume	526
bordeaux.hal.laboratories	Institut de Mathématiques de Bordeaux (IMB) - UMR 5251	*
bordeaux.issue	81-101
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.peerReviewed	oui
hal.identifier	hal-02551161
hal.version	1
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//hal-02551161v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Information%20Sciences&rft.date=2020&rft.volume=526&rft.issue=81-101&rft.spage=20&rft.epage=20&rft.eissn=0020-0255&rft.issn=0020-0255&rft.au=TRUJILLO,%20Leonardo&LOPEZ,%20Uriel&LEGRAND,%20Pierrick&rft.genre=article

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Institut de Mathématiques de Bordeaux (IMB) - UMR 5251

Afficher la notice abrégée

SOAP: Semantic Outliers Automatic Preprocessing

Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)