SOAP: Semantic Outliers Automatic Preprocessing
hal.structure.identifier | Instituto Tecnológico de Tijuana = Tijuana Institute of Technology [Tijuana] | |
dc.contributor.author | TRUJILLO, Leonardo | |
hal.structure.identifier | Instituto Tecnológico de Tijuana = Tijuana Institute of Technology [Tijuana] | |
dc.contributor.author | LOPEZ, Uriel | |
hal.structure.identifier | Quality control and dynamic reliability [CQFD] | |
dc.contributor.author | LEGRAND, Pierrick | |
dc.date.accessioned | 2024-04-04T02:55:00Z | |
dc.date.available | 2024-04-04T02:55:00Z | |
dc.date.issued | 2020 | |
dc.identifier.issn | 0020-0255 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/192323 | |
dc.description.abstractEn | Genetic Programming (GP) is an evolutionary algorithm for the automatic generation of symbolic models expressed as syntax trees. GP has been successfully applied in many domain, but most research in this area has not considered the presence of outliers in the training set. Outliers make supervised learning problems difficult, and sometimes impossible, to solve. For instance, robust regression methods cannot handle more than 50% of outlier contamination, referred to as their breakdown point. This paper studies problems where outlier contamination is high, reaching up to 90% contamination levels, extreme cases that can appear in some domains. This work shows, for the first time, that a random population of GP individuals can detect outliers in the output variable. From this property, a new filtering algorithm is proposed called Semantic Outlier Automatic Preprocessing (SOAP), which can be used with any learning algorithm to differentiate between inliers and outliers. Since the method uses a GP population, the algorithm can be carried out for free in a GP symbolic regression system. The approach is the only method that can perform such an automatic cleaning of a dataset without incurring an exponential cost as the percentage of outliers in the dataset increases. | |
dc.language.iso | en | |
dc.publisher | Elsevier | |
dc.title.en | SOAP: Semantic Outliers Automatic Preprocessing | |
dc.type | Article de revue | |
dc.identifier.doi | 10.1016/j.ins.2020.03.071 | |
dc.subject.hal | Informatique [cs]/Intelligence artificielle [cs.AI] | |
dc.description.sponsorshipEurope | Analysis and classification of mental states of vigilance with evolutionary computation | |
bordeaux.journal | Information Sciences | |
bordeaux.page | 20 | |
bordeaux.volume | 526 | |
bordeaux.hal.laboratories | Institut de Mathématiques de Bordeaux (IMB) - UMR 5251 | * |
bordeaux.issue | 81-101 | |
bordeaux.institution | Université de Bordeaux | |
bordeaux.institution | Bordeaux INP | |
bordeaux.institution | CNRS | |
bordeaux.peerReviewed | oui | |
hal.identifier | hal-02551161 | |
hal.version | 1 | |
hal.popular | non | |
hal.audience | Internationale | |
hal.origin.link | https://hal.archives-ouvertes.fr//hal-02551161v1 | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Information%20Sciences&rft.date=2020&rft.volume=526&rft.issue=81-101&rft.spage=20&rft.epage=20&rft.eissn=0020-0255&rft.issn=0020-0255&rft.au=TRUJILLO,%20Leonardo&LOPEZ,%20Uriel&LEGRAND,%20Pierrick&rft.genre=article |
Files in this item
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |