BARRIOT, Roland; SHERMAN, David James; DUTOUR, Isabelle

doi:10.1186/1471-2105-8-332

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Afficher la notice abrégée

hal.structure.identifier	Departement of Electrical Engineering-SCD [Leuven] [ESAT-SCD]
hal.structure.identifier	Centre de Bioinformatique de Bordeaux [CBIB]
dc.contributor.author	BARRIOT, Roland
hal.structure.identifier	Centre de Bioinformatique de Bordeaux [CBIB]
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Models and Algorithms for the Genome [MAGNOME]
dc.contributor.author	SHERMAN, David James
hal.structure.identifier	Centre de Bioinformatique de Bordeaux [CBIB]
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.author	DUTOUR, Isabelle
dc.date.accessioned	2024-04-15T09:56:14Z
dc.date.available	2024-04-15T09:56:14Z
dc.date.issued	2007
dc.identifier.issn	1471-2105
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198816
dc.description.abstractEn	ABSTRACT: BACKGROUND: The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth of available data, very little has been proposed in terms of formalization and optimization. Additionally, current methods mainly ignore the structure of the data which causes results redundancy. For example, when searching for enrichment in GO terms, genes can be annotated with multiple GO terms and should be propagated to the more general terms in the Gene Ontology. Consequently, the gene sets often overlap partially or totally, and this causes the reported enriched GO terms to be both numerous and redundant, hence, overwhelming the researcher with non-pertinent information. This situation is not unique, it arises whenever some hierarchical clustering is performed (e.g. based on the gene expression profiles), the extreme case being when genes that are neighbors on the chromosomes are considered. RESULTS: We present a generic framework to efficiently identify the most pertinent over-represented features in a set of genes. We propose a formal representation of gene sets based on the theory of partially ordered sets (posets), and give a formal definition of target set pertinence. Algorithms and compact representations of target sets are provided for the generation and the evaluation of the pertinent target sets. The relevance of our method is illustrated through the search for enriched GO annotations in the proteins involved in a multiprotein complex. The results obtained demonstrate the gain in terms of pertinence (up to 64% redundancy removed), space requirements (up to 73% less storage) and efficiency (up to 98% less comparisons). CONCLUSIONS: The generic framework presented in this article provides a formal approach to adequately represent available data and efficiently search for pertinent over-represented features in a set of genes or proteins. The formalism and the pertinence definition can be directly used by most of the methods and tools currently available for feature enrichment analysis.
dc.language.iso	en
dc.publisher	BioMed Central
dc.title.en	How to decide which are the most pertinent overly-represented features during gene set enrichment analysis
dc.type	Article de revue
dc.identifier.doi	10.1186/1471-2105-8-332
dc.subject.hal	Informatique [cs]/Bio-informatique [q-bio.QM]
dc.subject.hal	Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
bordeaux.journal	BMC Bioinformatics
bordeaux.volume	8
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.issue	1
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.peerReviewed	oui
hal.identifier	inria-00202721
hal.version	1
hal.popular	non
hal.audience	Internationale
dc.subject.it	classification
dc.subject.it	gene enrichment
dc.subject.it	gene ontology
dc.subject.it	data-mining
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00202721v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=BMC%20Bioinformatics&rft.date=2007&rft.volume=8&rft.issue=1&rft.eissn=1471-2105&rft.issn=1471-2105&rft.au=BARRIOT,%20Roland&SHERMAN,%20David%20James&DUTOUR,%20Isabelle&rft.genre=article

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Afficher la notice abrégée

How to decide which are the most pertinent overly-represented features during gene set enrichment analysis

Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)