DYRKA, Witold; THIRION, Florence; NEBEL, Jean-Christophe; KOTULSKA, Malgorzata

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

hal.structure.identifier	Models and Algorithms for the Genome [MAGNOME]
hal.structure.identifier	Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores
dc.contributor.author	DYRKA, Witold
hal.structure.identifier	Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores
dc.contributor.author	THIRION, Florence
hal.structure.identifier	Bioinformatics & Genomic Signal Processing Research Group
dc.contributor.author	NEBEL, Jean-Christophe
hal.structure.identifier	Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores
dc.contributor.author	KOTULSKA, Malgorzata
dc.date.accessioned	2024-04-15T09:41:51Z
dc.date.available	2024-04-15T09:41:51Z
dc.date.conference	2013-09-27
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/197636
dc.description.abstractEn	Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. We have developed a probabilistic grammatical framework for problem-specific protein languages, which has been already successfully applied to recognition of ligand binding sites. The core of the model consists of a probabilistic context-free grammar (PCFG), automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training sequences. Here, we show that the PCFG approach matches state-of-the-art performance in two other tasks: classification of transmembrane helix-helix pairs and recognition of amyloidogenic peptides. First, the framework was applied to produce grammar descriptors of four classes of transmembrane helix-helix contact sites. The highest performance of the classifiers reached AUC ROC of 0.70. Second, the analogous approach was used to distinguish between amyloidogenic and non-amyloidogenic protein fragments. It yielded good results whether these fragments were isolated or within an entire protein (AUC ROC up to 0.80). Finally, an attempt to model pairing amyloidogenic fragments resulted in classifiers reaching AUC ROC of 0.70. A significant feature of the PCFG method is that grammar rules and parse trees are human-readable, and thus could provide biologically meaningful information.
dc.language.iso	en
dc.title.en	Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides
dc.type	Autre communication scientifique (congrès sans actes - poster - séminaire...)
dc.subject.hal	Informatique [cs]/Bio-informatique [q-bio.QM]
dc.subject.hal	Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.conference.title	11th Workshop on Bioinformatics and 6th Symposium of the Polish Bioinformatics Society
bordeaux.country	PL
bordeaux.conference.city	Wroclaw
bordeaux.peerReviewed	oui
hal.identifier	hal-00937763
hal.version	1
hal.invited	non
hal.proceedings	non
hal.conference.end	2013-09-29
hal.popular	non
hal.audience	Nationale
hal.origin.link	https://hal.archives-ouvertes.fr//hal-00937763v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=DYRKA,%20Witold&THIRION,%20Florence&NEBEL,%20Jean-Christophe&KOTULSKA,%20Malgorzata&rft.genre=conference

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Afficher la notice abrégée

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)