Show simple item record

hal.structure.identifierInstitute of Biomedical Engineering and Instrumentation
hal.structure.identifierModels and Algorithms for the Genome [ MAGNOME]
dc.contributor.authorDYRKA, Witold
hal.structure.identifierFaculty of Science
dc.contributor.authorNEBEL, Jean‐christophe
hal.structure.identifierInstitute of Biomedical Engineering and Instrumentation
dc.contributor.authorKOTULSKA, Malgorzata
dc.date.accessioned2024-04-15T09:42:02Z
dc.date.available2024-04-15T09:42:02Z
dc.date.issued2013
dc.identifier.issn1748-7188
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/197649
dc.description.abstractEnBackground<br />Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited.<br />Results<br />In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached A U C R O C of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites.<br />Conclusions<br />We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists.
dc.language.isoen
dc.publisherBioMed Central
dc.subject.enProbabilistic context-free grammar
dc.subject.enGrammar inference
dc.subject.enGenetic algorithm
dc.subject.enHelix-helix contact
dc.subject.enProtein structure prediction
dc.title.enProbabilistic grammatical model for helix‐helix contact site classification
dc.typeArticle de revue
dc.identifier.doi10.1186/1748-7188-8-31
dc.subject.halSciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Biologie moléculaire
bordeaux.journalAlgorithms for Molecular Biology
bordeaux.page31
bordeaux.volume8
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.issue1
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.peerReviewedoui
hal.identifierhal-00925929
hal.version1
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//hal-00925929v1
bordeaux.COinSctx_ver=Z39.88-2004&amp;rft_val_fmt=info:ofi/fmt:kev:mtx:journal&amp;rft.jtitle=Algorithms%20for%20Molecular%20Biology&amp;rft.date=2013&amp;rft.volume=8&amp;rft.issue=1&amp;rft.spage=31&amp;rft.epage=31&amp;rft.eissn=1748-7188&amp;rft.issn=1748-7188&amp;rft.au=DYRKA,%20Witold&amp;NEBEL,%20Jean%E2%80%90christophe&amp;KOTULSKA,%20Malgorzata&amp;rft.genre=article


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record