Afficher la notice abrégée

hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorHAJJAR, Ghina
hal.structure.identifierBiologie du fruit et pathologie [BFP]
dc.contributor.authorBENABEN, David
hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorPAULHE, Nils
hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorDUPERIER, Christophe
hal.structure.identifierInstitut de Génétique, Environnement et Protection des Plantes [IGEPP]
dc.contributor.authorFILANGI, Olivier
hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorGIACOMONI, Franck
hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorCOMTE, Blandine
hal.structure.identifierPlateforme Exploration du Métabolisme [PFEM]
dc.contributor.authorPUJOS-GUILLOT, Estelle
dc.date.conference2022-09-05
dc.description.abstractEnSince the emergence of high throughput metabolomics, there has been a growing number of scientific communities performing metabolomic studies. Therefore, it has become crucial to standardize reporting and sharing of metabolites. Although minimum reporting standards for analytical practices and data processing are available, there are no established standards for metabolite reporting. In this context, our objective was to review the existing practices in terms of metabolite reporting in different scientific communities both in published results and across databases.In this context, we considered plasma metabolites reported in human large-scale studies from different communities, namely analytical chemistry, medicine and epidemiology. We focused only on metabolites reported as level 1 identification according to the Metabolomics Standard Initiative. We applied a data curation workflow on the list of annotated metabolites given by the authors. First, we performed a manual curation that included the addition of missing identifiers and the editing of some incoherent metadata. Second, we applied an automatic query algorithm in order to obtain additional information from available databases such as the compact hash code of the IUPAC International Chemical Identifier “InChIKey”. Identified metabolites were then compared between the selected studies using either the names given by the authors or the InChIKeys added after data curation. Regular inconsistencies were observed in metabolite reporting both in published results and across different databases. In the former, incoherence was observed in the metabolite information (identifiers not referring to the same isomer, metabolite name not corresponding to the molecular formula). Besides, isomers were listed with their corresponding retention times, yet without any indication of the isomers’ identity. On the other hand, cross-linking provided across databases presented some incoherent information regarding nomenclatures, optical isomerism, stereochemistry of asymmetric carbons, and molecular structure (acid/base; zwitterionic or canonical forms, molecules with a permanent charge) in addition to a mismatch between two structurally different compounds. The evaluation of metabolite reporting across different databases for instance HMDB, PubChem and ChEBI was performed with the help of the Metabolomics Semantic DataLake (MSD) team. Information was calculated from latest public versions of the aforementioned databases, under a Big Data infrastructure (Apache Spark) and Scala programming language. Based on the InChIKey, we were able to identify all incorrect metabolite matches in HMDB, PubChem and ChEBI and to categorize them into “structurally different compounds”, “optical isomerism” or “structural isomerism”.Although not yet required, the InChIKey was found to be the most suitable identifier for comparing reported metabolites between studies and across databases. It is therefore recommended either to use this identifier or to perform a deep data curation when reporting identified metabolites. This work will allow providing guidelines for a more effective and reproducible metabolomics data sharing.
dc.language.isoen
dc.title.enMetabolite reporting in large-scale studies within different metabolomics communities: DO WE SPEAK THE SAME LANGUAGE?
dc.typeAutre communication scientifique (congrès sans actes - poster - séminaire...)
dc.subject.halInformatique [cs]/Bio-informatique [q-bio.QM]
dc.subject.halChimie/Chimie analytique
dc.subject.halInformatique [cs]/Base de données [cs.DB]
dc.subject.halSciences du Vivant [q-bio]/Santé publique et épidémiologie
dc.subject.halSciences du Vivant [q-bio]/Alimentation et Nutrition
bordeaux.conference.titleAnalytics 2022
bordeaux.countryFR
bordeaux.conference.cityNantes
bordeaux.peerReviewedoui
hal.identifierhal-03775474
hal.version1
hal.invitednon
hal.proceedingsnon
hal.conference.end2022-09-08
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//hal-03775474v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=HAJJAR,%20Ghina&BENABEN,%20David&PAULHE,%20Nils&DUPERIER,%20Christophe&FILANGI,%20Olivier&rft.genre=conference


Fichier(s) constituant ce document

FichiersTailleFormatVue

Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée