Statistical learning for OTUs identification
ABOUABDALLAH, Mohamed Anwar
Unité de Mathématiques et Informatique Appliquées de Toulouse [MIAT INRAE]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Unité de Mathématiques et Informatique Appliquées de Toulouse [MIAT INRAE]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
FRANC, Alain
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Voir plus >
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
ABOUABDALLAH, Mohamed Anwar
Unité de Mathématiques et Informatique Appliquées de Toulouse [MIAT INRAE]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Unité de Mathématiques et Informatique Appliquées de Toulouse [MIAT INRAE]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
FRANC, Alain
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
< Réduire
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Langue
en
Communication dans un congrès
Ce document a été publié dans
ISEC 2020 - International Statistical Ecology Conference, 2020-06-22, Sydney / Virtual.
Résumé en anglais
Statistical learning for OTUs identification: Molecular based inventories are currently made rountinely with metabarcoding. However, comparisons with optical based inventories are scarce in micro-organisms. Here, we study ...Lire la suite >
Statistical learning for OTUs identification: Molecular based inventories are currently made rountinely with metabarcoding. However, comparisons with optical based inventories are scarce in micro-organisms. Here, we study whether a morphological based taxonomy and unsupervized clustering of amplicons on a same dataset provide the same picture of diversity. For OTU building, we implement both HAC and a novel approach based on the Stochastic Block Models (SBM). Plants are among the best known organisms (both botanically and with molecular phylogenies). Therefore, we use a dataset of amplicons (trnH-psbA) of 1502 trees from an experimental plot in French Guiana, over a large spectrum of botanical diversity, identified by field botanists. We study whether the convergence/divergence of the 3 classifications depends on the taxonomic level addressed (order, family, genus). We deploy the HAC and test several aggregation methods. We deploy SBM with Poisson probability distribution to model the pattern of distances between sequences. Finally, we compare the 3 classifications we obtained by building contingency tables. Preliminary result show that the convergence of the three methods depends on the distribution of intra and inter-class distances. For instance, in Magnoliales they are well differentiated and convergence is very good, whereas for the Gentianales convergence is poor and distances are not well differentiated. Moreover, the SBM provides a matrix of parameters which quantify the connection between the classes. It is an excellent candidate for being a multivariate index of diversity, richer than a scalar one. Finally, we will discuss the issue of scaling of this approach to metabarcoding.< Réduire
Origine
Importé de halUnités de recherche