Computing WSBM conditional marginals with Tensor-Train decomposition
ABOUABDALLAH, Mohamed
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Voir plus >
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
ABOUABDALLAH, Mohamed
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
FRANC, Alain
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
< Réduire
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Langue
en
Document de travail - Pré-publication
Résumé en anglais
The Weighted Stochastic Block Model (WSBM) is a statistical model for unsupervised clustering of individuals based on a pairwise distance matrix. The probabilities of group membership are computed as unary marginals of ...Lire la suite >
The Weighted Stochastic Block Model (WSBM) is a statistical model for unsupervised clustering of individuals based on a pairwise distance matrix. The probabilities of group membership are computed as unary marginals of the joint distribution conditionally to the distances whose exact evaluation with exact enumeration is out of reach beyond a few individuals. This distribution is factored into binary factors and we propose to build its exact Tensor-Train (TT) decomposition from the SVD of each factor. It has the advantage to lead to an expression with variables separation. We present how to exploit this decomposition to compute unary and binary conditional marginals of group membership. They are expressed without approximation as products of matrices involved in the TT decomposition. The implementation of the procedure faces several numerical challenges. First, the dimensions of the matrices involved grow faster than exponentially with the number of variables. We bypass this difficulty by using the format of TT-matrices. Second, the TT-rank of the products grows exponentially and we use a numerical approximation of matrices product that guarantees a low TT-rank, the rounding. We compare the TT approach with two classical inference methods, Mean-Field and the Gibbs Sampler, on the problem of binary conditional marginal inference for WSBM with various distances structures and up to fifty variables. The results lead to recommend the TT approach for its accuracy and reasonable computing time. Further researches should be devoted to the numerical difficulties for controlling the rank in rounding, to be able to deal with larger problems.< Réduire
Mots clés en anglais
Binary marginals
Weighted Stochastic Block Model
Variables separation
Tensor-Train format
low rank approximation
TT matrices
Origine
Importé de halUnités de recherche