Computing WSBM marginals with Tensor-Train decomposition
ABOUABDALLAH, Mohamed
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
See more >
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
ABOUABDALLAH, Mohamed
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
FRANC, Alain
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
< Reduce
Biodiversité, Gènes & Communautés [BioGeCo]
Pleiade, from patterns to models in computational biodiversity and biotechnology [PLEIADE]
Language
en
Document de travail - Pré-publication
English Abstract
The Weighted Stochastic Block Model (WSBM) is a statistical model for unsupervised clustering of individuals based on a pairwise distance matrix. The probabilities of group membership are computed as unary marginals of the ...Read more >
The Weighted Stochastic Block Model (WSBM) is a statistical model for unsupervised clustering of individuals based on a pairwise distance matrix. The probabilities of group membership are computed as unary marginals of the joint conditional distribution of the WSBM, whose exact evaluation with brute force is out of reach beyond a few individuals. We propose to build an exact Tensor-Train (TT) decomposition of the multivariate joint distribution, from the SVD of each binary factor of a WSBM, which leads to variables separation. We present how to exploit this decomposition to compute unary and binary marginals. They are expressed without approximation as products of matrices involved in the TT decomposition. However, the implementation of the procedure faces several numerical challenges. First, the dimensions of the matrices involved grow faster than exponentially with the number of variables. We bypass this difficulty by using the format of TT-matrices. Second, the TT-rank of the products grows exponentially. Then, we use a numerical approximation of matrices product that guarantees a low TT-rank, the rounding. We compare the TT approach with two classical inference methods, the Mean-Field approximation and the Gibbs Sampler, on the problem of binary marginal inference for WSBM with 1 various distances structures and up to fifty variables. The results lead to recommend the TT approach for its accuracy and reasonable computing time. Further researches should be devoted to the numerical difficulties for controlling the rank in rounding, to be able to deal with larger problems.Read less <
English Keywords
Binary marginals
Weighted Stochastic Block Model
Variables separation
Tensor-Train format
low rank approximation
TT matrices
Origin
Hal imported