MSpangepop: Simulating complex structural variants under advanced demographic scenarios using the coalescent
Language
en
Autre communication scientifique (congrès sans actes - poster - séminaire...)
This item was published in
JOBIM 2025 - Journées Ouvertes en Biologie, Informatique et Mathématiques, 2025-07-08, Bordeaux. 2025p. 1-1
English Abstract
BackgroundStructural variants (SVs) — large genomic changes such as insertions, deletions and inversions —are known to significantly contribute to genetic variation. Despite their impact on phenotypic diver-sity and adaptation, ...Read more >
BackgroundStructural variants (SVs) — large genomic changes such as insertions, deletions and inversions —are known to significantly contribute to genetic variation. Despite their impact on phenotypic diver-sity and adaptation, they remain relatively understudied [1]. The recent emergence of variationgraphs provides an efficient way to integrate these large variants in population and pangenomestudies, capturing the full genomic landscape of genes within clades.Simulating genomes is essential for evaluating analytical methods and improving genetic studies byproviding controlled datasets for testing hypotheses. Although several programs exist to simulateSVs genome-wide (e.g. [2]), none are able to account for complex evolutionary scenarios such asspeciation events, resulting in unrealistic star-like locus genealogies. On the other hand, coalescent-based simulators such as MSprime [3], which can simulate a wide variety of ancestry and demo-graphic models, do not currently support the simulation of large and complex SVs.ResultsTo fill this gap and allow biologists to directly produce simulated variation graphs, we developMSpangepop, a Python-based simulation workflow inspired by VISOR [2] and managed by Snake-make [4]. Starting with a reference sequence and a demographic scenario, MSprime [3] simulatesthe coalescent of each locus (i.e. recombination block) of the genome. Each coalescent tree is thentraversed lineage by lineage in order to chronologically map SVs simulated from a large panel ofclasses, allowing for complex nested variants. A coalescent-aware variation graph is thus progres-sively constructed and written to a Graphical Fragment Assembly file (GFA). For greater flexibility,genome sequences and alignments can also be generated directly as FASTA files.ConclusionMSpangepop allows biologists to simulate variation graphs under realistic demographic models for avariety of purposes, including benchmarking and simulation-based inferences. Future developmentsinclude the simulation of transposable elements and the modulation of the mutational landscape toaccount for genomic features like ORFs.Read less <
English Keywords
Population Genetics
Variation Graph
Coalescent
Pangenome Simulation
Structural Variants
Origin
Hal imported
