GrAuFlow: A snakemake workflow for pangenome graph augmentation using short read data
MALET, Antoine
BIOlogie et GEstion des Risques en agriculture [BIOGER]
Université de Rouen Normandie [UNIROUEN]
See more >
BIOlogie et GEstion des Risques en agriculture [BIOGER]
Université de Rouen Normandie [UNIROUEN]
MALET, Antoine
BIOlogie et GEstion des Risques en agriculture [BIOGER]
Université de Rouen Normandie [UNIROUEN]
BIOlogie et GEstion des Risques en agriculture [BIOGER]
Université de Rouen Normandie [UNIROUEN]
LORRAIN, Cécile
Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] [ETH Zürich]
< Reduce
Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] [ETH Zürich]
Language
en
Autre communication scientifique (congrès sans actes - poster - séminaire...)
This item was published in
Journées Ouvertes Biologie, Informatique, Mathématiques (JOBIM2025), 2025-07-08, Bordeaux.
English Abstract
Pangenome graphs are gaining popularity in genomic analysis as they address the bias introduced by using a single reference genome in population variant analyses. However, with the constant acquisition of new sequencing ...Read more >
Pangenome graphs are gaining popularity in genomic analysis as they address the bias introduced by using a single reference genome in population variant analyses. However, with the constant acquisition of new sequencing data, it is essential to update these graphs to incorporate new genomic resources. When new fully sequenced genomes are available, reconstructing the graph is often the most convenient method. In the case of small sequences, such as those from amplicon sequencing, augmenting the graph may be more straightforward, as only a small portion of the graph will be modified. In this study, we are interested in augmenting a graph with fragmented genomes assembled from short reads. This data represents a valuable resource of genetic diversity that is not currently utilized in graphs, where use of T2T genomes are recommended.In this context, we are developing a workflow called GrAuFlow (Graph Augmentation Workflow) using the Snakemake workflow manager (Mölder et al, 2021). First, GrAuFlow performs an assembly of Illumina short read data using the SPAdes assembly toolkit ( Prjibelski et al, 2020), retaining only contigs that pass stringent quality filters. Then, contigs are fractioned in long reads sequence like and mapped onto the graph using different tools: Palss (Denti et al, 2025) , GraphAligner ( Rautiainen et al, 2020) and SVArp (Soylev et al, 2024), before graph augmentation with vg augment (Garrison etal, 2018). GrAuFlow then extracts structural variants (SV) from the different strategies to retain only well supported with a minimal length. Finally, SVs are compared to modify the graph only with those that show consistent variants across all graph augmentation tools. To test our approach, we use Zymoseptoria tritici, a fungal pathogen responsible for septoria tritici blotch of wheat.Based on graphs generated by Minigraph and Minigraph-Cactus using 8 genomes of Zymoseptoria tritici, we validate that short-reads data could be useful to add new information in pangenome graph. Nevertheless, this approach is limited to medium-size variants. Structural variants that are not easilyassembled due to repeat contents or complex events may not be detected, which makes this approach interesting for enriching specific loci of interest.Read less <
English Keywords
Snakemake pipeline
Zymoseptoria tritici
Pangenome graph
Origin
Hal imported
