Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Standard

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. / Betschart, Raphael O; Thiéry, Alexandre; Aguilera-Garcia, Domingo; Zoche, Martin; Moch, Holger; Twerenbold, Raphael; Zeller, Tanja; Blankenberg, Stefan; Ziegler, Andreas.

in: SCI REP-UK, Jahrgang 12, Nr. 1, 21502, 13.12.2022.

Publikationen: SCORING: Beitrag in Fachzeitschrift/ZeitungSCORING: ZeitschriftenaufsatzForschungBegutachtung

Harvard

APA

Vancouver

Bibtex

@article{1fa05d04fd5d4cefa5336da5ac09576b,
title = "Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment",
abstract = "Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F1 score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F1 score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.",
keywords = "Polymorphism, Single Nucleotide, Whole Genome Sequencing, High-Throughput Nucleotide Sequencing, Computational Biology, INDEL Mutation, Software",
author = "Betschart, {Raphael O} and Alexandre Thi{\'e}ry and Domingo Aguilera-Garcia and Martin Zoche and Holger Moch and Raphael Twerenbold and Tanja Zeller and Stefan Blankenberg and Andreas Ziegler",
note = "{\textcopyright} 2022. The Author(s).",
year = "2022",
month = dec,
day = "13",
doi = "10.1038/s41598-022-26181-3",
language = "English",
volume = "12",
journal = "SCI REP-UK",
issn = "2045-2322",
publisher = "NATURE PUBLISHING GROUP",
number = "1",

}

RIS

TY - JOUR

T1 - Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

AU - Betschart, Raphael O

AU - Thiéry, Alexandre

AU - Aguilera-Garcia, Domingo

AU - Zoche, Martin

AU - Moch, Holger

AU - Twerenbold, Raphael

AU - Zeller, Tanja

AU - Blankenberg, Stefan

AU - Ziegler, Andreas

N1 - © 2022. The Author(s).

PY - 2022/12/13

Y1 - 2022/12/13

N2 - Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F1 score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F1 score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.

AB - Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F1 score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F1 score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.

KW - Polymorphism, Single Nucleotide

KW - Whole Genome Sequencing

KW - High-Throughput Nucleotide Sequencing

KW - Computational Biology

KW - INDEL Mutation

KW - Software

U2 - 10.1038/s41598-022-26181-3

DO - 10.1038/s41598-022-26181-3

M3 - SCORING: Journal article

C2 - 36513709

VL - 12

JO - SCI REP-UK

JF - SCI REP-UK

SN - 2045-2322

IS - 1

M1 - 21502

ER -