Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regar...

Verfasser: Sandmann, Sarah
de Graaf, Aniek O.
Karimi, Mohsen
van der Reijden, Bert A.
Hellström-Lindberg, Eva
Jansen, Joop H.
Dugas, Martin
FB/Einrichtung:FB 05: Medizinische Fakultät
Dokumenttypen:Artikel
Medientypen:Text
Erscheinungsdatum:2017
Publikation in MIAMI:03.05.2018
Datum der letzten Änderung:16.04.2019
Angaben zur Ausgabe:[Electronic ed.]
Quelle:Scientific Reports 7 (2017) 43169, 1-12
Fachgebiet (DDC):610: Medizin und Gesundheit
Lizenz:CC BY 4.0
Sprache:English
Förderung:Finanziert durch den Open-Access-Publikationsfonds 2017 der Westfälischen Wilhelms-Universität Münster (WWU Münster).
Format:PDF-Dokument
URN:urn:nbn:de:hbz:6-78179622564
Weitere Identifikatoren:DOI: 10.1038/srep43169
Permalink:https://nbn-resolving.de/urn:nbn:de:hbz:6-78179622564
Onlinezugriff:srep43169.pdf

Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.