Details on Vardict, a new variant caller, have been published recently in Nucleic Acids Research by authors, Zhongwu Lai and Jonathan Dry, among others from AstraZeneca . The current version of the Bina Read Alignment Variant Calling and Expression software module for secondary analysis includes Vardict, along with five other tools, for calling SNVs or indels from tumor-normal pairs. As the article demonstrates, VarDict has multiple strengths that extend our capabilities in variant calling beyond what was available before its release. Notably, the algorithm is particularly good at detecting indels. It also handles ultra-deep sequenced samples, which have become more common of late, and supports variant calling in tumor-only samples (in addition to tumor-vs-normal calling).
Topics: DREAM Challenge Results, Somatic Mutation Detection, Tumor-Normal, Webinar, Bina RAVE, SomaticSeq, AstraZeneca, Sequencing, VarScan, JointSNVMix, SomaticIndelDetector, VarDict, MuTect, SomaticSniper
Did you miss last week’s webinar on SomaticSeq? It’s now available on demand. Watch it to learn how we used this newly published ensemble and machine learning approach to score first in indel calling and second in SNV calling in the recent ICGC-TCGA DREAM Somatic Mutation Calling Challenge.
Leading the latest DREAM challenge using an ensemble and machine learning approach to somatic mutation detection
Accurate detection of somatic mutations has proven to be challenging in cancer NGS analysis, due to tumor heterogeneity and cross-contamination between tumor and matched normal samples. Oftentimes, a somatic caller that performs well for one tumor may not for another.
This is the third article in a series. The first post discussed challenges in somatic mutation detection with respect to false positives and false negatives. The second post reviewed how a concensus approach might increase the confidence of the call sets from multiple tumor-normal callers.
We developed SomaticSeq, an integrative machine learning pipeline, to address the limitations of current approaches . SomaticSeq currently incorporates five somatic mutation callers, and uses machine learning (Adaptive Boosting model) to distinguish true mutations from false positives based on over 70 genomic and sequencing features. Using SomaticSeq, we have recently placed #1 in INDEL (F1 score of 71.6%) and #2 in SNV (F1 score of 99.7%) in Stage 5 of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. SomaticSeq is released under BSD open-source license at http://bioinform.github.io/somaticseq/.
In cancer research, it is common to search for somatic mutations that appear in the tumor but not in the healthy tissues. Thus, tumor-normal comparisons have become the norm in somatic mutation detection in DNA sequencing. In an ideal world the workflow sounds simple: compare the tumor sequencing data against the normal. If a variant was found in the tumor but not found in the normal, it’s a somatic mutation.
Pleasanton and Redwood City, CA, September 15, 2015
Bina Technologies, Inc., a member of the Roche Group (SIX: RO, ROG; OTCQX: RHHBY), received top honors in the recent DREAM Mutation Calling Challenge, an open crowd-sourced competition and international effort to aggregate prediction algorithms for identifying cancer-associated mutations and rearrangements in next-generation sequencing (NGS) data. The challenge’s winning algorithms are poised to become industry standards for analyzing cancer genomes.
The DREAM challenges bring together multidisciplinary scientists from various communities to collaborate and solve fundamental problems in system biology and translational science. In particular, the International Cancer Genome Consortium (ICGC), the Cancer Genome Atlas (TCGA), Sage Bionetworks and IBM-DREAM initiated the Somatic Mutation Challenge (SMC), an open crowd-sourced sequencing challenge to further understand the linkage between genomic alterations and cancer mutations. Since past research efforts have been independent and siloed, the motivation behind this challenge is to aggregate the best prediction models and make them available to the genome analysis community, and improve the prediction algorithm for somatic mutation prediction in tumor-normal sequencing.