Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants
0
1
summary
results_dir
versions
Annotation and Ranking of Structural Variation
0
1
2
3
0
1
0
1
0
1
0
1
tsv
unannotated_tsv
vcf
versions
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
annotations
versions
Annotation and Ranking of Structural Variation
Run the alignment/variant-call/consensus logic of the artic pipeline
0
1
0
1
2
0
1
2
results
bam
bai
bam_trimmed
bai_trimmed
bam_primertrimmed
bai_primertrimmed
fasta
vcf
tbi
json
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
generate VCF file from a BAM file using various calling methods
0
1
2
3
4
0
0
0
0
vcf
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Gives an estimation of the sequencing bias based on known invariant sites
0
1
2
3
4
0
0
recal_patterns
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Concatenate VCF files
0
1
2
vcf
tbi
csi
versions
Concatenate VCF files.
Compresses VCF files
0
1
2
3
4
fasta
versions
Create consensus sequence by applying VCF variants to a reference fasta file.
Converts certain output formats to VCF
0
1
2
0
1
0
vcf_gz
vcf
bcf_gz
bcf
hap
legend
samples
tbi
csi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Filters VCF files
0
1
2
vcf
tbi
csi
versions
Apply fixed-threshold filters to VCF files.
Index VCF tools
0
1
csi
tbi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Apply set operations to VCF files
0
1
2
results
versions
Computes intersections, unions and complements of VCF files.
Merge VCF files
0
1
2
0
1
0
1
0
1
vcf
index
versions
Merge VCF files.
Compresses VCF files
0
1
2
0
1
0
vcf
tbi
stats
mpileup
versions
Generates genotype likelihoods at each genomic position with coverage.
Normalize VCF file
0
1
2
0
1
vcf
tbi
csi
versions
Normalize VCF files.
Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.
0
1
2
0
0
vcf
tbi
csi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available
Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
0
1
2
0
0
0
0
vcf
tbi
csi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
Extracts fields from VCF or BCF files and outputs them in user-defined format.
0
1
2
0
0
0
output
versions
Extracts fields from VCF or BCF files and outputs them in user-defined format.
Sorts VCF files
0
1
vcf
tbi
csi
versions
Sort VCF files by coordinates.
Generates stats from VCF files
0
1
2
0
1
0
1
0
1
0
1
0
1
stats
versions
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Convert a BED file to a VCF file according to a YAML config
0
1
2
0
1
vcf
versions
Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants
0
1
2
3
4
0
1
0
1
vcf
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Copy number variant detection from high-throughput sequencing data
0
1
2
0
1
0
1
0
1
0
1
0
bed
cnn
cnr
cns
pdf
png
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
0
1
2
tsv
cnn
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
structural-variant calling with cutesv
0
1
2
0
1
vcf
versions
DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.
0
1
2
3
4
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Call variants from the examples produced by make_examples
0
1
call_variants_tfrecords
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Transforms the input alignments to a format suitable for the deep neural network variant caller
0
1
2
3
0
1
0
1
0
1
0
1
examples
gvcf
small_model_calls
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
4
0
1
0
1
0
1
vcf
vcf_index
gvcf
gvcf_index
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
report
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Call structural variants
0
1
2
3
4
5
0
1
0
1
bcf
csi
versions
Structural variant discovery by integrated paired-end and split-read analysis
Export assembly segment sequences in GFA 1.0 format to FASTA format
0
1
fasta
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
0
1
2
3
4
5
0
0
vcf
versions
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
0
1
2
0
1
2
vcf
tbi
versions
Perform phasing of genotyped data with or without a reference panel
0
1
2
3
4
5
phased_variants
versions
Convert a file in FASTA format to the ELFASTA format
0
1
elfasta
log
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.
0
1
2
3
4
5
6
0
1
0
1
0
1
0
0
0
0
0
bam
logs
metrics
recall
gvcf
table
activity_profile
assembly_regions
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Merge split bam/sam chunks in one file
0
1
bam
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Split bam file into manageable chunks
0
1
bam
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args
.
0
1
2
3
cache
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
0
1
0
output
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args
.
0
1
2
0
0
0
0
0
1
0
vcf
tbi
tab
json
report
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
A haplotype-based variant detector
0
1
2
3
4
5
0
1
0
1
0
1
0
1
0
1
vcf
versions
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
0
1
2
0
0
0
lineages
summarized
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
specify the relative abundance of each known haplotype
0
1
2
0
0
demix
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
downloads new versions of the curated SARS-CoV-2 lineage file and barcodes
0
barcodes
lineages_topology
lineages_meta
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
call variant and sequencing depth information of the variant
0
1
0
variants
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Performs local realignment around indels to correct for mapping errors
0
1
2
3
0
1
0
1
0
1
0
1
bam
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Generates a list of locations that should be considered for local realignment prior genotyping.
0
1
2
0
1
0
1
0
1
0
1
intervals
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
SNP and Indel variant caller on a per-locus basis
0
1
2
0
1
0
1
0
1
0
1
0
1
0
1
0
1
vcf
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
bam
bai
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Annotates intervals with GC content, mappability, and segmental-duplication content
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
annotated_intervals
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
0
0
0
bam
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
0
1
2
3
4
5
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data
0
1
2
3
4
0
1
0
1
0
1
0
csv
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
0
1
0
1
0
1
0
1
0
1
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an interval list from a bed file and a reference dict
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
0
1
2
contamination
segmentation
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
estimates the parameters for the DRAGstr model
0
1
2
0
0
0
0
dragstr_model
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a Convolutional Neural Net to filter annotated variants
0
1
2
3
4
0
0
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
0
1
2
3
0
1
0
1
0
1
hdf5
tsv
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
3
4
0
0
0
split_read_evidence
split_read_evidence_index
paired_end_evidence
paired_end_evidence_index
site_depths
site_depths_index
versions
Genome Analysis Toolkit (GATK4)
Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
0
1
2
0
0
0
combined_gvcf
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
0
0
0
str_table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.
0
1
pon
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
0
1
dict
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a panel of normals constraining germline and artifactual sites for use with mutect2.
0
1
0
1
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Denoises read counts to produce denoised copy ratios
0
1
0
1
standardized
denoised
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Determines the baseline contig ploidy for germline samples given counts data
0
1
2
3
0
1
0
calls
model
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts FastQ file to SAM/BAM format
0
1
bam
versions
Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters intervals based on annotations and/or count statistics.
0
1
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
0
1
2
3
4
5
6
7
0
1
0
1
0
1
vcf
tbi
stats
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply tranche filtering
0
1
2
3
0
0
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
0
1
2
3
4
5
0
0
0
genomicsdb
updatedb
intervallist
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.
0
1
2
3
4
cohortcalls
cohortmodel
casecalls
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
0
1
2
3
0
1
0
1
0
1
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call germline SNPs and indels via local re-assembly of haplotypes
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
vcf
tbi
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Splits the interval list file into unique, equally-sized interval files and place it under a directory
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
0
1
artifactprior
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Left align and trim variants using GATK4 LeftAlignAndTrimVariants.
0
1
2
3
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
cram
bam
crai
bai
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merge unmapped with mapped BAM files
0
1
2
0
1
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges several vcf files
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call somatic SNVs and indels via local assembly of haplotypes.
0
1
2
3
0
1
0
1
0
1
0
0
0
0
vcf
tbi
stats
f1r2
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
0
1
2
3
intervals
segments
denoised
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Prepares bins for coverage collection.
0
1
0
1
0
1
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
0
1
2
0
1
0
1
0
1
bam
cram
sam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
0
0
0
0
printed_evidence
printed_evidence_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Condenses homRef blocks in a single-sample GVCF
0
1
2
3
0
0
0
0
0
vcf
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Reverts SAM or BAM files to a previous state.
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts BAM/SAM file to FastQ format
0
1
fastq
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Select a subset of variants from a VCF file
0
1
2
3
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a fasta with the bases shifted by offset
0
1
0
1
0
1
shift_fa
shift_fai
shift_back_chain
dict
intervals
shift_intervals
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Splits CRAM files efficiently by taking advantage of their container based structure
0
1
split_crams
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Splits reads that contain Ns in their cigar string
0
1
2
3
0
1
0
1
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0
1
2
3
0
0
0
annotated_vcf
index
versions
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
0
1
2
0
0
0
0
clustered_vcf
clustered_vcf_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
bam
bai
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filter variants
0
1
2
0
1
0
1
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
0
1
2
0
0
0
0
0
0
recal
idx
tranches
plots
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Extract fields from a VCF file to a tab-delimited table
0
1
2
3
4
5
0
1
0
1
0
1
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
0
0
0
bam
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
0
0
0
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
0
output
bam_index
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
for annotating regions, frequencies, cadd scores
0
1
vcf
versions
Annotate genetic inheritance models in variant files
Score compounds
0
1
vcf
versions
Annotate genetic inheritance models in variant files
annotate models of inheritance
0
1
2
0
vcf
versions
Annotate genetic inheritance models in variant files
Score the variants of a vcf based on their annotation
0
1
2
0
vcf
versions
Annotate genetic inheritance models in variant files
Concatenates imputation chunks in a single VCF/BCF file ligating phased information.
0
1
2
merged_variants
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
0
1
2
3
4
5
6
7
8
phased_variants
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.
0
1
2
merged_variants
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool for imputation and phasing from vcf file or directly from bam files.
0
1
2
3
4
5
6
7
8
9
0
1
2
phased_variants
stats_coverage
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
merge gVCF files and perform joint variant calling
0
1
0
1
bcf
versions
Tools for population-scale genotyping using pangenome graphs.
0
1
vcf
tbi
versions
A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
2
3
0
1
0
1
0
1
bedpe
bed
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
0
1
0
1
vcf
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
2
3
0
1
0
1
0
1
bedpe
bed
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
Collapse redundant transcript models in Iso-Seq data.
0
1
0
bed
bed_trans_reads
local_density_error
polya
read
strand_check
trans_report
versions
varcov
variants
Collapse similar gene model
Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.
0
1
vcf
versions
gvcftools is a package of small utilities for creating and analyzing gVCF files
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
summary_csv
roc_all_csv
roc_indel_locations_csv
roc_indel_locations_pass_csv
roc_snp_locations_csv
roc_snp_locations_pass_csv
extended_csv
runinfo
metrics_json
vcf
tbi
versions
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
features
metrics
stats
versions
Haplotype VCF comparison tools somatic variant comparison
Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline
0
1
vcf
versions
Human mitochondrial variants annotation using HmtVar.
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
0
1
2
0
1
2
tsv
versions
useful command-line tools written to show-case hts-nim
A Python application to generate self-contained HTML reports for variant review and other genomic applications
0
1
2
3
0
1
2
report
versions
Call variants from a BAM file using iVar
0
1
0
0
0
0
tsv
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Filtering VCF with dynamically-compiled java expressions
0
1
2
3
0
1
0
1
0
1
0
1
0
1
vcf
tbi
csi
versions
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Lofreq subcommand to for insert base and indel alignment qualities
0
1
0
bam
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments
0
1
2
0
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
It predicts variants using multiple processors
0
1
2
3
0
1
0
1
vcf
tbi
versions
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Lofreq subcommand to remove variants with low coverage or strand bias potential
0
1
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Inserts indel qualities in a BAM file
0
1
0
1
bam
versions
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
2
3
4
5
0
1
0
1
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
0
1
bam
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
vcf
tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
5
6
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
somatic_sv_vcf
somatic_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
tumor_sv_vcf
tumor_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
0
1
0
1
0
0
0
hist
log_stderr
versions
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data
0
1
2
3
4
5
6
output
output_dis
output_germline
output_somatic
versions
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Scan a reference genome to get microsatellite & homopolymer information
0
1
txt
versions
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
0
dataset
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
1
0
csv
csv_errors
csv_insertions
tsv
json
json_auspice
ndjson
fasta_aligned
fasta_translation
nwk
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database
0
1
2
3
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.
0
1
2
3
0
1
vcf
gvcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.
0
1
2
3
0
1
vcf
gvcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Determines the depth in a BAM/CRAM file
0
1
2
0
1
0
1
depth
binned_depth
versions
Graph realignment tools for structural variants
Genotype structural variants using paragraph and grmpy
0
1
2
3
4
5
0
1
0
1
vcf
json
versions
Graph realignment tools for structural variants
Convert a VCF file to a JSON graph
0
1
0
1
graph
versions
Graph realignment tools for structural variants
pbsv - PacBio structural variant (SV) signature discovery tool
0
1
0
1
svsig
versions
pbsv - PacBio structural variant (SV) calling and analysis tools
Creates an interval list from a bed file and a reference dict
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Automatically improve draft assemblies and find variation among strains, including large event detection
0
1
0
1
2
0
improved_assembly
vcf
change_record
tracks_bed
tracks_wig
versions
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
0
1
2
0
0
0
bp
cem
del
dd
int_final
inv
li
rp
si
td
versions
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
epi
episummary
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Exclude variant identifiers from plink bfiles
0
1
2
3
4
bed
bim
fam
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink bfiles with a text file of variant identifiers
0
1
2
3
4
bed
bim
fam
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
fepi
fepisummary
flog
fnosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.
0
1
2
3
0
0
0
prunein
pruneout
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.
0
1
2
3
0
0
0
prunein
pruneout
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
LD analysis in PLINK examines genetic variant associations within populations
0
1
2
3
0
1
0
1
0
1
ld
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink pfiles with a text file of variant identifiers
0
1
2
3
4
extract_pgen
extract_psam
extract_pvar
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Produce pruned set of variants in approximatelinkage equilibrium
0
1
2
3
0
0
0
prune_in
prune_out
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Import variant genetic data using plink2
0
1
pgen
psam
pvar
pvar_zst
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.
0
1
0
1
0
1
2
vcf
max_cov
bad_sites
versions
Run PureCN workflow to normalize, segment and determine purity and ploidy
0
1
2
0
0
pdf
local_optima_pdf
seg
genes_csv
amplification_pvalues_csv
vcf_gz
variants_csv
loh_csv
chr_pdf
segmentation_pdf
multisample_seg
versions
Copy number calling and SNV classification using targeted short read sequencing
Call SNVs/indels from BAM files for all target genes.
0
1
2
0
1
0
0
vcf
tbi
versions
A Python package for pharmacogenomics research
PyPGx pharmacogenomics genotyping pipeline for NGS data.
0
1
2
3
4
5
0
1
0
results
cnv_calls
consolidated_variants
versions
A Python package for pharmacogenomics research
The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set
0
1
2
3
4
5
6
0
1
tp_vcf
tp_tbi
fn_vcf
fn_tbi
fp_vcf
fp_tbi
baseline_vcf
baseline_tbi
snp_roc
non_snp_roc
weighted_roc
summary
phasing
versions
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
0
1
2
3
4
5
0
1
0
1
vcf
tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the Picard CollectVariantCallingMetrics tool.
0
1
2
0
1
2
0
1
0
1
0
1
metrics
summary
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
0
1
2
0
1
0
1
0
1
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
orientation_data
contamination_data
contamination_segments
stats
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
0
1
2
0
1
0
1
0
1
2
0
1
2
0
1
2
0
1
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
0
1
2
0
0
0
0
0
recal
idx
tranches
plots
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.
0
1
2
merged_variants
versions
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.
0
1
2
3
4
0
1
2
0
1
2
0
1
phased_variant
versions
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.
0
1
2
3
4
0
1
2
3
0
1
phased_variant
versions
Fast and accurate method for estimation of haplotypes (phasing)
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.
0
1
2
3
0
1
0
1
vcf
versions
structural variant calling and genotyping with existing tools, but, smoothly
structural-variant calling with sniffles
0
1
2
0
1
0
1
0
0
vcf
tbi
snf
versions
Rapid haploid variant calling
0
1
0
tab
csv
html
vcf
bed
gff
bam
bai
log
aligned_fa
consensus_fa
consensus_subs_fa
raw_vcf
filt_vcf
vcf_gz
vcf_csi
txt
versions
Rapid bacterial SNP calling and core genome alignments
Genetic variant annotation and functional effect prediction toolbox
0
1
2
cache
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Genetic variant annotation and functional effect prediction toolbox
0
1
0
0
1
vcf
report
summary_html
genes_txt
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Annotate a VCF file with another VCF file
0
1
2
0
1
2
vcf
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
The dbNSFP is an integrated database of functional predictions from multiple algorithms
0
1
2
0
1
2
vcf
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
0
1
2
3
4
0
0
vcf
vcf_tbi
genome_vcf
genome_vcf_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
0
1
2
3
4
5
6
7
8
0
0
vcf_indels
vcf_indels_tbi
vcf_snvs
vcf_snvs_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Converts a bedpe file to a VCF file (beta version)
0
1
vcf
versions
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
0
1
2
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
0
1
0
0
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
0
1
0
1
0
1
0
0
parameters
vcf
bed
fasta
insertions
versions
Toolset for SV simulation, comparison and filtering
Report multiple stats over a VCF file
0
1
0
0
0
stats
versions
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
1
sv
indel
germ_indel
germ_sv
som_indel
som_sv
unfiltered_sv
unfiltered_indel
unfiltered_germ_indel
unfiltered_germ_sv
unfiltered_som_indel
unfiltered_som_sv
raw_calls
discordants
log
versions
SVbenchmark compares a set of โtestโ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
0
1
2
3
4
5
0
1
0
1
fns
fps
distances
log
report
versions
SVanalyzer: tools for the analysis of structural variation in genomes
Build a structural variant database
0
1
0
db
versions
structural variant database software
The merge module merges structural variants within one or more vcf files.
0
1
0
0
vcf
tbi
csi
versions
structural variant database software
Query a structural variant database, using a vcf file as query
0
1
0
0
0
0
0
0
vcf
versions
structural variant database software
Performs tests on BAF files
0
1
2
3
4
metrics
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
0
1
counts
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
0
1
2
0
vcf
tbi
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
0
1
0
1
vcf
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
0
1
2
bed
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert a VCF file to a BEDPE file.
0
1
bedpe
versions
Tools for processing and analyzing structural variants
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
0
1
2
3
0
1
0
1
json
gt_vcf
bam
versions
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
0
1
2
3
0
1
gt_vcf
json
versions
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
0
1
2
3
vcf
tbi
versions
Computes the coverage of different regions from the bam file.
0
1
0
1
cov
wig
versions
TIDDIT - structural variant calling.
Identify chromosomal rearrangements.
0
1
2
0
1
0
1
vcf
ploidy
versions
Search for structural variants.
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
0
1
2
3
4
5
0
1
0
1
fn_vcf
fn_tbi
fp_vcf
fp_tbi
tp_base_vcf
tp_base_tbi
tp_comp_vcf
tp_comp_tbi
summary
versions
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
0
1
consistency
versions
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
0
1
vcf
versions
Structural variant comparison tool for VCFs
The Java port of the VarDict variant caller
0
1
2
3
0
1
0
1
vcf
versions
Filtering, downsampling and profiling alignments in BAM/CRAM formats
0
1
bam
versions
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
0
1
2
0
0
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
0
1
0
1
0
1
alignment_properties_json
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
0
1
2
3
4
0
1
0
1
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Convert VCF with structural variations to CytoSure format
0
1
0
1
0
1
0
1
0
cgh
versions
Command line tools for parsing and manipulating VCF files.
0
1
2
vcf
versions
Command line tools for parsing and manipulating VCF files.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
0
1
2
3
0
1
0
1
graph
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
0
1
0
0
vcf
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
0
1
xg
vg_index
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
decomposes multiallelic variants into biallelic in a VCF file.
0
1
2
vcf
versions
A tool set for short variant discovery in genetic sequence data
Decomposes biallelic block substitutions into its constituent SNPs.
0
1
2
3
vcf
versions
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
0
1
2
3
0
1
0
1
vcf
fai
versions
A tool set for short variant discovery in genetic sequence data
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
0
1
2
0
0
vcf
tbi
graph
versions
A large variant benchmarking tool analogous to hap.py for small variants.
0
1
2
3
4
report
bench_vcf
bench_vcf_tbi
versions
Click here to trigger an update.