Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • vcf 83
  • gatk4 52
  • variant calling 40
  • structural variants 40
  • bam 32
  • variants 26
  • variant 19
  • bed 17
  • filter 13
  • VCF 12
  • sv 12
  • fasta 10
  • cram 10
  • sort 10
  • annotation 10
  • merge 8
  • sentieon 8
  • somatic 8
  • bcftools 8
  • wgs 8
  • haplotype 8
  • neural network 7
  • machine learning 7
  • bqsr 6
  • base quality score recalibration 6
  • germline 6
  • low frequency variant calling 6
  • genomics 5
  • sam 5
  • imputation 5
  • gvcf 5
  • annotate 5
  • genotyping 5
  • indels 5
  • svtk 5
  • gridss 5
  • wxs 5
  • assembly 4
  • nanopore 4
  • conversion 4
  • copy number 4
  • table 4
  • plink2 4
  • low-coverage 4
  • phasing 4
  • glimpse 4
  • feature 4
  • benchmark 4
  • mutect2 4
  • structural 4
  • haplotypecaller 4
  • bedpe 4
  • genmod 4
  • ranking 4
  • split 3
  • pacbio 3
  • convert 3
  • consensus 3
  • graph 3
  • markduplicates 3
  • bcf 3
  • validation 3
  • json 3
  • call 3
  • snps 3
  • interval_list 3
  • small indels 3
  • benchmarking 3
  • panel 3
  • ligate 3
  • shapeit 3
  • wastewater 3
  • indel 3
  • survivor 3
  • spark 3
  • variant_calling 3
  • somatic variants 3
  • bim 3
  • fam 3
  • gatk4spark 3
  • structural_variants 3
  • fastq 2
  • index 2
  • alignment 2
  • statistics 2
  • download 2
  • gfa 2
  • quality 2
  • long-read 2
  • picard 2
  • metrics 2
  • filtering 2
  • gff3 2
  • mitochondria 2
  • cnvkit 2
  • extract 2
  • view 2
  • mpileup 2
  • query 2
  • pypgx 2
  • happy 2
  • microsatellite 2
  • SV 2
  • normalization 2
  • polishing 2
  • observations 2
  • comparison 2
  • snpsift 2
  • nextclade 2
  • snpeff 2
  • effect prediction 2
  • small variants 2
  • multiallelic 2
  • regression 2
  • interactions 2
  • vg 2
  • cancer genomics 2
  • lofreq 2
  • standardization 2
  • svdb 2
  • homoploymer 2
  • MSI 2
  • variant pruning 2
  • varcal 2
  • tab 2
  • deconvolution 2
  • realignment 2
  • normalize 2
  • norm 2
  • interval list 2
  • structural-variant calling 2
  • panelofnormals 2
  • gatk 2
  • filtermutectcalls 2
  • genome 1
  • database 1
  • bacteria 1
  • coverage 1
  • long reads 1
  • isoseq 1
  • build 1
  • variation graph 1
  • bisulfite 1
  • stats 1
  • depth 1
  • tsv 1
  • WGBS 1
  • scWGBS 1
  • pangenome graph 1
  • DNA methylation 1
  • example 1
  • transcriptome 1
  • completeness 1
  • genotype 1
  • evaluation 1
  • population genetics 1
  • pangenome 1
  • detection 1
  • interval 1
  • counts 1
  • concatenate 1
  • dna 1
  • preprocessing 1
  • chunk 1
  • DNA sequencing 1
  • targeted sequencing 1
  • hybrid capture sequencing 1
  • copy number alteration calling 1
  • add 1
  • union 1
  • pileup 1
  • fusion 1
  • virulence 1
  • score 1
  • subsample 1
  • SNP 1
  • arriba 1
  • complement 1
  • amplicon sequencing 1
  • dictionary 1
  • combine 1
  • replace 1
  • intervals 1
  • variation 1
  • read-group 1
  • genetics 1
  • tnhaplotyper2 1
  • phase 1
  • import 1
  • subset 1
  • bfiles 1
  • instability 1
  • Pharmacogenetics 1
  • intersection 1
  • pharmacogenetics 1
  • haplotypes 1
  • allele-specific 1
  • bayesian 1
  • concat 1
  • tbi 1
  • intersect 1
  • calling 1
  • estimation 1
  • artic 1
  • aggregate 1
  • demultiplexed reads 1
  • baf 1
  • RNA-Seq 1
  • unmarkduplicates 1
  • detecting svs 1
  • short-read sequencing 1
  • variantcalling 1
  • wham 1
  • whamg 1
  • dnamodelapply 1
  • constant 1
  • invariant 1
  • dnascope 1
  • SNPs 1
  • samples 1
  • denoisereadcounts 1
  • tnscope 1
  • hwe 1
  • qualty 1
  • copyratios 1
  • createreadcountpanelofnormals 1
  • snv 1
  • downsample 1
  • downsample bam 1
  • subsample bam 1
  • lua 1
  • toml 1
  • pruning 1
  • linkage equilibrium 1
  • chromosomal rearrangements 1
  • svtk/baftest 1
  • vcf2bed 1
  • Indel 1
  • rdtest 1
  • SNV 1
  • rdtest2vcf 1
  • reference panel 1
  • countsvtypes 1
  • baftest 1
  • graph projection to vcf 1
  • construct 1
  • java 1
  • tags 1
  • impute-info 1
  • associations 1
  • Bayesian 1
  • structural-variants 1
  • script 1
  • probabilistic realignment 1
  • check 1
  • poolseq 1
  • array_cgh 1
  • cytosure 1
  • structural variant 1
  • block substitutions 1
  • decomposeblocksub 1
  • simulation 1
  • decompose 1
  • tnseq 1
  • sompy 1
  • variant-calling 1
  • setgt 1
  • dbsnp 1
  • standardize 1
  • elprep 1
  • deep variant 1
  • mutect 1
  • elfasta 1
  • predictions 1
  • getpileupsummaries 1
  • short variant discovery 1
  • combinegvcfs 1
  • collectsvevidence 1
  • collectreadcounts 1
  • cnnscorevariants 1
  • calibratedragstrmodel 1
  • cross-samplecontamination 1
  • dragstr 1
  • calculatecontamination 1
  • bedtointervallist 1
  • asereadcounter 1
  • vqsr 1
  • variant quality score recalibration 1
  • annotateintervals 1
  • composestrtablefile 1
  • germlinecnvcaller 1
  • germline contig ploidy 1
  • panelofnormalscreation 1
  • jointgenotyping 1
  • genomicsdbimport 1
  • genomicsdb 1
  • tranche filtering 1
  • createsequencedictionary 1
  • filtervarianttranches 1
  • filterintervals 1
  • determinegermlinecontigploidy 1
  • createsomaticpanelofnormals 1
  • targets 1
  • getpileupsumaries 1
  • cache 1
  • variant caller 1
  • UShER 1
  • bootstrapping 1
  • bacterial variant calling 1
  • germline variant calling 1
  • somatic variant calling 1
  • germlinevariantsites 1
  • readcountssummary 1
  • tama_collapse.py 1
  • joint-variant-calling 1
  • TAMA 1
  • gene model 1
  • extractvariants 1
  • extract_variants 1
  • gvcftools 1
  • preprocessintervals 1
  • shiftchain 1
  • selectvariants 1
  • revert 1
  • reblockgvcf 1
  • printsvevidence 1
  • printreads 1
  • postprocessgermlinecnvcalls 1
  • shiftintervals 1
  • snvs 1
  • mergebamalignment 1
  • leftalignandtrimvariants 1
  • readorientationartifacts 1
  • learnreadorientationmodel 1
  • shiftfasta 1
  • models 1
  • compound 1
  • splitcram 1
  • variantrecalibrator 1
  • recalibration model 1
  • variantfiltration 1
  • svcluster 1
  • svannotate 1
  • split by chromosome 1
  • BCF 1
  • csi 1
  • sorting 1
  • Staphylococcus aureus 1
  • installation 1
  • bias 1
  • ATLAS 1
  • sequencing_bias 1
  • atlas 1
  • cutesv 1
  • depth information 1
  • structural variation 1
  • duphold 1
  • segment 1
  • hifi 1
  • Assembly 1
  • cadd 1
  • dbnsfp 1
  • variant genetic 1
  • tandem duplications 1
  • insertions 1
  • deletions 1
  • exclude 1
  • identifiers 1
  • indep pairwise 1
  • indep 1
  • variant identifiers 1
  • genetic 1
  • rare variants 1
  • sniffles 1
  • rtg-tools 1
  • variant recalibration 1
  • VQSR 1
  • applyvarcal 1
  • lofreq/filter 1
  • lofreq/call 1
  • qualities 1
  • hmtnote 1
  • jasmine 1
  • jasminesv 1
  • paragraph 1
  • graphs 1
  • mobile element insertions 1
  • somatic structural variations 1
  • cancer genome 1
  • metagenomics 0
  • reference 0
  • align 0
  • gff 0
  • map 0
  • qc 0
  • quality control 0
  • classification 0
  • gtf 0
  • classify 0
  • cnv 0
  • MSA 0
  • k-mer 0
  • contamination 0
  • taxonomic profiling 0
  • taxonomy 0
  • clustering 0
  • proteomics 0
  • binning 0
  • count 0
  • single-cell 0
  • ancient DNA 0
  • trimming 0
  • rnaseq 0
  • contigs 0
  • phylogeny 0
  • bedtools 0
  • mags 0
  • reporting 0
  • kmer 0
  • cna 0
  • visualisation 0
  • QC 0
  • databases 0
  • methylation 0
  • illumina 0
  • protein 0
  • compression 0
  • indexing 0
  • bisulphite 0
  • methylseq 0
  • taxonomic classification 0
  • serotype 0
  • antimicrobial resistance 0
  • openms 0
  • imaging 0
  • 5mC 0
  • demultiplex 0
  • mapping 0
  • phage 0
  • sequences 0
  • matrix 0
  • expression 0
  • repeat 0
  • pairs 0
  • samtools 0
  • plot 0
  • searching 0
  • amr 0
  • protein sequence 0
  • cluster 0
  • aDNA 0
  • structure 0
  • histogram 0
  • bins 0
  • transcript 0
  • bisulfite sequencing 0
  • gzip 0
  • mmseqs2 0
  • archaeogenomics 0
  • biscuit 0
  • virus 0
  • palaeogenomics 0
  • aligner 0
  • seqkit 0
  • sequence 0
  • cooler 0
  • iCLIP 0
  • LAST 0
  • gene 0
  • db 0
  • checkm 0
  • metagenome 0
  • mappability 0
  • damage 0
  • bwa 0
  • complexity 0
  • decompression 0
  • mag 0
  • hmmer 0
  • peaks 0
  • dedup 0
  • kraken2 0
  • blast 0
  • segmentation 0
  • newick 0
  • umi 0
  • spatial 0
  • bismark 0
  • ucsc 0
  • msa 0
  • mkref 0
  • sketch 0
  • hmmsearch 0
  • ncbi 0
  • prediction 0
  • demultiplexing 0
  • mirna 0
  • vsearch 0
  • bedGraph 0
  • short-read 0
  • antimicrobial peptides 0
  • antimicrobial resistance genes 0
  • deduplication 0
  • splicing 0
  • kmers 0
  • csv 0
  • reads 0
  • prokaryote 0
  • scRNA-seq 0
  • multiple sequence alignment 0
  • report 0
  • duplicates 0
  • differential 0
  • NCBI 0
  • tumor-only 0
  • snp 0
  • profile 0
  • single 0
  • plasmid 0
  • text 0
  • 3-letter genome 0
  • adapters 0
  • mem 0
  • idXML 0
  • merging 0
  • diversity 0
  • de novo assembly 0
  • tabular 0
  • deamination 0
  • MAF 0
  • visualization 0
  • summary 0
  • FASTQ 0
  • kallisto 0
  • fastx 0
  • riboseq 0
  • single cell 0
  • profiling 0
  • sourmash 0
  • isolates 0
  • antibiotic resistance 0
  • microbiome 0
  • amps 0
  • cat 0
  • de novo 0
  • arg 0
  • fragment 0
  • reference-free 0
  • compare 0
  • ont 0
  • distance 0
  • clipping 0
  • coptr 0
  • ptr 0
  • circrna 0
  • ngscheckmate 0
  • matching 0
  • read depth 0
  • propr 0
  • CLIP 0
  • logratio 0
  • rna 0
  • sylph 0
  • cut 0
  • retrotransposon 0
  • fgbio 0
  • genome assembler 0
  • isomir 0
  • ccs 0
  • ganon 0
  • HMM 0
  • phylogenetic placement 0
  • hmmcopy 0
  • HiFi 0
  • enrichment 0
  • transcriptomics 0
  • peak-calling 0
  • bedgraph 0
  • STR 0
  • public datasets 0
  • hic 0
  • deep learning 0
  • compress 0
  • gsea 0
  • xeniumranger 0
  • miscoding lesions 0
  • palaeogenetics 0
  • archaeogenetics 0
  • paf 0
  • containment 0
  • bin 0
  • redundancy 0
  • bigwig 0
  • diamond 0
  • quantification 0
  • mtDNA 0
  • telomere 0
  • fai 0
  • image 0
  • family 0
  • umitools 0
  • bcl2fastq 0
  • clean 0
  • fungi 0
  • ATAC-seq 0
  • chromosome 0
  • bgzip 0
  • abundance 0
  • BGC 0
  • biosynthetic gene cluster 0
  • malt 0
  • DNA sequence 0
  • ampir 0
  • resistance 0
  • ancestry 0
  • parsing 0
  • sample 0
  • skani 0
  • microarray 0
  • sequencing 0
  • tabix 0
  • uLTRA 0
  • krona 0
  • UMI 0
  • html 0
  • host 0
  • image_analysis 0
  • mcmicro 0
  • fastk 0
  • highly_multiplexed_imaging 0
  • transposons 0
  • bakta 0
  • bamtools 0
  • checkv 0
  • minimap2 0
  • adapter trimming 0
  • bacterial 0
  • rsem 0
  • duplication 0
  • remove 0
  • archiving 0
  • zip 0
  • quality trimming 0
  • unzip 0
  • uncompress 0
  • untar 0
  • scaffolding 0
  • typing 0
  • pangolin 0
  • long_read 0
  • entrez 0
  • ataqv 0
  • khmer 0
  • informative sites 0
  • spaceranger 0
  • chimeras 0
  • popscle 0
  • genotype-based deconvoltion 0
  • DRAMP 0
  • neubi 0
  • amplify 0
  • macrel 0
  • lossless 0
  • rna_structure 0
  • PacBio 0
  • RNA 0
  • kinship 0
  • identity 0
  • transcripts 0
  • genome assembly 0
  • relatedness 0
  • dist 0
  • angsd 0
  • pseudoalignment 0
  • miRNA 0
  • seqtk 0
  • krona chart 0
  • reports 0
  • notebook 0
  • RNA-seq 0
  • eukaryotes 0
  • prokaryotes 0
  • population genomics 0
  • cfDNA 0
  • genome mining 0
  • hidden Markov model 0
  • mask 0
  • ambient RNA removal 0
  • organelle 0
  • covid 0
  • dump 0
  • mapper 0
  • mkfastq 0
  • windowmasker 0
  • cellranger 0
  • prefetch 0
  • comparisons 0
  • amplicon sequences 0
  • prokka 0
  • C to T 0
  • das tool 0
  • das_tool 0
  • mlst 0
  • vrhyme 0
  • nucleotide 0
  • CRISPR 0
  • bwameth 0
  • cut up 0
  • aln 0
  • bracken 0
  • cool 0
  • mzml 0
  • repeat expansion 0
  • hi-c 0
  • npz 0
  • guide tree 0
  • fcs-gx 0
  • insert 0
  • deeparg 0
  • proteome 0
  • gene expression 0
  • genomes 0
  • scores 0
  • lineage 0
  • regions 0
  • png 0
  • microbes 0
  • kraken 0
  • wig 0
  • pairsam 0
  • fingerprint 0
  • chip-seq 0
  • pan-genome 0
  • roh 0
  • PCA 0
  • atac-seq 0
  • converter 0
  • hla_typing 0
  • hlala_typing 0
  • ancient dna 0
  • sequenzautils 0
  • mapcounter 0
  • Streptococcus pneumoniae 0
  • ampgram 0
  • reformat 0
  • reformatting 0
  • instrain 0
  • SimpleAF 0
  • metamaps 0
  • lift 0
  • hla 0
  • genomad 0
  • ChIP-seq 0
  • leviosam2 0
  • ichorcna 0
  • hlala 0
  • de novo assembler 0
  • rrna 0
  • nucleotides 0
  • taxids 0
  • taxon name 0
  • FracMinHash sketch 0
  • rgfa 0
  • proportionality 0
  • mitochondrion 0
  • registration 0
  • ped 0
  • cnvnator 0
  • gene set analysis 0
  • zlib 0
  • gstama 0
  • differential expression 0
  • functional analysis 0
  • GPU-accelerated 0
  • concordance 0
  • gene set 0
  • switch 0
  • haplogroups 0
  • small genome 0
  • trancriptome 0
  • shigella 0
  • signature 0
  • image_processing 0
  • graph layout 0
  • tama 0
  • polish 0
  • iphop 0
  • pharokka 0
  • k-mer index 0
  • bloom filter 0
  • minhash 0
  • mash 0
  • purge duplications 0
  • library 0
  • rtgtools 0
  • preseq 0
  • bam2fq 0
  • adapter 0
  • collate 0
  • retrotransposons 0
  • long terminal repeat 0
  • dict 0
  • tree 0
  • COBS 0
  • megan 0
  • runs_of_homozygosity 0
  • scaffold 0
  • contig 0
  • assembly evaluation 0
  • vcflib 0
  • junctions 0
  • GC content 0
  • k-mer frequency 0
  • resolve_bioscience 0
  • Duplication purging 0
  • spatial_transcriptomics 0
  • xz 0
  • archive 0
  • checksum 0
  • mudskipper 0
  • duplicate 0
  • transcriptomic 0
  • Read depth 0
  • long terminal retrotransposon 0
  • fixmate 0
  • maximum likelihood 0
  • msisensor-pro 0
  • screen 0
  • bustools 0
  • salmonella 0
  • parallelized 0
  • tumor 0
  • micro-satellite-scan 0
  • orthology 0
  • krakentools 0
  • profiles 0
  • polyA_tail 0
  • rename 0
  • transformation 0
  • refine 0
  • orf 0
  • removal 0
  • salmon 0
  • pair 0
  • serogroup 0
  • kma 0
  • barcode 0
  • primer 0
  • soft-clipped clusters 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • msi 0
  • fusions 0
  • interactive 0
  • krakenuniq 0
  • taxonomic profile 0
  • function 0
  • immunoprofiling 0
  • amptransformer 0
  • expansionhunterdenovo 0
  • UMIs 0
  • duplex 0
  • fetch 0
  • GEO 0
  • metagenomic 0
  • identifier 0
  • frame-shift correction 0
  • long-read sequencing 0
  • repeat_expansions 0
  • genome bins 0
  • metadata 0
  • sequence analysis 0
  • windows 0
  • emboss 0
  • region 0
  • unaligned 0
  • reheader 0
  • eigenstrat 0
  • graft 0
  • trim 0
  • ome-tif 0
  • MCMICRO 0
  • mirdeep2 0
  • microbial 0
  • RNA sequencing 0
  • microscopy 0
  • scatter 0
  • smrnaseq 0
  • merge mate pairs 0
  • reads merging 0
  • short reads 0
  • sizes 0
  • bases 0
  • cnv calling 0
  • decontamination 0
  • human removal 0
  • screening 0
  • cleaning 0
  • trgt 0
  • gem 0
  • split_kmers 0
  • corrupted 0
  • CNV 0
  • correction 0
  • nacho 0
  • cvnkit 0
  • nanostring 0
  • mRNA 0
  • vdj 0
  • recombination 0
  • eCLIP 0
  • splice 0
  • parse 0
  • hostile 0
  • version 0
  • validate 0
  • BAM 0
  • samplesheet 0
  • format 0
  • doublets 0
  • eido 0
  • anndata 0
  • awk 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • blastn 0
  • settings 0
  • pigz 0
  • heatmap 0
  • spatial_omics 0
  • random forest 0
  • metagenomes 0
  • gene labels 0
  • fasterq-dump 0
  • find 0
  • sra-tools 0
  • xenograft 0
  • single cells 0
  • joint genotyping 0
  • allele 0
  • WGS 0
  • gwas 0
  • antibiotics 0
  • RiPP 0
  • authentication 0
  • secondary metabolites 0
  • simulate 0
  • join 0
  • evidence 0
  • dereplicate 0
  • MaltExtract 0
  • antismash 0
  • HOPS 0
  • cgMLST 0
  • NRPS 0
  • edit distance 0
  • repeats 0
  • ragtag 0
  • orthologs 0
  • scanner 0
  • geo 0
  • helitron 0
  • mapad 0
  • adna 0
  • spatype 0
  • wavefront 0
  • c to t 0
  • junction 0
  • mashmap 0
  • covariance models 0
  • proteus 0
  • remove samples 0
  • 16S 0
  • yahs 0
  • hmmscan 0
  • hhsuite 0
  • CRISPRi 0
  • copy number analysis 0
  • hmmpress 0
  • copy-number 0
  • gender determination 0
  • phylogenies 0
  • sccmec 0
  • streptococcus 0
  • copy number alterations 0
  • copy number variation 0
  • spa 0
  • signatures 0
  • readproteingroups 0
  • groupby 0
  • data-download 0
  • doublet 0
  • patterns 0
  • regex 0
  • paired reads re-pairing 0
  • bgen 0
  • fix 0
  • chloroplast 0
  • confidence 0
  • malformed 0
  • blat 0
  • alr 0
  • readwriter 0
  • metabolite annotation 0
  • fracminhash sketch 0
  • ribosomal RNA 0
  • taxonomic composition 0
  • hash sketch 0
  • eigenvectors 0
  • trna 0
  • hicPCA 0
  • sliding 0
  • mzML 0
  • snakemake 0
  • workflow 0
  • genome annotation 0
  • workflow_mode 0
  • prepare 0
  • catpack 0
  • mobile genetic elements 0
  • rRNA 0
  • integron 0
  • Computational Immunology 0
  • Bioinformatics Tools 0
  • metaspace 0
  • Immune Deconvolution 0
  • all versus all 0
  • inbreeding 0
  • melon 0
  • disomy 0
  • pca 0
  • dream 0
  • md 0
  • nm 0
  • plink2_pca 0
  • coding 0
  • upd 0
  • uq 0
  • uniparental 0
  • files 0
  • eucaryotes 0
  • vcf2db 0
  • short 0
  • gemini 0
  • maf 0
  • cds 0
  • Mycobacterium tuberculosis 0
  • bigbed 0
  • heterozygous genotypes 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • umicollapse 0
  • drep 0
  • variancepartition 0
  • scRNA-Seq 0
  • homozygous genotypes 0
  • agat 0
  • longest 0
  • bedgraphtobigwig 0
  • f coefficient 0
  • isoform 0
  • sequencing adapters 0
  • transcroder 0
  • bgen file 0
  • vsearch/sort 0
  • extractunbinned 0
  • host removal 0
  • linkbins 0
  • haploype 0
  • impute 0
  • sintax 0
  • reference compression 0
  • usearch 0
  • long read alignment 0
  • SINE 0
  • bedtobigbed 0
  • pangenome-scale 0
  • plant 0
  • decompress 0
  • shuffleBed 0
  • vcf file 0
  • uniq 0
  • genotype dosages 0
  • assembly polishing 0
  • genome polishing 0
  • bedcov 0
  • comp 0
  • fast5 0
  • masking 0
  • vcfbreakmulti 0
  • low-complexity 0
  • GFF/GTF 0
  • deduplicate 0
  • trio binning 0
  • VCFtools 0
  • verifybamid 0
  • wget 0
  • polya tail 0
  • DNA contamination estimation 0
  • tandem repeats 0
  • long read 0
  • network 0
  • intron 0
  • peak picking 0
  • partitioning 0
  • Illumina 0
  • clahe 0
  • refresh 0
  • rank 0
  • hashing-based deconvolution 0
  • association 0
  • tag2tag 0
  • GWAS 0
  • functional 0
  • uniques 0
  • xml 0
  • case/control 0
  • drug categorization 0
  • Read report 0
  • Read trimming 0
  • Read filters 0
  • spatial_neighborhoods 0
  • scimap 0
  • omics 0
  • biological activity 0
  • svg 0
  • prior knowledge 0
  • staging 0
  • search engine 0
  • mass_error 0
  • multiqc 0
  • distance-based 0
  • nucleotide sequence 0
  • homologs 0
  • multi-tool 0
  • predict 0
  • microRNA 0
  • Staging 0
  • hardy-weinberg 0
  • haplotag 0
  • machine_learning 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • seqfu 0
  • n50 0
  • cell_type_identification 0
  • standard 0
  • cell_phenotyping 0
  • nanoq 0
  • tag 0
  • minimum_evolution 0
  • cellsnp 0
  • bwamem2 0
  • guidetree 0
  • translation 0
  • paired reads merging 0
  • Pacbio 0
  • AC/NS/AF 0
  • overlap-based merging 0
  • vcflib/vcffixup 0
  • trimfq 0
  • hamming-distance 0
  • donor deconvolution 0
  • grabix 0
  • genotype-based demultiplexing 0
  • lexogen 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • droplet based single cells 0
  • busco 0
  • InterProScan 0
  • retrieval 0
  • MMseqs2 0
  • bwameme 0
  • ribosomal 0
  • cell_barcodes 0
  • realign 0
  • redundant 0
  • mygene 0
  • go 0
  • extraction 0
  • featuretable 0
  • mass spectrometry 0
  • pile up 0
  • sage 0
  • orthogroup 0
  • spot 0
  • circular 0
  • quality check 0
  • functional enrichment 0
  • size 0
  • cram-size 0
  • selector 0
  • paraphase 0
  • transcription factors 0
  • regulatory network 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • extension 0
  • grea 0
  • 10x 0
  • phylogenetics 0
  • chip 0
  • gost 0
  • tnfilter 0
  • scanpy 0
  • metagenome assembler 0
  • morphology 0
  • resegment 0
  • relabel 0
  • cell segmentation 0
  • nuclear segmentation 0
  • gprofiler2 0
  • import segmentation 0
  • ancestral alleles 0
  • solo 0
  • scvi 0
  • rad 0
  • p-value 0
  • bam2fastx 0
  • significance statistic 0
  • logFC 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • subsetting 0
  • derived alleles 0
  • site frequency spectrum 0
  • immunoinformatics 0
  • reverse complement 0
  • updatedata 0
  • run 0
  • pdb 0
  • clr 0
  • boxcox 0
  • Escherichia coli 0
  • propd 0
  • Read coverage histogram 0
  • identity-by-descent 0
  • plotting 0
  • hmmfetch 0
  • transmembrane 0
  • genome graph 0
  • mgi 0
  • recovery 0
  • decoy 0
  • htseq 0
  • leafcutter 0
  • regtools 0
  • barcodes 0
  • co-orthology 0
  • shift 0
  • jvarkit 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • ATACshift 0
  • ATACseq 0
  • translate 0
  • fastqfilter 0
  • quarto 0
  • python 0
  • r 0
  • coexpression 0
  • vsearch/fastqfilter 0
  • correlation 0
  • corpcor 0
  • vsearch/dereplicate 0
  • assay 0
  • telseq 0
  • stardist 0
  • plastid 0
  • tar 0
  • homology 0
  • doublet_detection 0
  • sequence similarity 0
  • spectral clustering 0
  • comparative genomics 0
  • idx 0
  • quality_control 0
  • emoji 0
  • source tracking 0
  • controlstatistics 0
  • parallel 0
  • transform 0
  • nucleotide content 0
  • gaps 0
  • AT content 0
  • introns 0
  • nucBed 0
  • bclconvert 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • targz 0
  • tarball 0
  • vector 0
  • metaphlan 0
  • condensedepthevidence 0
  • heattree 0
  • gatherbqsrreports 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • gangstr 0
  • antibiotic resistance genes 0
  • consensus sequence 0
  • public 0
  • ENA 0
  • SRA 0
  • ANI 0
  • ARGs 0
  • faqcs 0
  • groupreads 0
  • str 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • duplexumi 0
  • unmapped 0
  • gene-calling 0
  • gamma 0
  • rust 0
  • ubam 0
  • fq 0
  • lint 0
  • random 0
  • generate 0
  • single molecule 0
  • zipperbams 0
  • embl 0
  • Haplotypes 0
  • genomes on a tree 0
  • merge compare 0
  • GNU 0
  • Imputation 0
  • Sample 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • hbd 0
  • ibd 0
  • rgi 0
  • fARGene 0
  • amrfinderplus 0
  • abricate 0
  • gstama/polyacleanup 0
  • gunzip 0
  • gunc 0
  • archaea 0
  • genome taxonomy database 0
  • GTDB taxonomy 0
  • Mykrobe 0
  • repeat content 0
  • indexfeaturefile 0
  • mutectstats 0
  • site depth 0
  • genome heterozygosity 0
  • txt 0
  • genome size 0
  • genome profile 0
  • bgc 0
  • file parsing 0
  • gawk 0
  • splitintervals 0
  • genbank 0
  • mitochondrial 0
  • illumiation_correction 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • background_correction 0
  • element 0
  • biallelic 0
  • trimBam 0
  • bamUtil 0
  • bamtools/split 0
  • yaml 0
  • bamtools/convert 0
  • mouse 0
  • update header 0
  • homozygosity 0
  • virulent 0
  • chunking 0
  • subtract 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • maskfasta 0
  • jaccard 0
  • autozygosity 0
  • overlap 0
  • getfasta 0
  • genomecov 0
  • closest 0
  • bamtobed 0
  • bacphlip 0
  • temperate 0
  • bioawk 0
  • amp 0
  • allele counts 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • doCounts 0
  • HLA 0
  • lifestyle 0
  • read group 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • utility 0
  • post mortem damage 0
  • mkarv 0
  • http(s) 0
  • unionBedGraphs 0
  • file manipulation 0
  • deletion 0
  • Segmentation 0
  • gct 0
  • cls 0
  • na 0
  • custom 0
  • Cores 0
  • TMA dearray 0
  • paired-end 0
  • UNet 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • pcr duplicates 0
  • track 0
  • cooler/balance 0
  • escherichia coli 0
  • circos 0
  • eklipse 0
  • eigenstratdatabasetools 0
  • pep 0
  • schema 0
  • PEP 0
  • corrrelation 0
  • blastx 0
  • cumulative coverage 0
  • scatterplot 0
  • cload 0
  • subcontigs 0
  • sorted 0
  • compartments 0
  • multiomics 0
  • mkvdjref 0
  • cellpose 0
  • domains 0
  • topology 0
  • antibody capture 0
  • calder2 0
  • postprocessing 0
  • tblastn 0
  • subtyping 0
  • Salmonella enterica 0
  • antigen capture 0
  • crispr 0
  • nucleotide composition 0
  • cmseq 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • antitarget 0
  • access 0
  • protein coding genes 0
  • qa 0
  • polymorphic sites 0
  • polymorphic 0
  • polymut 0
  • chromosome_visualization 0
  • duplicate removal 0
  • chromap 0
  • quality assurnce 0
  • beagle 0
  • Haemophilus influenzae 0
  • genomic intervals 0
  • false duplications 0
  • duplicate purging 0
  • haplotype purging 0
  • cutoff 0
  • panel of normals 0
  • normal database 0
  • intervals coverage 0
  • Haplotype purging 0
  • gene finding 0
  • contact maps 0
  • bmp 0
  • jpg 0
  • pretext 0
  • contact 0
  • assembly curation 0
  • False duplications 0
  • pmdtools 0
  • bamstat 0
  • read distribution 0
  • inner_distance 0
  • fragment_size 0
  • read_pairs 0
  • experiment 0
  • strandedness 0
  • R 0
  • Assembly curation 0
  • rhocall 0
  • long uncorrected reads 0
  • subsampling 0
  • neighbour-joining 0
  • quast 0
  • purging 0
  • porechop_abi 0
  • mapping-based 0
  • liftovervcf 0
  • sortvcf 0
  • picard/renamesampleinvcf 0
  • pcr 0
  • mate-pair 0
  • GRO-cap 0
  • hybrid-selection 0
  • phylogenetic composition 0
  • illumina datasets 0
  • identification 0
  • prophage 0
  • phantom peaks 0
  • CoPRO 0
  • PRO-cap 0
  • scoring 0
  • whole genome association 0
  • recode 0
  • CAGE 0
  • GRO-seq 0
  • PRO-seq 0
  • STRIPE-seq 0
  • csRNA-seq 0
  • RAMPAGE 0
  • NETCAGE 0
  • sequence-based 0
  • integrity 0
  • motif 0
  • bam2seqz 0
  • relative coverage 0
  • genetic sex 0
  • sex determination 0
  • induce 0
  • gc_wiggle 0
  • freqsum 0
  • de-novo 0
  • pseudodiploid 0
  • pseudohaploid 0
  • random draw 0
  • selection 0
  • seq 0
  • header 0
  • error 0
  • longread 0
  • sertotype 0
  • CRAM 0
  • snippy 0
  • core 0
  • POA 0
  • SMN2 0
  • SMN1 0
  • sliding window 0
  • sha256 0
  • features 0
  • density 0
  • boxplot 0
  • exploratory 0
  • shinyngs 0
  • 256 bit 0
  • interleave 0
  • sequence headers 0
  • rtg 0
  • multimapper 0
  • calmd 0
  • ampliconclip 0
  • amplicon 0
  • duplicate marking 0
  • sambamba 0
  • flagstat 0
  • Ancestor 0
  • insert size 0
  • LCA 0
  • salsa2 0
  • salsa 0
  • rocplot 0
  • pedfilter 0
  • faidx 0
  • repair 0
  • grep 0
  • chromatin 0
  • subseq 0
  • assembly-binning 0
  • seacr 0
  • cut&run 0
  • paired 0
  • cut&tag 0
  • peak-caller 0
  • clusteridentifier 0
  • cluster analysis 0
  • scramble 0
  • readgroup 0
  • read pairs 0
  • ChIP-Seq 0
  • pedigrees 0
  • haplotype resolution 0
  • legionella 0
  • Listeria monocytogenes 0
  • limma 0
  • pneumophila 0
  • clinical 0
  • collapsing 0
  • AMP 0
  • adapter removal 0
  • train 0
  • spliced 0
  • reorder 0
  • combining 0
  • kofamscan 0
  • peptide prediction 0
  • pneumoniae 0
  • estimate 0
  • metagenome-assembled genomes 0
  • maxbin2 0
  • representations 0
  • reduced 0
  • mash/sketch 0
  • taxonomic assignment 0
  • damage patterns 0
  • functional genomics 0
  • NGS 0
  • DNA damage 0
  • rra 0
  • maximum-likelihood 0
  • CRISPR-Cas9 0
  • sgRNA 0
  • kegg 0
  • Klebsiella 0
  • mcr-1 0
  • pos 0
  • js 0
  • igv.js 0
  • igv 0
  • IDR 0
  • panel_of_normals 0
  • haemophilus 0
  • annotations 0
  • multicut 0
  • Hidden Markov Model 0
  • amino acid 0
  • HMMER 0
  • readcounter 0
  • gccounter 0
  • genome browser 0
  • pixel classification 0
  • effective genome size 0
  • Jupyter 0
  • k-mer counting 0
  • digital normalization 0
  • quant 0
  • kallisto/index 0
  • papermill 0
  • jupytext 0
  • Python 0
  • pixel_classification 0
  • insertion 0
  • genomic islands 0
  • interproscan 0
  • probability_maps 0
  • mass-spectroscopy 0
  • MD5 0
  • read 0
  • combine graphs 0
  • hla-typing 0
  • tumor/normal 0
  • graph viz 0
  • graph formats 0
  • graph unchopping 0
  • graph stats 0
  • odgi 0
  • HLA-I 0
  • squeeze 0
  • graph drawing 0
  • graph construction 0
  • gender 0
  • Neisseria gonorrhoeae 0
  • ngm 0
  • ILP 0
  • block-compressed 0
  • sequencing summary 0
  • pair-end 0
  • pbp 0
  • subreads 0
  • pbmerge 0
  • pbbam 0
  • select 0
  • PCR/optical duplicates 0
  • restriction fragments 0
  • pairstools 0
  • pairtools 0
  • ligation junctions 0
  • upper-triangular matrix 0
  • flip 0
  • NextGenMap 0
  • 128 bit 0
  • contour map 0
  • mbias 0
  • methylation bias 0
  • unionsum 0
  • ploidy 0
  • smudgeplot 0
  • Merqury 0
  • 3D heat map 0
  • de Bruijn 0
  • Neisseria meningitidis 0
  • rma6 0
  • daa 0
  • debruijn 0
  • denovo 0
  • megahit 0
  • assembler 0
  • microrna 0
  • mitochondrial to nuclear ratio 0
  • contaminant 0
  • SNP table 0
  • GATK UnifiedGenotyper 0
  • Beautiful stand-alone HTML report 0
  • bioinformatics tools 0
  • ratio 0
  • target prediction 0
  • mtnucratio 0
  • scan 0
  • microsatellite instability 0
  • otu table 0
  • mosdepth 0
  • reference genome 0
  • mitochondrial genome 0
  • patch 0

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants

01

summary results_dir versions

Annotation and Ranking of Structural Variation

012301010101

tsv unannotated_tsv vcf versions

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

annotations versions

annotsv:

Annotation and Ranking of Structural Variation

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

generate VCF file from a BAM file using various calling methods

012340000

vcf versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Gives an estimation of the sequencing bias based on known invariant sites

0123400

recal_patterns versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Concatenate VCF files

012

vcf tbi csi versions

concat:

Concatenate VCF files.

Compresses VCF files

01234

fasta versions

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

Converts certain output formats to VCF

012010

vcf_gz vcf bcf_gz bcf hap legend samples tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Filters VCF files

012

vcf tbi csi versions

filter:

Apply fixed-threshold filters to VCF files.

Index VCF tools

01

csi tbi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Apply set operations to VCF files

012

results versions

isec:

Computes intersections, unions and complements of VCF files.

Merge VCF files

012010101

vcf index versions

merge:

Merge VCF files.

Compresses VCF files

012010

vcf tbi stats mpileup versions

mpileup:

Generates genotype likelihoods at each genomic position with coverage.

Normalize VCF file

01201

vcf tbi csi versions

norm:

Normalize VCF files.

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

01200

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

0120000

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

Extracts fields from VCF or BCF files and outputs them in user-defined format.

012000

output versions

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Sorts VCF files

01

vcf tbi csi versions

sort:

Sort VCF files by coordinates.

Generates stats from VCF files

0120101010101

stats versions

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Convert a BED file to a VCF file according to a YAML config

01201

vcf versions

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

012340101

vcf versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

010

tsv versions

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

0100

report assembly contigs corrected_reads corrected_trimmed_reads metadata contig_position contig_info versions

Clair3 is a germline small variant caller for long-reads

0123450101

vcf tbi phased_vcf phased_tbi versions

Copy number variant detection from high-throughput sequencing data

012010101010

bed cnn cnr cns pdf png versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

012

tsv cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

structural-variant calling with cutesv

01201

vcf versions

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

vcf vcf_tbi gvcf gvcf_tbi versions

(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

Call variants from the examples produced by make_examples

01

call_variants_tfrecords versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Transforms the input alignments to a format suitable for the deep neural network variant caller

012301010101

examples gvcf small_model_calls versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01234010101

vcf vcf_index gvcf gvcf_index versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01

report versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Call structural variants

0123450101

bcf csi versions

delly:

Structural variant discovery by integrated paired-end and split-read analysis

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

Perform phasing of genotyped data with or without a reference panel

012345

phased_variants versions

Convert a file in FASTA format to the ELFASTA format

01

elfasta log versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tbi tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

A haplotype-based variant detector

0123450101010101

vcf versions

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012000

lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

specify the relative abundance of each known haplotype

01200

demix versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

downloads new versions of the curated SARS-CoV-2 lineage file and barcodes

0

barcodes lineages_topology lineages_meta versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

call variant and sequencing depth information of the variant

010

variants versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

Performs local realignment around indels to correct for mapping errors

012301010101

bam versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

intervals versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

vcf versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Annotates intervals with GC content, mappability, and segmental-duplication content

0101010101010101

annotated_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data

012340101010

csv versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

01230101010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

0120000

dragstr_model versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

0123400000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

hdf5 tsv versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012000

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

01

pon versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

01

dict versions

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals constraining germline and artifactual sites for use with mutect2.

01010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Denoises read counts to produce denoised copy ratios

0101

standardized denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

0123010

calls model versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts FastQ file to SAM/BAM format

01

bam versions

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

012300000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345000

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

cohortcalls cohortmodel casecalls versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

012301010100

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

vcf tbi bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits the interval list file into unique, equally-sized interval files and place it under a directory

01

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts

01

artifactprior versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

0123000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

0120101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges several vcf files

0101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

01230101010000

vcf tbi stats f1r2 versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

0123

intervals segments denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Prepares bins for coverage collection.

0101010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

bam cram sam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

012300000

vcf versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

01

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

01

fastq versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

0123

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a fasta with the bases shifted by offset

010101

shift_fa shift_fai shift_back_chain dict intervals shift_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01

bam bai versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

01201010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012000000

recal idx tranches plots versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

012300000

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

for annotating regions, frequencies, cadd scores

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score compounds

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

annotate models of inheritance

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score the variants of a vcf based on their annotation

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Concatenates imputation chunks in a single VCF/BCF file ligating phased information.

012

merged_variants versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

012345678

phased_variants versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.

012

merged_variants versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool for imputation and phasing from vcf file or directly from bam files.

0123456789012

phased_variants stats_coverage versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

merge gVCF files and perform joint variant calling

0101

bcf versions

Tools for population-scale genotyping using pangenome graphs.

01201010

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

Tools for population-scale genotyping using pangenome graphs.

01

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

Collapse redundant transcript models in Iso-Seq data.

010

bed bed_trans_reads local_density_error polya read strand_check trans_report versions varcov variants

tama_collapse.py:

Collapse similar gene model

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

01

vcf versions

gvcftools:

gvcftools is a package of small utilities for creating and analyzing gVCF files

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.

012340101010101

features metrics stats versions

sompy:

Haplotype VCF comparison tools somatic variant comparison

pacbio structural variant calling tool

01201201

vcf csv versions

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

01

vcf versions

hmtnote:

Human mitochondrial variants annotation using HmtVar.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

012012

tsv versions

htsnimtools:

useful command-line tools written to show-case hts-nim

A Python application to generate self-contained HTML reports for variant review and other genomic applications

0123012

report versions

Call variants from a BAM file using iVar

010000

tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Filtering VCF with dynamically-compiled java expressions

01230101010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Lofreq subcommand to for insert base and indel alignment qualities

010

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments

0120

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

It predicts variants using multiple processors

01230101

vcf tbi versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Inserts indel qualities in a BAM file

0101

bam versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0123450101

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0101

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

A tool to create consensus sequences and variant calls from nanopore sequencing data

012

assembly versions

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101000

hist log_stderr versions

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data

0123456

output output_dis output_germline output_somatic versions

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

Scan a reference genome to get microsatellite & homopolymer information

01

txt versions

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

pre-filtering and calculating position-specific summary statistics using the Markov substitution model

0123401

txt versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants

01012

vcf tbi versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

00

dataset versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

010

csv csv_errors csv_insertions tsv json json_auspice ndjson fasta_aligned fasta_translation nwk versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

0123

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

012301

vcf gvcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

012301

vcf gvcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

0123450100

vcf stats versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Determines the depth in a BAM/CRAM file

0120101

depth binned_depth versions

paragraph:

Graph realignment tools for structural variants

Genotype structural variants using paragraph and grmpy

0123450101

vcf json versions

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

0101

graph versions

paragraph:

Graph realignment tools for structural variants

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

vcf versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

pbsv - PacBio structural variant (SV) signature discovery tool

0101

svsig versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Automatically improve draft assemblies and find variation among strains, including large event detection

010120

improved_assembly vcf change_record tracks_bed tracks_wig versions

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

012000

bp cem del dd int_final inv li rp si td versions

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data

01234000

vcf tbi log version

Analyses binary variant call format (BCF) files using plink

01

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

epi episummary log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Exclude variant identifiers from plink bfiles

01234

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Subset plink bfiles with a text file of variant identifiers

01234

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

fepi fepisummary flog fnosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.

0123000

prunein pruneout versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.

0123000

prunein pruneout versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

LD analysis in PLINK examines genetic variant associations within populations

0123010101

ld log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Analyses variant calling files using plink

01

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Subset plink pfiles with a text file of variant identifiers

01234

extract_pgen extract_psam extract_pvar versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Filters plink bfiles or pfiles with maf filters

01230

bed bim fam pgen pvar psam versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Produce pruned set of variants in approximatelinkage equilibrium

0123000

prune_in prune_out versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Import variant genetic data using plink2

01

pgen psam pvar pvar_zst versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.

0101012

vcf max_cov bad_sites versions

Run PureCN workflow to normalize, segment and determine purity and ploidy

01200

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample_seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Call SNVs/indels from BAM files for all target genes.

0120100

vcf tbi versions

pypgx:

A Python package for pharmacogenomics research

PyPGx pharmacogenomics genotyping pipeline for NGS data.

012345010

results cnv_calls consolidated_variants versions

pypgx:

A Python package for pharmacogenomics research

The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set

012345601

tp_vcf tp_tbi fn_vcf fn_tbi fp_vcf fp_tbi baseline_vcf baseline_tbi snp_roc non_snp_roc weighted_roc summary phasing versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

012012010101

metrics summary versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

01230101010101000

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

012340101010100

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

01230101010101010100

orientation_data contamination_data contamination_segments stats vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

012010101201201201

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

01200000

recal idx tranches plots versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.

012

merged_variants versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.

0123401201201

phased_variant versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.

01234012301

phased_variant versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

structural-variant calling with sniffles

012010100

vcf tbi snf versions

Rapid haploid variant calling

010

tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Genetic variant annotation and functional effect prediction toolbox

012

cache versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Genetic variant annotation and functional effect prediction toolbox

01001

vcf report summary_html genes_txt versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Annotate a VCF file with another VCF file

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Rapidly extracts SNPs from a multi-FASTA alignment.

0

fasta constant_sites versions constant_sites_string

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

0123400

vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Converts a bedpe file to a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

sv indel germ_indel germ_sv som_indel som_sv unfiltered_sv unfiltered_indel unfiltered_germ_indel unfiltered_germ_sv unfiltered_som_indel unfiltered_som_sv raw_calls discordants log versions

SVbenchmark compares a set of โ€œtestโ€ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

fns fps distances log report versions

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

Build a structural variant database

010

db versions

svdb:

structural variant database software

The merge module merges structural variants within one or more vcf files.

0100

vcf tbi csi versions

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01000000

vcf versions

svdb:

structural variant database software

Performs tests on BAF files

01234

metrics versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

0101

vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert a VCF file to a BEDPE file.

01

bedpe versions

svtools:

Tools for processing and analyzing structural variants

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

vcf tbi versions

Computes the coverage of different regions from the bam file.

0101

cov wig versions

tiddit:

TIDDIT - structural variant calling.

Identify chromosomal rearrangements.

0120101

vcf ploidy versions

sv:

Search for structural variants.

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary versions

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

consistency versions

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

vcf versions

truvari:

Structural variant comparison tool for VCFs

The Java port of the VarDict variant caller

01230101

vcf versions

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

bam versions

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

01200

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

010101010

cgh versions

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

0123000

vcf tbi versions

Command line tools for parsing and manipulating VCF files.

012

vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

0100

vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

01

xg vg_index versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

decomposes multiallelic variants into biallelic in a VCF file.

012

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

Decomposes biallelic block substitutions into its constituent SNPs.

0123

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

vcf fai versions

vt:

A tool set for short variant discovery in genetic sequence data

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

A large variant benchmarking tool analogous to hap.py for small variants.

01234

report bench_vcf bench_vcf_tbi versions

Click here to trigger an update.