Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • vcf 125
  • variant calling 36
  • structural variants 33
  • bam 32
  • gatk4 22
  • genomics 16
  • variant 16
  • bed 15
  • VCF 14
  • gvcf 12
  • filter 11
  • variants 11
  • bcftools 11
  • sv 11
  • cram 10
  • sam 10
  • annotation 10
  • somatic 10
  • gff 9
  • alignment 8
  • sort 8
  • sentieon 8
  • wgs 8
  • picard 8
  • germline 8
  • merge 7
  • statistics 7
  • imputation 7
  • haplotype 7
  • metrics 6
  • neural network 6
  • machine learning 6
  • genotype 6
  • bcf 6
  • annotate 6
  • index 5
  • quality 5
  • bedtools 5
  • benchmark 5
  • indels 5
  • wxs 5
  • gridss 5
  • fasta 4
  • conversion 4
  • long-read 4
  • low-coverage 4
  • validation 4
  • genotyping 4
  • json 4
  • low frequency variant calling 4
  • svtk 4
  • bedpe 4
  • haplotypecaller 4
  • happy 4
  • ranking 4
  • genmod 4
  • fastq 3
  • split 3
  • pacbio 3
  • QC 3
  • example 3
  • glimpse 3
  • mutect2 3
  • call 3
  • structural 3
  • family 3
  • STR 3
  • ancestry 3
  • snps 3
  • small indels 3
  • panel 3
  • benchmarking 3
  • somatic variants 3
  • variant_calling 3
  • informative sites 3
  • kinship 3
  • identity 3
  • relatedness 3
  • survivor 3
  • ligate 3
  • bacteria 2
  • cnv 2
  • ancient DNA 2
  • graph 2
  • consensus 2
  • DNA methylation 2
  • scWGBS 2
  • WGBS 2
  • filtering 2
  • plink2 2
  • phasing 2
  • mpileup 2
  • query 2
  • view 2
  • normalization 2
  • add 2
  • union 2
  • hybrid capture sequencing 2
  • copy number alteration calling 2
  • DNA sequencing 2
  • targeted sequencing 2
  • SV 2
  • structural_variants 2
  • insert 2
  • SNP 2
  • indel 2
  • fingerprint 2
  • complement 2
  • comparison 2
  • repeat expansion 2
  • observations 2
  • tabix 2
  • lofreq 2
  • variation 2
  • vcflib 2
  • standardization 2
  • tnhaplotyper2 2
  • small variants 2
  • multiallelic 2
  • cancer genomics 2
  • snpsift 2
  • snpeff 2
  • effect prediction 2
  • rtgtools 2
  • realignment 2
  • repeat_expansions 2
  • tab 2
  • allele-specific 2
  • joint genotyping 2
  • tbi 2
  • calling 2
  • structural-variant calling 2
  • intersection 2
  • normalize 2
  • norm 2
  • scatter 2
  • assembly 1
  • map 1
  • coverage 1
  • gfa 1
  • convert 1
  • copy number 1
  • phylogeny 1
  • bisulfite 1
  • variation graph 1
  • protein 1
  • cna 1
  • table 1
  • stats 1
  • tsv 1
  • serotype 1
  • markduplicates 1
  • histogram 1
  • pangenome graph 1
  • aDNA 1
  • bisulfite sequencing 1
  • aligner 1
  • biscuit 1
  • gff3 1
  • feature 1
  • population genetics 1
  • pangenome 1
  • demultiplexing 1
  • mitochondria 1
  • snp 1
  • extract 1
  • duplicates 1
  • mirna 1
  • visualization 1
  • concatenate 1
  • single cell 1
  • text 1
  • clipping 1
  • preprocessing 1
  • ngscheckmate 1
  • matching 1
  • fai 1
  • dna 1
  • rna 1
  • isomir 1
  • compress 1
  • bgzip 1
  • interval_list 1
  • resistance 1
  • pypgx 1
  • HiFi 1
  • clean 1
  • sample 1
  • html 1
  • replace 1
  • dictionary 1
  • polishing 1
  • regions 1
  • roh 1
  • remove 1
  • intervals 1
  • combine 1
  • score 1
  • popscle 1
  • pileup 1
  • genotype-based deconvoltion 1
  • shapeit 1
  • phase 1
  • trgt 1
  • vg 1
  • repeats 1
  • svdb 1
  • join 1
  • cnvnator 1
  • ped 1
  • import 1
  • bfiles 1
  • read-group 1
  • GPU-accelerated 1
  • soft-clipped clusters 1
  • varcal 1
  • Pharmacogenetics 1
  • dict 1
  • runs_of_homozygosity 1
  • bayesian 1
  • unaligned 1
  • panelofnormals 1
  • filtermutectcalls 1
  • artic 1
  • aggregate 1
  • demultiplexed reads 1
  • concat 1
  • recombination 1
  • intersect 1
  • reheader 1
  • tnfilter 1
  • jaccard 1
  • array_cgh 1
  • cytosure 1
  • decomposeblocksub 1
  • closest 1
  • sorting 1
  • structural variant 1
  • simulation 1
  • decompose 1
  • subtract 1
  • tnseq 1
  • multinterval 1
  • sompy 1
  • dbsnp 1
  • standardize 1
  • tandem repeats 1
  • csi 1
  • deep variant 1
  • mutect 1
  • idx 1
  • update header 1
  • joint-genotyping 1
  • genotypegvcf 1
  • BCF 1
  • vcf2db 1
  • gemini 1
  • maf 1
  • lua 1
  • toml 1
  • vcfbreakmulti 1
  • uniq 1
  • deduplicate 1
  • VCFtools 1
  • construct 1
  • graph projection to vcf 1
  • snv 1
  • tnscope 1
  • dnamodelapply 1
  • dnascope 1
  • genotype dosages 1
  • vcf file 1
  • bgen 1
  • bgen file 1
  • whamg 1
  • wham 1
  • long read 1
  • poolseq 1
  • variant-calling 1
  • setgt 1
  • tag2tag 1
  • functional 1
  • impute-info 1
  • tags 1
  • java 1
  • script 1
  • atlas 1
  • block substitutions 1
  • Bayesian 1
  • structural-variants 1
  • haploype 1
  • impute 1
  • Indel 1
  • SNV 1
  • paraphase 1
  • featuretable 1
  • extraction 1
  • AC/NS/AF 1
  • check 1
  • vcflib/vcffixup 1
  • asereadcounter 1
  • vqsr 1
  • variant quality score recalibration 1
  • targets 1
  • cnnscorevariants 1
  • createsomaticpanelofnormals 1
  • lofreq/call 1
  • lofreq/filter 1
  • short variant discovery 1
  • combinegvcfs 1
  • SNP table 1
  • GATK UnifiedGenotyper 1
  • gangstr 1
  • bacterial variant calling 1
  • germline variant calling 1
  • somatic variant calling 1
  • variant caller 1
  • extractvariants 1
  • variantfiltration 1
  • svcluster 1
  • svannotate 1
  • compound 1
  • extract_variants 1
  • joint-variant-calling 1
  • gvcftools 1
  • Mykrobe 1
  • Salmonella Typhi 1
  • models 1
  • hmtnote 1
  • indexfeaturefile 1
  • panelofnormalscreation 1
  • jointgenotyping 1
  • genomicsdbimport 1
  • genomicsdb 1
  • tranche filtering 1
  • filtervarianttranches 1
  • selectvariants 1
  • reblockgvcf 1
  • postprocessgermlinecnvcalls 1
  • snvs 1
  • leftalignandtrimvariants 1
  • jasminesv 1
  • jasmine 1
  • scramble 1
  • cluster analysis 1
  • clusteridentifier 1
  • applyvarcal 1
  • cutesv 1
  • VQSR 1
  • rdtest2vcf 1
  • variantcalling 1
  • countsvtypes 1
  • rdtest 1
  • vcf2bed 1
  • Mycobacterium tuberculosis 1
  • sniffles 1
  • core 1
  • snippy 1
  • dbnsfp 1
  • predictions 1
  • genetic 1
  • picard/renamesampleinvcf 1
  • recode 1
  • whole genome association 1
  • variant genetic 1
  • sortvcf 1
  • pcr 1
  • graphs 1
  • liftovervcf 1
  • hybrid-selection 1
  • mate-pair 1
  • rhocall 1
  • depth information 1
  • structural variation 1
  • duphold 1
  • rtg 1
  • pedfilter 1
  • rocplot 1
  • rtg-tools 1
  • normal database 1
  • panel of normals 1
  • metagenomics 0
  • genome 0
  • reference 0
  • database 0
  • align 0
  • qc 0
  • classification 0
  • quality control 0
  • gtf 0
  • download 0
  • nanopore 0
  • classify 0
  • k-mer 0
  • MSA 0
  • contamination 0
  • taxonomy 0
  • taxonomic profiling 0
  • proteomics 0
  • binning 0
  • count 0
  • clustering 0
  • single-cell 0
  • long reads 0
  • rnaseq 0
  • trimming 0
  • contigs 0
  • kmer 0
  • build 0
  • mags 0
  • reporting 0
  • isoseq 0
  • methylation 0
  • indexing 0
  • visualisation 0
  • databases 0
  • bisulphite 0
  • methylseq 0
  • compression 0
  • bqsr 0
  • illumina 0
  • taxonomic classification 0
  • phage 0
  • sequences 0
  • imaging 0
  • 5mC 0
  • mapping 0
  • demultiplex 0
  • depth 0
  • openms 0
  • antimicrobial resistance 0
  • base quality score recalibration 0
  • protein sequence 0
  • repeat 0
  • searching 0
  • pairs 0
  • bins 0
  • samtools 0
  • structure 0
  • matrix 0
  • plot 0
  • expression 0
  • amr 0
  • cluster 0
  • mappability 0
  • transcriptome 0
  • LAST 0
  • completeness 0
  • bwa 0
  • archaeogenomics 0
  • transcript 0
  • seqkit 0
  • cooler 0
  • damage 0
  • palaeogenomics 0
  • gzip 0
  • iCLIP 0
  • virus 0
  • sequence 0
  • gene 0
  • mmseqs2 0
  • metagenome 0
  • checkm 0
  • db 0
  • decompression 0
  • ncbi 0
  • hmmer 0
  • ucsc 0
  • complexity 0
  • spatial 0
  • newick 0
  • umi 0
  • peaks 0
  • mag 0
  • segmentation 0
  • evaluation 0
  • kraken2 0
  • msa 0
  • blast 0
  • bismark 0
  • mkref 0
  • hmmsearch 0
  • dedup 0
  • sketch 0
  • reads 0
  • cnvkit 0
  • plasmid 0
  • profile 0
  • report 0
  • differential 0
  • multiple sequence alignment 0
  • antimicrobial peptides 0
  • prokaryote 0
  • bedGraph 0
  • short-read 0
  • deduplication 0
  • kmers 0
  • prediction 0
  • scRNA-seq 0
  • single 0
  • splicing 0
  • vsearch 0
  • NCBI 0
  • antimicrobial resistance genes 0
  • tumor-only 0
  • deamination 0
  • ptr 0
  • diversity 0
  • distance 0
  • mem 0
  • cat 0
  • isolates 0
  • interval 0
  • amps 0
  • tabular 0
  • detection 0
  • fastx 0
  • csv 0
  • de novo 0
  • FASTQ 0
  • kallisto 0
  • arg 0
  • summary 0
  • ont 0
  • fragment 0
  • MAF 0
  • sourmash 0
  • counts 0
  • coptr 0
  • antibiotic resistance 0
  • de novo assembly 0
  • compare 0
  • idXML 0
  • adapters 0
  • profiling 0
  • microbiome 0
  • reference-free 0
  • 3-letter genome 0
  • merging 0
  • riboseq 0
  • ccs 0
  • malt 0
  • genome assembler 0
  • bigwig 0
  • read depth 0
  • ampir 0
  • fungi 0
  • peak-calling 0
  • CLIP 0
  • diamond 0
  • circrna 0
  • microarray 0
  • bin 0
  • ganon 0
  • ATAC-seq 0
  • microsatellite 0
  • retrotransposon 0
  • miscoding lesions 0
  • palaeogenetics 0
  • archaeogenetics 0
  • telomere 0
  • skani 0
  • hic 0
  • deep learning 0
  • paf 0
  • redundancy 0
  • cut 0
  • HMM 0
  • enrichment 0
  • chromosome 0
  • gsea 0
  • logratio 0
  • chunk 0
  • biosynthetic gene cluster 0
  • bcl2fastq 0
  • propr 0
  • hmmcopy 0
  • image 0
  • umitools 0
  • parsing 0
  • quantification 0
  • BGC 0
  • public datasets 0
  • phylogenetic placement 0
  • xeniumranger 0
  • transcriptomics 0
  • DNA sequence 0
  • mtDNA 0
  • abundance 0
  • sequencing 0
  • bedgraph 0
  • containment 0
  • fgbio 0
  • fcs-gx 0
  • arriba 0
  • deeparg 0
  • macrel 0
  • mlst 0
  • amplify 0
  • fastk 0
  • das tool 0
  • spark 0
  • C to T 0
  • DRAMP 0
  • das_tool 0
  • angsd 0
  • fam 0
  • bim 0
  • fusion 0
  • subsample 0
  • pangolin 0
  • UMI 0
  • pan-genome 0
  • rsem 0
  • pairsam 0
  • duplication 0
  • prokaryotes 0
  • bacterial 0
  • covid 0
  • lineage 0
  • PCA 0
  • mapper 0
  • genome mining 0
  • prokka 0
  • typing 0
  • RNA-seq 0
  • genomes 0
  • neubi 0
  • entrez 0
  • eukaryotes 0
  • scores 0
  • seqtk 0
  • mcmicro 0
  • aln 0
  • bwameth 0
  • npz 0
  • windowmasker 0
  • amplicon sequences 0
  • hi-c 0
  • bakta 0
  • vrhyme 0
  • nucleotide 0
  • highly_multiplexed_imaging 0
  • mkfastq 0
  • image_analysis 0
  • host 0
  • cellranger 0
  • gene expression 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • RNA 0
  • rna_structure 0
  • microbes 0
  • proteome 0
  • guide tree 0
  • long_read 0
  • transposons 0
  • transcripts 0
  • organelle 0
  • converter 0
  • genome assembly 0
  • gatk4spark 0
  • mzml 0
  • chimeras 0
  • PacBio 0
  • comparisons 0
  • quality trimming 0
  • adapter trimming 0
  • bamtools 0
  • bracken 0
  • hidden Markov model 0
  • archiving 0
  • minimap2 0
  • sylph 0
  • amplicon sequencing 0
  • notebook 0
  • reports 0
  • ataqv 0
  • checkv 0
  • virulence 0
  • cut up 0
  • krona chart 0
  • miRNA 0
  • cool 0
  • pseudoalignment 0
  • dist 0
  • dump 0
  • lossless 0
  • khmer 0
  • CRISPR 0
  • krona 0
  • prefetch 0
  • spaceranger 0
  • wastewater 0
  • wig 0
  • atac-seq 0
  • ambient RNA removal 0
  • chip-seq 0
  • population genomics 0
  • cfDNA 0
  • uLTRA 0
  • png 0
  • gstama 0
  • profiles 0
  • ichorcna 0
  • mash 0
  • tama 0
  • pigz 0
  • bustools 0
  • refine 0
  • resolve_bioscience 0
  • gene set 0
  • trancriptome 0
  • gene set analysis 0
  • spatial_transcriptomics 0
  • screen 0
  • krakentools 0
  • haplotypes 0
  • split_kmers 0
  • interactive 0
  • reformat 0
  • serogroup 0
  • minhash 0
  • GC content 0
  • maximum likelihood 0
  • megan 0
  • polyA_tail 0
  • hla 0
  • primer 0
  • hlala 0
  • k-mer frequency 0
  • hla_typing 0
  • hlala_typing 0
  • barcode 0
  • iphop 0
  • checksum 0
  • corrupted 0
  • tree 0
  • nanostring 0
  • mapcounter 0
  • nacho 0
  • haplogroups 0
  • mRNA 0
  • find 0
  • krakenuniq 0
  • instrain 0
  • pair 0
  • long terminal repeat 0
  • cgMLST 0
  • regression 0
  • taxids 0
  • SimpleAF 0
  • taxon name 0
  • zlib 0
  • differential expression 0
  • ampgram 0
  • amptransformer 0
  • orthologs 0
  • WGS 0
  • image_processing 0
  • dereplicate 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • ome-tif 0
  • de novo assembler 0
  • small genome 0
  • MCMICRO 0
  • signature 0
  • FracMinHash sketch 0
  • interactions 0
  • functional analysis 0
  • reformatting 0
  • function 0
  • pharokka 0
  • bloom filter 0
  • k-mer index 0
  • COBS 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • kma 0
  • parallelized 0
  • orthology 0
  • rrna 0
  • genetics 0
  • salmon 0
  • rgfa 0
  • nucleotides 0
  • proportionality 0
  • mitochondrion 0
  • orf 0
  • leviosam2 0
  • lift 0
  • metamaps 0
  • registration 0
  • mirdeep2 0
  • homoploymer 0
  • Duplication purging 0
  • purge duplications 0
  • library 0
  • preseq 0
  • adapter 0
  • doublets 0
  • variant pruning 0
  • anndata 0
  • subset 0
  • gene labels 0
  • hostile 0
  • duplicate 0
  • decontamination 0
  • graph layout 0
  • human removal 0
  • screening 0
  • nextclade 0
  • removal 0
  • msisensor-pro 0
  • cleaning 0
  • micro-satellite-scan 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • Read depth 0
  • contig 0
  • RNA sequencing 0
  • shigella 0
  • switch 0
  • ancient dna 0
  • Streptococcus pneumoniae 0
  • sequenzautils 0
  • transformation 0
  • rename 0
  • salmonella 0
  • smrnaseq 0
  • fusions 0
  • scaffold 0
  • fixmate 0
  • retrotransposons 0
  • collate 0
  • bam2fq 0
  • frame-shift correction 0
  • long-read sequencing 0
  • scaffolding 0
  • sequence analysis 0
  • junctions 0
  • pharmacogenetics 0
  • polish 0
  • taxonomic profile 0
  • assembly evaluation 0
  • concordance 0
  • duplex 0
  • deconvolution 0
  • merge mate pairs 0
  • reads merging 0
  • short reads 0
  • xenograft 0
  • graft 0
  • fetch 0
  • GEO 0
  • trim 0
  • metagenomic 0
  • identifier 0
  • microscopy 0
  • expansionhunterdenovo 0
  • metadata 0
  • microbial 0
  • emboss 0
  • MaltExtract 0
  • HOPS 0
  • authentication 0
  • gatk 0
  • edit distance 0
  • secondary metabolites 0
  • NRPS 0
  • RiPP 0
  • interval list 0
  • evidence 0
  • antibiotics 0
  • antismash 0
  • RNA-Seq 0
  • simulate 0
  • gwas 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • BAM 0
  • blastn 0
  • version 0
  • correction 0
  • cnv calling 0
  • immunoprofiling 0
  • cvnkit 0
  • estimation 0
  • vdj 0
  • single cells 0
  • genome bins 0
  • eCLIP 0
  • splice 0
  • parse 0
  • fasterq-dump 0
  • awk 0
  • eigenstrat 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • windows 0
  • metagenomes 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • region 0
  • heatmap 0
  • sizes 0
  • bases 0
  • spatial_omics 0
  • random forest 0
  • allele 0
  • UMIs 0
  • gem 0
  • ChIP-seq 0
  • baf 0
  • genomad 0
  • getfasta 0
  • derived alleles 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • overlap 0
  • ancestral alleles 0
  • gprofiler2 0
  • gost 0
  • genomecov 0
  • rad 0
  • bamtobed 0
  • bam2fastx 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • vector 0
  • site frequency spectrum 0
  • immunoinformatics 0
  • f coefficient 0
  • bioawk 0
  • unionBedGraphs 0
  • reverse complement 0
  • hmmfetch 0
  • pca 0
  • pruning 0
  • linkage equilibrium 0
  • slopBed 0
  • transmembrane 0
  • genome graph 0
  • chunking 0
  • homozygous genotypes 0
  • decoy 0
  • heterozygous genotypes 0
  • htseq 0
  • inbreeding 0
  • shiftBed 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • drep 0
  • homology 0
  • co-orthology 0
  • clumping fastqs 0
  • deduping 0
  • plastid 0
  • smaller fastqs 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • quarto 0
  • masking 0
  • python 0
  • r 0
  • low-complexity 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • GFF/GTF 0
  • assay 0
  • trio binning 0
  • phylogenetics 0
  • minimum_evolution 0
  • parallel 0
  • Read coverage histogram 0
  • biallelic 0
  • sequence similarity 0
  • spectral clustering 0
  • agat 0
  • longest 0
  • comparative genomics 0
  • isoform 0
  • autozygosity 0
  • homozygosity 0
  • variancepartition 0
  • intron 0
  • dream 0
  • md 0
  • transform 0
  • gaps 0
  • introns 0
  • nm 0
  • uq 0
  • install 0
  • short 0
  • file manipulation 0
  • plink2_pca 0
  • propd 0
  • verifybamid 0
  • melon 0
  • plant 0
  • SINE 0
  • network 0
  • downsample bam 0
  • DNA contamination estimation 0
  • wget 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • extractunbinned 0
  • linkbins 0
  • Assembly 0
  • sintax 0
  • vsearch/sort 0
  • subsample bam 0
  • downsample 0
  • usearch 0
  • unmarkduplicates 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • mobile genetic elements 0
  • genome annotation 0
  • trna 0
  • covariance models 0
  • quality assurnce 0
  • qa 0
  • umicollapse 0
  • scanner 0
  • scRNA-Seq 0
  • crispr 0
  • antibody capture 0
  • files 0
  • antigen capture 0
  • helitron 0
  • multiomics 0
  • remove samples 0
  • upd 0
  • uniparental 0
  • disomy 0
  • domains 0
  • long read alignment 0
  • nucleotide sequence 0
  • copyratios 0
  • comp 0
  • denoisereadcounts 0
  • readwriter 0
  • tblastn 0
  • bedcov 0
  • genome polishing 0
  • groupby 0
  • assembly polishing 0
  • postprocessing 0
  • subtyping 0
  • chloroplast 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • Escherichia coli 0
  • createreadcountpanelofnormals 0
  • workflow_mode 0
  • pangenome-scale 0
  • yahs 0
  • all versus all 0
  • mashmap 0
  • wavefront 0
  • compartments 0
  • copy-number 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • geo 0
  • workflow 0
  • mapad 0
  • adna 0
  • c to t 0
  • cumulative coverage 0
  • proteus 0
  • readproteingroups 0
  • calder2 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • cadd 0
  • snakemake 0
  • distance-based 0
  • homologs 0
  • telseq 0
  • admixture 0
  • taxonomic composition 0
  • mzML 0
  • microRNA 0
  • prepare 0
  • catpack 0
  • multiqc 0
  • mass_error 0
  • search engine 0
  • stardist 0
  • vsearch/dereplicate 0
  • Staging 0
  • vsearch/fastqfilter 0
  • fastqfilter 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • http(s) 0
  • utility 0
  • jvarkit 0
  • translate 0
  • tar 0
  • tarball 0
  • adapterremoval 0
  • CRISPRi 0
  • HLA 0
  • nanoq 0
  • Read filters 0
  • Read trimming 0
  • Read report 0
  • hhsuite 0
  • drug categorization 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • sequencing_bias 0
  • mkarv 0
  • hashing-based deconvolution 0
  • rank 0
  • 16S 0
  • post mortem damage 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • staging 0
  • targz 0
  • Computational Immunology 0
  • bias 0
  • scanpy 0
  • nuclear contamination estimate 0
  • resegment 0
  • morphology 0
  • fix 0
  • post Post-processing 0
  • malformed 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • metagenome assembler 0
  • run 0
  • model 0
  • AMPs 0
  • allele counts 0
  • antimicrobial peptide prediction 0
  • plotting 0
  • regtools 0
  • leafcutter 0
  • amp 0
  • pdb 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • affy 0
  • reference panels 0
  • relabel 0
  • cell segmentation 0
  • Bioinformatics Tools 0
  • quality_control 0
  • bclconvert 0
  • nucBed 0
  • AT content 0
  • Immune Deconvolution 0
  • nucleotide content 0
  • elfasta 0
  • elprep 0
  • doublet 0
  • patterns 0
  • controlstatistics 0
  • source tracking 0
  • emoji 0
  • regex 0
  • nuclear segmentation 0
  • paired reads re-pairing 0
  • installation 0
  • doublet_detection 0
  • barcodes 0
  • doCounts 0
  • subsetting 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • import segmentation 0
  • redundant 0
  • hmmpress 0
  • identity-by-descent 0
  • go 0
  • scimap 0
  • host removal 0
  • omics 0
  • biological activity 0
  • bamtools/split 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • mygene 0
  • yaml 0
  • associations 0
  • bedgraphtobigwig 0
  • bamtools/convert 0
  • reference compression 0
  • pile up 0
  • mouse 0
  • reference panel 0
  • bacphlip 0
  • virulent 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • spatial_neighborhoods 0
  • grea 0
  • seqfu 0
  • multi-tool 0
  • predict 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • probabilistic realignment 0
  • n50 0
  • case/control 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • shuffleBed 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • extension 0
  • temperate 0
  • read group 0
  • cram-size 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • background 0
  • single-stranded 0
  • regulatory network 0
  • ancientDNA 0
  • transcription factors 0
  • selector 0
  • size 0
  • Pacbio 0
  • quality check 0
  • realign 0
  • circular 0
  • phylogenies 0
  • hmmscan 0
  • spot 0
  • orthogroup 0
  • authentict 0
  • sage 0
  • mass spectrometry 0
  • guidetree 0
  • functional enrichment 0
  • autofluorescence 0
  • translation 0
  • paired reads merging 0
  • overlap-based merging 0
  • lifestyle 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • cycif 0
  • contiguate 0
  • junction 0
  • MMseqs2 0
  • InterProScan 0
  • busco 0
  • droplet based single cells 0
  • antimicrobial reistance 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • trimfq 0
  • bigbed 0
  • cmseq 0
  • duplicate removal 0
  • bedtointervallist 0
  • mash/sketch 0
  • calibratedragstrmodel 0
  • reduced 0
  • representations 0
  • maxbin2 0
  • getpileupsummaries 0
  • metagenome-assembled genomes 0
  • cross-samplecontamination 0
  • mass-spectroscopy 0
  • calculatecontamination 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • megahit 0
  • taxonomic assignment 0
  • denovo 0
  • debruijn 0
  • daa 0
  • rma6 0
  • Neisseria meningitidis 0
  • 3D heat map 0
  • contour map 0
  • Merqury 0
  • annotateintervals 0
  • collectreadcounts 0
  • ploidy 0
  • AMP 0
  • collapsing 0
  • determinegermlinecontigploidy 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • limma 0
  • Listeria monocytogenes 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • qualities 0
  • peptide prediction 0
  • estimate 0
  • dragstr 0
  • functional genomics 0
  • sgRNA 0
  • CRISPR-Cas9 0
  • maximum-likelihood 0
  • rra 0
  • composestrtablefile 0
  • DNA damage 0
  • NGS 0
  • damage patterns 0
  • collectsvevidence 0
  • smudgeplot 0
  • unionsum 0
  • train 0
  • graph drawing 0
  • contaminant 0
  • single molecule 0
  • cancer genome 0
  • somatic structural variations 0
  • mobile element insertions 0
  • sequencing summary 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • gender 0
  • zipperbams 0
  • graph construction 0
  • ubam 0
  • Beautiful stand-alone HTML report 0
  • squeeze 0
  • odgi 0
  • combine graphs 0
  • graph stats 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • unmapped 0
  • bioinformatics tools 0
  • metaphlan 0
  • bootstrapping 0
  • methylation bias 0
  • mbias 0
  • heattree 0
  • assembler 0
  • de Bruijn 0
  • microrna 0
  • gene-calling 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • gamma 0
  • UShER 0
  • mosdepth 0
  • mitochondrial to nuclear ratio 0
  • otu table 0
  • rust 0
  • microsatellite instability 0
  • fq 0
  • lint 0
  • random 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • generate 0
  • adapter removal 0
  • spliced 0
  • flip 0
  • txt 0
  • abricate 0
  • amrfinderplus 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • genome profile 0
  • bgc 0
  • Haemophilus influenzae 0
  • haplotype resolution 0
  • file parsing 0
  • gawk 0
  • variantrecalibrator 0
  • recalibration model 0
  • gccounter 0
  • splitintervals 0
  • readcounter 0
  • splitcram 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • Hidden Markov Model 0
  • gene model 0
  • Haplotypes 0
  • Imputation 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • tama_collapse.py 0
  • gfastats 0
  • TAMA 0
  • gstama/merge 0
  • repeat content 0
  • gstama/polyacleanup 0
  • GTDB taxonomy 0
  • genome heterozygosity 0
  • genome taxonomy database 0
  • archaea 0
  • genome size 0
  • gunc 0
  • gunzip 0
  • shiftfasta 0
  • reorder 0
  • Klebsiella 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • readcountssummary 0
  • getpileupsumaries 0
  • kallisto/index 0
  • quant 0
  • germlinevariantsites 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • digital normalization 0
  • k-mer counting 0
  • effective genome size 0
  • pneumoniae 0
  • jupytext 0
  • kegg 0
  • kofamscan 0
  • combining 0
  • gatherbqsrreports 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • papermill 0
  • Jupyter 0
  • annotations 0
  • pixel_classification 0
  • shiftchain 0
  • pos 0
  • haemophilus 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • multicut 0
  • pixel classification 0
  • probability_maps 0
  • Python 0
  • printsvevidence 0
  • printreads 0
  • interproscan 0
  • preprocessintervals 0
  • genomic islands 0
  • insertion 0
  • mutectstats 0
  • mergebamalignment 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • sequencing adapters 0
  • custom 0
  • sertotype 0
  • interleave 0
  • header 0
  • seq 0
  • na 0
  • selection 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • bam2seqz 0
  • gc_wiggle 0
  • induce 0
  • sex determination 0
  • sequence headers 0
  • genetic sex 0
  • relative coverage 0
  • Cores 0
  • Segmentation 0
  • rare variants 0
  • error 0
  • TMA dearray 0
  • de-novo 0
  • longread 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • cls 0
  • grep 0
  • boxplot 0
  • amplicon 0
  • ampliconclip 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • track 0
  • insert size 0
  • repair 0
  • paired 0
  • read pairs 0
  • readgroup 0
  • paired-end 0
  • subseq 0
  • peak-caller 0
  • cut&tag 0
  • cut&run 0
  • chromatin 0
  • seacr 0
  • pcr duplicates 0
  • assembly-binning 0
  • variant recalibration 0
  • gct 0
  • exploratory 0
  • density 0
  • sambamba 0
  • spatype 0
  • spa 0
  • streptococcus 0
  • sccmec 0
  • Sample 0
  • protein coding genes 0
  • detecting svs 0
  • short-read sequencing 0
  • polymorphic sites 0
  • svtk/baftest 0
  • baftest 0
  • antitarget 0
  • polymorphic 0
  • decompress 0
  • polymut 0
  • polya tail 0
  • fast5 0
  • chromosome_visualization 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • access 0
  • fracminhash sketch 0
  • features 0
  • cload 0
  • mcool 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • hash sketch 0
  • subcontigs 0
  • nucleotide composition 0
  • SNPs 0
  • invariant 0
  • constant 0
  • concoct 0
  • partition histograms 0
  • rRNA 0
  • ribosomal RNA 0
  • target 0
  • export 0
  • signatures 0
  • duplicate marking 0
  • flagstat 0
  • ligation junctions 0
  • deletions 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • csRNA-seq 0
  • STRIPE-seq 0
  • PRO-seq 0
  • GRO-seq 0
  • ARGs 0
  • antibiotic resistance genes 0
  • faqcs 0
  • exclude 0
  • variant identifiers 0
  • str 0
  • indep 0
  • indep pairwise 0
  • identifiers 0
  • scoring 0
  • cache 0
  • porechop_abi 0
  • pbp 0
  • pairtools 0
  • pairstools 0
  • restriction fragments 0
  • select 0
  • groupreads 0
  • duplexumi 0
  • consensus sequence 0
  • public 0
  • paragraph 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pair-end 0
  • read 0
  • pedigrees 0
  • ENA 0
  • motif 0
  • ChIP-Seq 0
  • phantom peaks 0
  • prophage 0
  • identification 0
  • illumina datasets 0
  • phylogenetic composition 0
  • SRA 0
  • ANI 0
  • pmdtools 0
  • percent on target 0
  • multimapper 0
  • read distribution 0
  • subsampling 0
  • long uncorrected reads 0
  • R 0
  • escherichia coli 0
  • bamstat 0
  • strandedness 0
  • experiment 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • PEP 0
  • sequence-based 0
  • mapping-based 0
  • segment 0
  • integrity 0
  • blastx 0
  • salsa 0
  • salsa2 0
  • LCA 0
  • Ancestor 0
  • neighbour-joining 0
  • quast 0
  • endogenous DNA 0
  • circos 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • genbank 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • embl 0
  • intervals coverage 0
  • split by chromosome 0
  • deletion 0
  • genomic intervals 0
  • schema 0
  • cutoff 0
  • eklipse 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • eigenstratdatabasetools 0
  • False duplications 0
  • Assembly curation 0
  • pep 0
  • purging 0
  • integron 0

Annotation and Ranking of Structural Variation

012301010101

tsv unannotated_tsv vcf versions

annotsv:

Annotation and Ranking of Structural Variation

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

generate VCF file from a BAM file using various calling methods

012340000

vcf versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Add or remove annotations.

0123400

vcf tbi csi versions

annotate:

Add or remove annotations.

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Concatenate VCF files

012

vcf tbi csi versions

concat:

Concatenate VCF files.

Compresses VCF files

01234

fasta versions

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

Converts certain output formats to VCF

012010

vcf_gz vcf bcf_gz bcf hap legend samples tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools Haplotype-aware consequence caller

01010101

vcf tbi csi versions

reheader:

Haplotype-aware consequence caller

Filters VCF files

012

vcf tbi csi versions

filter:

Apply fixed-threshold filters to VCF files.

Index VCF tools

01

csi tbi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Apply set operations to VCF files

012

results versions

isec:

Computes intersections, unions and complements of VCF files.

Merge VCF files

012010101

vcf index versions

merge:

Merge VCF files.

Compresses VCF files

012010

vcf tbi stats mpileup versions

mpileup:

Generates genotype likelihoods at each genomic position with coverage.

Normalize VCF file

01201

vcf tbi csi versions

norm:

Normalize VCF files.

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

01200

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Split VCF by chunks or regions, creating multiple VCFs.

01200000

scatter tbi csi versions

pluginscatter:

Split VCF by chunks or regions, creating multiple VCFs.

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

0120000

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

Split VCF by sample, creating single- or multi-sample VCFs.

0120000

vcf tbi csi versions

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

01200

vcf tbi csi versions

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

Extracts fields from VCF or BCF files and outputs them in user-defined format.

012000

output versions

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Reheader a VCF file

012301

vcf index versions

reheader:

Modify header of VCF/BCF files, change sample names.

Sorts VCF files

01

vcf tbi csi versions

sort:

Sort VCF files by coordinates.

Split a vcf file into files per chromosome

012

split_vcf versions

bcftools:

Sort VCF files by coordinates.

Generates stats from VCF files

0120101010101

stats versions

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.

010000

vcf log versions

beagle5:

Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.

Convert a BED file to a VCF file according to a YAML config

01201

vcf versions

For each feature in A, finds the closest feature (upstream or downstream) in B.

0120

output versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

computes both the depth and breadth of coverage of features in file B on the features in file A

0120

bed versions

bedtools:

A powerful toolset for genome arithmetic

Calculate Jaccard statistic b/w two feature files.

01201

tsv versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

01201

mapped versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.

012

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

012340101

vcf versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.

01

bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Clair3 is a germline small variant caller for long-reads

012340101

vcf tbi phased_vcf phased_tbi versions

convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.

01

vcf versions

cnvnator:

Tool for calling copy number variations.

view function to generate vcfs

0100

vcf tsv xls versions

cnvpytor:

calling CNVs using read depth

Annotate a VEP annotated VCF with the most severe consequence field

0101

vcf versions

custom:

Custom module to annotate a VEP annotated VCF with the most severe consequence field

Annotate a VEP annotated VCF with the most severe pLi field

01

vcf versions

custom:

Custom module to annotate a VEP annotated VCF with the most severe pLi field

structural-variant calling with cutesv

01201

vcf versions

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

vcf vcf_tbi gvcf gvcf_tbi versions

(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

Transforms the input alignments to a format suitable for the deep neural network variant caller

012301010101

examples gvcf small_model_calls versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01234010101

vcf vcf_index gvcf gvcf_index versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01

report versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tbi tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Estimate repeat sizes using NGS data

012010101

vcf json bam versions

A haplotype-based variant detector

0123450101010101

vcf versions

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Performs local realignment around indels to correct for mapping errors

012301010101

bam versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

intervals versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

vcf versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data

012340101010

csv versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

0123400000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012000

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals constraining germline and artifactual sites for use with mutect2.

01010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

012300000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345000

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

012340101010101

vcf tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

vcf tbi bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an index for a feature file, e.g. VCF or BED file.

01

index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

0123000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges several vcf files

0101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

01230101010000

vcf tbi stats f1r2 versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

0123

intervals segments denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

012300000

vcf versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

0123

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

01201010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

for annotating regions, frequencies, cadd scores

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score compounds

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

annotate models of inheritance

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score the variants of a vcf based on their annotation

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Genotype Salmonella Typhi from Mykrobe results

01

tsv versions

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

Concatenates imputation chunks in a single VCF/BCF file ligating phased information.

012

merged_variants versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.

012

merged_variants versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool for imputation and phasing from vcf file or directly from bam files.

0123456789012

phased_variants stats_coverage versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

merge gVCF files and perform joint variant calling

0101

bcf versions

Tools for population-scale genotyping using pangenome graphs.

01201010

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

Tools for population-scale genotyping using pangenome graphs.

01

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

01

vcf versions

gvcftools:

gvcftools is a package of small utilities for creating and analyzing gVCF files

Somatic VCF Feature Extraction tool from hap.y.

012340101

features versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py

0120101

preprocessed_vcf versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.

012340101010101

features metrics stats versions

sompy:

Haplotype VCF comparison tools somatic variant comparison

pacbio structural variant calling tool

01201201

vcf csv versions

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

01

vcf versions

hmtnote:

Human mitochondrial variants annotation using HmtVar.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

012012

tsv versions

htsnimtools:

useful command-line tools written to show-case hts-nim

A Python application to generate self-contained HTML reports for variant review and other genomic applications

0123012

report versions

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

01

bed versions

jvarkit:

Java utilities for Bioinformatics.

Convert VCF to a user friendly table

012301

output versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Filtering VCF with dynamically-compiled java expressions

01230101010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

annotate VCF files for poly repeats

01010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Lofreq subcommand to call low frequency variants from alignments

0120

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

It predicts variants using multiple processors

01230101

vcf tbi versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0123450101

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

vcf versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Imputation of genotypes using a reference panel

0123456

vcf versions

minimac4:

Computationally efficient genotype imputation

mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package

0101012

tsv fasta vcf versions

mirtop:

Small RNA-seq annotation

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

010101010000001

full_alignment info_txt snp_alignment snp_genome_alignment snpstatistics snptable snptable_snpeff snptable_uncertainty structure_genotypes structure_genotypes_nomissing json versions

pre-filtering and calculating position-specific summary statistics using the Markov substitution model

0123401

txt versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants

01012

vcf tbi versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.

010101

corr_matrix matched all pdf vcf versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

0123

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

012301

vcf gvcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs

0101

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

012301

vcf gvcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.

01

gvcf_index versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

0123450100

vcf stats versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Genotype structural variants using paragraph and grmpy

0123450101

vcf json versions

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

0101

graph versions

paragraph:

Graph realignment tools for structural variants

HiFi-based caller for highly homologous genes

0120101

json bam bai vcf vcf_index versions

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

vcf versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

01234010101

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect multiple metrics from a BAM file

0120101

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics from a RNAseq BAM file

01000

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

01201010

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Checks that all data in the set of input files appear to come from the same individual

01234501

crosscheck_metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.

0120000

vcf tbi versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Converts a FASTQ file to an unaligned BAM or SAM file.

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

0120

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Verify mate-pair information between mates and fix if needed

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Lifts over a VCF file from one reference build to another.

01010101

vcf_lifted vcf_unlifted versions

picard:

Move annotations from one assembly to another

Locate and tag duplicate reads in a BAM file

010101

bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Merges multiple BAM files into a single file

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

01

vcf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases

010101

intervals versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts BAM/SAM files based on a variety of picard specific criteria

010

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts vcf files

010101

vcf versions

picard:

Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format

Automatically improve draft assemblies and find variation among strains, including large event detection

010120

improved_assembly vcf change_record tracks_bed tracks_wig versions

Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data

01234000

vcf tbi log version

Recodes plink bfiles into a new text fileset applying different modifiers

0123

ped map txt raw traw beagledat chrdat chrmap geno pheno pos phase info lgen list gen gengz sample rlist strctin tped tfam vcf vcfgz versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Analyses variant calling files using plink

01

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Import variant genetic data using plink2

01

pgen psam pvar pvar_zst versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Convert from VCF file to BGEN file version 1.2 format preserving dosages.

01234

bgen_file sample_file log_file versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.

0101012

vcf max_cov bad_sites versions

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.

012

result vcf lmix singlet_result singlet_vcf versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.

012300

rds png bias_rds bias_bed low_cov_bed versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Run PureCN workflow to normalize, segment and determine purity and ploidy

01200

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample_seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Call SNVs/indels from BAM files for all target genes.

0120100

vcf tbi versions

pypgx:

A Python package for pharmacogenomics research

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

012345678910111213141501

vcf tbi rdata plots versions

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

Markup VCF file using rho-calls.

012010

vcf versions

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

sdf versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Converts a PED file to VCF headers

01

output versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.

01

png svg versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set

012345601

tp_vcf tp_tbi fn_vcf fn_tbi fp_vcf fp_tbi baseline_vcf baseline_tbi snp_roc non_snp_roc weighted_roc summary phasing versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier

0100

meis_tab dels_tab vcf versions

scramble:

Soft Clipped Read Alignment Mapper

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

012012010101

metrics summary versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

01230101010101000

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

012301010101

vcf_gz vcf_gz_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

012340101010100

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

01234560101

vcf vcf_tbi stats versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

01230101010101010100

orientation_data contamination_data contamination_segments stats vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

012010101201201201

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.

012

merged_variants versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

structural-variant calling with sniffles

012010100

vcf tbi snf versions

Core-SNP alignment from Snippy outputs

0120

aln full_aln tab vcf txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Rapid haploid variant calling

010

tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Genetic variant annotation and functional effect prediction toolbox

012

cache versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Genetic variant annotation and functional effect prediction toolbox

01001

vcf report summary_html genes_txt versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Annotate a VCF file with another VCF file

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Splits/Joins VCF(s) file into chromosomes

01

out_vcfs versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

extract versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

0123456789100120

input rdata plots vcf bgen versions

Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.

0101

vcf versions

Tandem repeat genotyper for long reads

012010101

vcf tbi versions

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

0123400

vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Converts a bedpe file to a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SVbenchmark compares a set of โ€œtestโ€ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

fns fps distances log report versions

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

The merge module merges structural variants within one or more vcf files.

0100

vcf tbi csi versions

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01000000

vcf versions

svdb:

structural variant database software

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

0101

vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert a VCF file to a BEDPE file.

01

bedpe versions

svtools:

Tools for processing and analyzing structural variants

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

vcf tbi versions

bgzip a sorted tab-delimited genome file and then create tabix index

01

gz_tbi gz_csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

create tabix index from a sorted bgzip tab-delimited genome file

01

tbi csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

A tool to detect resistance and lineages of M. tuberculosis genomes

01

bam csv json txt vcf versions

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Identify chromosomal rearrangements.

0120101

vcf ploidy versions

sv:

Search for structural variants.

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

010101010

fasta vcf ccf log versions

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

Tandem repeat genotyping from PacBio HiFi data

0123010101

vcf bam versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Merge TRGT VCFs from multiple samples

0120101

vcf versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary versions

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

consistency versions

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

vcf versions

truvari:

Structural variant comparison tool for VCFs

The Java port of the VarDict variant caller

01230101

vcf versions

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

01200

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

010101010

cgh versions

A tool to create a Gemini-compatible DB file from an annotated VCF

012

db versions

vcf2maf

0100

maf versions

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

0123000

vcf tbi versions

If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Command line tools for parsing and manipulating VCF files.

012

vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

A set of tools written in Perl and C++ for working with VCF files

0100

vcf bcf frq frq_count idepth ldepth ldepth_mean gdepth hap_ld geno_ld geno_chisq list_hap_ld list_geno_ld interchrom_hap_ld interchrom_geno_ld tstv tstv_summary tstv_count tstv_qual filter_summary sites_pi windowed_pi weir_fst heterozygosity hwe tajima_d freq_burden lroh relatedness relatedness2 lqual missing_individual missing_site snp_density kept_sites removed_sites singeltons indel_hist hapcount mendel format info genotypes_matrix genotypes_matrix_individual genotypes_matrix_position impute_hap impute_hap_legend impute_hap_indv ldhat_sites ldhat_locs beagle_gl beagle_pl ped map_ tped tfam diff_sites_in_files diff_indv_in_files diff_sites diff_indv diff_discd_matrix diff_switch_error versions

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

0100

vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

decomposes multiallelic variants into biallelic in a VCF file.

012

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

Decomposes biallelic block substitutions into its constituent SNPs.

0123

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

vcf fai versions

vt:

A tool set for short variant discovery in genetic sequence data

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

A large variant benchmarking tool analogous to hap.py for small variants.

01234

report bench_vcf bench_vcf_tbi versions

Click here to trigger an update.