Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • gatk4 61
  • bam 14
  • vcf 12
  • sort 9
  • bed 8
  • cram 7
  • bqsr 7
  • base quality score recalibration 7
  • sam 5
  • merge 5
  • structural variants 4
  • filter 4
  • table 4
  • mutect2 4
  • gvcf 3
  • markduplicates 3
  • interval 3
  • interval_list 3
  • spark 3
  • gatk4spark 3
  • fasta 2
  • convert 2
  • copy number 2
  • picard 2
  • haplotype 2
  • panelofnormals 2
  • interval list 2
  • filtermutectcalls 2
  • gatk 2
  • fastq 1
  • index 1
  • alignment 1
  • annotation 1
  • variant calling 1
  • variants 1
  • split 1
  • somatic 1
  • conversion 1
  • reporting 1
  • metrics 1
  • annotate 1
  • genotype 1
  • feature 1
  • mitochondria 1
  • counts 1
  • mpileup 1
  • indels 1
  • haplotypecaller 1
  • chunk 1
  • add 1
  • replace 1
  • dictionary 1
  • intervals 1
  • read-group 1
  • baf 1
  • joint genotyping 1
  • normalize 1
  • norm 1
  • evidence 1
  • allele-specific 1
  • createreadcountpanelofnormals 1
  • denoisereadcounts 1
  • copyratios 1
  • elprep 1
  • elfasta 1
  • combinegvcfs 1
  • annotateintervals 1
  • variant quality score recalibration 1
  • vqsr 1
  • asereadcounter 1
  • bedtointervallist 1
  • calculatecontamination 1
  • cross-samplecontamination 1
  • getpileupsummaries 1
  • calibratedragstrmodel 1
  • cnnscorevariants 1
  • collectreadcounts 1
  • collectsvevidence 1
  • short variant discovery 1
  • filterintervals 1
  • jointgenotyping 1
  • genomicsdbimport 1
  • genomicsdb 1
  • gatherbqsrreports 1
  • tranche filtering 1
  • filtervarianttranches 1
  • estimatelibrarycomplexity 1
  • composestrtablefile 1
  • duplication metrics 1
  • determinegermlinecontigploidy 1
  • createsomaticpanelofnormals 1
  • createsequencedictionary 1
  • condensedepthevidence 1
  • dragstr 1
  • germline contig ploidy 1
  • panelofnormalscreation 1
  • germlinecnvcaller 1
  • germlinevariantsites 1
  • mutectstats 1
  • reblockgvcf 1
  • printsvevidence 1
  • printreads 1
  • preprocessintervals 1
  • postprocessgermlinecnvcalls 1
  • snvs 1
  • mergebamalignment 1
  • selectvariants 1
  • leftalignandtrimvariants 1
  • readorientationartifacts 1
  • learnreadorientationmodel 1
  • indexfeaturefile 1
  • readcountssummary 1
  • getpileupsumaries 1
  • revert 1
  • shiftchain 1
  • recalibration model 1
  • variantrecalibrator 1
  • variantfiltration 1
  • shiftfasta 1
  • svcluster 1
  • svannotate 1
  • splitintervals 1
  • splitcram 1
  • site depth 1
  • shiftintervals 1
  • split by chromosome 1
  • genomics 0
  • metagenomics 0
  • genome 0
  • reference 0
  • assembly 0
  • database 0
  • align 0
  • gff 0
  • map 0
  • bacteria 0
  • statistics 0
  • coverage 0
  • qc 0
  • classification 0
  • quality control 0
  • nanopore 0
  • download 0
  • gtf 0
  • classify 0
  • cnv 0
  • variant 0
  • k-mer 0
  • MSA 0
  • gfa 0
  • contamination 0
  • taxonomy 0
  • taxonomic profiling 0
  • pacbio 0
  • sentieon 0
  • proteomics 0
  • clustering 0
  • count 0
  • binning 0
  • quality 0
  • single-cell 0
  • ancient DNA 0
  • VCF 0
  • imputation 0
  • bedtools 0
  • long reads 0
  • rnaseq 0
  • phylogeny 0
  • trimming 0
  • contigs 0
  • graph 0
  • mags 0
  • isoseq 0
  • build 0
  • bcftools 0
  • variation graph 0
  • kmer 0
  • bisulfite 0
  • sv 0
  • QC 0
  • databases 0
  • illumina 0
  • compression 0
  • consensus 0
  • long-read 0
  • protein 0
  • indexing 0
  • wgs 0
  • bisulphite 0
  • methylseq 0
  • cna 0
  • visualisation 0
  • methylation 0
  • depth 0
  • phage 0
  • 5mC 0
  • mapping 0
  • tsv 0
  • openms 0
  • stats 0
  • imaging 0
  • demultiplex 0
  • sequences 0
  • serotype 0
  • taxonomic classification 0
  • antimicrobial resistance 0
  • matrix 0
  • filtering 0
  • cluster 0
  • samtools 0
  • amr 0
  • bins 0
  • expression 0
  • example 0
  • repeat 0
  • searching 0
  • plot 0
  • protein sequence 0
  • DNA methylation 0
  • WGBS 0
  • pangenome graph 0
  • pairs 0
  • structure 0
  • histogram 0
  • scWGBS 0
  • neural network 0
  • aDNA 0
  • bisulfite sequencing 0
  • machine learning 0
  • mappability 0
  • biscuit 0
  • virus 0
  • transcriptome 0
  • bcf 0
  • gzip 0
  • db 0
  • completeness 0
  • cooler 0
  • metagenome 0
  • checkm 0
  • bwa 0
  • validation 0
  • aligner 0
  • phasing 0
  • low-coverage 0
  • iCLIP 0
  • plink2 0
  • gene 0
  • archaeogenomics 0
  • seqkit 0
  • damage 0
  • palaeogenomics 0
  • LAST 0
  • germline 0
  • transcript 0
  • mmseqs2 0
  • hmmsearch 0
  • mkref 0
  • kraken2 0
  • ncbi 0
  • complexity 0
  • ucsc 0
  • newick 0
  • evaluation 0
  • msa 0
  • umi 0
  • genotyping 0
  • spatial 0
  • segmentation 0
  • mag 0
  • blast 0
  • dedup 0
  • decompression 0
  • bismark 0
  • glimpse 0
  • peaks 0
  • gff3 0
  • hmmer 0
  • sequence 0
  • population genetics 0
  • sketch 0
  • profile 0
  • report 0
  • prediction 0
  • short-read 0
  • deduplication 0
  • mirna 0
  • reads 0
  • bedGraph 0
  • vsearch 0
  • snp 0
  • antimicrobial resistance genes 0
  • differential 0
  • demultiplexing 0
  • json 0
  • prokaryote 0
  • kmers 0
  • low frequency variant calling 0
  • plasmid 0
  • pangenome 0
  • antimicrobial peptides 0
  • single 0
  • tumor-only 0
  • scRNA-seq 0
  • splicing 0
  • NCBI 0
  • duplicates 0
  • cnvkit 0
  • multiple sequence alignment 0
  • call 0
  • adapters 0
  • extract 0
  • antibiotic resistance 0
  • merging 0
  • sourmash 0
  • 3-letter genome 0
  • FASTQ 0
  • fastx 0
  • fragment 0
  • tabular 0
  • isolates 0
  • csv 0
  • coptr 0
  • concatenate 0
  • de novo 0
  • mem 0
  • cat 0
  • ont 0
  • svtk 0
  • text 0
  • arg 0
  • amps 0
  • microbiome 0
  • ptr 0
  • single cell 0
  • diversity 0
  • clipping 0
  • compare 0
  • kallisto 0
  • profiling 0
  • de novo assembly 0
  • summary 0
  • MAF 0
  • deamination 0
  • visualization 0
  • view 0
  • wxs 0
  • reference-free 0
  • query 0
  • detection 0
  • distance 0
  • structural 0
  • riboseq 0
  • benchmark 0
  • idXML 0
  • gridss 0
  • preprocessing 0
  • CLIP 0
  • happy 0
  • compress 0
  • transcriptomics 0
  • hic 0
  • bgzip 0
  • ccs 0
  • HiFi 0
  • dna 0
  • cut 0
  • xeniumranger 0
  • HMM 0
  • bedgraph 0
  • read depth 0
  • genmod 0
  • circrna 0
  • ranking 0
  • retrotransposon 0
  • peak-calling 0
  • hmmcopy 0
  • logratio 0
  • telomere 0
  • mtDNA 0
  • bedpe 0
  • paf 0
  • public datasets 0
  • pypgx 0
  • bin 0
  • SV 0
  • snps 0
  • deep learning 0
  • diamond 0
  • microsatellite 0
  • enrichment 0
  • ngscheckmate 0
  • family 0
  • bigwig 0
  • matching 0
  • phylogenetic placement 0
  • STR 0
  • gsea 0
  • genome assembler 0
  • umitools 0
  • propr 0
  • sequencing 0
  • ganon 0
  • DNA sequencing 0
  • containment 0
  • targeted sequencing 0
  • miscoding lesions 0
  • fgbio 0
  • palaeogenetics 0
  • archaeogenetics 0
  • hybrid capture sequencing 0
  • copy number alteration calling 0
  • redundancy 0
  • isomir 0
  • quantification 0
  • sample 0
  • bcl2fastq 0
  • rna 0
  • ATAC-seq 0
  • resistance 0
  • parsing 0
  • skani 0
  • image 0
  • clean 0
  • microarray 0
  • normalization 0
  • fai 0
  • abundance 0
  • chromosome 0
  • biosynthetic gene cluster 0
  • DNA sequence 0
  • BGC 0
  • ampir 0
  • ancestry 0
  • fungi 0
  • union 0
  • malt 0
  • untar 0
  • remove 0
  • reports 0
  • entrez 0
  • notebook 0
  • guide tree 0
  • structural_variants 0
  • amplicon sequencing 0
  • virulence 0
  • unzip 0
  • krona chart 0
  • population genomics 0
  • fastk 0
  • mapper 0
  • cfDNA 0
  • tabix 0
  • bakta 0
  • complement 0
  • highly_multiplexed_imaging 0
  • lineage 0
  • mcmicro 0
  • transposons 0
  • bacterial 0
  • seqtk 0
  • combine 0
  • comparisons 0
  • variant_calling 0
  • html 0
  • krona 0
  • prokka 0
  • pairsam 0
  • typing 0
  • archiving 0
  • pan-genome 0
  • zip 0
  • covid 0
  • pangolin 0
  • khmer 0
  • survivor 0
  • pseudoalignment 0
  • score 0
  • hidden Markov model 0
  • insert 0
  • uncompress 0
  • uLTRA 0
  • vrhyme 0
  • macrel 0
  • duplication 0
  • ligate 0
  • popscle 0
  • adapter trimming 0
  • quality trimming 0
  • transcripts 0
  • indel 0
  • rna_structure 0
  • minimap2 0
  • genotype-based deconvoltion 0
  • long_read 0
  • amplify 0
  • neubi 0
  • rsem 0
  • genome assembly 0
  • roh 0
  • benchmarking 0
  • mask 0
  • subsample 0
  • UMI 0
  • miRNA 0
  • host 0
  • converter 0
  • PacBio 0
  • npz 0
  • spaceranger 0
  • windowmasker 0
  • pileup 0
  • RNA 0
  • SNP 0
  • amplicon sequences 0
  • chimeras 0
  • angsd 0
  • DRAMP 0
  • wastewater 0
  • bim 0
  • ambient RNA removal 0
  • fam 0
  • bamtools 0
  • image_analysis 0
  • cool 0
  • fcs-gx 0
  • relatedness 0
  • dist 0
  • observations 0
  • checkv 0
  • somatic variants 0
  • identity 0
  • kinship 0
  • C to T 0
  • ataqv 0
  • atac-seq 0
  • informative sites 0
  • das tool 0
  • comparison 0
  • bwameth 0
  • das_tool 0
  • aln 0
  • hi-c 0
  • organelle 0
  • deeparg 0
  • cut up 0
  • cellranger 0
  • arriba 0
  • dump 0
  • gene expression 0
  • fusion 0
  • shapeit 0
  • polishing 0
  • nucleotide 0
  • CRISPR 0
  • sylph 0
  • mkfastq 0
  • kraken 0
  • microbes 0
  • prefetch 0
  • mzml 0
  • chip-seq 0
  • mlst 0
  • proteome 0
  • genomes 0
  • repeat expansion 0
  • wig 0
  • fingerprint 0
  • panel 0
  • prokaryotes 0
  • eukaryotes 0
  • small indels 0
  • scores 0
  • PCA 0
  • regions 0
  • bracken 0
  • png 0
  • RNA-seq 0
  • lossless 0
  • genome mining 0
  • variation 0
  • authentication 0
  • functional analysis 0
  • gstama 0
  • profiles 0
  • multiallelic 0
  • trim 0
  • tumor 0
  • vcflib 0
  • variant pruning 0
  • assembly evaluation 0
  • gene set analysis 0
  • gem 0
  • bfiles 0
  • small variants 0
  • gene set 0
  • concat 0
  • differential expression 0
  • subset 0
  • edit distance 0
  • vg 0
  • resolve_bioscience 0
  • msi 0
  • library 0
  • ChIP-seq 0
  • tnhaplotyper2 0
  • taxids 0
  • taxon name 0
  • antismash 0
  • regression 0
  • zlib 0
  • tama 0
  • antibiotics 0
  • HOPS 0
  • preseq 0
  • interactions 0
  • reformatting 0
  • phase 0
  • genomad 0
  • rrna 0
  • homoploymer 0
  • rgfa 0
  • MSI 0
  • instability 0
  • spatial_transcriptomics 0
  • adapter 0
  • concordance 0
  • microscopy 0
  • import 0
  • MaltExtract 0
  • bloom filter 0
  • haplogroups 0
  • orthology 0
  • ampgram 0
  • graph layout 0
  • proportionality 0
  • screen 0
  • long terminal retrotransposon 0
  • kma 0
  • bustools 0
  • NRPS 0
  • retrotransposons 0
  • RiPP 0
  • GPU-accelerated 0
  • mash 0
  • minhash 0
  • tree 0
  • polyA_tail 0
  • refine 0
  • long terminal repeat 0
  • krakentools 0
  • maximum likelihood 0
  • secondary metabolites 0
  • mitochondrion 0
  • serogroup 0
  • barcode 0
  • primer 0
  • k-mer index 0
  • COBS 0
  • archive 0
  • krakenuniq 0
  • pharokka 0
  • registration 0
  • function 0
  • image_processing 0
  • xz 0
  • pair 0
  • interactive 0
  • salmon 0
  • iphop 0
  • trancriptome 0
  • removal 0
  • graft 0
  • nucleotides 0
  • megan 0
  • lofreq 0
  • leviosam2 0
  • artic 0
  • lift 0
  • hlala 0
  • simulate 0
  • metamaps 0
  • k-mer frequency 0
  • GC content 0
  • RNA-Seq 0
  • genetics 0
  • parallelized 0
  • hla 0
  • hla_typing 0
  • cnvnator 0
  • demultiplexed reads 0
  • instrain 0
  • ped 0
  • SimpleAF 0
  • parse 0
  • ichorcna 0
  • orf 0
  • aggregate 0
  • hlala_typing 0
  • nextclade 0
  • reformat 0
  • checksum 0
  • mudskipper 0
  • transcriptomic 0
  • xenograft 0
  • mapcounter 0
  • tbi 0
  • amptransformer 0
  • frame-shift correction 0
  • sizes 0
  • emboss 0
  • Pharmacogenetics 0
  • bases 0
  • gwas 0
  • varcal 0
  • eigenstrat 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • fusions 0
  • long-read sequencing 0
  • sequence analysis 0
  • salmonella 0
  • de novo assembler 0
  • small genome 0
  • blastp 0
  • rename 0
  • deseq2 0
  • rna-seq 0
  • awk 0
  • region 0
  • svdb 0
  • BAM 0
  • smrnaseq 0
  • collate 0
  • dict 0
  • mirdeep2 0
  • RNA sequencing 0
  • unaligned 0
  • UMIs 0
  • fixmate 0
  • duplex 0
  • pigz 0
  • find 0
  • fetch 0
  • tab 0
  • GEO 0
  • intersection 0
  • standardization 0
  • metagenomic 0
  • identifier 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • windows 0
  • soft-clipped clusters 0
  • metadata 0
  • transformation 0
  • blastn 0
  • reads merging 0
  • CNV 0
  • snpsift 0
  • corrupted 0
  • split_kmers 0
  • snpeff 0
  • gene labels 0
  • single cells 0
  • genome bins 0
  • calling 0
  • effect prediction 0
  • cnv calling 0
  • hostile 0
  • cancer genomics 0
  • decontamination 0
  • cvnkit 0
  • human removal 0
  • estimation 0
  • screening 0
  • trgt 0
  • cleaning 0
  • recombination 0
  • eCLIP 0
  • splice 0
  • vdj 0
  • shigella 0
  • heatmap 0
  • nanostring 0
  • pharmacogenetics 0
  • signature 0
  • spatial_omics 0
  • random forest 0
  • metagenomes 0
  • FracMinHash sketch 0
  • sequenzautils 0
  • Streptococcus pneumoniae 0
  • structural-variant calling 0
  • mRNA 0
  • fasterq-dump 0
  • join 0
  • sra-tools 0
  • settings 0
  • doublets 0
  • nacho 0
  • version 0
  • ancient dna 0
  • correction 0
  • anndata 0
  • switch 0
  • immunoprofiling 0
  • short reads 0
  • micro-satellite-scan 0
  • merge mate pairs 0
  • runs_of_homozygosity 0
  • duplicate 0
  • intersect 0
  • repeats 0
  • contig 0
  • scaffold 0
  • polish 0
  • bayesian 0
  • deconvolution 0
  • haplotypes 0
  • microbial 0
  • Read depth 0
  • scaffolding 0
  • orthologs 0
  • realignment 0
  • scatter 0
  • rtgtools 0
  • reheader 0
  • junctions 0
  • dereplicate 0
  • cgMLST 0
  • WGS 0
  • taxon tables 0
  • Duplication purging 0
  • standardise 0
  • ome-tif 0
  • allele 0
  • purge duplications 0
  • bam2fq 0
  • MCMICRO 0
  • taxonomic profile 0
  • standardisation 0
  • msisensor-pro 0
  • otu tables 0
  • decoy 0
  • rRNA 0
  • Read coverage histogram 0
  • DNA contamination estimation 0
  • ribosomal RNA 0
  • propd 0
  • ucsc/liftover 0
  • gtftogenepred 0
  • verifybamid 0
  • genotype dosages 0
  • core 0
  • assembly polishing 0
  • gemini 0
  • Escherichia coli 0
  • maf 0
  • SINE 0
  • vcf2db 0
  • refflat 0
  • boxcox 0
  • signatures 0
  • clr 0
  • subsample bam 0
  • hash sketch 0
  • vcf file 0
  • snv 0
  • umicollapse 0
  • hmmfetch 0
  • uniq 0
  • decompose 0
  • deduplicate 0
  • vcfbreakmulti 0
  • genome polishing 0
  • files 0
  • dbnsfp 0
  • predictions 0
  • upd 0
  • uniparental 0
  • SNPs 0
  • blat 0
  • invariant 0
  • constant 0
  • downsample 0
  • toml 0
  • simulation 0
  • reverse complement 0
  • plant 0
  • genome graph 0
  • snippy 0
  • lua 0
  • tnseq 0
  • scRNA-Seq 0
  • melon 0
  • VCFtools 0
  • disomy 0
  • transmembrane 0
  • alr 0
  • bigbed 0
  • confidence 0
  • downsample bam 0
  • readproteingroups 0
  • pca 0
  • eigenvectors 0
  • bedcov 0
  • hicPCA 0
  • sliding 0
  • rdtest 0
  • pangenome-scale 0
  • all versus all 0
  • long read alignment 0
  • snakemake 0
  • usearch 0
  • rdtest2vcf 0
  • workflow 0
  • workflow_mode 0
  • countsvtypes 0
  • proteus 0
  • vcf2bed 0
  • linkage equilibrium 0
  • copy-number 0
  • copy number alterations 0
  • copy number variation 0
  • yahs 0
  • gender determination 0
  • polya tail 0
  • geo 0
  • copy number analysis 0
  • wham 0
  • c to t 0
  • plink2_pca 0
  • whamg 0
  • wavefront 0
  • mapad 0
  • mashmap 0
  • decompress 0
  • adna 0
  • baftest 0
  • chloroplast 0
  • groupby 0
  • readwriter 0
  • dnamodelapply 0
  • dnascope 0
  • sequencing adapters 0
  • bedgraphtobigwig 0
  • extractunbinned 0
  • fast5 0
  • linkbins 0
  • tnscope 0
  • bgen 0
  • graph projection to vcf 0
  • bedtobigbed 0
  • fracminhash sketch 0
  • construct 0
  • genepred 0
  • transcroder 0
  • bgen file 0
  • svtk/baftest 0
  • eucaryotes 0
  • vsearch/sort 0
  • comp 0
  • Mycobacterium tuberculosis 0
  • short-read sequencing 0
  • detecting svs 0
  • sintax 0
  • chromosomal rearrangements 0
  • coding 0
  • cds 0
  • wget 0
  • variantcalling 0
  • sccmec 0
  • network 0
  • streptococcus 0
  • spa 0
  • spatype 0
  • pruning 0
  • minimum_evolution 0
  • htseq 0
  • java 0
  • Read filters 0
  • Read trimming 0
  • Read report 0
  • drug categorization 0
  • uniques 0
  • junction 0
  • Illumina 0
  • functional 0
  • impute-info 0
  • phylogenies 0
  • tags 0
  • tag2tag 0
  • hashing-based deconvolution 0
  • rank 0
  • script 0
  • redundant 0
  • hmmscan 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • hmmpress 0
  • staging 0
  • hhsuite 0
  • 16S 0
  • Staging 0
  • microRNA 0
  • CRISPRi 0
  • multiqc 0
  • mass_error 0
  • nanoq 0
  • extraction 0
  • search engine 0
  • ribosomal 0
  • busco 0
  • droplet based single cells 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • trimfq 0
  • vcflib/vcffixup 0
  • AC/NS/AF 0
  • Pacbio 0
  • guidetree 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • 10x 0
  • featuretable 0
  • regulatory network 0
  • transcription factors 0
  • paraphase 0
  • selector 0
  • cram-size 0
  • size 0
  • quality check 0
  • realign 0
  • POA 0
  • circular 0
  • spot 0
  • orthogroup 0
  • sage 0
  • mass spectrometry 0
  • taxonomic composition 0
  • poolseq 0
  • MMseqs2 0
  • p-value 0
  • updatedata 0
  • source tracking 0
  • emoji 0
  • run 0
  • pdb 0
  • block substitutions 0
  • decomposeblocksub 0
  • identity-by-descent 0
  • quality_control 0
  • doublet_detection 0
  • barcodes 0
  • subsetting 0
  • logFC 0
  • significance statistic 0
  • scvi 0
  • controlstatistics 0
  • solo 0
  • import segmentation 0
  • nuclear segmentation 0
  • mgi 0
  • cell segmentation 0
  • recovery 0
  • relabel 0
  • resegment 0
  • leafcutter 0
  • regtools 0
  • plotting 0
  • morphology 0
  • metagenome assembler 0
  • scanpy 0
  • chip 0
  • variant-calling 0
  • jvarkit 0
  • stardist 0
  • telseq 0
  • mzML 0
  • prepare 0
  • catpack 0
  • Computational Immunology 0
  • Bioinformatics Tools 0
  • vsearch/dereplicate 0
  • vsearch/fastqfilter 0
  • fastqfilter 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • setgt 0
  • Immune Deconvolution 0
  • regex 0
  • partitioning 0
  • malformed 0
  • fix 0
  • paired reads re-pairing 0
  • nucleotide content 0
  • AT content 0
  • nucBed 0
  • translate 0
  • bclconvert 0
  • targz 0
  • tarball 0
  • tar 0
  • patterns 0
  • doublet 0
  • InterProScan 0
  • retrieval 0
  • f coefficient 0
  • plastid 0
  • transform 0
  • gaps 0
  • introns 0
  • agat 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • longest 0
  • isoform 0
  • variancepartition 0
  • dream 0
  • md 0
  • nm 0
  • parallel 0
  • resfinder 0
  • microbial genomics 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • uq 0
  • quarto 0
  • python 0
  • short 0
  • r 0
  • coexpression 0
  • correlation 0
  • drep 0
  • idx 0
  • assay 0
  • structural variant 0
  • homozygous genotypes 0
  • sompy 0
  • heterozygous genotypes 0
  • peak picking 0
  • site frequency spectrum 0
  • ancestral alleles 0
  • derived alleles 0
  • tnfilter 0
  • array_cgh 0
  • cytosure 0
  • vector 0
  • gprofiler2 0
  • gost 0
  • rad 0
  • bam2fastx 0
  • mutect 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • inbreeding 0
  • immunoinformatics 0
  • co-orthology 0
  • homology 0
  • sequence similarity 0
  • covariance model 0
  • spectral clustering 0
  • comparative genomics 0
  • deep variant 0
  • dereplication 0
  • corpcor 0
  • phylogenetics 0
  • transposable element 0
  • rna velocity 0
  • SNV 0
  • Indel 0
  • omics 0
  • biological activity 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • mygene 0
  • host removal 0
  • go 0
  • pile up 0
  • haploype 0
  • impute 0
  • nanopore sequencing 0
  • reference compression 0
  • structural-variants 0
  • cobra 0
  • extension 0
  • grea 0
  • reference panel 0
  • functional enrichment 0
  • translation 0
  • paired reads merging 0
  • overlap-based merging 0
  • check 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • shuffleBed 0
  • long read 0
  • distance-based 0
  • liftover 0
  • nucleotide sequence 0
  • homologs 0
  • intron 0
  • multi-tool 0
  • masking 0
  • predict 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • low-complexity 0
  • genotype likelihood 0
  • collapse 0
  • GFF/GTF 0
  • probabilistic realignment 0
  • Bayesian 0
  • refresh 0
  • scimap 0
  • spatial_neighborhoods 0
  • associations 0
  • case/control 0
  • GWAS 0
  • association 0
  • tandem repeats 0
  • seqfu 0
  • trio binning 0
  • clahe 0
  • machine_learning 0
  • cell_phenotyping 0
  • cell_type_identification 0
  • n50 0
  • sniffles 0
  • 3D heat map 0
  • SMN2 0
  • heattree 0
  • targets 0
  • gene-calling 0
  • gangstr 0
  • gamma 0
  • str 0
  • ENA 0
  • SRA 0
  • ANI 0
  • ARGs 0
  • antibiotic resistance genes 0
  • faqcs 0
  • cache 0
  • consensus sequence 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • genbank 0
  • embl 0
  • public 0
  • duplexumi 0
  • UShER 0
  • fq 0
  • bootstrapping 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • lint 0
  • groupreads 0
  • random 0
  • generate 0
  • single molecule 0
  • zipperbams 0
  • ubam 0
  • unmapped 0
  • deletion 0
  • gget 0
  • GNU 0
  • joint-variant-calling 0
  • Imputation 0
  • Haplotypes 0
  • Sample 0
  • low coverage 0
  • genome statistics 0
  • genomes on a tree 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • Mykrobe 0
  • Salmonella Typhi 0
  • repeat content 0
  • merge compare 0
  • tama_collapse.py 0
  • genome size 0
  • gunzip 0
  • fARGene 0
  • amrfinderplus 0
  • abricate 0
  • extractvariants 0
  • extract_variants 0
  • gvcftools 0
  • gunc 0
  • gene model 0
  • archaea 0
  • genome taxonomy database 0
  • GTDB taxonomy 0
  • gstama/polyacleanup 0
  • gstama/merge 0
  • TAMA 0
  • genome heterozygosity 0
  • models 0
  • compound 0
  • genome profile 0
  • bgc 0
  • file parsing 0
  • txt 0
  • gawk 0
  • circos 0
  • ibd 0
  • update header 0
  • mouse 0
  • bamtools/convert 0
  • yaml 0
  • bamtools/split 0
  • bamUtil 0
  • trimBam 0
  • element 0
  • illumiation_correction 0
  • background_correction 0
  • clumping fastqs 0
  • smaller fastqs 0
  • deduping 0
  • csi 0
  • BCF 0
  • biallelic 0
  • virulent 0
  • jaccard 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • maskfasta 0
  • chunking 0
  • overlap 0
  • homozygosity 0
  • getfasta 0
  • genomecov 0
  • closest 0
  • bamtobed 0
  • sorting 0
  • autozygosity 0
  • bacphlip 0
  • temperate 0
  • unionBedGraphs 0
  • amp 0
  • allele counts 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • Staphylococcus aureus 0
  • installation 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • doCounts 0
  • HLA 0
  • lifestyle 0
  • read group 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • bias 0
  • utility 0
  • ATLAS 0
  • sequencing_bias 0
  • post mortem damage 0
  • atlas 0
  • mkarv 0
  • http(s) 0
  • subtract 0
  • bioawk 0
  • eklipse 0
  • UNet 0
  • cls 0
  • na 0
  • custom 0
  • Cores 0
  • Segmentation 0
  • TMA dearray 0
  • mcool 0
  • cutesv 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • cload 0
  • cooler/balance 0
  • gct 0
  • pcr duplicates 0
  • nucleotide composition 0
  • structural variation 0
  • eigenstratdatabasetools 0
  • pep 0
  • schema 0
  • PEP 0
  • escherichia coli 0
  • depth information 0
  • duphold 0
  • paired-end 0
  • segment 0
  • blastx 0
  • cumulative coverage 0
  • scatterplot 0
  • corrrelation 0
  • track 0
  • subcontigs 0
  • concoct 0
  • file manipulation 0
  • topology 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • domains 0
  • compartments 0
  • calder2 0
  • antigen capture 0
  • cadd 0
  • postprocessing 0
  • tblastn 0
  • subtyping 0
  • Salmonella enterica 0
  • sorted 0
  • multiomics 0
  • antibody capture 0
  • partition histograms 0
  • polymorphic sites 0
  • target 0
  • export 0
  • antitarget 0
  • access 0
  • cmseq 0
  • protein coding genes 0
  • polymorphic 0
  • crispr 0
  • polymut 0
  • chromosome_visualization 0
  • duplicate removal 0
  • chromap 0
  • quality assurnce 0
  • qa 0
  • rgi 0
  • hbd 0
  • SMN1 0
  • cutoff 0
  • scoring 0
  • variant genetic 0
  • pmdtools 0
  • porechop_abi 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • intervals coverage 0
  • genomic intervals 0
  • normal database 0
  • panel of normals 0
  • haplotype purging 0
  • whole genome association 0
  • neighbour-joining 0
  • strandedness 0
  • bamstat 0
  • R 0
  • rhocall 0
  • long uncorrected reads 0
  • subsampling 0
  • quast 0
  • duplicate purging 0
  • purging 0
  • Assembly curation 0
  • False duplications 0
  • Haplotype purging 0
  • assembly curation 0
  • false duplications 0
  • identifiers 0
  • recode 0
  • read_pairs 0
  • illumina datasets 0
  • picard/renamesampleinvcf 0
  • pcr 0
  • liftovervcf 0
  • mate-pair 0
  • hybrid-selection 0
  • phylogenetic composition 0
  • identification 0
  • deletions 0
  • prophage 0
  • phantom peaks 0
  • ChIP-Seq 0
  • motif 0
  • pedigrees 0
  • read 0
  • sortvcf 0
  • insertions 0
  • indep pairwise 0
  • STRIPE-seq 0
  • indep 0
  • variant identifiers 0
  • exclude 0
  • genetic 0
  • GRO-seq 0
  • PRO-seq 0
  • csRNA-seq 0
  • tandem duplications 0
  • RAMPAGE 0
  • NETCAGE 0
  • CAGE 0
  • PRO-cap 0
  • GRO-cap 0
  • CoPRO 0
  • experiment 0
  • fragment_size 0
  • pbp 0
  • seq 0
  • bam2seqz 0
  • freqsum 0
  • pseudodiploid 0
  • pseudohaploid 0
  • random draw 0
  • selection 0
  • header 0
  • induce 0
  • interleave 0
  • sertotype 0
  • sequence headers 0
  • grep 0
  • subseq 0
  • variant recalibration 0
  • gc_wiggle 0
  • sex determination 0
  • applyvarcal 0
  • shinyngs 0
  • CRAM 0
  • sliding window 0
  • features 0
  • density 0
  • boxplot 0
  • exploratory 0
  • 256 bit 0
  • genetic sex 0
  • sha256 0
  • longread 0
  • de-novo 0
  • error 0
  • rare variants 0
  • relative coverage 0
  • VQSR 0
  • assembly-binning 0
  • inner_distance 0
  • rtg-tools 0
  • flagstat 0
  • multimapper 0
  • Ancestor 0
  • LCA 0
  • salsa2 0
  • salsa 0
  • rocplot 0
  • duplicate marking 0
  • pedfilter 0
  • rtg 0
  • integrity 0
  • mapping-based 0
  • sequence-based 0
  • read distribution 0
  • sambamba 0
  • amplicon 0
  • seacr 0
  • scramble 0
  • chromatin 0
  • cut&run 0
  • cut&tag 0
  • peak-caller 0
  • clusteridentifier 0
  • cluster analysis 0
  • readgroup 0
  • ampliconclip 0
  • read pairs 0
  • paired 0
  • repair 0
  • insert size 0
  • faidx 0
  • calmd 0
  • pair-end 0
  • subreads 0
  • beagle 0
  • train 0
  • limma 0
  • pneumophila 0
  • clinical 0
  • legionella 0
  • collapsing 0
  • adapter removal 0
  • spliced 0
  • lofreq/call 0
  • reorder 0
  • combining 0
  • kofamscan 0
  • kegg 0
  • pneumoniae 0
  • Klebsiella 0
  • Listeria monocytogenes 0
  • lofreq/filter 0
  • k-mer counting 0
  • DNA damage 0
  • reduced 0
  • mash/sketch 0
  • taxonomic assignment 0
  • estimate 0
  • damage patterns 0
  • NGS 0
  • rra 0
  • qualities 0
  • maximum-likelihood 0
  • CRISPR-Cas9 0
  • sgRNA 0
  • functional genomics 0
  • peptide prediction 0
  • AMP 0
  • effective genome size 0
  • digital normalization 0
  • maxbin2 0
  • Hidden Markov Model 0
  • IDR 0
  • panel_of_normals 0
  • haemophilus 0
  • pos 0
  • annotations 0
  • hmtnote 0
  • amino acid 0
  • igv.js 0
  • HMMER 0
  • readcounter 0
  • gccounter 0
  • haplotype resolution 0
  • Haemophilus influenzae 0
  • mitochondrial 0
  • igv 0
  • js 0
  • quant 0
  • jasminesv 0
  • kallisto/index 0
  • papermill 0
  • jupytext 0
  • Jupyter 0
  • Python 0
  • jasmine 0
  • insertion 0
  • genome browser 0
  • genomic islands 0
  • interproscan 0
  • probability_maps 0
  • pixel_classification 0
  • pixel classification 0
  • multicut 0
  • representations 0
  • metagenome-assembled genomes 0
  • pbmerge 0
  • graph construction 0
  • graph unchopping 0
  • graph stats 0
  • combine graphs 0
  • odgi 0
  • squeeze 0
  • graph drawing 0
  • gender 0
  • graph viz 0
  • Neisseria gonorrhoeae 0
  • ngm 0
  • NextGenMap 0
  • sequencing summary 0
  • mobile element insertions 0
  • somatic structural variations 0
  • graph formats 0
  • tumor/normal 0
  • contaminant 0
  • pairtools 0
  • pbbam 0
  • graphs 0
  • paragraph 0
  • select 0
  • restriction fragments 0
  • pairstools 0
  • ligation junctions 0
  • hla-typing 0
  • upper-triangular matrix 0
  • flip 0
  • PCR/optical duplicates 0
  • block-compressed 0
  • HLA-I 0
  • ILP 0
  • cancer genome 0
  • SNP table 0
  • mass-spectroscopy 0
  • rma6 0
  • unionsum 0
  • ploidy 0
  • smudgeplot 0
  • Merqury 0
  • contour map 0
  • Neisseria meningitidis 0
  • daa 0
  • methylation bias 0
  • debruijn 0
  • denovo 0
  • megahit 0
  • 128 bit 0
  • MD5 0
  • mcr-1 0
  • metaphlan 0
  • mbias 0
  • GATK UnifiedGenotyper 0
  • microsatellite instability 0
  • Beautiful stand-alone HTML report 0
  • bioinformatics tools 0
  • mitochondrial to nuclear ratio 0
  • ratio 0
  • mtnucratio 0
  • scan 0
  • otu table 0
  • assembler 0
  • mosdepth 0
  • reference genome 0
  • mitochondrial genome 0
  • target prediction 0
  • microrna 0
  • de Bruijn 0
  • remove samples 0

Convert a file in FASTA format to the ELFASTA format

01

elfasta log versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Annotates intervals with GC content, mappability, and segmental-duplication content

0101010101010101

annotated_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data

012340101010

csv versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

01230101010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

0120000

dragstr_model versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

0123400000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

hdf5 tsv versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012000

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges adjacent DepthEvidence records

012000

condensed_evidence condensed_evidence_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

01

pon versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

01

dict versions

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals constraining germline and artifactual sites for use with mutect2.

01010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Denoises read counts to produce denoised copy ratios

0101

standardized denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

0123010

calls model versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Estimates the numbers of unique molecules in a sequencing library.

01000

metrics versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts FastQ file to SAM/BAM format

01

bam versions

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

012300000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers scattered BQSR recalibration reports into a single file

01

table versions

gatk4:

Genome Analysis Toolkit (GATK4)

write your description here

010

table versions

gatk4:

Genome Analysis Toolkit (GATK4)

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345000

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

012340101010101

vcf tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

cohortcalls cohortmodel casecalls versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

012301010100

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

vcf tbi bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an index for a feature file, e.g. VCF or BED file.

01

index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts an Picard IntervalList file to a BED file.

01

bed versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits the interval list file into unique, equally-sized interval files and place it under a directory

01

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts

01

artifactprior versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

0123000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

0120101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges mutect2 stats generated on different intervals/regions

01

stats versions

gatk4:

Genome Analysis Toolkit (GATK4)

Merges several vcf files

0101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

01230101010000

vcf tbi stats f1r2 versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

0123

intervals segments denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Prepares bins for coverage collection.

0101010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

bam cram sam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

012300000

vcf versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

01

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

01

fastq versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

0123

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a fasta with the bases shifted by offset

010101

shift_fa shift_fai shift_back_chain dict intervals shift_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence

01201000

baf baf_tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Split intervals into sub-interval files.

01010101

split_intervals versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

01201010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012000000

recal idx tranches plots versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

012300000

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Click here to trigger an update.