Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • vcf 18
  • bam 12
  • metagenomics 11
  • variants 10
  • fastq 9
  • gatk4 9
  • fasta 7
  • structural variants 7
  • cram 6
  • somatic 6
  • gvcf 6
  • genomics 5
  • classification 5
  • reporting 5
  • wgs 5
  • demultiplex 5
  • plink2 5
  • variant calling 4
  • filter 4
  • align 4
  • merge 4
  • qc 4
  • antimicrobial peptides 4
  • amps 4
  • wxs 4
  • sample 4
  • parsing 4
  • sam 3
  • database 3
  • table 3
  • QC 3
  • matrix 3
  • genotype 3
  • hmmsearch 3
  • demultiplexing 3
  • single cell 3
  • ampir 3
  • panel 3
  • small indels 3
  • DRAMP 3
  • neubi 3
  • amplify 3
  • macrel 3
  • subsample 3
  • wastewater 3
  • bracken 3
  • npz 3
  • reference 2
  • coverage 2
  • cnv 2
  • contamination 2
  • convert 2
  • copy number 2
  • imputation 2
  • expression 2
  • genotyping 2
  • kraken2 2
  • report 2
  • mirna 2
  • low frequency variant calling 2
  • microsatellite 2
  • DNA sequencing 2
  • targeted sequencing 2
  • hybrid capture sequencing 2
  • copy number alteration calling 2
  • abundance 2
  • popscle 2
  • genotype-based deconvoltion 2
  • observations 2
  • miRNA 2
  • ampgram 2
  • amptransformer 2
  • profiles 2
  • tumor 2
  • samplesheet 2
  • smrnaseq 2
  • deconvolution 2
  • nanostring 2
  • nacho 2
  • mRNA 2
  • format 2
  • eido 2
  • joint genotyping 2
  • samples 2
  • qualty 2
  • genome 1
  • index 1
  • assembly 1
  • bed 1
  • gff 1
  • nanopore 1
  • k-mer 1
  • split 1
  • taxonomy 1
  • sentieon 1
  • pacbio 1
  • clustering 1
  • binning 1
  • single-cell 1
  • contigs 1
  • build 1
  • sv 1
  • bcftools 1
  • mags 1
  • kmer 1
  • bisulfite 1
  • picard 1
  • bisulphite 1
  • methylseq 1
  • cna 1
  • visualisation 1
  • illumina 1
  • methylation 1
  • 5mC 1
  • serotype 1
  • depth 1
  • cluster 1
  • bins 1
  • machine learning 1
  • validation 1
  • mmseqs2 1
  • low-coverage 1
  • transcript 1
  • gene 1
  • germline 1
  • glimpse 1
  • population genetics 1
  • spatial 1
  • bismark 1
  • reads 1
  • extract 1
  • tumor-only 1
  • single 1
  • cnvkit 1
  • fragment 1
  • profiling 1
  • svtk 1
  • detection 1
  • fastx 1
  • summary 1
  • counts 1
  • compare 1
  • xeniumranger 1
  • preprocessing 1
  • STR 1
  • ganon 1
  • isomir 1
  • bgzip 1
  • umitools 1
  • bcl2fastq 1
  • ancestry 1
  • sequencing 1
  • microarray 1
  • family 1
  • UMI 1
  • informative sites 1
  • kinship 1
  • identity 1
  • transcripts 1
  • relatedness 1
  • score 1
  • angsd 1
  • RNA-seq 1
  • SNP 1
  • survivor 1
  • population genomics 1
  • cfDNA 1
  • repeat expansion 1
  • png 1
  • comparison 1
  • concordance 1
  • barcode 1
  • instability 1
  • msi 1
  • rename 1
  • bfiles 1
  • vcflib 1
  • GEO 1
  • expansionhunterdenovo 1
  • reheader 1
  • validate 1
  • short reads 1
  • trgt 1
  • heatmap 1
  • gene labels 1
  • gatk 1
  • panelofnormals 1
  • filtermutectcalls 1
  • copy number variation 1
  • missingness 1
  • copy-number 1
  • copy number analysis 1
  • gender determination 1
  • copy number alterations 1
  • createreadcountpanelofnormals 1
  • hwe 1
  • countsvtypes 1
  • bgen 1
  • downsample 1
  • downsample bam 1
  • subsample bam 1
  • drep 1
  • microbial genomics 1
  • bgen file 1
  • dereplication 1
  • vcf file 1
  • genotype dosages 1
  • remove samples 1
  • verifybamid 1
  • DNA contamination estimation 1
  • hashing-based deconvolution 1
  • Bayesian 1
  • microRNA 1
  • probabilistic realignment 1
  • AC/NS/AF 1
  • vcflib/vcffixup 1
  • MMseqs2 1
  • droplet based single cells 1
  • InterProScan 1
  • Escherichia coli 1
  • mgi 1
  • assay 1
  • source tracking 1
  • bclconvert 1
  • getpileupsummaries 1
  • short variant discovery 1
  • combinegvcfs 1
  • cross-samplecontamination 1
  • dragstr 1
  • calculatecontamination 1
  • composestrtablefile 1
  • germlinecnvcaller 1
  • germline contig ploidy 1
  • panelofnormalscreation 1
  • jointgenotyping 1
  • genomicsdbimport 1
  • genomicsdb 1
  • determinegermlinecontigploidy 1
  • gangstr 1
  • str 1
  • UShER 1
  • bootstrapping 1
  • rust 1
  • fq 1
  • Imputation 1
  • Haplotypes 1
  • Sample 1
  • beagle 1
  • hbd 1
  • ibd 1
  • reblockgvcf 1
  • variantrecalibrator 1
  • recalibration model 1
  • element 1
  • update header 1
  • allele counts 1
  • doCounts 1
  • HLA 1
  • na 1
  • pep 1
  • schema 1
  • PEP 1
  • corrrelation 1
  • scatterplot 1
  • postprocessing 1
  • nucleotide composition 1
  • concoct 1
  • intervals coverage 1
  • subsampling 1
  • picard/renamesampleinvcf 1
  • scoring 1
  • whole genome association 1
  • recode 1
  • features 1
  • jasmine 1
  • jasminesv 1
  • gender 1
  • metaphlan 1
  • bioinformatics tools 1
  • Beautiful stand-alone HTML report 1
  • microsatellite instability 1
  • alignment 0
  • sort 0
  • annotation 0
  • bacteria 0
  • map 0
  • statistics 0
  • quality control 0
  • download 0
  • gtf 0
  • classify 0
  • variant 0
  • MSA 0
  • taxonomic profiling 0
  • gfa 0
  • conversion 0
  • quality 0
  • count 0
  • proteomics 0
  • VCF 0
  • ancient DNA 0
  • bedtools 0
  • phylogeny 0
  • rnaseq 0
  • long reads 0
  • trimming 0
  • consensus 0
  • isoseq 0
  • variation graph 0
  • graph 0
  • compression 0
  • indexing 0
  • long-read 0
  • protein 0
  • bqsr 0
  • databases 0
  • stats 0
  • phage 0
  • sequences 0
  • mapping 0
  • openms 0
  • imaging 0
  • taxonomic classification 0
  • metrics 0
  • antimicrobial resistance 0
  • tsv 0
  • haplotype 0
  • pangenome graph 0
  • markduplicates 0
  • histogram 0
  • neural network 0
  • structure 0
  • base quality score recalibration 0
  • scWGBS 0
  • samtools 0
  • WGBS 0
  • plot 0
  • amr 0
  • protein sequence 0
  • pairs 0
  • searching 0
  • DNA methylation 0
  • filtering 0
  • example 0
  • repeat 0
  • aDNA 0
  • bcf 0
  • completeness 0
  • mappability 0
  • phasing 0
  • biscuit 0
  • annotate 0
  • checkm 0
  • virus 0
  • metagenome 0
  • gzip 0
  • aligner 0
  • palaeogenomics 0
  • cooler 0
  • sequence 0
  • LAST 0
  • transcriptome 0
  • bwa 0
  • seqkit 0
  • archaeogenomics 0
  • damage 0
  • iCLIP 0
  • bisulfite sequencing 0
  • db 0
  • complexity 0
  • evaluation 0
  • feature 0
  • peaks 0
  • gff3 0
  • hmmer 0
  • mkref 0
  • segmentation 0
  • blast 0
  • decompression 0
  • ncbi 0
  • msa 0
  • newick 0
  • ucsc 0
  • umi 0
  • mag 0
  • dedup 0
  • sketch 0
  • vsearch 0
  • prokaryote 0
  • antimicrobial resistance genes 0
  • rna 0
  • csv 0
  • bedGraph 0
  • multiple sequence alignment 0
  • short-read 0
  • deduplication 0
  • NCBI 0
  • duplicates 0
  • snp 0
  • plasmid 0
  • pangenome 0
  • prediction 0
  • splicing 0
  • scRNA-seq 0
  • json 0
  • mitochondria 0
  • kmers 0
  • profile 0
  • differential 0
  • mpileup 0
  • idXML 0
  • concatenate 0
  • diversity 0
  • mem 0
  • cat 0
  • kallisto 0
  • riboseq 0
  • text 0
  • benchmark 0
  • MAF 0
  • gridss 0
  • adapters 0
  • merging 0
  • isolates 0
  • arg 0
  • indels 0
  • antibiotic resistance 0
  • interval 0
  • sourmash 0
  • mutect2 0
  • call 0
  • FASTQ 0
  • microbiome 0
  • visualization 0
  • ptr 0
  • ont 0
  • de novo assembly 0
  • query 0
  • distance 0
  • tabular 0
  • view 0
  • reference-free 0
  • de novo 0
  • clipping 0
  • structural 0
  • deamination 0
  • 3-letter genome 0
  • coptr 0
  • deep learning 0
  • gsea 0
  • snps 0
  • haplotypecaller 0
  • mtDNA 0
  • enrichment 0
  • fgbio 0
  • redundancy 0
  • quantification 0
  • CLIP 0
  • read depth 0
  • transcriptomics 0
  • peak-calling 0
  • diamond 0
  • circrna 0
  • miscoding lesions 0
  • palaeogenetics 0
  • bedgraph 0
  • ranking 0
  • interval_list 0
  • HiFi 0
  • happy 0
  • public datasets 0
  • genome assembler 0
  • hic 0
  • bin 0
  • bigwig 0
  • retrotransposon 0
  • archaeogenetics 0
  • cut 0
  • phylogenetic placement 0
  • containment 0
  • SV 0
  • sylph 0
  • bedpe 0
  • dna 0
  • ngscheckmate 0
  • HMM 0
  • hmmcopy 0
  • paf 0
  • telomere 0
  • pypgx 0
  • compress 0
  • matching 0
  • ccs 0
  • genmod 0
  • resistance 0
  • BGC 0
  • chunk 0
  • propr 0
  • fai 0
  • image 0
  • biosynthetic gene cluster 0
  • malt 0
  • clean 0
  • chromosome 0
  • fungi 0
  • DNA sequence 0
  • fusion 0
  • ATAC-seq 0
  • normalization 0
  • logratio 0
  • union 0
  • add 0
  • skani 0
  • untar 0
  • transposons 0
  • highly_multiplexed_imaging 0
  • unzip 0
  • fastk 0
  • mcmicro 0
  • image_analysis 0
  • duplication 0
  • fusions 0
  • uncompress 0
  • html 0
  • ataqv 0
  • krona 0
  • bacterial 0
  • bakta 0
  • benchmarking 0
  • minimap2 0
  • pileup 0
  • tabix 0
  • quality trimming 0
  • zip 0
  • archiving 0
  • polishing 0
  • remove 0
  • entrez 0
  • adapter trimming 0
  • uLTRA 0
  • scaffolding 0
  • host 0
  • typing 0
  • bamtools 0
  • checkv 0
  • khmer 0
  • spaceranger 0
  • chimeras 0
  • lossless 0
  • PacBio 0
  • ligate 0
  • rna_structure 0
  • RNA 0
  • virulence 0
  • dist 0
  • genome assembly 0
  • shapeit 0
  • seqtk 0
  • pseudoalignment 0
  • arriba 0
  • krona chart 0
  • rsem 0
  • reports 0
  • notebook 0
  • amplicon sequencing 0
  • indel 0
  • dictionary 0
  • eukaryotes 0
  • prokaryotes 0
  • spark 0
  • genome mining 0
  • hidden Markov model 0
  • mask 0
  • ambient RNA removal 0
  • complement 0
  • long_read 0
  • atac-seq 0
  • fam 0
  • somatic variants 0
  • aln 0
  • cut up 0
  • proteome 0
  • bim 0
  • cool 0
  • mzml 0
  • gatk4spark 0
  • mapper 0
  • CRISPR 0
  • combine 0
  • comparisons 0
  • prefetch 0
  • windowmasker 0
  • fcs-gx 0
  • prokka 0
  • bwameth 0
  • guide tree 0
  • amplicon sequences 0
  • kraken 0
  • structural_variants 0
  • chip-seq 0
  • lineage 0
  • wig 0
  • microbes 0
  • pangolin 0
  • covid 0
  • pan-genome 0
  • hi-c 0
  • pairsam 0
  • gene expression 0
  • variant_calling 0
  • cellranger 0
  • replace 0
  • mkfastq 0
  • nucleotide 0
  • insert 0
  • C to T 0
  • dump 0
  • das tool 0
  • regions 0
  • roh 0
  • intervals 0
  • mlst 0
  • fingerprint 0
  • organelle 0
  • genomes 0
  • scaffold 0
  • converter 0
  • PCA 0
  • vrhyme 0
  • deeparg 0
  • scores 0
  • das_tool 0
  • graph layout 0
  • shigella 0
  • small genome 0
  • haplogroups 0
  • genetics 0
  • duplicate 0
  • functional analysis 0
  • copyratios 0
  • k-mer frequency 0
  • tnhaplotyper2 0
  • signature 0
  • interactions 0
  • rrna 0
  • de novo assembler 0
  • ancient dna 0
  • switch 0
  • xz 0
  • hla 0
  • reformat 0
  • megan 0
  • regression 0
  • COBS 0
  • hlala 0
  • hla_typing 0
  • hlala_typing 0
  • read-group 0
  • Read depth 0
  • archive 0
  • mapcounter 0
  • rgfa 0
  • zlib 0
  • taxids 0
  • ChIP-seq 0
  • variation 0
  • mitochondrion 0
  • contig 0
  • resolve_bioscience 0
  • effect prediction 0
  • snpeff 0
  • GPU-accelerated 0
  • assembly evaluation 0
  • snpsift 0
  • cancer genomics 0
  • spatial_transcriptomics 0
  • genomad 0
  • runs_of_homozygosity 0
  • junctions 0
  • small variants 0
  • gstama 0
  • taxon name 0
  • SimpleAF 0
  • trancriptome 0
  • multiallelic 0
  • FracMinHash sketch 0
  • tama 0
  • image_processing 0
  • nucleotides 0
  • ped 0
  • cnvnator 0
  • gene set 0
  • registration 0
  • GC content 0
  • gene set analysis 0
  • proportionality 0
  • differential expression 0
  • phase 0
  • checksum 0
  • leviosam2 0
  • metamaps 0
  • salmon 0
  • primer 0
  • soft-clipped clusters 0
  • pharokka 0
  • taxon tables 0
  • otu tables 0
  • varcal 0
  • pair 0
  • standardisation 0
  • minhash 0
  • interactive 0
  • krakenuniq 0
  • standardise 0
  • serogroup 0
  • lofreq 0
  • salmonella 0
  • homoploymer 0
  • purge duplications 0
  • library 0
  • bam2fq 0
  • preseq 0
  • collate 0
  • adapter 0
  • function 0
  • retrotransposons 0
  • MSI 0
  • long terminal repeat 0
  • dict 0
  • fixmate 0
  • long terminal retrotransposon 0
  • kma 0
  • import 0
  • mash 0
  • taxonomic profile 0
  • ichorcna 0
  • maximum likelihood 0
  • polyA_tail 0
  • sequenzautils 0
  • refine 0
  • svdb 0
  • mudskipper 0
  • reformatting 0
  • iphop 0
  • orf 0
  • vg 0
  • Streptococcus pneumoniae 0
  • bloom filter 0
  • rtgtools 0
  • instrain 0
  • lift 0
  • k-mer index 0
  • nextclade 0
  • transformation 0
  • micro-satellite-scan 0
  • krakentools 0
  • tree 0
  • variant pruning 0
  • screen 0
  • msisensor-pro 0
  • bustools 0
  • transcriptomic 0
  • parallelized 0
  • standardization 0
  • orthology 0
  • subset 0
  • Duplication purging 0
  • removal 0
  • polish 0
  • immunoprofiling 0
  • join 0
  • repeat_expansions 0
  • duplex 0
  • fetch 0
  • frame-shift correction 0
  • long-read sequencing 0
  • metagenomic 0
  • identifier 0
  • sequence analysis 0
  • metadata 0
  • tab 0
  • intersection 0
  • windows 0
  • pharmacogenetics 0
  • emboss 0
  • doublets 0
  • eigenstrat 0
  • anndata 0
  • UMIs 0
  • unaligned 0
  • xenograft 0
  • MCMICRO 0
  • graft 0
  • trim 0
  • allele-specific 0
  • mirdeep2 0
  • RNA sequencing 0
  • realignment 0
  • microbial 0
  • microscopy 0
  • Pharmacogenetics 0
  • bayesian 0
  • concat 0
  • tbi 0
  • intersect 0
  • normalize 0
  • norm 0
  • merge mate pairs 0
  • reads merging 0
  • region 0
  • sizes 0
  • ome-tif 0
  • pigz 0
  • find 0
  • split_kmers 0
  • corrupted 0
  • calling 0
  • cnv calling 0
  • CNV 0
  • screening 0
  • vdj 0
  • cvnkit 0
  • single cells 0
  • estimation 0
  • genome bins 0
  • recombination 0
  • eCLIP 0
  • splice 0
  • parse 0
  • cleaning 0
  • correction 0
  • bases 0
  • haplotypes 0
  • awk 0
  • BAM 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • blastn 0
  • spatial_omics 0
  • human removal 0
  • random forest 0
  • metagenomes 0
  • structural-variant calling 0
  • hostile 0
  • fasterq-dump 0
  • sra-tools 0
  • settings 0
  • decontamination 0
  • version 0
  • interval list 0
  • scatter 0
  • NRPS 0
  • evidence 0
  • MaltExtract 0
  • HOPS 0
  • baf 0
  • authentication 0
  • edit distance 0
  • dereplicate 0
  • secondary metabolites 0
  • RiPP 0
  • allele 0
  • demultiplexed reads 0
  • antibiotics 0
  • aggregate 0
  • artic 0
  • simulate 0
  • antismash 0
  • RNA-Seq 0
  • WGS 0
  • cgMLST 0
  • orthologs 0
  • ragtag 0
  • repeats 0
  • gem 0
  • gwas 0
  • hmmscan 0
  • short-read sequencing 0
  • alr 0
  • blat 0
  • yahs 0
  • detecting svs 0
  • Bioinformatics Tools 0
  • confidence 0
  • phylogenies 0
  • geo 0
  • chloroplast 0
  • hmmpress 0
  • patch 0
  • hhsuite 0
  • mapad 0
  • covariance models 0
  • trna 0
  • clr 0
  • reference compression 0
  • baftest 0
  • svtk/baftest 0
  • regex 0
  • impute 0
  • scanner 0
  • whamg 0
  • constant 0
  • wham 0
  • reference panel 0
  • modelsegments 0
  • unmarkduplicates 0
  • junction 0
  • references 0
  • sccmec 0
  • variantcalling 0
  • c to t 0
  • adna 0
  • dnamodelapply 0
  • workflow_mode 0
  • groupby 0
  • taxonomic composition 0
  • denoisereadcounts 0
  • metaspace 0
  • metabolite annotation 0
  • readwriter 0
  • mzML 0
  • snakemake 0
  • data-download 0
  • Immune Deconvolution 0
  • ribosomal RNA 0
  • rRNA 0
  • prepare 0
  • catpack 0
  • Computational Immunology 0
  • workflow 0
  • tnscope 0
  • genome annotation 0
  • readproteingroups 0
  • dnascope 0
  • 16S 0
  • proteus 0
  • streptococcus 0
  • spa 0
  • spatype 0
  • mobile genetic elements 0
  • integron 0
  • patterns 0
  • signatures 0
  • doublet 0
  • eigenvectors 0
  • hicPCA 0
  • fracminhash sketch 0
  • hash sketch 0
  • sliding 0
  • CRISPRi 0
  • pruning 0
  • rdtest2vcf 0
  • longest 0
  • isoform 0
  • upd 0
  • transcroder 0
  • cds 0
  • uniparental 0
  • disomy 0
  • snv 0
  • variancepartition 0
  • coding 0
  • sequencing adapters 0
  • vcf2db 0
  • gemini 0
  • maf 0
  • eucaryotes 0
  • lua 0
  • dream 0
  • toml 0
  • chromosomal rearrangements 0
  • agat 0
  • vcfbreakmulti 0
  • pca 0
  • linkage equilibrium 0
  • refflat 0
  • genepred 0
  • bedtobigbed 0
  • ucsc/liftover 0
  • bigbed 0
  • f coefficient 0
  • homozygous genotypes 0
  • heterozygous genotypes 0
  • inbreeding 0
  • umicollapse 0
  • bedgraphtobigwig 0
  • scRNA-Seq 0
  • plink2_pca 0
  • covariance model 0
  • files 0
  • Mycobacterium tuberculosis 0
  • assembly polishing 0
  • rdtest 0
  • SNV 0
  • extractunbinned 0
  • tandem repeats 0
  • linkbins 0
  • long read 0
  • decompress 0
  • sintax 0
  • vsearch/sort 0
  • vcf2bed 0
  • shuffleBed 0
  • Indel 0
  • trio binning 0
  • host removal 0
  • usearch 0
  • long read alignment 0
  • pangenome-scale 0
  • all versus all 0
  • mashmap 0
  • gtftogenepred 0
  • wavefront 0
  • haploype 0
  • helitron 0
  • polya tail 0
  • fast5 0
  • genome polishing 0
  • network 0
  • bedcov 0
  • uniq 0
  • deduplicate 0
  • paired reads re-pairing 0
  • comp 0
  • md 0
  • VCFtools 0
  • nm 0
  • wget 0
  • uq 0
  • GFF/GTF 0
  • short 0
  • intron 0
  • SINE 0
  • masking 0
  • low-complexity 0
  • plant 0
  • construct 0
  • melon 0
  • graph projection to vcf 0
  • boxcox 0
  • busco 0
  • fix 0
  • tag2tag 0
  • association 0
  • GWAS 0
  • svg 0
  • case/control 0
  • xml 0
  • script 0
  • java 0
  • associations 0
  • rank 0
  • spatial_neighborhoods 0
  • tags 0
  • standard 0
  • impute-info 0
  • functional 0
  • Illumina 0
  • scimap 0
  • uniques 0
  • invariant 0
  • structural-variants 0
  • omics 0
  • biological activity 0
  • drug categorization 0
  • prior knowledge 0
  • refresh 0
  • clahe 0
  • cell_barcodes 0
  • telseq 0
  • stardist 0
  • variant-calling 0
  • poolseq 0
  • multi-tool 0
  • predict 0
  • search engine 0
  • mass_error 0
  • hardy-weinberg 0
  • hwe statistics 0
  • multiqc 0
  • hwe equilibrium 0
  • haplotag 0
  • reference-independent 0
  • genotype likelihood 0
  • Staging 0
  • collapse 0
  • liftover 0
  • seqfu 0
  • n50 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • staging 0
  • tag 0
  • mygene 0
  • vsearch/dereplicate 0
  • coreutils 0
  • transcription factors 0
  • regulatory network 0
  • 10x 0
  • ribosomal 0
  • grabix 0
  • hamming-distance 0
  • bwameme 0
  • bwamem2 0
  • guidetree 0
  • hashing-based deconvoltion 0
  • gnu 0
  • Pacbio 0
  • overlap-based merging 0
  • generic 0
  • trimfq 0
  • cellsnp 0
  • transposable element 0
  • retrieval 0
  • donor deconvolution 0
  • genotype-based demultiplexing 0
  • lexogen 0
  • check 0
  • paired reads merging 0
  • Read report 0
  • orthogroup 0
  • go 0
  • Read trimming 0
  • Read filters 0
  • nanoq 0
  • redundant 0
  • pile up 0
  • extraction 0
  • featuretable 0
  • mass spectrometry 0
  • sage 0
  • nanopore sequencing 0
  • rna velocity 0
  • translation 0
  • cobra 0
  • spot 0
  • circular 0
  • extension 0
  • realign 0
  • quality check 0
  • size 0
  • cram-size 0
  • selector 0
  • grea 0
  • paraphase 0
  • functional enrichment 0
  • homologs 0
  • vsearch/fastqfilter 0
  • malformed 0
  • rad 0
  • tnfilter 0
  • plotting 0
  • scanpy 0
  • array_cgh 0
  • cytosure 0
  • metagenome assembler 0
  • vector 0
  • gprofiler2 0
  • gost 0
  • morphology 0
  • resegment 0
  • relabel 0
  • regtools 0
  • cell segmentation 0
  • nuclear segmentation 0
  • structural variant 0
  • bam2fastx 0
  • import segmentation 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • immunoinformatics 0
  • solo 0
  • scvi 0
  • co-orthology 0
  • derived alleles 0
  • sequence similarity 0
  • decompose 0
  • partitioning 0
  • chip 0
  • propd 0
  • Read coverage histogram 0
  • updatedata 0
  • run 0
  • reverse complement 0
  • pdb 0
  • simulation 0
  • hmmfetch 0
  • block substitutions 0
  • site frequency spectrum 0
  • transmembrane 0
  • decomposeblocksub 0
  • genome graph 0
  • tnseq 0
  • identity-by-descent 0
  • decoy 0
  • htseq 0
  • sompy 0
  • recovery 0
  • peak picking 0
  • leafcutter 0
  • homology 0
  • p-value 0
  • fastqfilter 0
  • translate 0
  • raw 0
  • mgf 0
  • tarball 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • quarto 0
  • python 0
  • r 0
  • tar 0
  • jvarkit 0
  • resistance genes 0
  • setgt 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • ATACshift 0
  • phylogenetics 0
  • shift 0
  • minimum_evolution 0
  • distance-based 0
  • ATACseq 0
  • nucleotide sequence 0
  • targz 0
  • resfinder 0
  • significance statistic 0
  • gaps 0
  • logFC 0
  • spectral clustering 0
  • comparative genomics 0
  • subsetting 0
  • deep variant 0
  • mutect 0
  • idx 0
  • barcodes 0
  • doublet_detection 0
  • quality_control 0
  • transform 0
  • emoji 0
  • introns 0
  • plastid 0
  • controlstatistics 0
  • elprep 0
  • elfasta 0
  • install 0
  • nucleotide content 0
  • joint-genotyping 0
  • genotypegvcf 0
  • AT content 0
  • nucBed 0
  • parallel 0
  • ancestral alleles 0
  • methylation bias 0
  • SNPs 0
  • collectsvevidence 0
  • collectreadcounts 0
  • cnnscorevariants 0
  • calibratedragstrmodel 0
  • bedtointervallist 0
  • asereadcounter 0
  • vqsr 0
  • variant quality score recalibration 0
  • annotateintervals 0
  • condensedepthevidence 0
  • heattree 0
  • gatherbqsrreports 0
  • tranche filtering 0
  • createsequencedictionary 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • createsomaticpanelofnormals 0
  • targets 0
  • getpileupsumaries 0
  • antibiotic resistance genes 0
  • consensus sequence 0
  • public 0
  • ENA 0
  • SRA 0
  • ANI 0
  • ARGs 0
  • faqcs 0
  • groupreads 0
  • cache 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • duplexumi 0
  • unmapped 0
  • gene-calling 0
  • variant caller 0
  • gamma 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • ubam 0
  • lint 0
  • random 0
  • generate 0
  • single molecule 0
  • zipperbams 0
  • germlinevariantsites 0
  • readcountssummary 0
  • embl 0
  • gene model 0
  • tama_collapse.py 0
  • genomes on a tree 0
  • merge compare 0
  • GNU 0
  • joint-variant-calling 0
  • gstama/merge 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • TAMA 0
  • gstama/polyacleanup 0
  • Mykrobe 0
  • abricate 0
  • rgi 0
  • fARGene 0
  • amrfinderplus 0
  • extractvariants 0
  • GTDB taxonomy 0
  • extract_variants 0
  • gvcftools 0
  • gunzip 0
  • gunc 0
  • archaea 0
  • genome taxonomy database 0
  • gfastats 0
  • Salmonella Typhi 0
  • indexfeaturefile 0
  • preprocessintervals 0
  • shiftchain 0
  • selectvariants 0
  • revert 0
  • printsvevidence 0
  • printreads 0
  • postprocessgermlinecnvcalls 0
  • shiftintervals 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • shiftfasta 0
  • site depth 0
  • repeat content 0
  • file parsing 0
  • genome heterozygosity 0
  • genome size 0
  • models 0
  • compound 0
  • genome profile 0
  • bgc 0
  • txt 0
  • splitcram 0
  • gawk 0
  • variantfiltration 0
  • svcluster 0
  • svannotate 0
  • splitintervals 0
  • genbank 0
  • split by chromosome 0
  • Haemophilus influenzae 0
  • illumiation_correction 0
  • BCF 0
  • csi 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • background_correction 0
  • biallelic 0
  • trimBam 0
  • bamUtil 0
  • bamtools/split 0
  • yaml 0
  • bamtools/convert 0
  • mouse 0
  • homozygosity 0
  • virulent 0
  • chunking 0
  • subtract 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • maskfasta 0
  • jaccard 0
  • autozygosity 0
  • overlap 0
  • getfasta 0
  • genomecov 0
  • closest 0
  • bamtobed 0
  • sorting 0
  • bacphlip 0
  • temperate 0
  • bioawk 0
  • amp 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • Staphylococcus aureus 0
  • installation 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • lifestyle 0
  • read group 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • bias 0
  • utility 0
  • ATLAS 0
  • sequencing_bias 0
  • post mortem damage 0
  • atlas 0
  • mkarv 0
  • http(s) 0
  • unionBedGraphs 0
  • file manipulation 0
  • deletion 0
  • Segmentation 0
  • cutesv 0
  • gct 0
  • cls 0
  • custom 0
  • Cores 0
  • TMA dearray 0
  • paired-end 0
  • UNet 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • pcr duplicates 0
  • track 0
  • cooler/balance 0
  • escherichia coli 0
  • circos 0
  • eklipse 0
  • eigenstratdatabasetools 0
  • depth information 0
  • structural variation 0
  • duphold 0
  • segment 0
  • blastx 0
  • cumulative coverage 0
  • cload 0
  • subcontigs 0
  • sorted 0
  • compartments 0
  • multiomics 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • domains 0
  • topology 0
  • antibody capture 0
  • calder2 0
  • cadd 0
  • tblastn 0
  • subtyping 0
  • Salmonella enterica 0
  • antigen capture 0
  • crispr 0
  • cmseq 0
  • partition histograms 0
  • target 0
  • export 0
  • antitarget 0
  • access 0
  • protein coding genes 0
  • qa 0
  • polymorphic sites 0
  • polymorphic 0
  • polymut 0
  • chromosome_visualization 0
  • duplicate removal 0
  • chromap 0
  • quality assurnce 0
  • mitochondrial 0
  • haplotype resolution 0
  • predictions 0
  • normal database 0
  • assembly curation 0
  • false duplications 0
  • duplicate purging 0
  • haplotype purging 0
  • cutoff 0
  • panel of normals 0
  • genomic intervals 0
  • False duplications 0
  • gene finding 0
  • contact maps 0
  • bmp 0
  • jpg 0
  • pretext 0
  • Haplotype purging 0
  • Assembly curation 0
  • porechop_abi 0
  • strandedness 0
  • sequence-based 0
  • read distribution 0
  • inner_distance 0
  • fragment_size 0
  • read_pairs 0
  • experiment 0
  • bamstat 0
  • purging 0
  • R 0
  • rhocall 0
  • long uncorrected reads 0
  • neighbour-joining 0
  • quast 0
  • contact 0
  • pmdtools 0
  • integrity 0
  • pcr 0
  • CoPRO 0
  • tandem duplications 0
  • insertions 0
  • deletions 0
  • sortvcf 0
  • liftovervcf 0
  • PRO-cap 0
  • mate-pair 0
  • hybrid-selection 0
  • phylogenetic composition 0
  • illumina datasets 0
  • identification 0
  • prophage 0
  • GRO-cap 0
  • CAGE 0
  • variant genetic 0
  • variant identifiers 0
  • identifiers 0
  • indep pairwise 0
  • indep 0
  • exclude 0
  • NETCAGE 0
  • genetic 0
  • GRO-seq 0
  • PRO-seq 0
  • STRIPE-seq 0
  • csRNA-seq 0
  • RAMPAGE 0
  • mapping-based 0
  • rtg 0
  • ChIP-Seq 0
  • gc_wiggle 0
  • error 0
  • rare variants 0
  • relative coverage 0
  • genetic sex 0
  • sex determination 0
  • induce 0
  • bam2seqz 0
  • longread 0
  • freqsum 0
  • pseudodiploid 0
  • pseudohaploid 0
  • random draw 0
  • selection 0
  • seq 0
  • de-novo 0
  • sha256 0
  • interleave 0
  • SMN1 0
  • dbnsfp 0
  • snippy 0
  • core 0
  • sniffles 0
  • POA 0
  • SMN2 0
  • CRAM 0
  • 256 bit 0
  • sliding window 0
  • density 0
  • boxplot 0
  • exploratory 0
  • shinyngs 0
  • header 0
  • sertotype 0
  • pedfilter 0
  • flagstat 0
  • faidx 0
  • calmd 0
  • ampliconclip 0
  • amplicon 0
  • duplicate marking 0
  • sambamba 0
  • multimapper 0
  • repair 0
  • Ancestor 0
  • LCA 0
  • salsa2 0
  • salsa 0
  • rtg-tools 0
  • rocplot 0
  • insert size 0
  • paired 0
  • sequence headers 0
  • seacr 0
  • grep 0
  • subseq 0
  • variant recalibration 0
  • VQSR 0
  • applyvarcal 0
  • assembly-binning 0
  • chromatin 0
  • read pairs 0
  • cut&run 0
  • cut&tag 0
  • peak-caller 0
  • clusteridentifier 0
  • cluster analysis 0
  • scramble 0
  • readgroup 0
  • phantom peaks 0
  • motif 0
  • gccounter 0
  • clinical 0
  • qualities 0
  • lofreq/filter 0
  • lofreq/call 0
  • Listeria monocytogenes 0
  • limma 0
  • pneumophila 0
  • legionella 0
  • peptide prediction 0
  • collapsing 0
  • adapter removal 0
  • train 0
  • spliced 0
  • reorder 0
  • combining 0
  • AMP 0
  • functional genomics 0
  • kegg 0
  • taxonomic assignment 0
  • mass-spectroscopy 0
  • metagenome-assembled genomes 0
  • maxbin2 0
  • representations 0
  • reduced 0
  • mash/sketch 0
  • estimate 0
  • sgRNA 0
  • damage patterns 0
  • NGS 0
  • DNA damage 0
  • rra 0
  • maximum-likelihood 0
  • CRISPR-Cas9 0
  • kofamscan 0
  • pneumoniae 0
  • MD5 0
  • haemophilus 0
  • genome browser 0
  • js 0
  • igv.js 0
  • igv 0
  • IDR 0
  • panel_of_normals 0
  • pos 0
  • pixel classification 0
  • annotations 0
  • hmtnote 0
  • Hidden Markov Model 0
  • amino acid 0
  • HMMER 0
  • readcounter 0
  • multicut 0
  • pixel_classification 0
  • Klebsiella 0
  • jupytext 0
  • effective genome size 0
  • k-mer counting 0
  • digital normalization 0
  • quant 0
  • kallisto/index 0
  • papermill 0
  • Jupyter 0
  • probability_maps 0
  • Python 0
  • insertion 0
  • genomic islands 0
  • interproscan 0
  • mcr-1 0
  • 128 bit 0
  • pedigrees 0
  • graph stats 0
  • ILP 0
  • hla-typing 0
  • tumor/normal 0
  • graph viz 0
  • graph formats 0
  • graph unchopping 0
  • combine graphs 0
  • block-compressed 0
  • odgi 0
  • squeeze 0
  • graph drawing 0
  • graph construction 0
  • Neisseria gonorrhoeae 0
  • HLA-I 0
  • PCR/optical duplicates 0
  • NextGenMap 0
  • graphs 0
  • read 0
  • pair-end 0
  • pbp 0
  • subreads 0
  • pbmerge 0
  • pbbam 0
  • paragraph 0
  • flip 0
  • select 0
  • restriction fragments 0
  • pairstools 0
  • pairtools 0
  • ligation junctions 0
  • upper-triangular matrix 0
  • ngm 0
  • sequencing summary 0
  • megahit 0
  • Merqury 0
  • assembler 0
  • mbias 0
  • unionsum 0
  • ploidy 0
  • smudgeplot 0
  • contour map 0
  • microrna 0
  • 3D heat map 0
  • Neisseria meningitidis 0
  • rma6 0
  • daa 0
  • debruijn 0
  • denovo 0
  • de Bruijn 0
  • target prediction 0
  • mobile element insertions 0
  • somatic structural variations 0
  • cancer genome 0
  • contaminant 0
  • SNP table 0
  • GATK UnifiedGenotyper 0
  • mitochondrial to nuclear ratio 0
  • mitochondrial genome 0
  • ratio 0
  • mtnucratio 0
  • scan 0
  • otu table 0
  • mosdepth 0
  • reference genome 0
  • long-reads 0

A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.

0100

sample_dir txt csv faa summary_csv summary_html log results_db results_db_dmnd results_db_fasta results_db_tsv versions

A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.

0

cluster_tsv rep_cluster_tsv log versions

ampcombi2/cluster:

A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.

A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.

0

tsv log versions

ampcombi2/complete:

This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

0100000

sample_dir contig_gbks db_tsv tsv faa sample_log full_log db db_txt db_fasta db_mmseqs versions

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

Calculates base frequency statistics across reference positions from BAM.

0123

depth_sample depth_global qs pos counts icounts versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

01

extracted_reads_fastq log intermediate_sam intermediate_bam intermediate_sorted_bam versions

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Demultiplex Element Biosciences bases files

012

sample_fastq sample_json qc_report run_stats generated_run_manifest metrics unassigned versions

Converts certain output formats to VCF

012010

vcf_gz vcf bcf_gz bcf hap legend samples tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Split VCF by sample, creating single- or multi-sample VCFs.

0120000

vcf tbi csi versions

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

Reheader a VCF file

012301

vcf index versions

reheader:

Modify header of VCF/BCF files, change sample names.

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

00000

summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

010

reports txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

db bracken_files versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.

01234

base cell sample allele_depth depth_coverage depth_other versions

cellsnp:

Efficient genotyping bi-allelic SNPs on single cells

Compile a coverage reference from the given files (normal samples).

000

cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Copy number and genotype annotation from whole genome and whole exome sequencing data

0123456000000000

bedgraph control_cpn sample_cpn gcprofile_cpn BAF CNV info ratio config versions

controlfreec/freec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

filter a matrix based on a minimum value and numbers of samples that must pass.

0101

filtered tests session_info versions

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

0100

pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Performs rapid genome comparisons for a group of genomes and visualize their relatedness

01

directory versions

drep:

De-replication of microbial genomes assembled from multiple samples

Convert any PEP project or Nextflow samplesheet to any format

000

versions samplesheet_converted

eido:

Convert any PEP project or Nextflow samplesheet to any format

Validate samplesheet or PEP config against a schema

000

versions log

validate:

Validate samplesheet or PEP config against a schema.

Merge STR profiles into a multi-sample STR profile

010101

merged_profiles versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.

01

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Demultiplex fastq files

0123

sample_fastq metrics most_frequent_unmatched versions

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012000

lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

specify the relative abundance of each known haplotype

01200

demix versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

downloads new versions of the curated SARS-CoV-2 lineage file and barcodes

0

barcodes lineages_topology lineages_meta versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

call variant and sequencing depth information of the variant

010

variants versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Generate a multi-sample report file from the output of ganon report runs

01

txt versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012000

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

01

pon versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

0123010

calls model versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345000

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

012340101010101

vcf tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

cohortcalls cohortmodel casecalls versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

012300000

vcf versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012000000

recal idx tranches plots versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA

0

samples versions

Generates haplotype calls by sampling haplotype estimates

01

haplo_sampled versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Program to compute the genotyping error rate at the sample or marker level.

0123456780123456

errors_cal errors_grp errors_spl rsquare_grp rsquare_spl rsquare_per_site versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).

0120000

barcodes matrix features classification_report config_report summary_report versions

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

0100

hbd ibd log versions

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0123450101

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0101

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Build MetaPhlAn database for taxonomic profiling.

NO input

db versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn4

01

txt versions

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn3

01

txt versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Demultiplex MGI fastq files

012

fastq undetermined ambiguous undetermined_reports ambiguous_reports general_info_reports index_reports sample_stat_reports qc_reports versions

mgikit demultiplex:

Demultiplex MGI fastq files

mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.

0101012

tsv versions

mirtop:

Small RNA-seq annotation

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

msisensor2 detection of MSI regions.

01234500

msi distribution somatic versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

msisensor2 detection of MSI regions.

00

scan versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

Aggregate results from bioinformatics analyses across many samples into a single report

000000

report data plots versions

multiqc:

MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants

01012

vcf tbi versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

normalized_counts normalized_counts_wo_HK versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

nacho_qc_reports nacho_qc_png nacho_qc_txt versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

Determines the gender of a sample from the BAM/CRAM file.

01201010

tsv versions

ngsbits:

Short-read sequencing tools

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

01

vcf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Recodes plink bfiles into a new text fileset applying different modifiers

0123

ped map txt raw traw beagledat chrdat chrmap geno pheno pos phase info lgen list gen gengz sample rlist strctin tped tfam vcf vcfgz versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Filters plink bfiles or pfiles with filters such as maf or var

0123

bed bim fam pgen pvar psam versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Filters plink bfiles or pfiles with maf filters

01230

bed bim fam pgen pvar psam versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Remove samples from a plink2 dataset

01230

remove_bim remove_bed remove_fam remove_pgen remove_psam remove_pvar versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Apply a scoring system to each sample in a plink 2 fileset

01230

score versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Convert from VCF file to BGEN file version 1.2 format preserving dosages.

01234

bgen_file sample_file log_file versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.

0123

demuxlet_result versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.

012

result vcf lmix singlet_result singlet_vcf versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.

0120

txt png loess_qc_txt loess_txt versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Run PureCN workflow to normalize, segment and determine purity and ploidy

01200

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample_seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Demultiplexer for Nanopore samples

010

reads versions

Randomly subsample sequencing reads to a specified coverage

0120

reads versions

Module to validate illuminaยฎ Sample Sheet v2 files.

010

samplesheet versions

Accelerated implementation of the GATK DepthOfCoverage tool.

01201010101

per_locus sample_summary statistics coverage_counts coverage_proportions interval_summary versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

012301010101

vcf_gz vcf_gz_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.

01

fastq versions

seqcluster:

Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.

Subsample reads from FASTQ files

012

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Demultiplex bgzip'd fastq files

012

sample_fastq metrics most_frequent_unmatched per_project_metrics per_sample_metrics sample_barcode_hop_metrics versions

validate consistency of feature and sample annotations with matrices and contrasts

0120101

sample_meta feature_meta assays contrasts versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Classifies and predicts the origin of metagenomic samples

010000

report versions

Serotype STEC samples from paired-end reads or assemblies

01

tsv versions

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

0123456789100120

input rdata plots vcf bgen versions

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

SummarizedExperiment container

010101

rds log versions

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

Merge TRGT VCFs from multiple samples

0120101

vcf versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Subsample a long-read sequencing fastq file for multiple assemblies

01

subreads versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

bam versions

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Convert and filter aligned reads to .npz

0120101

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase

0101

gender versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Create a new reference using healthy reference samples

01

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Find copy number aberrations

010101

aberrations_bed bins_bed segments_bed chr_statistics chr_plots genome_plot versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

0100

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

Click here to trigger an update.