Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • metagenomics 81
  • fastq 20
  • genomics 20
  • classify 20
  • taxonomic profiling 18
  • bam 17
  • database 17
  • classification 12
  • download 10
  • sort 9
  • taxonomy 9
  • coverage 8
  • binning 8
  • metagenome 8
  • db 8
  • fasta 7
  • alignment 7
  • contamination 7
  • quality 7
  • build 7
  • kmer 7
  • checkm 7
  • completeness 7
  • virus 7
  • mag 7
  • kraken2 7
  • contigs 6
  • phage 6
  • mapping 6
  • markduplicates 6
  • kmers 6
  • vsearch 6
  • index 5
  • cram 5
  • bacteria 5
  • statistics 5
  • visualisation 5
  • sequences 5
  • bins 5
  • complexity 5
  • microbiome 5
  • profiling 5
  • isolates 5
  • ptr 5
  • coptr 5
  • sourmash 5
  • taxonomic classification 5
  • gatk4 4
  • assembly 4
  • sam 4
  • merge 4
  • k-mer 4
  • ancient DNA 4
  • table 4
  • aDNA 4
  • palaeogenomics 4
  • archaeogenomics 4
  • diversity 4
  • ganon 4
  • umitools 4
  • malt 4
  • redundancy 4
  • dedup 3
  • population genetics 3
  • long reads 3
  • report 3
  • antimicrobial resistance genes 3
  • merging 3
  • de novo assembly 3
  • bin 3
  • containment 3
  • amplicon sequences 3
  • vrhyme 3
  • UMI 3
  • bracken 3
  • kraken 3
  • checkv 3
  • vcf 2
  • annotation 2
  • pacbio 2
  • clustering 2
  • isoseq 2
  • bcftools 2
  • antimicrobial resistance 2
  • imaging 2
  • depth 2
  • damage 2
  • peaks 2
  • ncbi 2
  • mags 2
  • plasmid 2
  • prediction 2
  • deduplication 2
  • detection 2
  • arg 2
  • deep learning 2
  • sketch 2
  • telomere 2
  • krona chart 2
  • microbes 2
  • abundance 2
  • deeparg 2
  • host 2
  • MaltExtract 2
  • HOPS 2
  • authentication 2
  • edit distance 2
  • barcode 2
  • krakenuniq 2
  • krakentools 2
  • metamaps 2
  • iphop 2
  • taxonomic profile 2
  • standardise 2
  • standardisation 2
  • otu tables 2
  • taxon tables 2
  • signature 2
  • FracMinHash sketch 2
  • profiles 2
  • ome-tif 2
  • MCMICRO 2
  • metagenomes 2
  • metagenomic 2
  • single cells 2
  • genome bins 2
  • genomad 2
  • genome 1
  • bed 1
  • filter 1
  • map 1
  • gtf 1
  • split 1
  • sentieon 1
  • count 1
  • VCF 1
  • copy number 1
  • imputation 1
  • trimming 1
  • reporting 1
  • rnaseq 1
  • indexing 1
  • QC 1
  • long-read 1
  • metrics 1
  • amr 1
  • cluster 1
  • plot 1
  • repeat 1
  • machine learning 1
  • iCLIP 1
  • example 1
  • umi 1
  • antimicrobial peptides 1
  • duplicates 1
  • fragment 1
  • amps 1
  • visualization 1
  • antibiotic resistance 1
  • extract 1
  • riboseq 1
  • fgbio 1
  • bedgraph 1
  • compare 1
  • profile 1
  • cat 1
  • DNA sequencing 1
  • targeted sequencing 1
  • hybrid capture sequencing 1
  • copy number alteration calling 1
  • normalization 1
  • add 1
  • retrotransposon 1
  • ccs 1
  • gatk4spark 1
  • krona 1
  • rsem 1
  • spark 1
  • html 1
  • arriba 1
  • hi-c 1
  • fusion 1
  • genome mining 1
  • atac-seq 1
  • chip-seq 1
  • long terminal repeat 1
  • interactive 1
  • primer 1
  • salmon 1
  • orf 1
  • instrain 1
  • scaffold 1
  • contig 1
  • trim 1
  • sequenzautils 1
  • varcal 1
  • registration 1
  • RNA-Seq 1
  • UMIs 1
  • identifier 1
  • dereplicate 1
  • extension 1
  • rna velocity 1
  • cobra 1
  • metagenome assembler 1
  • usearch 1
  • extractunbinned 1
  • linkbins 1
  • sintax 1
  • vsearch/sort 1
  • catpack 1
  • sylph 1
  • split_kmers 1
  • signatures 1
  • hash sketch 1
  • fracminhash sketch 1
  • standard 1
  • tag2tag 1
  • haplotag 1
  • Staging 1
  • staging 1
  • tags 1
  • multiqc 1
  • impute-info 1
  • drug categorization 1
  • confidence 1
  • cell_barcodes 1
  • tag 1
  • source tracking 1
  • fastqfilter 1
  • vsearch/fastqfilter 1
  • vsearch/dereplicate 1
  • post Post-processing 1
  • GTDB taxonomy 1
  • genome taxonomy database 1
  • archaea 1
  • qa 1
  • splitcram 1
  • quality assurnce 1
  • bgc 1
  • duplicate removal 1
  • chromap 1
  • population genomics 1
  • postprocessing 1
  • sorted 1
  • combining 1
  • antibiotic resistance genes 1
  • ARGs 1
  • consensus sequence 1
  • groupreads 1
  • nucleotide composition 1
  • concoct 1
  • variant quality score recalibration 1
  • vqsr 1
  • intervals coverage 1
  • pcr 1
  • peak-caller 1
  • cut&tag 1
  • cut&run 1
  • chromatin 1
  • seacr 1
  • assembly-binning 1
  • applyvarcal 1
  • VQSR 1
  • gc_wiggle 1
  • LCA 1
  • Ancestor 1
  • multimapper 1
  • duplicate marking 1
  • calmd 1
  • phylogenetic composition 1
  • megahit 1
  • denovo 1
  • debruijn 1
  • metaphlan 1
  • metagenome-assembled genomes 1
  • maxbin2 1
  • AMP 1
  • peptide prediction 1
  • illumina datasets 1
  • ChIP-Seq 1
  • phantom peaks 1
  • otu table 1
  • prepare 1
  • reference 0
  • structural variants 0
  • variant calling 0
  • align 0
  • gff 0
  • qc 0
  • variants 0
  • quality control 0
  • nanopore 0
  • cnv 0
  • gfa 0
  • variant 0
  • MSA 0
  • somatic 0
  • convert 0
  • conversion 0
  • single-cell 0
  • proteomics 0
  • bedtools 0
  • phylogeny 0
  • graph 0
  • gvcf 0
  • sv 0
  • variation graph 0
  • bisulfite 0
  • consensus 0
  • illumina 0
  • picard 0
  • databases 0
  • wgs 0
  • methylseq 0
  • bisulphite 0
  • methylation 0
  • bqsr 0
  • protein 0
  • cna 0
  • compression 0
  • 5mC 0
  • stats 0
  • serotype 0
  • tsv 0
  • demultiplex 0
  • scWGBS 0
  • WGBS 0
  • DNA methylation 0
  • haplotype 0
  • pairs 0
  • pangenome graph 0
  • base quality score recalibration 0
  • searching 0
  • protein sequence 0
  • histogram 0
  • openms 0
  • structure 0
  • neural network 0
  • matrix 0
  • expression 0
  • LAST 0
  • genotype 0
  • mmseqs2 0
  • bcf 0
  • mappability 0
  • filtering 0
  • annotate 0
  • validation 0
  • samtools 0
  • bwa 0
  • biscuit 0
  • aligner 0
  • bisulfite sequencing 0
  • low-coverage 0
  • cooler 0
  • transcript 0
  • transcriptome 0
  • decompression 0
  • gff3 0
  • segmentation 0
  • mkref 0
  • phasing 0
  • msa 0
  • glimpse 0
  • blast 0
  • bismark 0
  • hmmsearch 0
  • evaluation 0
  • gene 0
  • genotyping 0
  • spatial 0
  • newick 0
  • seqkit 0
  • ucsc 0
  • sequence 0
  • germline 0
  • pangenome 0
  • demultiplexing 0
  • scRNA-seq 0
  • splicing 0
  • differential 0
  • low frequency variant calling 0
  • mirna 0
  • bedGraph 0
  • hmmer 0
  • cnvkit 0
  • prokaryote 0
  • short-read 0
  • multiple sequence alignment 0
  • single 0
  • NCBI 0
  • gzip 0
  • snp 0
  • mitochondria 0
  • json 0
  • tumor-only 0
  • feature 0
  • gridss 0
  • MAF 0
  • text 0
  • 3-letter genome 0
  • single cell 0
  • summary 0
  • kallisto 0
  • de novo 0
  • call 0
  • clipping 0
  • wxs 0
  • mem 0
  • query 0
  • idXML 0
  • mutect2 0
  • view 0
  • counts 0
  • interval 0
  • indels 0
  • mpileup 0
  • deamination 0
  • adapters 0
  • benchmark 0
  • csv 0
  • svtk 0
  • tabular 0
  • cut 0
  • enrichment 0
  • genome assembler 0
  • bcl2fastq 0
  • snps 0
  • read depth 0
  • public datasets 0
  • CLIP 0
  • gsea 0
  • circrna 0
  • haplotypecaller 0
  • genmod 0
  • SV 0
  • ranking 0
  • compress 0
  • peak-calling 0
  • diamond 0
  • miscoding lesions 0
  • structural 0
  • palaeogenetics 0
  • phylogenetic placement 0
  • interval_list 0
  • archaeogenetics 0
  • hic 0
  • bigwig 0
  • STR 0
  • paf 0
  • chunk 0
  • ATAC-seq 0
  • FASTQ 0
  • concatenate 0
  • fastx 0
  • sample 0
  • sequencing 0
  • ont 0
  • resistance 0
  • union 0
  • ampir 0
  • xeniumranger 0
  • ancestry 0
  • pypgx 0
  • isomir 0
  • microarray 0
  • parsing 0
  • fungi 0
  • BGC 0
  • biosynthetic gene cluster 0
  • propr 0
  • logratio 0
  • family 0
  • bgzip 0
  • hmmcopy 0
  • DNA sequence 0
  • reference-free 0
  • microsatellite 0
  • reads 0
  • quantification 0
  • ngscheckmate 0
  • matching 0
  • HiFi 0
  • preprocessing 0
  • happy 0
  • reports 0
  • notebook 0
  • bedpe 0
  • mzml 0
  • somatic variants 0
  • ligate 0
  • mtDNA 0
  • windowmasker 0
  • pseudoalignment 0
  • npz 0
  • variant_calling 0
  • mapper 0
  • typing 0
  • entrez 0
  • guide tree 0
  • covid 0
  • organelle 0
  • transcriptomics 0
  • repeat expansion 0
  • fcs-gx 0
  • chimeras 0
  • PacBio 0
  • fingerprint 0
  • PCA 0
  • miRNA 0
  • ambient RNA removal 0
  • HMM 0
  • amplicon sequencing 0
  • rna_structure 0
  • RNA 0
  • genotype-based deconvoltion 0
  • cfDNA 0
  • popscle 0
  • dna 0
  • transposons 0
  • bacterial 0
  • untar 0
  • archiving 0
  • plink2 0
  • wastewater 0
  • transcripts 0
  • genome assembly 0
  • polishing 0
  • indel 0
  • mlst 0
  • prokka 0
  • dictionary 0
  • duplication 0
  • fam 0
  • bim 0
  • insert 0
  • score 0
  • replace 0
  • pairsam 0
  • structural_variants 0
  • pan-genome 0
  • lineage 0
  • SNP 0
  • benchmarking 0
  • unzip 0
  • survivor 0
  • uncompress 0
  • fastk 0
  • pangolin 0
  • long_read 0
  • panel 0
  • minimap2 0
  • uLTRA 0
  • tabix 0
  • spaceranger 0
  • subsample 0
  • informative sites 0
  • kinship 0
  • identity 0
  • relatedness 0
  • lossless 0
  • small indels 0
  • observations 0
  • shapeit 0
  • scores 0
  • zip 0
  • wig 0
  • rna 0
  • png 0
  • adapter trimming 0
  • angsd 0
  • ataqv 0
  • aln 0
  • bwameth 0
  • CRISPR 0
  • pileup 0
  • DRAMP 0
  • bamtools 0
  • nucleotide 0
  • quality trimming 0
  • amplify 0
  • comparisons 0
  • fai 0
  • intervals 0
  • converter 0
  • virulence 0
  • chromosome 0
  • roh 0
  • prokaryotes 0
  • eukaryotes 0
  • combine 0
  • complement 0
  • cut up 0
  • cool 0
  • RNA-seq 0
  • remove 0
  • macrel 0
  • dump 0
  • image 0
  • mcmicro 0
  • prefetch 0
  • highly_multiplexed_imaging 0
  • image_analysis 0
  • cellranger 0
  • bakta 0
  • genomes 0
  • C to T 0
  • neubi 0
  • gene expression 0
  • das tool 0
  • mkfastq 0
  • das_tool 0
  • clean 0
  • phase 0
  • retrotransposons 0
  • pair 0
  • variation 0
  • pharokka 0
  • differential expression 0
  • function 0
  • orthology 0
  • parallelized 0
  • checksum 0
  • tree 0
  • transcriptomic 0
  • mudskipper 0
  • minhash 0
  • mash 0
  • concordance 0
  • vdj 0
  • xz 0
  • archive 0
  • COBS 0
  • k-mer index 0
  • bloom filter 0
  • lofreq 0
  • gene set analysis 0
  • serogroup 0
  • awk 0
  • hlala_typing 0
  • hidden Markov model 0
  • Read depth 0
  • mask 0
  • leviosam2 0
  • lift 0
  • mapcounter 0
  • hla_typing 0
  • ichorcna 0
  • taxon name 0
  • hlala 0
  • hla 0
  • genetics 0
  • functional analysis 0
  • taxids 0
  • regression 0
  • interactions 0
  • zlib 0
  • proteome 0
  • long terminal retrotransposon 0
  • polyA_tail 0
  • kma 0
  • screen 0
  • khmer 0
  • bustools 0
  • BAM 0
  • blastn 0
  • gene set 0
  • immunoprofiling 0
  • refine 0
  • maximum likelihood 0
  • gstama 0
  • tama 0
  • trancriptome 0
  • windows 0
  • spatial_omics 0
  • megan 0
  • Duplication purging 0
  • small genome 0
  • de novo assembler 0
  • junctions 0
  • svdb 0
  • runs_of_homozygosity 0
  • polish 0
  • standardization 0
  • duplicate 0
  • graft 0
  • purge duplications 0
  • library 0
  • preseq 0
  • adapter 0
  • demultiplexed reads 0
  • rtgtools 0
  • import 0
  • effect prediction 0
  • ancient dna 0
  • switch 0
  • transformation 0
  • rename 0
  • shigella 0
  • seqtk 0
  • salmonella 0
  • fusions 0
  • soft-clipped clusters 0
  • scaffolding 0
  • snpeff 0
  • xenograft 0
  • snpsift 0
  • cancer genomics 0
  • fixmate 0
  • join 0
  • dict 0
  • collate 0
  • bam2fq 0
  • aggregate 0
  • artic 0
  • intersection 0
  • cnvnator 0
  • NRPS 0
  • msisensor-pro 0
  • micro-satellite-scan 0
  • tumor 0
  • proportionality 0
  • msi 0
  • instability 0
  • MSI 0
  • homoploymer 0
  • ampgram 0
  • nucleotides 0
  • removal 0
  • multiallelic 0
  • small variants 0
  • rgfa 0
  • spatial_transcriptomics 0
  • resolve_bioscience 0
  • tnhaplotyper2 0
  • secondary metabolites 0
  • reformatting 0
  • GC content 0
  • mitochondrion 0
  • simulate 0
  • ped 0
  • variant pruning 0
  • bfiles 0
  • distance 0
  • vcflib 0
  • vg 0
  • concat 0
  • read-group 0
  • tbi 0
  • intersect 0
  • nextclade 0
  • GPU-accelerated 0
  • normalize 0
  • norm 0
  • scatter 0
  • reheader 0
  • antismash 0
  • SimpleAF 0
  • antibiotics 0
  • graph layout 0
  • RiPP 0
  • image_processing 0
  • comparison 0
  • Streptococcus pneumoniae 0
  • amptransformer 0
  • sequence analysis 0
  • Pharmacogenetics 0
  • microbial 0
  • frame-shift correction 0
  • deconvolution 0
  • bayesian 0
  • long-read sequencing 0
  • CNV 0
  • cvnkit 0
  • pharmacogenetics 0
  • calling 0
  • merge mate pairs 0
  • reads merging 0
  • short reads 0
  • doublets 0
  • gwas 0
  • realignment 0
  • unaligned 0
  • gatk 0
  • joint genotyping 0
  • eCLIP 0
  • repeats 0
  • recombination 0
  • panelofnormals 0
  • evidence 0
  • estimation 0
  • mirdeep2 0
  • RNA sequencing 0
  • smrnaseq 0
  • filtermutectcalls 0
  • interval list 0
  • allele-specific 0
  • anndata 0
  • parse 0
  • fasterq-dump 0
  • sra-tools 0
  • eigenstrat 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • blastp 0
  • mRNA 0
  • deseq2 0
  • rna-seq 0
  • structural-variant calling 0
  • heatmap 0
  • random forest 0
  • regions 0
  • settings 0
  • nanostring 0
  • duplex 0
  • repeat_expansions 0
  • fetch 0
  • GEO 0
  • gene labels 0
  • expansionhunterdenovo 0
  • metadata 0
  • nacho 0
  • screening 0
  • cleaning 0
  • tab 0
  • trgt 0
  • correction 0
  • emboss 0
  • corrupted 0
  • cnv calling 0
  • baf 0
  • ChIP-seq 0
  • gem 0
  • allele 0
  • sage 0
  • vcflib/vcffixup 0
  • umicollapse 0
  • trimfq 0
  • nanopore sequencing 0
  • scRNA-Seq 0
  • morphology 0
  • resegment 0
  • AC/NS/AF 0
  • files 0
  • relabel 0
  • hostile 0
  • Pacbio 0
  • adapterremoval 0
  • cell segmentation 0
  • upd 0
  • uniparental 0
  • disomy 0
  • snv 0
  • downsample 0
  • downsample bam 0
  • antimicrobial reistance 0
  • mkarv 0
  • decontamination 0
  • atlas 0
  • contiguate 0
  • scanpy 0
  • Mycobacterium tuberculosis 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • sequencing adapters 0
  • bedgraphtobigwig 0
  • bigbed 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • vcf2db 0
  • human removal 0
  • subsample bam 0
  • lua 0
  • gemini 0
  • logFC 0
  • pangenome-scale 0
  • all versus all 0
  • mashmap 0
  • wavefront 0
  • whamg 0
  • wham 0
  • bwameme 0
  • HLA 0
  • copy-number 0
  • grabix 0
  • bwamem2 0
  • copy number analysis 0
  • subsetting 0
  • pile up 0
  • barcodes 0
  • doublet_detection 0
  • gender determination 0
  • ribosomal 0
  • copy number alterations 0
  • copy number variation 0
  • yahs 0
  • long read alignment 0
  • significance statistic 0
  • maf 0
  • construct 0
  • cellsnp 0
  • toml 0
  • nuclear segmentation 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • VCFtools 0
  • verifybamid 0
  • DNA contamination estimation 0
  • import segmentation 0
  • graph projection to vcf 0
  • solo 0
  • scvi 0
  • guidetree 0
  • http(s) 0
  • utility 0
  • p-value 0
  • grea 0
  • regtools 0
  • plotting 0
  • paired reads re-pairing 0
  • patterns 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • sniffles 0
  • core 0
  • snippy 0
  • dist 0
  • hashing-based deconvoltion 0
  • regex 0
  • fix 0
  • autofluorescence 0
  • malformed 0
  • InterProScan 0
  • partitioning 0
  • chip 0
  • dbnsfp 0
  • predictions 0
  • updatedata 0
  • run 0
  • pdb 0
  • SNPs 0
  • CRAM 0
  • gnu 0
  • busco 0
  • sha256 0
  • relative coverage 0
  • lifestyle 0
  • MMseqs2 0
  • rare variants 0
  • error 0
  • transposable element 0
  • de-novo 0
  • Computational Immunology 0
  • longread 0
  • generic 0
  • sliding window 0
  • 256 bit 0
  • Bioinformatics Tools 0
  • Immune Deconvolution 0
  • shinyngs 0
  • doublet 0
  • exploratory 0
  • boxplot 0
  • density 0
  • features 0
  • coreutils 0
  • hamming-distance 0
  • invariant 0
  • fast5 0
  • recovery 0
  • ATLAS 0
  • detecting svs 0
  • short-read sequencing 0
  • lexogen 0
  • mgi 0
  • sequencing_bias 0
  • svtk/baftest 0
  • baftest 0
  • countsvtypes 0
  • genotype-based demultiplexing 0
  • variantcalling 0
  • donor deconvolution 0
  • rdtest2vcf 0
  • rdtest 0
  • vcf2bed 0
  • leafcutter 0
  • decompress 0
  • post mortem damage 0
  • polya tail 0
  • mapad 0
  • bias 0
  • sccmec 0
  • constant 0
  • overlap-based merging 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • rRNA 0
  • ribosomal RNA 0
  • check 0
  • authentict 0
  • block substitutions 0
  • decomposeblocksub 0
  • streptococcus 0
  • identity-by-descent 0
  • read group 0
  • paired reads merging 0
  • translation 0
  • spatype 0
  • functional enrichment 0
  • spa 0
  • droplet based single cells 0
  • geo 0
  • c to t 0
  • adna 0
  • script 0
  • clahe 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • realign 0
  • model 0
  • svg 0
  • xml 0
  • circular 0
  • spot 0
  • introns 0
  • java 0
  • rank 0
  • AMPs 0
  • parallel 0
  • hashing-based deconvolution 0
  • Staphylococcus aureus 0
  • plastid 0
  • resfinder 0
  • resistance genes 0
  • quality check 0
  • raw 0
  • association 0
  • bam2fastx 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • immunoinformatics 0
  • co-orthology 0
  • homology 0
  • sequence similarity 0
  • spectral clustering 0
  • comparative genomics 0
  • microRNA 0
  • size 0
  • deep variant 0
  • mutect 0
  • idx 0
  • affy 0
  • refresh 0
  • cram-size 0
  • transform 0
  • gaps 0
  • genetic sex 0
  • reference-independent 0
  • homologs 0
  • antimicrobial peptide prediction 0
  • nanoq 0
  • multi-tool 0
  • predict 0
  • amp 0
  • redundant 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • genotype likelihood 0
  • Read filters 0
  • collapse 0
  • WGS 0
  • cgMLST 0
  • liftover 0
  • probabilistic realignment 0
  • extraction 0
  • seqfu 0
  • n50 0
  • cell_type_identification 0
  • featuretable 0
  • nucleotide sequence 0
  • Read trimming 0
  • python 0
  • parquet 0
  • functional 0
  • orthogroup 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • orthologs 0
  • quarto 0
  • Illumina 0
  • uniques 0
  • r 0
  • distance-based 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • assay 0
  • phylogenetics 0
  • machine_learning 0
  • Read report 0
  • cell_phenotyping 0
  • minimum_evolution 0
  • structural variant 0
  • GWAS 0
  • mass spectrometry 0
  • Escherichia coli 0
  • mygene 0
  • elprep 0
  • chloroplast 0
  • blat 0
  • alr 0
  • clr 0
  • elfasta 0
  • boxcox 0
  • retrieval 0
  • nucleotide content 0
  • tnscope 0
  • AT content 0
  • nucBed 0
  • bclconvert 0
  • prior knowledge 0
  • propd 0
  • transcription factors 0
  • Read coverage histogram 0
  • biological activity 0
  • bgen 0
  • groupby 0
  • targz 0
  • workflow_mode 0
  • admixture 0
  • proteus 0
  • readproteingroups 0
  • reference panels 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • quality_control 0
  • snakemake 0
  • workflow 0
  • 10x 0
  • controlstatistics 0
  • regulatory network 0
  • createreadcountpanelofnormals 0
  • copyratios 0
  • denoisereadcounts 0
  • readwriter 0
  • dnamodelapply 0
  • dnascope 0
  • go 0
  • emoji 0
  • omics 0
  • tarball 0
  • mass_error 0
  • array_cgh 0
  • ancestral alleles 0
  • derived alleles 0
  • tnfilter 0
  • nuclear contamination estimate 0
  • paraphase 0
  • telseq 0
  • selector 0
  • stardist 0
  • ATACseq 0
  • cytosure 0
  • vector 0
  • gprofiler2 0
  • gost 0
  • variant-calling 0
  • case/control 0
  • poolseq 0
  • search engine 0
  • rad 0
  • site frequency spectrum 0
  • allele counts 0
  • tar 0
  • Bayesian 0
  • reverse complement 0
  • simulation 0
  • hmmfetch 0
  • decompose 0
  • translate 0
  • structural-variants 0
  • transmembrane 0
  • jvarkit 0
  • genome graph 0
  • setgt 0
  • tnseq 0
  • shift 0
  • scimap 0
  • spatial_neighborhoods 0
  • decoy 0
  • associations 0
  • htseq 0
  • rrna 0
  • installation 0
  • sompy 0
  • doCounts 0
  • ATACshift 0
  • peak picking 0
  • mgf 0
  • amino acid 0
  • sex determination 0
  • genomes on a tree 0
  • genome manipulation 0
  • genome statistics 0
  • crispr 0
  • gget 0
  • low coverage 0
  • antibody capture 0
  • antigen capture 0
  • Sample 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • multiomics 0
  • gfastats 0
  • tama_collapse.py 0
  • mkvdjref 0
  • gene model 0
  • TAMA 0
  • gstama/merge 0
  • gstama/polyacleanup 0
  • cellpose 0
  • gunc 0
  • gunzip 0
  • gvcftools 0
  • extract_variants 0
  • genome summary 0
  • hifi 0
  • variantrecalibrator 0
  • reblockgvcf 0
  • revert 0
  • selectvariants 0
  • shiftchain 0
  • shiftfasta 0
  • shiftintervals 0
  • site depth 0
  • splitintervals 0
  • svannotate 0
  • svcluster 0
  • variantfiltration 0
  • recalibration model 0
  • gawk 0
  • txt 0
  • file parsing 0
  • chromosome_visualization 0
  • genome profile 0
  • compound 0
  • models 0
  • genome size 0
  • genome heterozygosity 0
  • repeat content 0
  • Salmonella Typhi 0
  • Mykrobe 0
  • extractvariants 0
  • abricate 0
  • printreads 0
  • Salmonella enterica 0
  • interproscan 0
  • genomic islands 0
  • insertion 0
  • jasminesv 0
  • jasmine 0
  • Python 0
  • Jupyter 0
  • jupytext 0
  • papermill 0
  • tblastn 0
  • subtyping 0
  • kallisto/index 0
  • pixel_classification 0
  • quant 0
  • digital normalization 0
  • k-mer counting 0
  • effective genome size 0
  • Klebsiella 0
  • pneumoniae 0
  • file manipulation 0
  • kegg 0
  • kofamscan 0
  • bioawk 0
  • unionBedGraphs 0
  • subtract 0
  • probability_maps 0
  • pixel classification 0
  • amrfinderplus 0
  • gccounter 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • haplogroups 0
  • Assembly 0
  • Haemophilus influenzae 0
  • haplotype resolution 0
  • domains 0
  • compartments 0
  • topology 0
  • calder2 0
  • readcounter 0
  • multicut 0
  • cadd 0
  • reformat 0
  • HMMER 0
  • Hidden Markov Model 0
  • hmtnote 0
  • annotations 0
  • pos 0
  • haemophilus 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • printsvevidence 0
  • preprocessintervals 0
  • spliced 0
  • SRA 0
  • TMA dearray 0
  • UNet 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • str 0
  • faqcs 0
  • ANI 0
  • enzyme 0
  • digest 0
  • cload 0
  • ENA 0
  • Cores 0
  • public 0
  • cooler/balance 0
  • duplexumi 0
  • subcontigs 0
  • unmapped 0
  • ubam 0
  • zipperbams 0
  • single molecule 0
  • generate 0
  • random 0
  • Segmentation 0
  • cache 0
  • lint 0
  • PEP 0
  • corrrelation 0
  • scatterplot 0
  • cumulative coverage 0
  • paired-end 0
  • pcr duplicates 0
  • cutesv 0
  • blastx 0
  • gct 0
  • segment 0
  • cls 0
  • duphold 0
  • structural variation 0
  • depth information 0
  • escherichia coli 0
  • schema 0
  • percent on target 0
  • pep 0
  • eigenstratdatabasetools 0
  • eklipse 0
  • na 0
  • version 0
  • circos 0
  • deletion 0
  • custom 0
  • split by chromosome 0
  • embl 0
  • genbank 0
  • swissprot 0
  • Streptococcus pyogenes 0
  • endogenous DNA 0
  • partition histograms 0
  • fq 0
  • postprocessgermlinecnvcalls 0
  • genomicsdb 0
  • dragstr 0
  • condensedepthevidence 0
  • createsequencedictionary 0
  • polymut 0
  • createsomaticpanelofnormals 0
  • determinegermlinecontigploidy 0
  • duplication metrics 0
  • estimatelibrarycomplexity 0
  • filterintervals 0
  • splice 0
  • filtervarianttranches 0
  • tranche filtering 0
  • gatherbqsrreports 0
  • genomicsdbimport 0
  • short variant discovery 0
  • jointgenotyping 0
  • panelofnormalscreation 0
  • germline contig ploidy 0
  • germlinecnvcaller 0
  • germlinevariantsites 0
  • getpileupsumaries 0
  • readcountssummary 0
  • indexfeaturefile 0
  • learnreadorientationmodel 0
  • readorientationartifacts 0
  • leftalignandtrimvariants 0
  • mergebamalignment 0
  • mutectstats 0
  • snvs 0
  • composestrtablefile 0
  • combinegvcfs 0
  • rust 0
  • targets 0
  • variant caller 0
  • target 0
  • somatic variant calling 0
  • germline variant calling 0
  • bacterial variant calling 0
  • export 0
  • bootstrapping 0
  • UShER 0
  • gamma 0
  • gene-calling 0
  • gangstr 0
  • antitarget 0
  • access 0
  • heattree 0
  • annotateintervals 0
  • polymorphic 0
  • cmseq 0
  • protein coding genes 0
  • polymorphic sites 0
  • asereadcounter 0
  • bedtointervallist 0
  • calculatecontamination 0
  • cross-samplecontamination 0
  • getpileupsummaries 0
  • calibratedragstrmodel 0
  • cnnscorevariants 0
  • collectreadcounts 0
  • collectsvevidence 0
  • reorder 0
  • train 0
  • induce 0
  • genomic intervals 0
  • microscopy 0
  • background_correction 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • illumiation_correction 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • normal database 0
  • pmdtools 0
  • panel of normals 0
  • cutoff 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • False duplications 0
  • Assembly curation 0
  • track 0
  • purging 0
  • bamtools/split 0
  • yaml 0
  • quast 0
  • porechop_abi 0
  • variant genetic 0
  • subsampling 0
  • csRNA-seq 0
  • mate-pair 0
  • liftovervcf 0
  • picard/renamesampleinvcf 0
  • sortvcf 0
  • deletions 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • STRIPE-seq 0
  • scoring 0
  • PRO-seq 0
  • GRO-seq 0
  • genetic 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • exclude 0
  • variant identifiers 0
  • subset 0
  • indep 0
  • indep pairwise 0
  • recode 0
  • whole genome association 0
  • identifiers 0
  • neighbour-joining 0
  • long uncorrected reads 0
  • csi 0
  • virulent 0
  • scramble 0
  • cluster analysis 0
  • clusteridentifier 0
  • bacphlip 0
  • variant recalibration 0
  • subseq 0
  • read pairs 0
  • grep 0
  • sequence headers 0
  • sertotype 0
  • interleave 0
  • temperate 0
  • header 0
  • seq 0
  • selection 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • bam2seqz 0
  • readgroup 0
  • paired 0
  • rhocall 0
  • pedfilter 0
  • R 0
  • bamstat 0
  • bamtools/convert 0
  • strandedness 0
  • experiment 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • read distribution 0
  • sequence-based 0
  • mapping-based 0
  • mouse 0
  • integrity 0
  • rtg 0
  • rocplot 0
  • repair 0
  • rtg-tools 0
  • salsa 0
  • salsa2 0
  • flagstat 0
  • sambamba 0
  • amplicon 0
  • ampliconclip 0
  • faidx 0
  • insert size 0
  • hybrid-selection 0
  • adapter removal 0
  • contour map 0
  • chunking 0
  • mass-spectroscopy 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • daa 0
  • rma6 0
  • Neisseria meningitidis 0
  • k-mer frequency 0
  • 3D heat map 0
  • Merqury 0
  • maskfasta 0
  • jaccard 0
  • assembly evaluation 0
  • smudgeplot 0
  • ploidy 0
  • unionsum 0
  • methylation bias 0
  • mbias 0
  • assembler 0
  • de Bruijn 0
  • overlap 0
  • microrna 0
  • getfasta 0
  • target prediction 0
  • reference genome 0
  • sgRNA 0
  • collapsing 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • limma 0
  • Listeria monocytogenes 0
  • slopBed 0
  • lofreq/call 0
  • lofreq/filter 0
  • qualities 0
  • bases 0
  • functional genomics 0
  • CRISPR-Cas9 0
  • representations 0
  • maximum-likelihood 0
  • rra 0
  • sizes 0
  • region 0
  • shiftBed 0
  • DNA damage 0
  • NGS 0
  • damage patterns 0
  • multinterval 0
  • estimate 0
  • overlapped bed 0
  • taxonomic assignment 0
  • mash/sketch 0
  • reduced 0
  • mitochondrial genome 0
  • genomecov 0
  • select 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • update header 0
  • PCR/optical duplicates 0
  • flip 0
  • upper-triangular matrix 0
  • ligation junctions 0
  • pairtools 0
  • pairstools 0
  • restriction fragments 0
  • BCF 0
  • graph formats 0
  • paragraph 0
  • graphs 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pbp 0
  • pair-end 0
  • read 0
  • pedigrees 0
  • motif 0
  • prophage 0
  • identification 0
  • graph viz 0
  • graph unchopping 0
  • closest 0
  • contaminant 0
  • bamtobed 0
  • mosdepth 0
  • sorting 0
  • microsatellite instability 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • autozygosity 0
  • mitochondrial to nuclear ratio 0
  • bioinformatics tools 0
  • Beautiful stand-alone HTML report 0
  • GATK UnifiedGenotyper 0
  • SNP table 0
  • cancer genome 0
  • graph stats 0
  • somatic structural variations 0
  • mobile element insertions 0
  • sequencing summary 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • gender 0
  • homozygosity 0
  • biallelic 0
  • graph construction 0
  • graph drawing 0
  • squeeze 0
  • odgi 0
  • combine graphs 0

Post-processing script of the MaltExtract component of the HOPS package

000

json summary_pdf tsv candidate_pdfs versions

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

0100

tsv versions

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

01200

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

01200

vcf tbi csi versions

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

Locate and tag duplicate reads in a BAM file

01

bam metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Merge a list of sorted bam files

01

bam bam_index checksum versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Parallel sorting and duplicate marking

0101

bam bam_index cram metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

010

reports txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

db bracken_files versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Downloads the required files for either Nr or GTDB for building into a CAT database

01

rawdb versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Creates a CAT_pack database based on input FASTAs

01000

db taxonomy versions versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Build centrifuge database for taxonomic profiling

010000

cf versions

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

01000

report results sam fastq_mapped fastq_unmapped versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

010

kreport versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

0100

checkm_output marker_file checkm_tsv versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

01230

output fasta versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 database download

0

database versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

CheckM2 bin quality prediction

0101

checkm2_output checkm2_tsv versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

Construct the database necessary for checkv's quality assessment

NO input

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Assess the quality of metagenome-assembled viral genomes.

010

quality_summary completeness contamination complete_genomes proviruses viruses versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

010

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

0101010000

bed bam tagAlign pairs versions

chromap:

Fast alignment and preprocessing of chromatin profiles

binning of metagenomic sequences

01

fasta bins fm index links result versions

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.

01010101000

self_circular extended_circular extended_partial extended_failed orphan_end all_cobra_assemblies joining_summary log versions

cobra-meta:

COBRA is a tool to get higher quality viral genomes assembled from metagenomes.

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Calculate confidence scores from Kraken2 output

010

score versions

Calculates peak-to-through ratio (PTR) from metagenomic sequence data

01

ptr versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Computes the coverage map along the reference genome

01

coverage versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Indexes a directory of fasta files for use with CoPTR

01

index_dir versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Maps the reads to the reference database

0101

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Merge reads that were mapped to multiple indices

01

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Map reads to contigs and estimate coverage

010100

coverage versions

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

NO input

db versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

0120

daa daa_tsv arg potential_arg versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ€™ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

010

bam histogram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Cluster genome FASTA files by average nucleotide identity

0123

tsv dereplicated_bins versions

Build ganon database using custom reference sequences.

01000

db info versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

010

tre report one all unc log versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

010

tre versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

01

txt versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

0120

genes features clusters gbk json versions

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

Download geNomad databases and related files

NO input

genomad_db versions

genomad:

Identification of mobile genetic elements

Identify mobile genetic elements present in genomic assemblies

010

aggregated_classification taxonomy provirus compositions calibrated_classification plasmid_fasta plasmid_genes plasmid_proteins plasmid_summary virus_fasta virus_genes virus_proteins virus_summary versions

genomad:

Identification of mobile genetic elements

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

010100

summary tree markers msa user_msa filtered failed log warnings versions

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Create a tag directory with the HOMER suite

010

tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Plot a metagene of cross-link events/sites around various transcriptomic landmarks.

010

tsv versions

icount:

Computational pipeline for analysis of iCLIP data

inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification

01000

profile snvs gene_info genome_info linkage mapping_info scaffold_info versions

instrain:

Calculation of strain-level metrics

Download, extract, and check md5 of iPHoP databases

NO input

iphop_db versions

iphop:

Predict host genus from genomes of uncultivated phages.

Predict phage host using iPHoP

010

iphop_genus iphop_genome iphop_detailed_output versions

iphop:

Predict host genus from genomes of uncultivated phages.

Extract UMI and cell barcodes

010

bam pbi versions

isoseq3:

Iso-Seq - Scalable De Novo Isoform Discovery

Taxonomic classification of metagenomic sequence data using a protein reference database

010

results versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.

010

txt versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

write your description here

0100

summary versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Merge two tab-separated output files of Kaiju and Kraken in the column format

0120

merged versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Make Kaiju FMI-index file from a protein FASTA file

010

fmi bwt sa versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Generate k-mers (sketches) from FASTA/Q sequences

01

outdir info versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

01

kmcp log versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

01

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

010

profile versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

010

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Adds fasta files to a Kraken2 taxonomic database

01000

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

010

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Downloads and builds Kraken2 standard database

0

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Classifies metagenomic sequence data

01000

classified_reads_fastq unclassified_reads_fastq classified_reads_assignment report versions

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Takes multiple kraken-style reports and combines them into a single report file

01

txt versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Extract reads classified at any user-specified taxonomy IDs.

0010101

extracted_kraken2_reads versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Takes a Kraken report file and prints out a krona-compatible TEXT file

01

txt versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Download and build (custom) KrakenUniq databases

01230

db versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Download KrakenUniq databases and related fles

0

output versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Classifies metagenomic sequence data using unique k-mer counts

012000000

classified_reads unclassified_reads classified_assignment report versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Creates a Krona chart from text files listing quantities and lineages.

01

html versions

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

lima - The PacBio Barcode Demultiplexer and Primer Remover

010

counts report summary versions bam pbi fasta fastagz fastq fastqgz xml json clips guess

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

bam log versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Identifies LTR retrotransposons using LTR_retriever

metagenomeharvestfindermgescannon_tgca

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

01

smorfs all_orfs amp_prediction readme_file log_file versions

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

000

index versions log

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

010

rma6 alignments log versions

malt:

A tool for mapping metagenomic data

Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening

0100

results versions

MaxBin is a software that is capable of clustering metagenomic contigs

0123

binned_fastas summary abundance log marker_counts unbinned_fasta tooshort_fasta marker_bins marker_genes versions

Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.

01

tif versions

mcstaging:

Staging modules for MCMICRO

Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.

01

tif versions

mcstaging:

Staging modules for MCMICRO

An ultra-fast metagenomic assembler for large and complex metagenomics

012

contigs k_contigs addi_contigs local_contigs kfinal_contigs log versions

pigz:

Parallel implementation of the gzip algorithm.

Depth computation per contig step of metabat2

012

depth versions

metabat2:

Metagenome binning

Metagenome binning of contigs

012

tooshort lowdepth unbinned membership fasta versions

metabat2:

Metagenome binning

Annotation of eukaryotic metagenomes using MetaEuk

010

faa codon tsv gff versions

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Strain-level metagenomic assignment

012340

wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Maps long reads to a metamaps database

010

classification_res meta_file meta_unmappedreadsLengths para_file versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

Build MetaPhlAn database for taxonomic profiling.

NO input

db versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn4

01

txt versions

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn3

01

txt versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

A tool to estimate bacterial species abundance

0100

results versions

midas:

An integrated pipeline for estimating strain-level genomic variation from metagenomic data

Download the mOTUs database

0

db versions

motus:

The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

Taxonomic meta-omics profiling using universal marker genes

0100

txt biom versions

motus:

Marker gene-based OTU (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

010

out versions

motus:

Marker gene-based operational taxonomic unit (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

010

out bam mgc log versions

motus:

Marker gene-based OTU (mOTU) profiling

write your description here

metareadsformatmode

meta versions npa npc npl npo

Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file

01

png versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Calculate metagenome redundancy curve from FASTQ files

0100

npa npc npl npo versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files

01

json tsv csv pdf versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image

01

png versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."

01

spp pdf rdata versions

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

0100

results versions

Locate and tag duplicate reads in a BAM file

010101

bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

good_reads single_reads bad_reads log versions

Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.

0120

txt png loess_qc_txt loess_txt versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Predict antibiotic resistance from protein or nucleotide data

0100

json tsv tmp tool_version db_version versions

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Accurate detection of short and long active ORFs using Ribo-seq data

01201

protocol bam_summary read_length_dist metagene_profile_5p metagene_profile_3p metagene_plots psite_offsets pos_wig neg_wig orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

0120

csv json bam versions

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

calculates MD and NM tags

0101

bam versions

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Call peaks using SEACR on sequenced reads in bedgraph format

0120

bed versions

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

metagenomic binning with self-supervised learning

012

csv model output_fasta recluster_fasta tsv versions

semibin:

Metagenomic binning with semi-supervised siamese neural network

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

wig versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Create genome sketch using split k-mers

012

skf versions

ska:

SKA (Split Kmer Analysis)

Classifies and predicts the origin of metagenomic samples

010000

report versions

Compare many FracMinHash signatures generated by sourmash sketch.

01000

matrix labels csv versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

0100000

result unassigned matches prefetch prefetchcsv versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.

010

signature_index versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a signature (a group of FracMinHash sketches) of a sequence using sourmash

01

signatures versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.

010

result versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Sketching/indexing sequencing reads

010

sketch_fastq_genome versions

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

Standardise and merge two or more taxonomic profiles into a single table

010000

merged_profiles versions

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

Standardise the output of a wide range of taxonomic profilers

01000

standardised_profile versions

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

01

explore_tsv top_sequence versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Searches a genome for a telomere string such as TTAGGG

010

tsv bedgraph versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

012

bam log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto tutorial

01230

loom versions

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

bins membership summary versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Click here to trigger an update.