Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • vcf 41
  • fasta 37
  • structural variants 36
  • genomics 22
  • bam 17
  • metagenomics 13
  • bacteria 12
  • sv 12
  • reporting 10
  • tsv 10
  • fastq 9
  • serotype 9
  • antimicrobial resistance 9
  • amr 9
  • cram 8
  • annotation 8
  • assembly 7
  • bed 7
  • variants 7
  • binning 7
  • genome 6
  • alignment 6
  • gatk4 6
  • statistics 6
  • conversion 6
  • clustering 6
  • graph 6
  • vsearch 6
  • sort 5
  • variant calling 5
  • coverage 5
  • variant 5
  • gfa 5
  • antimicrobial peptides 5
  • antimicrobial resistance genes 5
  • amps 5
  • csv 5
  • arg 5
  • svtk 5
  • align 4
  • gff 4
  • contamination 4
  • somatic 4
  • quality 4
  • contigs 4
  • visualisation 4
  • wgs 4
  • indels 4
  • structural 4
  • ampir 4
  • parsing 4
  • SV 4
  • snps 4
  • merge 3
  • quality control 3
  • nanopore 3
  • taxonomy 3
  • ancient DNA 3
  • VCF 3
  • kmer 3
  • bcftools 3
  • mags 3
  • variation graph 3
  • databases 3
  • QC 3
  • protein 3
  • long-read 3
  • depth 3
  • protein sequence 3
  • searching 3
  • bins 3
  • pangenome graph 3
  • aDNA 3
  • archaeogenomics 3
  • damage 3
  • palaeogenomics 3
  • validation 3
  • mmseqs2 3
  • checkm 3
  • population genetics 3
  • hmmsearch 3
  • dedup 3
  • pangenome 3
  • prokaryote 3
  • benchmark 3
  • visualization 3
  • fragment 3
  • wxs 3
  • family 3
  • normalization 3
  • bin 3
  • telomere 3
  • macrel 3
  • amplify 3
  • DRAMP 3
  • small indels 3
  • panel 3
  • neubi 3
  • amplicon sequences 3
  • vrhyme 3
  • RNA 3
  • rna_structure 3
  • survivor 3
  • index 2
  • filter 2
  • classification 2
  • gtf 2
  • pacbio 2
  • sentieon 2
  • proteomics 2
  • long reads 2
  • phylogeny 2
  • gvcf 2
  • indexing 2
  • phage 2
  • example 2
  • filtering 2
  • neural network 2
  • completeness 2
  • annotate 2
  • virus 2
  • metagenome 2
  • genotyping 2
  • mag 2
  • plasmid 2
  • snp 2
  • profile 2
  • deduplication 2
  • prediction 2
  • mirna 2
  • deamination 2
  • sourmash 2
  • de novo assembly 2
  • microbiome 2
  • gridss 2
  • view 2
  • bedpe 2
  • fungi 2
  • diamond 2
  • miscoding lesions 2
  • isomir 2
  • palaeogenetics 2
  • archaeogenetics 2
  • deep learning 2
  • resistance 2
  • gsea 2
  • biosynthetic gene cluster 2
  • umitools 2
  • BGC 2
  • DNA sequence 2
  • containment 2
  • ancestry 2
  • deeparg 2
  • mlst 2
  • C to T 2
  • typing 2
  • somatic variants 2
  • chimeras 2
  • comparison 2
  • informative sites 2
  • kinship 2
  • identity 2
  • relatedness 2
  • observations 2
  • serogroup 2
  • taxids 2
  • taxon name 2
  • variation 2
  • vg 2
  • vcflib 2
  • ampgram 2
  • amptransformer 2
  • standardization 2
  • svdb 2
  • FracMinHash sketch 2
  • small variants 2
  • multiallelic 2
  • shigella 2
  • salmonella 2
  • allele-specific 2
  • calling 2
  • intersection 2
  • blastp 2
  • sam 1
  • map 1
  • qc 1
  • download 1
  • cnv 1
  • split 1
  • MSA 1
  • taxonomic profiling 1
  • single-cell 1
  • copy number 1
  • build 1
  • bqsr 1
  • consensus 1
  • taxonomic classification 1
  • metrics 1
  • base quality score recalibration 1
  • samtools 1
  • haplotype 1
  • matrix 1
  • plot 1
  • cluster 1
  • mappability 1
  • machine learning 1
  • genotype 1
  • germline 1
  • iCLIP 1
  • complexity 1
  • spatial 1
  • newick 1
  • umi 1
  • evaluation 1
  • blast 1
  • mitochondria 1
  • cnvkit 1
  • diversity 1
  • distance 1
  • mem 1
  • isolates 1
  • concatenate 1
  • single cell 1
  • tabular 1
  • summary 1
  • antibiotic resistance 1
  • compare 1
  • query 1
  • malt 1
  • preprocessing 1
  • dna 1
  • union 1
  • redundancy 1
  • cut 1
  • pypgx 1
  • enrichment 1
  • happy 1
  • STR 1
  • hybrid capture sequencing 1
  • copy number alteration calling 1
  • DNA sequencing 1
  • quantification 1
  • phylogenetic placement 1
  • targeted sequencing 1
  • transcriptomics 1
  • mtDNA 1
  • bedgraph 1
  • structural_variants 1
  • prokaryotes 1
  • benchmarking 1
  • genome mining 1
  • prokka 1
  • genomes 1
  • eukaryotes 1
  • mcmicro 1
  • highly_multiplexed_imaging 1
  • image_analysis 1
  • microbes 1
  • amplicon sequencing 1
  • cut up 1
  • dist 1
  • wig 1
  • gene set 1
  • gene set analysis 1
  • phase 1
  • maximum likelihood 1
  • dereplicate 1
  • signature 1
  • join 1
  • ped 1
  • anndata 1
  • graph layout 1
  • nextclade 1
  • contig 1
  • ancient dna 1
  • Streptococcus pneumoniae 1
  • smrnaseq 1
  • Pharmacogenetics 1
  • scaffold 1
  • frame-shift correction 1
  • long-read sequencing 1
  • rtgtools 1
  • sequence analysis 1
  • short reads 1
  • expansionhunterdenovo 1
  • MaltExtract 1
  • HOPS 1
  • authentication 1
  • edit distance 1
  • secondary metabolites 1
  • NRPS 1
  • RiPP 1
  • antibiotics 1
  • antismash 1
  • gwas 1
  • structural-variant calling 1
  • estimation 1
  • single cells 1
  • genome bins 1
  • eigenstrat 1
  • validate 1
  • format 1
  • eido 1
  • metagenomes 1
  • random forest 1
  • baf 1
  • jaccard 1
  • decomposeblocksub 1
  • structural variant 1
  • simulation 1
  • decompose 1
  • transmembrane 1
  • standardize 1
  • verifybamid 1
  • melon 1
  • vcfbreakmulti 1
  • uniq 1
  • deduplicate 1
  • DNA contamination estimation 1
  • construct 1
  • graph projection to vcf 1
  • extractunbinned 1
  • linkbins 1
  • sintax 1
  • vsearch/sort 1
  • usearch 1
  • genome annotation 1
  • trna 1
  • covariance models 1
  • snv 1
  • Escherichia coli 1
  • whamg 1
  • wham 1
  • cadd 1
  • microRNA 1
  • multiqc 1
  • vsearch/dereplicate 1
  • vsearch/fastqfilter 1
  • fastqfilter 1
  • jvarkit 1
  • tag2tag 1
  • drug categorization 1
  • xml 1
  • svg 1
  • haplotag 1
  • post Post-processing 1
  • model 1
  • AMPs 1
  • antimicrobial peptide prediction 1
  • amp 1
  • block substitutions 1
  • emoji 1
  • installation 1
  • barcodes 1
  • subsetting 1
  • go 1
  • Bayesian 1
  • mygene 1
  • sage 1
  • mass spectrometry 1
  • AC/NS/AF 1
  • check 1
  • vcflib/vcffixup 1
  • MMseqs2 1
  • InterProScan 1
  • mcr-1 1
  • asereadcounter 1
  • Neisseria meningitidis 1
  • collectreadcounts 1
  • legionella 1
  • clinical 1
  • pneumophila 1
  • Listeria monocytogenes 1
  • collectsvevidence 1
  • cancer genome 1
  • somatic structural variations 1
  • mobile element insertions 1
  • Neisseria gonorrhoeae 1
  • gender 1
  • graph stats 1
  • heattree 1
  • abricate 1
  • amrfinderplus 1
  • fARGene 1
  • rgi 1
  • Haemophilus influenzae 1
  • svcluster 1
  • svannotate 1
  • Mykrobe 1
  • Salmonella Typhi 1
  • gunc 1
  • kegg 1
  • kofamscan 1
  • haemophilus 1
  • printsvevidence 1
  • interproscan 1
  • jasminesv 1
  • jasmine 1
  • sertotype 1
  • sex determination 1
  • genetic sex 1
  • relative coverage 1
  • assembly-binning 1
  • cutesv 1
  • gct 1
  • rdtest2vcf 1
  • spatype 1
  • spa 1
  • streptococcus 1
  • sccmec 1
  • detecting svs 1
  • short-read sequencing 1
  • svtk/baftest 1
  • baftest 1
  • countsvtypes 1
  • rdtest 1
  • vcf2bed 1
  • polya tail 1
  • fast5 1
  • Mycobacterium tuberculosis 1
  • subcontigs 1
  • nucleotide composition 1
  • concoct 1
  • pbp 1
  • pedigrees 1
  • prophage 1
  • identification 1
  • multimapper 1
  • escherichia coli 1
  • depth information 1
  • structural variation 1
  • duphold 1
  • blastx 1
  • rocplot 1
  • LCA 1
  • Ancestor 1
  • quast 1
  • circos 1
  • Streptococcus pyogenes 1
  • deletion 1
  • schema 1
  • eklipse 1
  • eigenstratdatabasetools 1
  • pep 1
  • reference 0
  • database 0
  • classify 0
  • k-mer 0
  • convert 0
  • count 0
  • imputation 0
  • rnaseq 0
  • trimming 0
  • bedtools 0
  • bisulfite 0
  • isoseq 0
  • methylation 0
  • bisulphite 0
  • methylseq 0
  • picard 0
  • compression 0
  • illumina 0
  • cna 0
  • table 0
  • stats 0
  • sequences 0
  • imaging 0
  • 5mC 0
  • mapping 0
  • demultiplex 0
  • openms 0
  • DNA methylation 0
  • markduplicates 0
  • repeat 0
  • histogram 0
  • scWGBS 0
  • pairs 0
  • WGBS 0
  • structure 0
  • expression 0
  • bisulfite sequencing 0
  • transcriptome 0
  • aligner 0
  • LAST 0
  • bwa 0
  • plink2 0
  • low-coverage 0
  • transcript 0
  • bcf 0
  • seqkit 0
  • cooler 0
  • phasing 0
  • gzip 0
  • sequence 0
  • gene 0
  • db 0
  • biscuit 0
  • decompression 0
  • ncbi 0
  • hmmer 0
  • ucsc 0
  • gff3 0
  • feature 0
  • peaks 0
  • segmentation 0
  • kraken2 0
  • msa 0
  • bismark 0
  • mkref 0
  • glimpse 0
  • sketch 0
  • reads 0
  • json 0
  • demultiplexing 0
  • report 0
  • differential 0
  • multiple sequence alignment 0
  • low frequency variant calling 0
  • bedGraph 0
  • short-read 0
  • kmers 0
  • scRNA-seq 0
  • single 0
  • splicing 0
  • extract 0
  • NCBI 0
  • duplicates 0
  • tumor-only 0
  • ptr 0
  • cat 0
  • interval 0
  • detection 0
  • fastx 0
  • de novo 0
  • FASTQ 0
  • text 0
  • mutect2 0
  • kallisto 0
  • ont 0
  • call 0
  • MAF 0
  • counts 0
  • coptr 0
  • idXML 0
  • adapters 0
  • profiling 0
  • mpileup 0
  • reference-free 0
  • 3-letter genome 0
  • clipping 0
  • merging 0
  • riboseq 0
  • ccs 0
  • ngscheckmate 0
  • genome assembler 0
  • matching 0
  • fai 0
  • bigwig 0
  • read depth 0
  • peak-calling 0
  • CLIP 0
  • circrna 0
  • rna 0
  • microarray 0
  • ganon 0
  • ATAC-seq 0
  • add 0
  • microsatellite 0
  • retrotransposon 0
  • compress 0
  • bgzip 0
  • skani 0
  • interval_list 0
  • hic 0
  • paf 0
  • haplotypecaller 0
  • HMM 0
  • chromosome 0
  • logratio 0
  • HiFi 0
  • chunk 0
  • bcl2fastq 0
  • propr 0
  • hmmcopy 0
  • image 0
  • public datasets 0
  • clean 0
  • ranking 0
  • xeniumranger 0
  • genmod 0
  • sample 0
  • abundance 0
  • sequencing 0
  • fgbio 0
  • fcs-gx 0
  • arriba 0
  • fastk 0
  • das tool 0
  • spark 0
  • html 0
  • das_tool 0
  • angsd 0
  • insert 0
  • fam 0
  • bim 0
  • fusion 0
  • SNP 0
  • subsample 0
  • pangolin 0
  • UMI 0
  • pan-genome 0
  • rsem 0
  • pairsam 0
  • duplication 0
  • replace 0
  • bacterial 0
  • covid 0
  • dictionary 0
  • lineage 0
  • polishing 0
  • indel 0
  • PCA 0
  • mapper 0
  • fingerprint 0
  • regions 0
  • RNA-seq 0
  • entrez 0
  • scores 0
  • seqtk 0
  • aln 0
  • bwameth 0
  • npz 0
  • windowmasker 0
  • hi-c 0
  • bakta 0
  • nucleotide 0
  • mkfastq 0
  • host 0
  • cellranger 0
  • gene expression 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • proteome 0
  • guide tree 0
  • long_read 0
  • transposons 0
  • complement 0
  • roh 0
  • transcripts 0
  • organelle 0
  • remove 0
  • converter 0
  • intervals 0
  • genome assembly 0
  • gatk4spark 0
  • mzml 0
  • PacBio 0
  • comparisons 0
  • combine 0
  • quality trimming 0
  • score 0
  • adapter trimming 0
  • popscle 0
  • pileup 0
  • genotype-based deconvoltion 0
  • bamtools 0
  • variant_calling 0
  • bracken 0
  • hidden Markov model 0
  • archiving 0
  • minimap2 0
  • sylph 0
  • notebook 0
  • reports 0
  • ataqv 0
  • checkv 0
  • repeat expansion 0
  • virulence 0
  • krona chart 0
  • miRNA 0
  • cool 0
  • pseudoalignment 0
  • dump 0
  • lossless 0
  • shapeit 0
  • khmer 0
  • CRISPR 0
  • krona 0
  • prefetch 0
  • spaceranger 0
  • wastewater 0
  • atac-seq 0
  • tabix 0
  • ambient RNA removal 0
  • chip-seq 0
  • ligate 0
  • population genomics 0
  • cfDNA 0
  • uLTRA 0
  • png 0
  • gstama 0
  • profiles 0
  • ichorcna 0
  • mash 0
  • tama 0
  • pigz 0
  • bustools 0
  • refine 0
  • resolve_bioscience 0
  • trancriptome 0
  • spatial_transcriptomics 0
  • lofreq 0
  • screen 0
  • krakentools 0
  • haplotypes 0
  • split_kmers 0
  • interactive 0
  • reformat 0
  • minhash 0
  • GC content 0
  • megan 0
  • polyA_tail 0
  • hla 0
  • primer 0
  • hlala 0
  • k-mer frequency 0
  • hla_typing 0
  • hlala_typing 0
  • barcode 0
  • iphop 0
  • checksum 0
  • corrupted 0
  • tree 0
  • nanostring 0
  • mapcounter 0
  • nacho 0
  • haplogroups 0
  • mRNA 0
  • find 0
  • krakenuniq 0
  • instrain 0
  • pair 0
  • long terminal repeat 0
  • trgt 0
  • cgMLST 0
  • regression 0
  • SimpleAF 0
  • zlib 0
  • differential expression 0
  • orthologs 0
  • WGS 0
  • image_processing 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • repeats 0
  • ome-tif 0
  • de novo assembler 0
  • small genome 0
  • MCMICRO 0
  • interactions 0
  • functional analysis 0
  • reformatting 0
  • function 0
  • pharokka 0
  • bloom filter 0
  • k-mer index 0
  • COBS 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • kma 0
  • parallelized 0
  • orthology 0
  • rrna 0
  • genetics 0
  • salmon 0
  • tnhaplotyper2 0
  • rgfa 0
  • nucleotides 0
  • cnvnator 0
  • proportionality 0
  • mitochondrion 0
  • orf 0
  • leviosam2 0
  • lift 0
  • metamaps 0
  • registration 0
  • mirdeep2 0
  • cancer genomics 0
  • homoploymer 0
  • Duplication purging 0
  • purge duplications 0
  • library 0
  • preseq 0
  • adapter 0
  • import 0
  • doublets 0
  • variant pruning 0
  • bfiles 0
  • subset 0
  • gene labels 0
  • read-group 0
  • hostile 0
  • duplicate 0
  • decontamination 0
  • GPU-accelerated 0
  • human removal 0
  • screening 0
  • removal 0
  • msisensor-pro 0
  • cleaning 0
  • micro-satellite-scan 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • Read depth 0
  • RNA sequencing 0
  • soft-clipped clusters 0
  • snpsift 0
  • snpeff 0
  • effect prediction 0
  • switch 0
  • sequenzautils 0
  • transformation 0
  • rename 0
  • varcal 0
  • fusions 0
  • fixmate 0
  • retrotransposons 0
  • dict 0
  • collate 0
  • bam2fq 0
  • scaffolding 0
  • junctions 0
  • pharmacogenetics 0
  • runs_of_homozygosity 0
  • polish 0
  • taxonomic profile 0
  • assembly evaluation 0
  • concordance 0
  • duplex 0
  • deconvolution 0
  • bayesian 0
  • merge mate pairs 0
  • reads merging 0
  • xenograft 0
  • graft 0
  • unaligned 0
  • fetch 0
  • realignment 0
  • GEO 0
  • trim 0
  • metagenomic 0
  • identifier 0
  • microscopy 0
  • repeat_expansions 0
  • metadata 0
  • tab 0
  • microbial 0
  • emboss 0
  • panelofnormals 0
  • gatk 0
  • joint genotyping 0
  • interval list 0
  • evidence 0
  • filtermutectcalls 0
  • RNA-Seq 0
  • simulate 0
  • artic 0
  • aggregate 0
  • demultiplexed reads 0
  • concat 0
  • tbi 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • BAM 0
  • blastn 0
  • version 0
  • correction 0
  • cnv calling 0
  • immunoprofiling 0
  • cvnkit 0
  • vdj 0
  • recombination 0
  • eCLIP 0
  • splice 0
  • parse 0
  • fasterq-dump 0
  • awk 0
  • intersect 0
  • normalize 0
  • norm 0
  • scatter 0
  • reheader 0
  • samplesheet 0
  • windows 0
  • deseq2 0
  • rna-seq 0
  • region 0
  • heatmap 0
  • sizes 0
  • bases 0
  • spatial_omics 0
  • allele 0
  • UMIs 0
  • gem 0
  • ChIP-seq 0
  • genomad 0
  • getfasta 0
  • derived alleles 0
  • tnfilter 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • overlap 0
  • array_cgh 0
  • cytosure 0
  • ancestral alleles 0
  • gprofiler2 0
  • gost 0
  • genomecov 0
  • closest 0
  • rad 0
  • bamtobed 0
  • sorting 0
  • bam2fastx 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • vector 0
  • site frequency spectrum 0
  • immunoinformatics 0
  • f coefficient 0
  • bioawk 0
  • unionBedGraphs 0
  • reverse complement 0
  • hmmfetch 0
  • pca 0
  • pruning 0
  • subtract 0
  • linkage equilibrium 0
  • slopBed 0
  • genome graph 0
  • chunking 0
  • tnseq 0
  • homozygous genotypes 0
  • decoy 0
  • heterozygous genotypes 0
  • htseq 0
  • inbreeding 0
  • shiftBed 0
  • multinterval 0
  • sompy 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • drep 0
  • homology 0
  • co-orthology 0
  • clumping fastqs 0
  • deduping 0
  • plastid 0
  • smaller fastqs 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • quarto 0
  • masking 0
  • python 0
  • r 0
  • low-complexity 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • GFF/GTF 0
  • assay 0
  • trio binning 0
  • tandem repeats 0
  • phylogenetics 0
  • minimum_evolution 0
  • parallel 0
  • csi 0
  • Read coverage histogram 0
  • biallelic 0
  • sequence similarity 0
  • spectral clustering 0
  • agat 0
  • longest 0
  • comparative genomics 0
  • isoform 0
  • autozygosity 0
  • homozygosity 0
  • deep variant 0
  • variancepartition 0
  • mutect 0
  • idx 0
  • update header 0
  • intron 0
  • dream 0
  • md 0
  • transform 0
  • gaps 0
  • introns 0
  • nm 0
  • uq 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • BCF 0
  • short 0
  • file manipulation 0
  • plink2_pca 0
  • propd 0
  • vcf2db 0
  • gemini 0
  • maf 0
  • lua 0
  • toml 0
  • plant 0
  • SINE 0
  • VCFtools 0
  • network 0
  • downsample bam 0
  • wget 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • subsample bam 0
  • downsample 0
  • unmarkduplicates 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • mobile genetic elements 0
  • quality assurnce 0
  • qa 0
  • umicollapse 0
  • scanner 0
  • scRNA-Seq 0
  • crispr 0
  • antibody capture 0
  • files 0
  • antigen capture 0
  • helitron 0
  • multiomics 0
  • remove samples 0
  • upd 0
  • uniparental 0
  • disomy 0
  • domains 0
  • long read alignment 0
  • nucleotide sequence 0
  • tnscope 0
  • copyratios 0
  • comp 0
  • denoisereadcounts 0
  • readwriter 0
  • dnamodelapply 0
  • dnascope 0
  • tblastn 0
  • bedcov 0
  • genome polishing 0
  • groupby 0
  • assembly polishing 0
  • genotype dosages 0
  • vcf file 0
  • postprocessing 0
  • bgen 0
  • subtyping 0
  • chloroplast 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • bgen file 0
  • createreadcountpanelofnormals 0
  • workflow_mode 0
  • pangenome-scale 0
  • yahs 0
  • all versus all 0
  • mashmap 0
  • wavefront 0
  • compartments 0
  • copy-number 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • geo 0
  • workflow 0
  • mapad 0
  • adna 0
  • c to t 0
  • cumulative coverage 0
  • proteus 0
  • readproteingroups 0
  • calder2 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • snakemake 0
  • distance-based 0
  • long read 0
  • homologs 0
  • telseq 0
  • admixture 0
  • taxonomic composition 0
  • mzML 0
  • prepare 0
  • catpack 0
  • mass_error 0
  • search engine 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • Staging 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • http(s) 0
  • utility 0
  • setgt 0
  • translate 0
  • tar 0
  • tarball 0
  • adapterremoval 0
  • CRISPRi 0
  • HLA 0
  • nanoq 0
  • Read filters 0
  • Read trimming 0
  • Read report 0
  • hhsuite 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • functional 0
  • impute-info 0
  • tags 0
  • sequencing_bias 0
  • mkarv 0
  • hashing-based deconvolution 0
  • rank 0
  • 16S 0
  • java 0
  • script 0
  • post mortem damage 0
  • standard 0
  • atlas 0
  • staging 0
  • targz 0
  • Computational Immunology 0
  • bias 0
  • scanpy 0
  • nuclear contamination estimate 0
  • resegment 0
  • morphology 0
  • fix 0
  • malformed 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • metagenome assembler 0
  • run 0
  • allele counts 0
  • plotting 0
  • regtools 0
  • leafcutter 0
  • pdb 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • affy 0
  • reference panels 0
  • relabel 0
  • cell segmentation 0
  • Bioinformatics Tools 0
  • quality_control 0
  • bclconvert 0
  • nucBed 0
  • AT content 0
  • Immune Deconvolution 0
  • nucleotide content 0
  • elfasta 0
  • elprep 0
  • doublet 0
  • patterns 0
  • controlstatistics 0
  • source tracking 0
  • regex 0
  • nuclear segmentation 0
  • paired reads re-pairing 0
  • doublet_detection 0
  • doCounts 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • import segmentation 0
  • redundant 0
  • hmmpress 0
  • identity-by-descent 0
  • scimap 0
  • host removal 0
  • structural-variants 0
  • omics 0
  • biological activity 0
  • bamtools/split 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • haploype 0
  • yaml 0
  • associations 0
  • impute 0
  • bedgraphtobigwig 0
  • bamtools/convert 0
  • reference compression 0
  • pile up 0
  • mouse 0
  • reference panel 0
  • bacphlip 0
  • virulent 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • spatial_neighborhoods 0
  • Indel 0
  • grea 0
  • seqfu 0
  • multi-tool 0
  • predict 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • probabilistic realignment 0
  • n50 0
  • case/control 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • shuffleBed 0
  • SNV 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • extension 0
  • temperate 0
  • read group 0
  • cram-size 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • background 0
  • single-stranded 0
  • regulatory network 0
  • ancientDNA 0
  • transcription factors 0
  • paraphase 0
  • selector 0
  • size 0
  • Pacbio 0
  • quality check 0
  • realign 0
  • circular 0
  • phylogenies 0
  • hmmscan 0
  • spot 0
  • orthogroup 0
  • authentict 0
  • featuretable 0
  • extraction 0
  • guidetree 0
  • functional enrichment 0
  • autofluorescence 0
  • translation 0
  • paired reads merging 0
  • overlap-based merging 0
  • lifestyle 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • cycif 0
  • contiguate 0
  • junction 0
  • busco 0
  • droplet based single cells 0
  • antimicrobial reistance 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • trimfq 0
  • bigbed 0
  • cmseq 0
  • duplicate removal 0
  • bedtointervallist 0
  • mash/sketch 0
  • calibratedragstrmodel 0
  • reduced 0
  • representations 0
  • maxbin2 0
  • getpileupsummaries 0
  • metagenome-assembled genomes 0
  • cross-samplecontamination 0
  • mass-spectroscopy 0
  • calculatecontamination 0
  • MD5 0
  • 128 bit 0
  • megahit 0
  • taxonomic assignment 0
  • denovo 0
  • debruijn 0
  • daa 0
  • rma6 0
  • vqsr 0
  • variant quality score recalibration 0
  • 3D heat map 0
  • contour map 0
  • Merqury 0
  • annotateintervals 0
  • targets 0
  • cnnscorevariants 0
  • ploidy 0
  • AMP 0
  • collapsing 0
  • determinegermlinecontigploidy 0
  • createsomaticpanelofnormals 0
  • limma 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • lofreq/call 0
  • lofreq/filter 0
  • qualities 0
  • peptide prediction 0
  • estimate 0
  • dragstr 0
  • functional genomics 0
  • sgRNA 0
  • CRISPR-Cas9 0
  • maximum-likelihood 0
  • rra 0
  • composestrtablefile 0
  • short variant discovery 0
  • combinegvcfs 0
  • DNA damage 0
  • NGS 0
  • damage patterns 0
  • smudgeplot 0
  • unionsum 0
  • train 0
  • graph drawing 0
  • SNP table 0
  • contaminant 0
  • single molecule 0
  • sequencing summary 0
  • NextGenMap 0
  • ngm 0
  • zipperbams 0
  • graph construction 0
  • ubam 0
  • Beautiful stand-alone HTML report 0
  • squeeze 0
  • odgi 0
  • combine graphs 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • unmapped 0
  • GATK UnifiedGenotyper 0
  • bioinformatics tools 0
  • metaphlan 0
  • bootstrapping 0
  • methylation bias 0
  • mbias 0
  • gangstr 0
  • assembler 0
  • de Bruijn 0
  • microrna 0
  • gene-calling 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • gamma 0
  • UShER 0
  • mosdepth 0
  • mitochondrial to nuclear ratio 0
  • otu table 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • microsatellite instability 0
  • fq 0
  • lint 0
  • random 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • generate 0
  • adapter removal 0
  • spliced 0
  • flip 0
  • txt 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • genome profile 0
  • bgc 0
  • haplotype resolution 0
  • file parsing 0
  • gawk 0
  • extractvariants 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • gccounter 0
  • splitintervals 0
  • readcounter 0
  • splitcram 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • compound 0
  • extract_variants 0
  • Hidden Markov Model 0
  • gene model 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • tama_collapse.py 0
  • gfastats 0
  • TAMA 0
  • gvcftools 0
  • gstama/merge 0
  • repeat content 0
  • gstama/polyacleanup 0
  • GTDB taxonomy 0
  • genome heterozygosity 0
  • genome taxonomy database 0
  • archaea 0
  • genome size 0
  • gunzip 0
  • models 0
  • shiftfasta 0
  • hmtnote 0
  • reorder 0
  • Klebsiella 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • indexfeaturefile 0
  • readcountssummary 0
  • getpileupsumaries 0
  • kallisto/index 0
  • quant 0
  • germlinevariantsites 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • digital normalization 0
  • k-mer counting 0
  • effective genome size 0
  • pneumoniae 0
  • jupytext 0
  • panelofnormalscreation 0
  • jointgenotyping 0
  • combining 0
  • genomicsdbimport 0
  • genomicsdb 0
  • gatherbqsrreports 0
  • tranche filtering 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • papermill 0
  • Jupyter 0
  • annotations 0
  • pixel_classification 0
  • shiftchain 0
  • pos 0
  • selectvariants 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • multicut 0
  • pixel classification 0
  • probability_maps 0
  • Python 0
  • reblockgvcf 0
  • printreads 0
  • preprocessintervals 0
  • postprocessgermlinecnvcalls 0
  • genomic islands 0
  • insertion 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • sequencing adapters 0
  • custom 0
  • interleave 0
  • header 0
  • seq 0
  • na 0
  • selection 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • bam2seqz 0
  • gc_wiggle 0
  • induce 0
  • sequence headers 0
  • Cores 0
  • Segmentation 0
  • rare variants 0
  • error 0
  • TMA dearray 0
  • de-novo 0
  • longread 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • cls 0
  • grep 0
  • boxplot 0
  • scramble 0
  • amplicon 0
  • ampliconclip 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • track 0
  • insert size 0
  • repair 0
  • paired 0
  • read pairs 0
  • readgroup 0
  • paired-end 0
  • cluster analysis 0
  • subseq 0
  • clusteridentifier 0
  • peak-caller 0
  • cut&tag 0
  • cut&run 0
  • chromatin 0
  • seacr 0
  • pcr duplicates 0
  • applyvarcal 0
  • VQSR 0
  • variant recalibration 0
  • exploratory 0
  • density 0
  • sambamba 0
  • variantcalling 0
  • Sample 0
  • protein coding genes 0
  • polymorphic sites 0
  • antitarget 0
  • polymorphic 0
  • decompress 0
  • polymut 0
  • chromosome_visualization 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • access 0
  • fracminhash sketch 0
  • features 0
  • cload 0
  • mcool 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • sniffles 0
  • core 0
  • snippy 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • hash sketch 0
  • dbnsfp 0
  • predictions 0
  • SNPs 0
  • invariant 0
  • constant 0
  • partition histograms 0
  • rRNA 0
  • ribosomal RNA 0
  • target 0
  • export 0
  • signatures 0
  • duplicate marking 0
  • flagstat 0
  • ligation junctions 0
  • genetic 0
  • deletions 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • csRNA-seq 0
  • STRIPE-seq 0
  • PRO-seq 0
  • GRO-seq 0
  • ARGs 0
  • picard/renamesampleinvcf 0
  • antibiotic resistance genes 0
  • faqcs 0
  • exclude 0
  • variant identifiers 0
  • str 0
  • indep 0
  • indep pairwise 0
  • recode 0
  • whole genome association 0
  • identifiers 0
  • scoring 0
  • cache 0
  • variant genetic 0
  • sortvcf 0
  • pcr 0
  • porechop_abi 0
  • pairtools 0
  • pairstools 0
  • restriction fragments 0
  • select 0
  • groupreads 0
  • duplexumi 0
  • consensus sequence 0
  • public 0
  • paragraph 0
  • graphs 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pair-end 0
  • liftovervcf 0
  • read 0
  • ENA 0
  • motif 0
  • ChIP-Seq 0
  • phantom peaks 0
  • illumina datasets 0
  • phylogenetic composition 0
  • SRA 0
  • ANI 0
  • hybrid-selection 0
  • mate-pair 0
  • pmdtools 0
  • percent on target 0
  • read distribution 0
  • subsampling 0
  • long uncorrected reads 0
  • rhocall 0
  • R 0
  • bamstat 0
  • strandedness 0
  • experiment 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • PEP 0
  • sequence-based 0
  • mapping-based 0
  • segment 0
  • integrity 0
  • rtg 0
  • pedfilter 0
  • rtg-tools 0
  • salsa 0
  • salsa2 0
  • neighbour-joining 0
  • endogenous DNA 0
  • swissprot 0
  • genbank 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • embl 0
  • intervals coverage 0
  • split by chromosome 0
  • genomic intervals 0
  • normal database 0
  • panel of normals 0
  • cutoff 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • False duplications 0
  • Assembly curation 0
  • purging 0
  • integron 0

Converts a GFF/GTF file into a TSV file

01

tsv versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.

0100

sample_dir txt csv faa summary_csv summary_html log results_db results_db_dmnd results_db_fasta results_db_tsv versions

A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.

0

cluster_tsv rep_cluster_tsv log versions

ampcombi2/cluster:

A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.

A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.

0

tsv log versions

ampcombi2/complete:

This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

0100000

sample_dir contig_gbks db_tsv tsv faa sample_log full_log db db_txt db_fasta db_mmseqs versions

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.

01000

amps_faa amps_tsv versions

AMPlify is an attentive deep learning model for antimicrobial peptide prediction.

010

tsv versions

amplify:

Attentive deep learning model for antimicrobial peptide prediction

Post-processing script of the MaltExtract component of the HOPS package

000

json summary_pdf tsv candidate_pdfs versions

Module to subset AnnData object to cells with matching barcodes from the csv file

012

h5ad versions

anndata:

An annotated data matrix.

Annotation and Ranking of Structural Variation

012301010101

tsv unannotated_tsv vcf versions

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

annotations versions

annotsv:

Annotation and Ranking of Structural Variation

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

0100

clusterblast_file html_accessory_files knownclusterblast_html knownclusterblast_dir knownclusterblast_txt svg_files_clusterblast svg_files_knownclusterblast gbk_input json_results log zip gbk_results clusterblastoutput html knownclusterblastoutput json_sideloading versions

antismashlite:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

0100

tsv versions

Annotation of bacterial genomes (isolates, MAGs) and plasmids

01000

embl faa ffn fna gbff gff hypotheticals_tsv hypotheticals_faa tsv txt versions

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.

Render an assembly graph in GFA 1.0 format to PNG and SVG image formats

01

png svg versions

bandage:

Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

01200

vcf tbi csi versions

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Calculate Jaccard statistic b/w two feature files.

01201

tsv versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

01010

xml tsv csv versions

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

010

tsv versions

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

0100

checkm_output marker_file checkm_tsv versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 bin quality prediction

0101

checkm2_output checkm2_tsv versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

Copy number variant detection from high-throughput sequencing data

012

tsv cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

view function to generate vcfs

0100

vcf tsv xls versions

cnvpytor:

calling CNVs using read depth

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Generate the input coverage table for CONCOCT using a BEDFile

0123

tsv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

01

csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Concatenate two or more CSV (or TSV) tables into a single table

0100

csv versions

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Join two or more CSV (or TSV) tables by selected fields into a single table

01

csv versions

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Splits CSV/TSV into multiple files according to column values

0100

split_csv versions

csvtk:

CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

01

gct versions

tabulartogseagct:

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

structural-variant calling with cutesv

01201

vcf versions

Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.

0

report versions

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

0120

daa daa_tsv arg potential_arg versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.

010

readme log json bgc_gbk bgc_tsv full_gbk pfam_tsv bgc_png pr_png roc_png score_png versions

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

A Deep Learning Model for Transmembrane Topology Prediction and Classification

01

gff3 line3 md csv png versions

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01234010101

vcf vcf_index gvcf gvcf_index versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Queries a DIAMOND database using blastp mode

010100

blast xml txt daa sam tsv paf versions

diamond:

Accelerated BLAST compatible local sequence aligner

Queries a DIAMOND database using blastx mode

010100

blast xml txt daa sam tsv paf log versions

diamond:

Accelerated BLAST compatible local sequence aligner

calculate clusters of highly similar sequences

01

tsv versions

diamond:

Accelerated BLAST compatible local sequence aligner

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

In silico prediction of E. coli serotype

01

log tsv txt versions

Validate samplesheet or PEP config against a schema

000

versions log

validate:

Validate samplesheet or PEP config against a schema.

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

0123

tsv json versions

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

tool for detection and quantification of large mtDNA rearrangements.

0120

deletions genes circos versions

EMM typing of Streptococcus pyogenes assemblies

01

tsv versions

Compute genome-wide STR profile

0120101

locus_tsv motif_tsv str_profile versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Quickly compute statistics over a fasta file in windows.

01

freq mononuc dinuc trinuc tetranuc versions

fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.

01

tsv versions

Cluster genome FASTA files by average nucleotide identity

0123

tsv dereplicated_bins versions

colours a phylogeny with placement densities

01

newick nexus phyloxml svg colours log versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data

012340101010

csv versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

hdf5 tsv versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

create mappability files for a genome

0101

wig bedgraph txt csv versions

genmap:

Ultra-fast computation of genome mappability.

Genotype Salmonella Typhi from Mykrobe results

01

tsv versions

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

run the Broad Gene Set Enrichment tool in GSEA mode

01230101

rpt index_html heat_map_corr_plot report_tsvs_ref report_htmls_ref report_tsvs_target report_htmls_target ranked_gene_list gene_set_sizes histogram heatmap pvalues_vs_nes_plot ranked_list_corr butterfly_plot gene_set_tsv gene_set_html gene_set_heatmap snapshot gene_set_enplot gene_set_dist archive versions

gsea:

Gene Set Enrichment Analysis (GSEA)

Merging of CheckM and GUNC results in one summary table

012

tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Detection of Chimerism and Contamination in Prokaryotic Genomes

010

maxcss_level_tsv all_levels_tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Tool to convert and summarize ABRicate outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize DeepARG outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize fARGene outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize RGI outputs using the hAMRonization specification.

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to summarize and combine all hAMRonization reports into a single file

00

json tsv html versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

0100

gbk svg tsv versions

pacbio structural variant calling tool

01201201

vcf csv versions

Serotype prediction of Haemophilus parasuis assemblies

01

tsv versions

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

012012

tsv versions

htsnimtools:

useful command-line tools written to show-case hts-nim

Plot a metagene of cross-link events/sites around various transcriptomic landmarks.

010

tsv versions

icount:

Computational pipeline for analysis of iCLIP data

Produces protein annotations and predictions from an amino acids FASTA file

010

tsv xml gff3 json versions

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

012000000000000

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

Call variants from a BAM file using iVar

010000

tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Convert sam files to tsv files

01230123

tsv versions

jvarkit:

Java utilities for Bioinformatics.

Plot whole genome coverage from BAM/CRAM file as SVG

012010101

output versions

jvarkit:

Java utilities for Bioinformatics.

Produces annotation using kofamscan against a Profile database and a KO list

0100

txt tsv versions

Typing of clinical and environmental isolates of Legionella pneumophila

01

tsv versions

Serogrouping Listeria monocytogenes assemblies

01

tsv versions

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

bam log versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

vcf versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.

010101

csv versions

Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation

01

tsv fa versions

Performs taxonomic profiling of long metagenomic reads against the melon database

0100

tsv_output json_output log versions

Serotyping of Neisseria meningitidis assemblies

01

tsv versions

Annotation of eukaryotic metagenomes using MetaEuk

010

faa codon tsv gff versions

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.

0101012

tsv versions

mirtop:

Small RNA-seq annotation

mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package

0101012

tsv fasta vcf versions

mirtop:

Small RNA-seq annotation

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

Run Torsten Seemann's classic MLST on a genome assembly

01

tsv versions

Create a tsv file from a query and a target database as well as the result database

010101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a database using MMseqs2

0101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Conversion of expandable profile to databases to the MMseqs2 databases format

0

db_exprofile versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Fetch the GO concepts for a list of genes

01

gmt tsv versions

AMR predictions for supported species

010

csv json versions

mykrobe:

Antibiotic resistance prediction in minutes

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

010

csv csv_errors csv_insertions tsv json json_auspice ndjson fasta_aligned fasta_translation nwk versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

Serotyping Neisseria gonorrhoeae assemblies

01

tsv versions

Determines the gender of a sample from the BAM/CRAM file.

01201010

tsv versions

ngsbits:

Short-read sequencing tools

Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files

01

json tsv csv pdf versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.

01

lay tsv versions

odgi:

An optimized dynamic genome/graph implementation

Metrics describing a variation graph and its path relationship.

01

tsv yaml versions

odgi:

An optimized dynamic genome/graph implementation

Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file

01000

tsv versions

panacus:

panacus is a tool for computing counting statistics for GFA files

Create visualizations from a tsv coverage histogram created with panacus.

01

image versions

panacus:

panacus is a tool for computing counting statistics for GFA files

Serogroup Pseudomonas aeruginosa assemblies

01

tsv blast details versions

Assign PBP type of Streptococcus pneumoniae assemblies

010

tsv blast versions

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

vcf versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

pbsv - PacBio structural variant (SV) signature discovery tool

0101

svsig versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

Manipulation, validation and exploration of pedigrees

0120101

vs_html html ped het_check_png ped_check_png sex_check_png het_check_csv ped_check_csv sex_check_csv ped_check_rel_difference_csv versions

Per-base metrics on BAM/CRAM files.

012012

tsv versions

Predict prophages in bacterial genomes

01

coordinates gbk log information bacteria_fasta bacteria_gbk phage_fasta phage_gbk prophage_gff prophage_tbl prophage_tsv versions

phispy:

Prophage finder using multiple metrics

Identify plasmids in bacterial sequences and assemblies

01

json txt tsv genome_seq plasmid_seq versions

Whole genome annotation of small genomes (bacterial, archeal, viral)

0100

gff gbk fna faa ffn sqn fsa tbl err log txt tsv versions

frame-shift correction for long read (meta)genomics - maps proteins to reads

012

tsv versions

proovframe:

frame-shift correction for long read (meta)genomics

Run PureCN workflow to normalize, segment and determine purity and ploidy

01200

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample_seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Damage parameter estimation for ancient DNA

012

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Damage parameter estimation for ancient DNA

01

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Prepare a depth of coverage file for all target genes with SV from BAM files.

01200

coverage versions

pypgx:

A Python package for pharmacogenomics research

Quality Assessment Tool for Genome Assemblies

010101

results tsv transcriptome misassemblies unaligned versions

Predict antibiotic resistance from protein or nucleotide data

0100

json tsv tmp tool_version db_version versions

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.

01

png svg versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

sage is a search software for proteomics data

010101

results_tsv results_json results_pin versions tmt_tsv lfq_tsv

sageproteomics:

Proteomics searching so fast it feels like magic.

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

0120

csv json bam versions

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

Computes the depth at each position or region.

0101

tsv versions

samtools:

Tools for dealing with SAM, BAM and CRAM files; samtools depth โ€“ computes the read depth at each position or region

SCIMAP is a suite of tools that enables spatial single-cell analyses

01

csv h5ad versions

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data.

Use pangenome outputs for GWAS

0120

csv versions

metagenomic binning with self-supervised learning

012

csv model output_fasta recluster_fasta tsv versions

semibin:

Metagenomic binning with semi-supervised siamese neural network

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

0120101

cram crai bam bai score metrics metrics_multiqc_tsv versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Generate recalibration table and optionally perform base quality recalibration

01201010101010

table table_post recal_alignment csv pdf versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Salmonella serotype prediction from reads and assemblies

01

log tsv txt versions

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

tsv txt versions

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.

010

json tsv versions

Determine Shigella serotype from Illumina or Oxford Nanopore reads

01

tsv hits versions

Determine Shigella serotype from assemblies or Illumina paired-end reads

01

tsv versions

Serovar prediction of salmonella assemblies

01

tsv allele_fasta allele_json cgmlst_csv versions

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

Rapid haploid variant calling

010

tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Pairwise SNP distance matrix from a FASTA sequence alignment

01

tsv versions

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Compare many FracMinHash signatures generated by sourmash sketch.

01000

matrix labels csv versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

0100000

result unassigned matches prefetch prefetchcsv versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Computational method for finding spa types.

0100

tsv versions

Serotype prediction of Streptococcus suis assemblies

01

tsv versions

Predicts Staphylococcus aureus SCCmec type based on primers.

01

tsv versions

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

01

results_xlsx summary_tsv detailed_summary_tsv resfinder_tsv plasmidfinder_tsv mlst_tsv settings_txt pointfinder_tsv versions

staramr:

Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.

Serotype STEC samples from paired-end reads or assemblies

01

tsv versions

Converts a bedpe file to a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

sv indel germ_indel germ_sv som_indel som_sv unfiltered_sv unfiltered_indel unfiltered_germ_indel unfiltered_germ_sv unfiltered_som_indel unfiltered_som_sv raw_calls discordants log versions

SVbenchmark compares a set of โ€œtestโ€ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

fns fps distances log report versions

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

Build a structural variant database

010

db versions

svdb:

structural variant database software

The merge module merges structural variants within one or more vcf files.

0100

vcf tbi csi versions

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01000000

vcf versions

svdb:

structural variant database software

Performs tests on BAF files

01234

metrics versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

010

standardized_vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert a VCF file to a BEDPE file.

01

bedpe versions

svtools:

Tools for processing and analyzing structural variants

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

vcf tbi versions

Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA

01

csv_gz versions

Convert taxonids to taxon lineages

0120

tsv versions

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

Convert taxon names to TaxIds

0120

tsv versions

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

A tool to detect resistance and lineages of M. tuberculosis genomes

01

bam csv json txt vcf versions

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

0101

tcs scores versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Identify chromosomal rearrangements.

0120101

vcf ploidy versions

sv:

Search for structural variants.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

01

explore_tsv top_sequence versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Plots telomeric repeat frequency against sliding window location using data produced by tidk/search

01

svg versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Searches a genome for a telomere string such as TTAGGG

010

tsv bedgraph versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Detection of tRNA sequences using covariance models

01

tsv log stats fasta gff bed versions

Run TRUST4 on RNA-seq data

01201010101

tsv airr_files airr_tsv report_tsv fasta out fq outs versions

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary versions

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

consistency versions

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

vcf versions

truvari:

Structural variant comparison tool for VCFs

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

The Java port of the VarDict variant caller

01230101

vcf versions

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

01200

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Command line tools for parsing and manipulating VCF files.

012

vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

0100

vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

01

xg vg_index versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โ€œdot plotโ€ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

0

rnalfold_txt versions

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โ€œscanโ€ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

bins membership summary versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

decomposes multiallelic variants into biallelic in a VCF file.

012

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

Decomposes biallelic block substitutions into its constituent SNPs.

0123

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

vcf fai versions

vt:

A tool set for short variant discovery in genetic sequence data

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

Click here to trigger an update.