Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • fasta 79
  • fastq 42
  • genomics 38
  • bam 37
  • alignment 30
  • metagenomics 27
  • reference 21
  • sam 21
  • cram 20
  • index 19
  • genome 16
  • align 15
  • MSA 15
  • assembly 12
  • database 12
  • classify 12
  • sort 11
  • classification 11
  • clustering 11
  • sequences 10
  • k-mer 9
  • phylogeny 9
  • databases 9
  • searching 9
  • protein sequence 9
  • annotation 8
  • map 8
  • qc 8
  • indexing 8
  • db 8
  • sequence 8
  • LAST 8
  • vcf 7
  • filter 7
  • gff 7
  • taxonomic profiling 7
  • taxonomy 7
  • build 7
  • consensus 7
  • mmseqs2 7
  • blast 7
  • hmmer 7
  • msa 7
  • bed 6
  • rnaseq 6
  • long reads 6
  • cluster 6
  • kmers 6
  • vsearch 6
  • multiple sequence alignment 6
  • quality control 5
  • nanopore 5
  • graph 5
  • mags 5
  • protein 5
  • taxonomic classification 5
  • bwa 5
  • newick 5
  • kraken2 5
  • fastx 5
  • MAF 5
  • structural variants 4
  • merge 4
  • contamination 4
  • gfa 4
  • mapping 4
  • seqkit 4
  • indels 4
  • profiling 4
  • ganon 4
  • HMM 4
  • DNA sequence 4
  • gatk4 3
  • bacteria 3
  • download 3
  • convert 3
  • count 3
  • binning 3
  • ancient DNA 3
  • stats 3
  • histogram 3
  • samtools 3
  • repeat 3
  • ncbi 3
  • population genetics 3
  • evaluation 3
  • hmmsearch 3
  • short-read 3
  • report 3
  • reads 3
  • reference-free 3
  • antibiotic resistance 3
  • adapters 3
  • counts 3
  • microbiome 3
  • view 3
  • paf 3
  • phylogenetic placement 3
  • bin 3
  • public datasets 3
  • snps 3
  • diamond 3
  • amplicon sequences 3
  • bracken 3
  • seqtk 3
  • hidden Markov model 3
  • mask 3
  • windowmasker 3
  • entrez 3
  • guide tree 3
  • rna_structure 3
  • RNA 3
  • vrhyme 3
  • aln 3
  • nucleotide 3
  • statistics 2
  • coverage 2
  • variant 2
  • copy number 2
  • single-cell 2
  • trimming 2
  • contigs 2
  • kmer 2
  • isoseq 2
  • example 2
  • aDNA 2
  • filtering 2
  • transcript 2
  • palaeogenomics 2
  • archaeogenomics 2
  • peaks 2
  • mirna 2
  • duplicates 2
  • splicing 2
  • FASTQ 2
  • mem 2
  • distance 2
  • mpileup 2
  • interval 2
  • query 2
  • clipping 2
  • kallisto 2
  • sample 2
  • sequencing 2
  • retrotransposon 2
  • fgbio 2
  • malt 2
  • chromosome 2
  • abundance 2
  • amplicon sequencing 2
  • cellranger 2
  • dictionary 2
  • small variants 2
  • nextclade 2
  • multiallelic 2
  • artic 2
  • aggregate 2
  • demultiplexed reads 2
  • nucleotides 2
  • reformatting 2
  • rrna 2
  • rgfa 2
  • reformat 2
  • long terminal retrotransposon 2
  • frame-shift correction 2
  • long-read sequencing 2
  • sequence analysis 2
  • blastp 2
  • emboss 2
  • blastn 2
  • fixmate 2
  • identifier 2
  • tab 2
  • recombination 2
  • immunoprofiling 2
  • smrnaseq 2
  • junctions 2
  • dereplicate 2
  • variant calling 1
  • split 1
  • conversion 1
  • proteomics 1
  • VCF 1
  • bedtools 1
  • imputation 1
  • bcftools 1
  • variation graph 1
  • reporting 1
  • bqsr 1
  • picard 1
  • compression 1
  • table 1
  • long-read 1
  • illumina 1
  • QC 1
  • depth 1
  • tsv 1
  • phage 1
  • markduplicates 1
  • DNA methylation 1
  • WGBS 1
  • scWGBS 1
  • haplotype 1
  • structure 1
  • matrix 1
  • plot 1
  • annotate 1
  • validation 1
  • bcf 1
  • virus 1
  • metagenome 1
  • aligner 1
  • bisulfite sequencing 1
  • biscuit 1
  • low-coverage 1
  • genotype 1
  • umi 1
  • sketch 1
  • gff3 1
  • feature 1
  • genotyping 1
  • json 1
  • snp 1
  • profile 1
  • extract 1
  • NCBI 1
  • plasmid 1
  • tabular 1
  • cat 1
  • sourmash 1
  • coptr 1
  • ptr 1
  • text 1
  • summary 1
  • benchmark 1
  • cut 1
  • bedgraph 1
  • dna 1
  • containment 1
  • telomere 1
  • preprocessing 1
  • happy 1
  • HiFi 1
  • isomir 1
  • rna 1
  • resistance 1
  • add 1
  • normalization 1
  • miRNA 1
  • duplication 1
  • khmer 1
  • UMI 1
  • pileup 1
  • pseudoalignment 1
  • typing 1
  • replace 1
  • kraken 1
  • dump 1
  • microbes 1
  • eukaryotes 1
  • remove 1
  • mlst 1
  • dist 1
  • quality trimming 1
  • bwameth 1
  • polishing 1
  • complement 1
  • adapter trimming 1
  • prefetch 1
  • trim 1
  • import 1
  • gstama 1
  • subset 1
  • adapter 1
  • tama 1
  • vg 1
  • trancriptome 1
  • simulate 1
  • minhash 1
  • mash 1
  • orthology 1
  • removal 1
  • maximum likelihood 1
  • graft 1
  • purge duplications 1
  • primer 1
  • pair 1
  • kma 1
  • screen 1
  • concat 1
  • eigenstrat 1
  • rename 1
  • transformation 1
  • unaligned 1
  • UMIs 1
  • duplex 1
  • metagenomic 1
  • metadata 1
  • dict 1
  • cvnkit 1
  • estimation 1
  • sequenzautils 1
  • version 1
  • vdj 1
  • collate 1
  • mirdeep2 1
  • reheader 1
  • RNA sequencing 1
  • hmmfetch 1
  • linkbins 1
  • covariance models 1
  • hash sketch 1
  • reverse complement 1
  • extractunbinned 1
  • decompose 1
  • trna 1
  • ribosomal RNA 1
  • rRNA 1
  • genome graph 1
  • construct 1
  • DNA contamination estimation 1
  • verifybamid 1
  • eucaryotes 1
  • vsearch/sort 1
  • blat 1
  • sintax 1
  • usearch 1
  • coding 1
  • cds 1
  • transcroder 1
  • comp 1
  • functional 1
  • rank 1
  • standard 1
  • microRNA 1
  • uniques 1
  • vsearch/fastqfilter 1
  • trimfq 1
  • Pacbio 1
  • guidetree 1
  • cram-size 1
  • size 1
  • vsearch/dereplicate 1
  • hmmscan 1
  • pdb 1
  • block substitutions 1
  • decomposeblocksub 1
  • metagenome assembler 1
  • hmmpress 1
  • fastqfilter 1
  • hhsuite 1
  • CRISPRi 1
  • translate 1
  • taxonomic composition 1
  • quality_control 1
  • parallel 1
  • covariance model 1
  • resfinder 1
  • resistance genes 1
  • homologs 1
  • vector 1
  • gaps 1
  • transform 1
  • comparative genomics 1
  • spectral clustering 1
  • sequence similarity 1
  • homology 1
  • co-orthology 1
  • nucleotide sequence 1
  • masking 1
  • low-complexity 1
  • nanopore sequencing 1
  • retrieval 1
  • multi-tool 1
  • collapse 1
  • dragstr 1
  • composestrtablefile 1
  • createsequencedictionary 1
  • antibiotic resistance genes 1
  • consensus sequence 1
  • ARGs 1
  • groupreads 1
  • cache 1
  • swissprot 1
  • random 1
  • generate 1
  • embl 1
  • gstama/polyacleanup 1
  • svannotate 1
  • genbank 1
  • maskfasta 1
  • getfasta 1
  • genomecov 1
  • segment 1
  • blastx 1
  • mkvdjref 1
  • postprocessing 1
  • tblastn 1
  • protein coding genes 1
  • access 1
  • cmseq 1
  • polymorphic sites 1
  • polymorphic 1
  • polymut 1
  • haplotype purging 1
  • cutoff 1
  • false duplications 1
  • porechop_abi 1
  • duplicate purging 1
  • assembly curation 1
  • fragment_size 1
  • read_pairs 1
  • experiment 1
  • strandedness 1
  • bamstat 1
  • neighbour-joining 1
  • read distribution 1
  • deletions 1
  • tandem duplications 1
  • insertions 1
  • inner_distance 1
  • sequence-based 1
  • pseudohaploid 1
  • induce 1
  • gc_wiggle 1
  • freqsum 1
  • pseudodiploid 1
  • random draw 1
  • selection 1
  • seq 1
  • header 1
  • interleave 1
  • sequence headers 1
  • subseq 1
  • longread 1
  • de-novo 1
  • grep 1
  • mapping-based 1
  • amplicon 1
  • duplicate marking 1
  • calmd 1
  • rtg 1
  • integrity 1
  • ampliconclip 1
  • faidx 1
  • peak-caller 1
  • seacr 1
  • chromatin 1
  • cut&run 1
  • cut&tag 1
  • insert size 1
  • readgroup 1
  • read pairs 1
  • paired 1
  • repair 1
  • train 1
  • spliced 1
  • reorder 1
  • estimate 1
  • representations 1
  • reduced 1
  • mash/sketch 1
  • taxonomic assignment 1
  • mcr-1 1
  • Hidden Markov Model 1
  • amino acid 1
  • HMMER 1
  • effective genome size 1
  • quant 1
  • kallisto/index 1
  • sequencing summary 1
  • somatic structural variations 1
  • mobile element insertions 1
  • cancer genome 1
  • unionsum 1
  • genome annotation 1
  • variants 0
  • gtf 0
  • cnv 0
  • somatic 0
  • sentieon 0
  • pacbio 0
  • quality 0
  • gvcf 0
  • bisulfite 0
  • sv 0
  • visualisation 0
  • methylation 0
  • methylseq 0
  • bisulphite 0
  • wgs 0
  • cna 0
  • serotype 0
  • antimicrobial resistance 0
  • 5mC 0
  • openms 0
  • metrics 0
  • imaging 0
  • demultiplex 0
  • amr 0
  • bins 0
  • pairs 0
  • base quality score recalibration 0
  • neural network 0
  • pangenome graph 0
  • expression 0
  • mappability 0
  • iCLIP 0
  • phasing 0
  • gzip 0
  • completeness 0
  • checkm 0
  • cooler 0
  • germline 0
  • transcriptome 0
  • machine learning 0
  • damage 0
  • plink2 0
  • gene 0
  • decompression 0
  • mkref 0
  • dedup 0
  • complexity 0
  • segmentation 0
  • mag 0
  • ucsc 0
  • spatial 0
  • glimpse 0
  • bismark 0
  • antimicrobial resistance genes 0
  • scRNA-seq 0
  • antimicrobial peptides 0
  • mitochondria 0
  • bedGraph 0
  • differential 0
  • low frequency variant calling 0
  • prokaryote 0
  • deduplication 0
  • demultiplexing 0
  • cnvkit 0
  • pangenome 0
  • prediction 0
  • tumor-only 0
  • single 0
  • detection 0
  • isolates 0
  • svtk 0
  • de novo assembly 0
  • call 0
  • 3-letter genome 0
  • idXML 0
  • concatenate 0
  • fragment 0
  • diversity 0
  • merging 0
  • de novo 0
  • arg 0
  • wxs 0
  • csv 0
  • single cell 0
  • visualization 0
  • amps 0
  • ont 0
  • riboseq 0
  • mutect2 0
  • deamination 0
  • structural 0
  • gridss 0
  • compare 0
  • xeniumranger 0
  • CLIP 0
  • hic 0
  • umitools 0
  • matching 0
  • copy number alteration calling 0
  • hybrid capture sequencing 0
  • ngscheckmate 0
  • DNA sequencing 0
  • logratio 0
  • targeted sequencing 0
  • haplotypecaller 0
  • genmod 0
  • ranking 0
  • bgzip 0
  • peak-calling 0
  • microsatellite 0
  • circrna 0
  • read depth 0
  • enrichment 0
  • compress 0
  • pypgx 0
  • STR 0
  • ccs 0
  • SV 0
  • mtDNA 0
  • bigwig 0
  • deep learning 0
  • genome assembler 0
  • transcriptomics 0
  • quantification 0
  • interval_list 0
  • gsea 0
  • redundancy 0
  • miscoding lesions 0
  • palaeogenetics 0
  • archaeogenetics 0
  • family 0
  • hmmcopy 0
  • propr 0
  • bedpe 0
  • ATAC-seq 0
  • ampir 0
  • microarray 0
  • ancestry 0
  • union 0
  • skani 0
  • fai 0
  • chunk 0
  • image 0
  • BGC 0
  • parsing 0
  • clean 0
  • biosynthetic gene cluster 0
  • bcl2fastq 0
  • fungi 0
  • krona 0
  • spark 0
  • pairsam 0
  • benchmarking 0
  • popscle 0
  • survivor 0
  • pan-genome 0
  • covid 0
  • ambient RNA removal 0
  • combine 0
  • genotype-based deconvoltion 0
  • long_read 0
  • regions 0
  • scores 0
  • fingerprint 0
  • amplify 0
  • PCA 0
  • minimap2 0
  • html 0
  • macrel 0
  • ligate 0
  • uLTRA 0
  • bacterial 0
  • transposons 0
  • lineage 0
  • comparisons 0
  • pangolin 0
  • image_analysis 0
  • chimeras 0
  • intervals 0
  • wastewater 0
  • bakta 0
  • fam 0
  • bim 0
  • angsd 0
  • insert 0
  • converter 0
  • host 0
  • PacBio 0
  • SNP 0
  • npz 0
  • bamtools 0
  • mapper 0
  • cfDNA 0
  • variant_calling 0
  • structural_variants 0
  • krona chart 0
  • reports 0
  • notebook 0
  • indel 0
  • virulence 0
  • mcmicro 0
  • roh 0
  • highly_multiplexed_imaging 0
  • population genomics 0
  • tabix 0
  • png 0
  • fastk 0
  • small indels 0
  • panel 0
  • comparison 0
  • fcs-gx 0
  • score 0
  • fusion 0
  • cut up 0
  • observations 0
  • arriba 0
  • gene expression 0
  • cool 0
  • genomes 0
  • zip 0
  • relatedness 0
  • identity 0
  • kinship 0
  • informative sites 0
  • CRISPR 0
  • wig 0
  • prokka 0
  • somatic variants 0
  • chip-seq 0
  • mzml 0
  • RNA-seq 0
  • DRAMP 0
  • repeat expansion 0
  • lossless 0
  • atac-seq 0
  • prokaryotes 0
  • checkv 0
  • sylph 0
  • gatk4spark 0
  • organelle 0
  • archiving 0
  • mkfastq 0
  • genome assembly 0
  • hi-c 0
  • C to T 0
  • das tool 0
  • neubi 0
  • das_tool 0
  • uncompress 0
  • transcripts 0
  • ataqv 0
  • deeparg 0
  • proteome 0
  • spaceranger 0
  • untar 0
  • genome mining 0
  • subsample 0
  • shapeit 0
  • rsem 0
  • unzip 0
  • genetics 0
  • zlib 0
  • microscopy 0
  • gene set analysis 0
  • ampgram 0
  • amptransformer 0
  • gene set 0
  • concordance 0
  • vcflib 0
  • gem 0
  • variation 0
  • taxon name 0
  • library 0
  • variant pruning 0
  • regression 0
  • edit distance 0
  • taxids 0
  • phase 0
  • bfiles 0
  • MaltExtract 0
  • preseq 0
  • genomad 0
  • interactions 0
  • differential expression 0
  • HOPS 0
  • ChIP-seq 0
  • functional analysis 0
  • authentication 0
  • bustools 0
  • metamaps 0
  • RiPP 0
  • megan 0
  • checksum 0
  • RNA-Seq 0
  • NRPS 0
  • tree 0
  • xz 0
  • archive 0
  • graph layout 0
  • cnvnator 0
  • mudskipper 0
  • k-mer frequency 0
  • k-mer index 0
  • antibiotics 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • homoploymer 0
  • antismash 0
  • spatial_transcriptomics 0
  • GC content 0
  • resolve_bioscience 0
  • parallelized 0
  • profiles 0
  • tnhaplotyper2 0
  • assembly evaluation 0
  • transcriptomic 0
  • COBS 0
  • bloom filter 0
  • lift 0
  • read-group 0
  • refine 0
  • ped 0
  • iphop 0
  • instrain 0
  • orf 0
  • SimpleAF 0
  • ichorcna 0
  • salmon 0
  • mapcounter 0
  • hlala_typing 0
  • hla_typing 0
  • hlala 0
  • hla 0
  • xenograft 0
  • haplogroups 0
  • leviosam2 0
  • polyA_tail 0
  • proportionality 0
  • long terminal repeat 0
  • pharokka 0
  • function 0
  • lofreq 0
  • serogroup 0
  • barcode 0
  • secondary metabolites 0
  • retrotransposons 0
  • mitochondrion 0
  • GPU-accelerated 0
  • interactive 0
  • krakenuniq 0
  • registration 0
  • krakentools 0
  • image_processing 0
  • baf 0
  • micro-satellite-scan 0
  • samplesheet 0
  • sizes 0
  • bases 0
  • gwas 0
  • svdb 0
  • validate 0
  • format 0
  • region 0
  • salmonella 0
  • eido 0
  • pharmacogenetics 0
  • deseq2 0
  • rna-seq 0
  • awk 0
  • de novo assembler 0
  • varcal 0
  • soft-clipped clusters 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • taxonomic profile 0
  • fetch 0
  • GEO 0
  • fusions 0
  • intersection 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • standardization 0
  • pigz 0
  • find 0
  • Pharmacogenetics 0
  • BAM 0
  • heatmap 0
  • trgt 0
  • human removal 0
  • single cells 0
  • calling 0
  • genome bins 0
  • nacho 0
  • cnv calling 0
  • CNV 0
  • screening 0
  • cleaning 0
  • cancer genomics 0
  • decontamination 0
  • snpsift 0
  • snpeff 0
  • split_kmers 0
  • effect prediction 0
  • corrupted 0
  • eCLIP 0
  • splice 0
  • parse 0
  • nanostring 0
  • hostile 0
  • small genome 0
  • ancient dna 0
  • doublets 0
  • spatial_omics 0
  • random forest 0
  • metagenomes 0
  • Streptococcus pneumoniae 0
  • signature 0
  • FracMinHash sketch 0
  • structural-variant calling 0
  • anndata 0
  • fasterq-dump 0
  • mRNA 0
  • sra-tools 0
  • settings 0
  • switch 0
  • correction 0
  • shigella 0
  • gene labels 0
  • join 0
  • taxon tables 0
  • windows 0
  • panelofnormals 0
  • msisensor-pro 0
  • deconvolution 0
  • WGS 0
  • evidence 0
  • realignment 0
  • cgMLST 0
  • tbi 0
  • intersect 0
  • contig 0
  • normalize 0
  • scaffold 0
  • repeats 0
  • polish 0
  • microbial 0
  • MCMICRO 0
  • gatk 0
  • joint genotyping 0
  • ome-tif 0
  • orthologs 0
  • runs_of_homozygosity 0
  • scaffolding 0
  • duplicate 0
  • bayesian 0
  • short reads 0
  • reads merging 0
  • merge mate pairs 0
  • allele-specific 0
  • scatter 0
  • interval list 0
  • haplotypes 0
  • rtgtools 0
  • Duplication purging 0
  • allele 0
  • bam2fq 0
  • filtermutectcalls 0
  • Read depth 0
  • norm 0
  • simulation 0
  • files 0
  • propd 0
  • scRNA-Seq 0
  • signatures 0
  • ucsc/liftover 0
  • fracminhash sketch 0
  • Read coverage histogram 0
  • unmarkduplicates 0
  • umicollapse 0
  • transmembrane 0
  • network 0
  • htseq 0
  • downsample 0
  • deduplicate 0
  • wget 0
  • sompy 0
  • SINE 0
  • decoy 0
  • downsample bam 0
  • uniq 0
  • vcfbreakmulti 0
  • subsample bam 0
  • vcf2db 0
  • toml 0
  • gemini 0
  • VCFtools 0
  • snv 0
  • predictions 0
  • upd 0
  • constant 0
  • Escherichia coli 0
  • invariant 0
  • uniparental 0
  • SNPs 0
  • plant 0
  • graph projection to vcf 0
  • tnseq 0
  • disomy 0
  • dbnsfp 0
  • gtftogenepred 0
  • streptococcus 0
  • spatype 0
  • snakemake 0
  • workflow 0
  • decompress 0
  • vcf2bed 0
  • wham 0
  • assembly polishing 0
  • chromosomal rearrangements 0
  • lua 0
  • sliding 0
  • workflow_mode 0
  • whamg 0
  • rdtest 0
  • wavefront 0
  • rdtest2vcf 0
  • mashmap 0
  • createreadcountpanelofnormals 0
  • genome polishing 0
  • hicPCA 0
  • denoisereadcounts 0
  • c to t 0
  • copy number variation 0
  • yahs 0
  • copy number alterations 0
  • geo 0
  • Mycobacterium tuberculosis 0
  • mapad 0
  • adna 0
  • proteus 0
  • eigenvectors 0
  • readproteingroups 0
  • melon 0
  • bedcov 0
  • gender determination 0
  • fast5 0
  • copy number analysis 0
  • polya tail 0
  • copy-number 0
  • copyratios 0
  • readwriter 0
  • boxcox 0
  • variantcalling 0
  • bedtobigbed 0
  • tnscope 0
  • bgen 0
  • scanner 0
  • chloroplast 0
  • genepred 0
  • confidence 0
  • sccmec 0
  • bigbed 0
  • alr 0
  • clr 0
  • refflat 0
  • spa 0
  • groupby 0
  • dnamodelapply 0
  • genotype dosages 0
  • all versus all 0
  • countsvtypes 0
  • baftest 0
  • dnascope 0
  • remove samples 0
  • svtk/baftest 0
  • pangenome-scale 0
  • detecting svs 0
  • long read alignment 0
  • sequencing adapters 0
  • helitron 0
  • short-read sequencing 0
  • bedgraphtobigwig 0
  • maf 0
  • predict 0
  • peak picking 0
  • haplotag 0
  • Illumina 0
  • impute-info 0
  • tags 0
  • tag2tag 0
  • hashing-based deconvolution 0
  • host removal 0
  • java 0
  • script 0
  • haploype 0
  • xml 0
  • svg 0
  • staging 0
  • drug categorization 0
  • impute 0
  • Staging 0
  • reference compression 0
  • reference panel 0
  • junction 0
  • multiqc 0
  • mass_error 0
  • search engine 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • phylogenies 0
  • telseq 0
  • Read report 0
  • paraphase 0
  • cellsnp 0
  • vcflib/vcffixup 0
  • AC/NS/AF 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • SNV 0
  • regulatory network 0
  • transcription factors 0
  • selector 0
  • Read trimming 0
  • sage 0
  • Read filters 0
  • nanoq 0
  • redundant 0
  • extraction 0
  • snippy 0
  • mass spectrometry 0
  • Indel 0
  • orthogroup 0
  • spot 0
  • circular 0
  • realign 0
  • quality check 0
  • genotype-based demultiplexing 0
  • import segmentation 0
  • paired reads re-pairing 0
  • fix 0
  • malformed 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • subsetting 0
  • run 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • nuclear segmentation 0
  • doublet_detection 0
  • cell segmentation 0
  • relabel 0
  • resegment 0
  • morphology 0
  • identity-by-descent 0
  • mgi 0
  • scanpy 0
  • plotting 0
  • regtools 0
  • leafcutter 0
  • recovery 0
  • barcodes 0
  • regex 0
  • bclconvert 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • 16S 0
  • setgt 0
  • jvarkit 0
  • tar 0
  • tarball 0
  • targz 0
  • mzML 0
  • patterns 0
  • emoji 0
  • doublet 0
  • Immune Deconvolution 0
  • Bioinformatics Tools 0
  • Computational Immunology 0
  • catpack 0
  • source tracking 0
  • nucBed 0
  • controlstatistics 0
  • elprep 0
  • prepare 0
  • elfasta 0
  • nucleotide content 0
  • AT content 0
  • donor deconvolution 0
  • lexogen 0
  • vcf file 0
  • parquet 0
  • genotypegvcf 0
  • homozygous genotypes 0
  • heterozygous genotypes 0
  • inbreeding 0
  • plastid 0
  • dereplication 0
  • microbial genomics 0
  • raw 0
  • mgf 0
  • parser 0
  • f coefficient 0
  • dbsnp 0
  • drep 0
  • standardize 0
  • agat 0
  • quarto 0
  • python 0
  • r 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • assay 0
  • phylogenetics 0
  • minimum_evolution 0
  • distance-based 0
  • joint-genotyping 0
  • install 0
  • bam2fastq 0
  • bgen file 0
  • site frequency spectrum 0
  • ancestral alleles 0
  • derived alleles 0
  • plink2_pca 0
  • tnfilter 0
  • array_cgh 0
  • cytosure 0
  • gprofiler2 0
  • gost 0
  • rad 0
  • structural variant 0
  • bam2fastx 0
  • immcantation 0
  • introns 0
  • mutect 0
  • linkage equilibrium 0
  • pruning 0
  • pca 0
  • idx 0
  • deep variant 0
  • airrseq 0
  • immunoinformatics 0
  • longest 0
  • droplet based single cells 0
  • translation 0
  • mygene 0
  • intron 0
  • go 0
  • pile up 0
  • GFF/GTF 0
  • trio binning 0
  • rna velocity 0
  • cobra 0
  • extension 0
  • grea 0
  • functional enrichment 0
  • paired reads merging 0
  • cell_barcodes 0
  • coreutils 0
  • busco 0
  • InterProScan 0
  • MMseqs2 0
  • transposable element 0
  • generic 0
  • gnu 0
  • tandem repeats 0
  • hashing-based deconvoltion 0
  • hamming-distance 0
  • shuffleBed 0
  • long read 0
  • check 0
  • overlap-based merging 0
  • short 0
  • tag 0
  • cell_phenotyping 0
  • isoform 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • variancepartition 0
  • genotype likelihood 0
  • liftover 0
  • dream 0
  • probabilistic realignment 0
  • seqfu 0
  • n50 0
  • cell_type_identification 0
  • machine_learning 0
  • prior knowledge 0
  • nm 0
  • biological activity 0
  • uq 0
  • omics 0
  • structural-variants 0
  • Bayesian 0
  • scimap 0
  • spatial_neighborhoods 0
  • md 0
  • associations 0
  • case/control 0
  • GWAS 0
  • association 0
  • refresh 0
  • clahe 0
  • featuretable 0
  • cumulative coverage 0
  • core 0
  • getpileupsummaries 0
  • short variant discovery 0
  • combinegvcfs 0
  • collectsvevidence 0
  • collectreadcounts 0
  • cnnscorevariants 0
  • calibratedragstrmodel 0
  • cross-samplecontamination 0
  • calculatecontamination 0
  • bedtointervallist 0
  • asereadcounter 0
  • vqsr 0
  • variant quality score recalibration 0
  • annotateintervals 0
  • condensedepthevidence 0
  • heattree 0
  • gatherbqsrreports 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • panelofnormalscreation 0
  • jointgenotyping 0
  • genomicsdbimport 0
  • genomicsdb 0
  • tranche filtering 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • determinegermlinecontigploidy 0
  • createsomaticpanelofnormals 0
  • targets 0
  • gangstr 0
  • getpileupsumaries 0
  • public 0
  • ENA 0
  • SRA 0
  • ANI 0
  • faqcs 0
  • str 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • duplexumi 0
  • unmapped 0
  • gene-calling 0
  • variant caller 0
  • gamma 0
  • UShER 0
  • bootstrapping 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • rust 0
  • ubam 0
  • fq 0
  • lint 0
  • single molecule 0
  • zipperbams 0
  • germlinevariantsites 0
  • readcountssummary 0
  • Haplotypes 0
  • tama_collapse.py 0
  • genomes on a tree 0
  • merge compare 0
  • GNU 0
  • joint-variant-calling 0
  • Imputation 0
  • Sample 0
  • TAMA 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • gene model 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • extractvariants 0
  • hbd 0
  • ibd 0
  • rgi 0
  • fARGene 0
  • amrfinderplus 0
  • abricate 0
  • extract_variants 0
  • gvcftools 0
  • gunzip 0
  • gunc 0
  • archaea 0
  • genome taxonomy database 0
  • GTDB taxonomy 0
  • Mykrobe 0
  • repeat content 0
  • indexfeaturefile 0
  • preprocessintervals 0
  • shiftchain 0
  • selectvariants 0
  • revert 0
  • reblockgvcf 0
  • printsvevidence 0
  • printreads 0
  • postprocessgermlinecnvcalls 0
  • shiftintervals 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • shiftfasta 0
  • site depth 0
  • genome heterozygosity 0
  • txt 0
  • genome size 0
  • models 0
  • compound 0
  • genome profile 0
  • bgc 0
  • file parsing 0
  • gawk 0
  • splitcram 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • svcluster 0
  • splitintervals 0
  • split by chromosome 0
  • mitochondrial 0
  • illumiation_correction 0
  • BCF 0
  • csi 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • background_correction 0
  • element 0
  • biallelic 0
  • trimBam 0
  • bamUtil 0
  • bamtools/split 0
  • yaml 0
  • bamtools/convert 0
  • mouse 0
  • update header 0
  • homozygosity 0
  • virulent 0
  • chunking 0
  • subtract 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • jaccard 0
  • autozygosity 0
  • overlap 0
  • closest 0
  • bamtobed 0
  • sorting 0
  • bacphlip 0
  • temperate 0
  • bioawk 0
  • amp 0
  • allele counts 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • Staphylococcus aureus 0
  • installation 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • doCounts 0
  • HLA 0
  • lifestyle 0
  • read group 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • bias 0
  • utility 0
  • ATLAS 0
  • sequencing_bias 0
  • post mortem damage 0
  • atlas 0
  • mkarv 0
  • http(s) 0
  • unionBedGraphs 0
  • file manipulation 0
  • deletion 0
  • TMA dearray 0
  • gct 0
  • cls 0
  • na 0
  • custom 0
  • Cores 0
  • Segmentation 0
  • UNet 0
  • pcr duplicates 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • cload 0
  • cutesv 0
  • paired-end 0
  • subcontigs 0
  • escherichia coli 0
  • circos 0
  • eklipse 0
  • eigenstratdatabasetools 0
  • pep 0
  • schema 0
  • PEP 0
  • depth information 0
  • track 0
  • structural variation 0
  • duphold 0
  • scatterplot 0
  • corrrelation 0
  • cooler/balance 0
  • nucleotide composition 0
  • sorted 0
  • compartments 0
  • multiomics 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • domains 0
  • topology 0
  • antibody capture 0
  • calder2 0
  • cadd 0
  • subtyping 0
  • Salmonella enterica 0
  • antigen capture 0
  • crispr 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • antitarget 0
  • qa 0
  • chromosome_visualization 0
  • duplicate removal 0
  • chromap 0
  • quality assurnce 0
  • beagle 0
  • Haemophilus influenzae 0
  • sniffles 0
  • gene finding 0
  • panel of normals 0
  • normal database 0
  • genomic intervals 0
  • intervals coverage 0
  • contact maps 0
  • bmp 0
  • jpg 0
  • pretext 0
  • contact 0
  • pmdtools 0
  • scoring 0
  • rhocall 0
  • R 0
  • long uncorrected reads 0
  • Haplotype purging 0
  • subsampling 0
  • quast 0
  • purging 0
  • Assembly curation 0
  • False duplications 0
  • variant genetic 0
  • identifiers 0
  • hybrid-selection 0
  • sortvcf 0
  • picard/renamesampleinvcf 0
  • pcr 0
  • liftovervcf 0
  • mate-pair 0
  • phylogenetic composition 0
  • illumina datasets 0
  • identification 0
  • prophage 0
  • phantom peaks 0
  • ChIP-Seq 0
  • motif 0
  • CoPRO 0
  • whole genome association 0
  • GRO-seq 0
  • recode 0
  • indep pairwise 0
  • indep 0
  • variant identifiers 0
  • exclude 0
  • genetic 0
  • PRO-seq 0
  • GRO-cap 0
  • STRIPE-seq 0
  • csRNA-seq 0
  • RAMPAGE 0
  • NETCAGE 0
  • CAGE 0
  • PRO-cap 0
  • read 0
  • sex determination 0
  • bam2seqz 0
  • relative coverage 0
  • sertotype 0
  • genetic sex 0
  • rare variants 0
  • density 0
  • POA 0
  • SMN2 0
  • SMN1 0
  • CRAM 0
  • sliding window 0
  • features 0
  • boxplot 0
  • error 0
  • exploratory 0
  • shinyngs 0
  • 256 bit 0
  • sha256 0
  • variant recalibration 0
  • LCA 0
  • sambamba 0
  • flagstat 0
  • multimapper 0
  • Ancestor 0
  • salsa2 0
  • salsa 0
  • rtg-tools 0
  • rocplot 0
  • pedfilter 0
  • VQSR 0
  • applyvarcal 0
  • assembly-binning 0
  • clusteridentifier 0
  • cluster analysis 0
  • scramble 0
  • pedigrees 0
  • pair-end 0
  • haplotype resolution 0
  • legionella 0
  • lofreq/filter 0
  • lofreq/call 0
  • Listeria monocytogenes 0
  • limma 0
  • pneumophila 0
  • clinical 0
  • collapsing 0
  • AMP 0
  • adapter removal 0
  • combining 0
  • kofamscan 0
  • qualities 0
  • peptide prediction 0
  • pneumoniae 0
  • metagenome-assembled genomes 0
  • maxbin2 0
  • damage patterns 0
  • functional genomics 0
  • NGS 0
  • DNA damage 0
  • rra 0
  • maximum-likelihood 0
  • CRISPR-Cas9 0
  • sgRNA 0
  • kegg 0
  • Klebsiella 0
  • pos 0
  • js 0
  • igv.js 0
  • igv 0
  • IDR 0
  • panel_of_normals 0
  • haemophilus 0
  • annotations 0
  • multicut 0
  • hmtnote 0
  • readcounter 0
  • gccounter 0
  • genome browser 0
  • pixel classification 0
  • Jupyter 0
  • k-mer counting 0
  • digital normalization 0
  • papermill 0
  • jupytext 0
  • Python 0
  • pixel_classification 0
  • jasmine 0
  • jasminesv 0
  • insertion 0
  • genomic islands 0
  • interproscan 0
  • probability_maps 0
  • mass-spectroscopy 0
  • MD5 0
  • pbp 0
  • squeeze 0
  • graph viz 0
  • graph formats 0
  • graph unchopping 0
  • graph stats 0
  • combine graphs 0
  • odgi 0
  • graph drawing 0
  • hla-typing 0
  • graph construction 0
  • gender 0
  • Neisseria gonorrhoeae 0
  • ngm 0
  • NextGenMap 0
  • tumor/normal 0
  • ILP 0
  • restriction fragments 0
  • subreads 0
  • pbmerge 0
  • pbbam 0
  • graphs 0
  • paragraph 0
  • select 0
  • pairstools 0
  • HLA-I 0
  • pairtools 0
  • ligation junctions 0
  • upper-triangular matrix 0
  • flip 0
  • PCR/optical duplicates 0
  • block-compressed 0
  • 128 bit 0
  • contour map 0
  • methylation bias 0
  • metaphlan 0
  • ploidy 0
  • smudgeplot 0
  • Merqury 0
  • 3D heat map 0
  • assembler 0
  • Neisseria meningitidis 0
  • rma6 0
  • daa 0
  • debruijn 0
  • denovo 0
  • megahit 0
  • mbias 0
  • de Bruijn 0
  • contaminant 0
  • mtnucratio 0
  • SNP table 0
  • GATK UnifiedGenotyper 0
  • Beautiful stand-alone HTML report 0
  • bioinformatics tools 0
  • mitochondrial to nuclear ratio 0
  • ratio 0
  • scan 0
  • microrna 0
  • microsatellite instability 0
  • otu table 0
  • mosdepth 0
  • reference genome 0
  • mitochondrial genome 0
  • target prediction 0

This script extracts sequences in fasta format according to features described in a gff file.

0100

fasta versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Identify antimicrobial resistance in gene or protein sequences

010

report mutation_report versions tool_version db_version

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Identify antimicrobial resistance in gene or protein sequences

NO input

db versions

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Aggregates fastq files with demultiplexed reads

01

fastq versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

This module is used to clip primer sequences from your alignments.

0123

bam bai versions

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

gff versions

Filter out sequences by sequence header name(s)

01000

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Compresses VCF files

01234

fasta versions

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

bcftools Haplotype-aware consequence caller

01010101

vcf tbi csi versions

reheader:

Haplotype-aware consequence caller

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

012000

genomecov versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

extract sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

masks sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

010101

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Retrieve entries from a BLAST database

01201

fasta text versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

txt versions

blast:

BLAST finds regions of similarity between biological sequences.

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

01010

xml tsv csv versions

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

Builds a BLAST database

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

txt versions

blast:

Protein to Translated Nucleotide BLAST.

Downloads a BLAST database from NCBI

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a sequence subject

0101

psl versions

Align reads to a reference genome using bowtie

01010

bam log fastq versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create bowtie index for reference genome

01

index versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

01010100

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

01

index versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

010

reports txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

db bracken_files versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Find SA coordinates of the input reads for bwa short-read mapping

0101

sai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA index for reference genome

01

index versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

bam cram csi crai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-mem2 index for reference genome

01

index versions

bwamem2:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

sam bam cram crai csi versions

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101010101

orf2lca bin2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101

orf2lca contig2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101001010101010101

rat_log complete_abundance contig_abundance read2classification alignment_diamond contig2classification cat_log orf2lca faa gff unmapped_diamond unmapped_fasta unmapped2classification versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Cluster protein sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Cluster nucleotide sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.

0000

reference versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.

010

outs versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Build centrifuge database for taxonomic profiling

010000

cf versions

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

01000

report results sam fastq_mapped fastq_unmapped versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

010

kreport versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

binning of metagenomic sequences

01

fasta bins fm index links result versions

Predict recomination events in bacterial genomes

012

emsim em status newick fasta pos_ref versions

Align sequences using Clustal Omega

010100000

alignment versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

Renders a guidetree in clustalo

01

tree versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

Calculates polymorphic site rates over protein coding genes

01234

polymut versions

cmseq:

Set of utilities on sequences and BAM files

Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.

0101

bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Calculates peak-to-through ratio (PTR) from metagenomic sequence data

01

ptr versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Annotate a VEP annotated VCF with the most severe consequence field

0101

vcf versions

custom:

Custom module to annotate a VEP annotated VCF with the most severe consequence field

Perform adapter/quality trimming on sequencing reads

01

reads log versions

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

Queries a DIAMOND database using blastp mode

010100

blast xml txt daa sam tsv paf versions

diamond:

Accelerated BLAST compatible local sequence aligner

Queries a DIAMOND database using blastx mode

010100

blast xml txt daa sam tsv paf log versions

diamond:

Accelerated BLAST compatible local sequence aligner

calculate clusters of highly similar sequences

01

tsv versions

diamond:

Accelerated BLAST compatible local sequence aligner

Builds a DIAMOND database

01000

db versions

diamond:

Accelerated BLAST compatible local sequence aligner

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.

01

consensus versions

emboss:

The European Molecular Biology Open Software Suite

the revseq program from emboss reverse complements a nucleotide sequence

01

revseq versions

emboss:

The European Molecular Biology Open Software Suite

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

010

outseq versions

emboss:

The European Molecular Biology Open Software Suite

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tbi tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Searches a term in a public NCBI database

010

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using Unique Identifier(s)

0120

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using an UID

01000

txt versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

phylogenetic placement of query sequences in a reference tree

012300

epang jplace log versions

epang:

Massively parallel phylogenetic placement of genetic sequences

splits an alignment into reference and query parts

012

query reference versions

epang:

Massively parallel phylogenetic placement of genetic sequences

Run falco on sequenced reads

01

html txt versions

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

Aligns sequences using FAMSA

01010

alignment versions

famsa:

Algorithm for large-scale multiple sequence alignments

Renders a guidetree in famsa

01

tree versions

famsa:

Algorithm for large-scale multiple sequence alignments

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

A program that counts sequence occurrences in FASTQ files.

0101

count_matrix stats distribution_plot reads_plot reads_plot_percentage versions

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

Run FastQC on sequenced reads

01

html zip versions

Build fastq screen config file from bowtie index files

00

database versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Align reads to multiple reference genomes using fastq-screen

010

txt png html fastq versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Produces a Newick format phylogeny from a multiple sequence alignment. Capable of bacterial genome size alignments.

0

phylogeny versions

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

01

fasta versions

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

01

bam cram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ€™ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

010

bam histogram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

0

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Build ganon database using custom reference sequences.

01000

db info versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

010

tre report one all unc log versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

010

tre versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

01

txt versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

assigns taxonomy to query sequences in phylogenetic placement output

012

examineassign profile labelled_tree per_query krona sativa versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Grafts query sequences from phylogenetic placement on the reference tree

01

newick versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

01

dict versions

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts GFA or rGFA files to FASTA

01

fasta versions

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Summary statistics for GFA files

01

stats versions

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0100

sam versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0

index versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

01

fasta report tails versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Reformat a Multiple Sequence Alignment (MSA) file

0100

msa versions

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

Mask multiple sequence alignments

012345670

maskedaln fmask_rf fmask_all gmask_rf gmask_all pmask_rf pmask_all versions

hmmer:

Biosequence analysis using profile hidden Markov models

reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.

01

seqreformated versions

hmmer:

Biosequence analysis using profile hidden Markov models

hmmalign from the HMMER suite aligns a number of sequences to an HMM profile

010

sto versions

hmmer:

Biosequence analysis using profile hidden Markov models

create an hmm profile from a multiple sequence alignment

010

hmm hmmbuildout versions

hmmer:

Biosequence analysis using profile hidden Markov models

extract hmm from hmm database file or create index for hmm database

01000

hmm index versions

hmmer:

Biosequence analysis using profile hidden Markov models

compress and index profile database for hmmscan

01

compressed_db versions

hmmer:

Biosequence analysis using profile hidden Markov models

R script that scores output from multiple runs of hmmer/hmmsearch

01

hmmrank versions

hmmer:

Biosequence analysis using profile hidden Markov models

R:

A Language and Environment for Statistical Computing

Tidyverse:

Tidyverse: R packages for data science

search profile(s) against a sequence database

012345

output alignments target_summary domain_summary versions

hmmer:

Biosequence analysis using profile hidden Markov models

Create a tag directory with the HOMER suite

010

tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Search covariance models against a sequence database

01200

output alignments target_summary versions

infernal:

Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

012000000000000

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

IsoSeq - Cluster - Cluster trimmed consensus sequences

01

bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi versions

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Generate a consensus sequence from a BAM file using iVar

0100

fasta qual mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

0120

bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Taxonomic classification of metagenomic sequence data using a protein reference database

010

results versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Aligns sequences using kalign

010

alignment versions

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

Create kallisto index

01

index versions

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

01010000

results json_info log versions

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Creates a histogram of the number of distinct k-mers having a given frequency.

01

hist json png ps pdf jellyfish_hash versions

kat:

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

00

report kmers versions

khmer:

khmer k-mer counting library

Generate k-mers (sketches) from FASTA/Q sequences

01

outdir info versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

01

kmcp log versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

01

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

010

profile versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

010

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Adds fasta files to a Kraken2 taxonomic database

010000

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

010

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Downloads and builds Kraken2 standard database

0

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Classifies metagenomic sequence data

01000

classified_reads_fastq unclassified_reads_fastq classified_reads_assignment report versions

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Classifies metagenomic sequence data using unique k-mer counts

012000000

classified_reads unclassified_reads classified_assignment report versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Makes a dotplot (Oxford Grid) of pair-wise sequence alignments

0120100

gif png versions

last:

LAST finds & aligns related regions of sequences.

Aligns query sequences to target sequences indexed with lastdb

0120

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Prepare sequences for subsequent alignment with lastal.

01

index versions

last:

LAST finds & aligns related regions of sequences.

Converts MAF alignments in another format.

012010101

axt_gz bam blast_gz blasttab_gz chain_gz cram gff_gz html_gz psl_gz sam_gz tab_gz versions

last:

LAST finds & aligns related regions of sequences.

Reorder alignments in a MAF file

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Post-alignment masking

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Find split or spliced alignments in a MAF file

01

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Find suitable score parameters for sequence alignment

010

param_file multiqc versions

last:

LAST finds & aligns related regions of sequences.

Align sequences using learnMSA

01

alignment versions

learnmsa:

learnMSA: Learning and Aligning large Protein Families

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

01

scn gff versions

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments

01000

log lai_out versions

lai:

Assessing genome assembly quality using the LTR Assembly Index (LAI)

Multiple sequence alignment using MAFFT

0101010101010

fas versions

pigz:

Parallel implementation of the gzip algorithm.

Multiple sequence alignment using MAFFT

0101010101010

fas versions

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

pigz:

Parallel implementation of the gzip algorithm.

Guide tree rendering using MAFFT

01

tree versions

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

Multiple Sequence Alignment using Graph Clustering

01010

alignment versions

magus:

Multiple Sequence Alignment using Graph Clustering

Multiple Sequence Alignment using Graph Clustering

01

tree versions

magus:

Multiple Sequence Alignment using Graph Clustering

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

0000

index versions log

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

010

rma6 alignments log versions

malt:

A tool for mapping metagenomic data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Calculate Mash distances between reference and query sequences

010

dist versions

mash:

Fast sequence distance estimator that uses MinHash

Screens query sequences against large sequence databases

0101

screen versions

mash:

Fast sequence distance estimator that uses MinHash

Creates vastly reduced representations of sequences using MinHash

01

mash stats versions

mash:

Fast sequence distance estimator that uses MinHash

Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation

01

tsv fa versions

mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences

01

fasta versions

A tool to create consensus sequences and variant calls from nanopore sequencing data

012

assembly versions

A genomic k-mer counter (and sequence utility) with nice features.

010

meryl_db versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

010

hist versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

010

meryl_db versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

01010000

paf bam index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Provides fasta index required by minimap2 alignment.

01

index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0101

paf gff versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

Provides fasta index required by miniprot alignment.

01

index versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.

012010123

outputs versions

mirdeep2:

miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs. It includes three main scripts: miRDeep2.pl, mapper.pl, and quantifier.pl for comprehensive miRNA detection and quantification.

mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.

0101012

tsv versions

mirtop:

Small RNA-seq annotation

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

Cluster sequences using MMSeqs2 cluster.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create an MMseqs database from an existing FASTA/Q file

01

db versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Creates sequence index for mmseqs database

01

db_indexed versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create a tsv file from a query and a target database as well as the result database

010101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Download an mmseqs-formatted database

0

database versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a database using MMseqs2

0101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences in linear time using MMSeqs2 linclust.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Search and calculate a score for similar sequences in a query and a target database.

0101

db_search versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Computes the lowest common ancestor by identifying the query sequence homologs against the target database.

010

db_taxonomy versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Conversion of expandable profile to databases to the MMseqs2 databases format

0

db_exprofile versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Aligns protein structures using mTM-align

010

alignment structure versions

mTM-align:

Algorithm for structural multiple sequence alignments

pigz:

Parallel implementation of the gzip algorithm.

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

01

aligned_fasta phyi phys clustalw html msf tree log versions

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

010

alignment versions

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

01

html png txt log versions

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

0101

vecscreen_output versions

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

00

dataset versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

010

csv csv_errors csv_insertions tsv json json_auspice ndjson fasta_aligned fasta_translation nwk versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

012

delta coords versions

VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

0101010

bam bai qc_metrics bqsr_table duplicate_metrics versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Paraclu finds clusters in data attached to sequences.

010

bed versions

Creates a sequence dictionary for a reference sequence.

01

reference_dict versions

picard:

Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

012000

bp cem del dd int_final inv li rp si td versions

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Identify plasmids in bacterial sequences and assemblies

01

json txt tsv genome_seq plasmid_seq versions

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

01

reads log versions

Calculate pairwise nucleotide identity with respect to a reference sequence

01010

valid_fasta invalid_fasta report log versions

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

good_reads single_reads bad_reads log versions

frame-shift correction for long read (meta)genomics - fix frameshifts in reads

0101

out_fa versions

proovframe:

frame-shift correction for long read (meta)genomics

frame-shift correction for long read (meta)genomics - maps proteins to reads

012

tsv versions

proovframe:

frame-shift correction for long read (meta)genomics

Proteinortho is a tool to detect orthologous genes within different species.

01

orthologgroups orthologgraph blastgraph versions

Calculate coverage cutoffs to determine when to purge duplicated sequence.

01

cutoff log versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Separates out sequences purged of falsely duplicated sequences.

012

haplotigs purged versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

012345678910111213141501

vcf tbi rdata plots versions

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.

0

stockholm_alignment phylogeny versions

Screening DNA sequences for interspersed repeats and low complexity DNA sequences

010

masked out tbl gff versions

repeatmasker:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

01200

json disinfinder_kma pheno_table_species pheno_table pointfinder_kma pointfinder_prediction pointfinder_results pointfinder_table resfinder_hit_in_genome_seq resfinder_blast resfinder_kma resfinder_resistance_gene_seq resfinder_results_table resfinder_results_tab resfinder_results versions

resfinder:

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

Predict antibiotic resistance from protein or nucleotide data

0100

json tsv tmp tool_version db_version versions

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Generate statistics from a bam file

01

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Infer strandedness from sequencing reads

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate inner distance between read pairs.

010

distance freq mean pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

010

xls rscript log bed interact_bed pdf events_pdf versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

010

pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate read duplication rate

01

seq_xls pos_xls pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate TIN (transcript integrity number) from RNA-seq reads

0120

txt xls versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

sdf versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

Clips read alignments where they match BED file defined regions

01000

bam stats rejects_bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

calculates MD and NM tags

0101

bam versions

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Concatenate BAM or CRAM file

01

bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces a consensus FASTA/FASTQ/PILEUP

01

fasta fastq pileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

convert and then index CRAM -> BAM or BAM -> CRAM file

0120101

bam cram bai crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

produces a histogram or table of coverage per chromosome

0120101

coverage versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

List CRAM Content-ID and Data-Series sizes

01

size versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Create a sequence dictionary file from a FASTA file

01

dict versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index FASTA file, and optionally generate a file of chromosome sizes

01010

fa fai sizes gzi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Converts a SAM/BAM/CRAM file to FASTQ

010

fastq interleaved singleton other versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

bam cram sam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

012

flagstat versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

01

readgroup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Reports alignment summary statistics for a BAM/CRAM/SAM file

012

idxstats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

converts FASTQ files to unmapped SAM/BAM/CRAM

01

sam bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index SAM/BAM/CRAM file

01

bai csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Merge BAM or CRAM file

010101

bam cram csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

BAM

0120

mpileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAMโ†’SAMโ†’BAM conversion.

01

bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

0101

bam cram csi crai metrics versions

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Sort SAM/BAM/CRAM file

0101

bam cram crai csi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces comprehensive statistics from SAM/BAM/CRAM file

01201

stats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

0120100

bam cram sam bai csi crai unselected unselected_index versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Call peaks using SEACR on sequenced reads in bedgraph format

0120

bed versions

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.

01

fastq versions

seqcluster:

Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.

Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.

01

fasta versions

seqfu:

DNA sequence utilities for FASTX files

Concatenating multiple uncompressed sequence files together

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTA/Q to tabular format, and provide various information, like sequence length, GC content/GC skew.

01

text versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Select sequences from a large file based on name/ID

010

filter versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Subset FASTA/FASTQ files to some number of sequences

012

subset versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Use seqkit to find/replace strings within sequences and sequence headers

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx log versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Sorts sequences by id/name/sequence/length

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Translate DNA/RNA to protein sequence

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Computes sequence statistics from FASTQ or FASTA files

01

seqtk_stats versions

Generates a BED file containing genomic locations of lengths of N.

01

bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Rename sequence names in FASTQ or FASTA files.

01

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.

Subsample reads from FASTQ files

012

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Common transformation operations on FASTA or FASTQ files.

01

fastx versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

010

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Trim low quality bases from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Sequence quality metrics for FASTQ and uBAM files.

01

json html versions

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

0100

eigenstrat plink freqsum versions

sequencetools:

Tools for population genetics on sequencing data

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

wig versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Induce a variation graph in GFA format from alignments in PAF format

012

gfa versions

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

01

assembly gfa results versions

Pairwise SNP distance matrix from a FASTA sequence alignment

01

tsv versions

Local sequence alignment tool for filtering, mapping and clustering.

010101

reads log index versions

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Create a signature (a group of FracMinHash sketches) of a sequence using sourmash

01

signatures versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).

0100

reads versions

sratools:

SRA Toolkit and SDK from NCBI

Download sequencing data from the NCBI Sequence Read Archive (SRA).

0100

sra versions

sratools:

SRA Toolkit and SDK from NCBI

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

012

gene_results fullgene_results mlst_results pileup sorted_bam versions

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Advanced sequence file format conversions

01000

cram gzi versions

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Align reads to a reference genome using STAR

010101000

log_final log_out log_progress versions bam bam_sorted bam_sorted_aligned bam_transcript bam_unsorted fastq tab spl_junc_tab read_per_gene_tab junction sam wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create index for STAR

0101

index versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Get the minimal allowed index version from STAR

NO input

index_version versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Aligns sequences using T_COFFEE

01010120

alignment lib versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Compares 2 alternative MSAs to evaluate them.

012

scores versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Computes a consensus alignment using T_COFFEE

01010

alignment eval versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats the header of PDB files with t-coffee

01

formatted_pdb versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Computes the irmsd score for a given alignment and the structures.

01012

irmsd versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package

01010120

alignment versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

01

formatted_file versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

0101

tcs scores versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

01

explore_tsv top_sequence versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

010101010

fasta vcf ccf log versions

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

A post sequencing QC tool for Oxford Nanopore sequencers

01

report_data report_html plots_html plotly_js versions

TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.

01

pep gff3 cds dat folder versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

010

pep gff3 cds bed versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Detection of tRNA sequences using covariance models

01

tsv log stats fasta gff bed versions

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โ€œdot plotโ€ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

0

rnalfold_txt versions

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โ€œscanโ€ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

bins membership summary versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

decomposes multiallelic variants into biallelic in a VCF file.

012

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

Decomposes biallelic block substitutions into its constituent SNPs.

0123

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

vcf fai versions

vt:

A tool set for short variant discovery in genetic sequence data

simulating sequence reads from a reference genome

01

fastq versions

Masks out highly repetitive DNA sequences with low complexity in a genome

01

converted versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to generate frequency counts of repetitive units.

01

counts versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to take a counts file and creates a file of genomic co-ordinates to be masked.

0101

intervals versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

a tool to build k-mer hash table for fasta and fastq files

01

yak versions

yak:

Yet another k-mer analyzer

Click here to trigger an update.