Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Identify antimicrobial resistance in gene or protein sequences
0
1
0
report
mutation_report
versions
tool_version
db_version
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Identify antimicrobial resistance in gene or protein sequences
NO input
db
versions
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Aggregates fastq files with demultiplexed reads
0
1
fastq
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
0
1
0
1
2
0
1
2
results
bam
bai
bam_trimmed
bai_trimmed
bam_primertrimmed
bai_primertrimmed
fasta
vcf
tbi
json
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
This module is used to clip primer sequences from your alignments.
0
1
2
3
bam
bai
versions
Filter out sequences by sequence header name(s)
0
1
0
0
0
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Compresses VCF files
0
1
2
3
4
fasta
versions
Create consensus sequence by applying VCF variants to a reference fasta file.
bcftools Haplotype-aware consequence caller
0
1
0
1
0
1
0
1
vcf
tbi
csi
versions
Haplotype-aware consequence caller
Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.
0
1
2
0
0
0
genomecov
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
extract sequences in a FASTA file based on intervals defined in a feature file.
0
1
0
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
masks sequences in a FASTA file based on intervals defined in a feature file.
0
1
0
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
0
1
0
1
0
1
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Retrieve entries from a BLAST database
0
1
2
0
1
fasta
text
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
txt
versions
BLAST finds regions of similarity between biological sequences.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
0
1
0
1
0
xml
tsv
csv
versions
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
Builds a BLAST database
0
1
db
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
txt
versions
Protein to Translated Nucleotide BLAST.
Downloads a BLAST database from NCBI
0
1
db
versions
BLAST finds regions of similarity between biological sequences.
Create bowtie index for reference genome
0
1
index
versions
bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.
0
1
0
reports
txt
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Extends a Kraken2 database to be compatible with Bracken
0
1
db
bracken_files
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Combine output of metagenomic samples analyzed by bracken.
0
1
txt
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Create BWA-mem2 index for reference genome
0
1
index
versions
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
txt
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
0
1
0
1
0
1
orf2lca
bin2classification
log
diamond
faa
gff
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
orf2lca
contig2classification
log
diamond
faa
gff
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
rat_log
complete_abundance
contig_abundance
read2classification
alignment_diamond
contig2classification
cat_log
orf2lca
faa
gff
unmapped_diamond
unmapped_fasta
unmapped2classification
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Cluster protein sequences using sequence similarity
0
1
fasta
clusters
versions
Clusters and compares protein or nucleotide sequences
Cluster nucleotide sequences using sequence similarity
0
1
fasta
clusters
versions
Clusters and compares protein or nucleotide sequences
Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.
0
0
0
0
reference
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.
0
1
0
outs
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Build centrifuge database for taxonomic profiling
0
1
0
0
0
0
cf
versions
Classifier for metagenomic sequences
Classifies metagenomic sequence data
0
1
0
0
0
report
results
sam
fastq_mapped
fastq_unmapped
versions
Centrifuge is a classifier for metagenomic sequences.
Creates Kraken-style reports from centrifuge out files
0
1
0
kreport
versions
Centrifuge is a classifier for metagenomic sequences.
Predict recomination events in bacterial genomes
0
1
2
emsim
em
status
newick
fasta
pos_ref
versions
Align sequences using Clustal Omega
0
1
0
1
0
0
0
0
0
alignment
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Parallel implementation of the gzip algorithm.
Renders a guidetree in clustalo
0
1
tree
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Calculates polymorphic site rates over protein coding genes
0
1
2
3
4
polymut
versions
Set of utilities on sequences and BAM files
Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.
0
1
0
1
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Calculates peak-to-through ratio (PTR) from metagenomic sequence data
0
1
ptr
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Annotate a VEP annotated VCF with the most severe consequence field
0
1
0
1
vcf
versions
Custom module to annotate a VEP annotated VCF with the most severe consequence field
Perform adapter/quality trimming on sequencing reads
0
1
reads
log
versions
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Queries a DIAMOND database using blastp mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
versions
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
log
versions
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
0
1
tsv
versions
Accelerated BLAST compatible local sequence aligner
Builds a DIAMOND database
0
1
0
0
0
db
versions
Accelerated BLAST compatible local sequence aligner
Export assembly segment sequences in GFA 1.0 format to FASTA format
0
1
fasta
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.
0
1
consensus
versions
The European Molecular Biology Open Software Suite
the revseq program from emboss reverse complements a nucleotide sequence
0
1
revseq
versions
The European Molecular Biology Open Software Suite
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args
.
0
1
2
3
cache
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
0
1
0
output
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args
.
0
1
2
0
0
0
0
0
1
0
vcf
tbi
tab
json
report
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Searches a term in a public NCBI database
0
1
0
xml
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using Unique Identifier(s)
0
1
2
0
xml
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using an UID
0
1
0
0
0
txt
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
phylogenetic placement of query sequences in a reference tree
0
1
2
3
0
0
epang
jplace
log
versions
Massively parallel phylogenetic placement of genetic sequences
splits an alignment into reference and query parts
0
1
2
query
reference
versions
Massively parallel phylogenetic placement of genetic sequences
Run falco on sequenced reads
0
1
html
txt
versions
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Aligns sequences using FAMSA
0
1
0
1
0
alignment
versions
Algorithm for large-scale multiple sequence alignments
Renders a guidetree in famsa
0
1
tree
versions
Algorithm for large-scale multiple sequence alignments
tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.
0
1
0
log
txt
hmm
hmm_genes
orfs
orfs_amino
contigs
contigs_pept
filtered
filtered_pept
fragments
trimmed
spades
metagenome
tmp
versions
A program that counts sequence occurrences in FASTQ files.
0
1
0
1
count_matrix
stats
distribution_plot
reads_plot
reads_plot_percentage
versions
2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.
Build fastq screen config file from bowtie index files
0
0
database
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Align reads to multiple reference genomes using fastq-screen
0
1
0
txt
png
html
fastq
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
0
1
fasta
versions
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
0
1
0
0
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
0
1
0
0
bam
versions
Tools for working with genomic and high throughput sequencing data.
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
0
1
bam
cram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
0
1
0
bam
histogram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
0
fastq
versions
fq is a library to generate and validate FASTQ file pairs.
Build ganon database using custom reference sequences.
0
1
0
0
0
db
info
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Classify FASTQ files against ganon database
0
1
0
tre
report
one
all
unc
log
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a ganon report file from the output of ganon classify
0
1
0
tre
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a multi-sample report file from the output of ganon report runs
0
1
txt
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
assigns taxonomy to query sequences in phylogenetic placement output
0
1
2
examineassign
profile
labelled_tree
per_query
krona
sativa
versions
Genesis Applications for Phylogenetic Placement Analysis
Grafts query sequences from phylogenetic placement on the reference tree
0
1
newick
versions
Genesis Applications for Phylogenetic Placement Analysis
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
0
0
0
str_table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
0
1
dict
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0
1
2
3
0
0
0
annotated_vcf
index
versions
Genome Analysis Toolkit (GATK4)
Converts GFA or rGFA files to FASTA
0
1
fasta
versions
Tools for manipulating sequence graphs in the GFA and rGFA formats
Summary statistics for GFA files
0
1
stats
versions
Tools for manipulating sequence graphs in the GFA and rGFA formats
A versatile pairwise aligner for genomic and spliced nucleotide sequences
0
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
0
1
fasta
report
tails
versions
Gene-Switch Transcriptome Annotation by Modular Algorithms
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
0
fasta
gff
vcf
stats
phylip
embl_predicted
embl_branch
tree
tree_labelled
versions
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
summary_csv
roc_all_csv
roc_indel_locations_csv
roc_indel_locations_pass_csv
roc_snp_locations_csv
roc_snp_locations_pass_csv
extended_csv
runinfo
metrics_json
vcf
tbi
versions
Haplotype VCF comparison tools
Reformat a Multiple Sequence Alignment (MSA) file
0
1
0
0
msa
versions
HH-suite3 for fast remote homology detection and deep protein annotation
Mask multiple sequence alignments
0
1
2
3
4
5
6
7
0
maskedaln
fmask_rf
fmask_all
gmask_rf
gmask_all
pmask_rf
pmask_all
versions
Biosequence analysis using profile hidden Markov models
reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.
0
1
seqreformated
versions
Biosequence analysis using profile hidden Markov models
hmmalign from the HMMER suite aligns a number of sequences to an HMM profile
0
1
0
sto
versions
Biosequence analysis using profile hidden Markov models
create an hmm profile from a multiple sequence alignment
0
1
0
hmm
hmmbuildout
versions
Biosequence analysis using profile hidden Markov models
extract hmm from hmm database file or create index for hmm database
0
1
0
0
0
hmm
index
versions
Biosequence analysis using profile hidden Markov models
compress and index profile database for hmmscan
0
1
compressed_db
versions
Biosequence analysis using profile hidden Markov models
R script that scores output from multiple runs of hmmer/hmmsearch
0
1
hmmrank
versions
Biosequence analysis using profile hidden Markov models
A Language and Environment for Statistical Computing
Tidyverse: R packages for data science
search profile(s) against a sequence database
0
1
2
3
4
5
output
alignments
target_summary
domain_summary
versions
Biosequence analysis using profile hidden Markov models
Create a tag directory with the HOMER suite
0
1
0
tagdir
taginfo
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Differential gene expression analysis based on the negative binomial distribution
Empirical Analysis of Digital Gene Expression Data in R
Search covariance models against a sequence database
0
1
2
0
0
output
alignments
target_summary
versions
Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.
Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
phylogeny
report
mldist
lmap_svg
lmap_eps
lmap_quartetlh
sitefreq_out
bootstrap
state
contree
nex
splits
suptree
alninfo
partlh
siteprob
sitelh
treels
rate
mlrate
exch_matrix
log
versions
IsoSeq - Cluster - Cluster trimmed consensus sequences
0
1
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
versions
IsoSeq - Cluster - Cluster trimmed consensus sequences
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
meta
bam
meta
version
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
Generate a consensus sequence from a BAM file using iVar
0
1
0
0
fasta
qual
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Trim primer sequences rom a BAM file with iVar
0
1
2
0
bam
log
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Taxonomic classification of metagenomic sequence data using a protein reference database
0
1
0
results
versions
Fast and sensitive taxonomic classification for metagenomics
Aligns sequences using kalign
0
1
0
alignment
versions
Kalign is a fast and accurate multiple sequence alignment algorithm.
Create kallisto index
0
1
index
versions
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
0
1
0
1
0
0
0
0
results
json_info
log
versions
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
0
0
report
kmers
versions
khmer k-mer counting library
Generate k-mers (sketches) from FASTA/Q sequences
0
1
outdir
info
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Construct KMCP database from k-mer files
0
1
kmcp
log
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Merge search results from multiple databases.
0
1
result
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Generate taxonomic profile from search results
0
1
0
profile
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Search sequences against database
0
1
0
result
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Adds fasta files to a Kraken2 taxonomic database
0
1
0
0
0
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Builds Kraken2 database
0
1
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Downloads and builds Kraken2 standard database
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Classifies metagenomic sequence data
0
1
0
0
0
classified_reads_fastq
unclassified_reads_fastq
classified_reads_assignment
report
versions
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Classifies metagenomic sequence data using unique k-mer counts
0
1
2
0
0
0
0
0
0
classified_reads
unclassified_reads
classified_assignment
report
versions
Metagenomics classifier with unique k-mer counting for more specific results
Aligns query sequences to target sequences indexed with lastdb
0
1
2
0
maf
multiqc
versions
LAST finds & aligns related regions of sequences.
Prepare sequences for subsequent alignment with lastal.
0
1
index
versions
LAST finds & aligns related regions of sequences.
Converts MAF alignments in another format.
0
1
2
0
1
0
1
0
1
axt_gz
bam
blast_gz
blasttab_gz
chain_gz
cram
gff_gz
html_gz
psl_gz
sam_gz
tab_gz
versions
LAST finds & aligns related regions of sequences.
Reorder alignments in a MAF file
0
1
maf
versions
LAST finds & aligns related regions of sequences.
Post-alignment masking
0
1
maf
versions
LAST finds & aligns related regions of sequences.
Find suitable score parameters for sequence alignment
0
1
0
param_file
multiqc
versions
LAST finds & aligns related regions of sequences.
Align sequences using learnMSA
0
1
alignment
versions
learnMSA: Learning and Aligning large Protein Families
Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder
0
1
scn
gff
versions
A Perl wrapper for LTR_FINDER
An efficient program for finding full-length LTR retrotranspsons in genome sequences
Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments
0
1
0
0
0
log
lai_out
versions
Assessing genome assembly quality using the LTR Assembly Index (LAI)
Multiple sequence alignment using MAFFT
0
1
0
1
0
1
0
1
0
1
0
1
0
fas
versions
Parallel implementation of the gzip algorithm.
Multiple sequence alignment using MAFFT
0
1
0
1
0
1
0
1
0
1
0
1
0
fas
versions
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
Parallel implementation of the gzip algorithm.
Guide tree rendering using MAFFT
0
1
tree
versions
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
Multiple Sequence Alignment using Graph Clustering
0
1
0
1
0
alignment
versions
Multiple Sequence Alignment using Graph Clustering
Multiple Sequence Alignment using Graph Clustering
0
1
tree
versions
Multiple Sequence Alignment using Graph Clustering
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
0
0
0
0
index
versions
log
A tool for mapping metagenomic data
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
0
1
0
rma6
alignments
log
versions
A tool for mapping metagenomic data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
vcf
tbi
versions
Structural variant and indel caller for mapped sequencing data
Screens query sequences against large sequence databases
0
1
0
1
screen
versions
Fast sequence distance estimator that uses MinHash
Creates vastly reduced representations of sequences using MinHash
0
1
mash
stats
versions
Fast sequence distance estimator that uses MinHash
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
0
1
tsv
fa
versions
mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences
0
1
fasta
versions
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
meryl_db
versions
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
hist
versions
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
meryl_db
versions
A genomic k-mer counter (and sequence utility) with nice features.
Metagenome assembler for long-read sequences (HiFi and ONT).
0
1
0
contigs
log
versions
MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
Provides fasta index required by minimap2 alignment.
0
1
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by miniprot alignment.
0
1
index
versions
A versatile pairwise aligner for genomic and protein sequences.
miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
0
1
2
0
1
0
1
2
3
outputs
versions
miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
It includes three main scripts: miRDeep2.pl
, mapper.pl
, and quantifier.pl
for comprehensive miRNA detection and quantification.
mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.
0
1
0
1
0
1
2
tsv
versions
Small RNA-seq annotation
A tool for quality control and tracing taxonomic origins of microRNA sequencing data
0
1
2
0
html
json
tsv
all_fa
rnatype_unknown_fa
versions
miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.
Cluster sequences using MMSeqs2 cluster.
0
1
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create an MMseqs database from an existing FASTA/Q file
0
1
db
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Creates sequence index for mmseqs database
0
1
db_indexed
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create a tsv file from a query and a target database as well as the result database
0
1
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Download an mmseqs-formatted database
0
database
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a database using MMseqs2
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences in linear time using MMSeqs2 linclust.
0
1
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Search and calculate a score for similar sequences in a query and a target database.
0
1
0
1
db_search
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Computes the lowest common ancestor by identifying the query sequence homologs against the target database.
0
1
0
db_taxonomy
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
0
db_exprofile
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Aligns protein structures using mTM-align
0
1
0
alignment
structure
versions
Algorithm for structural multiple sequence alignments
Parallel implementation of the gzip algorithm.
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two
0
1
aligned_fasta
phyi
phys
clustalw
html
msf
tree
log
versions
Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.
0
1
0
alignment
versions
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
Parallel implementation of the gzip algorithm.
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
0
1
html
png
txt
log
versions
NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose
0
1
0
1
vecscreen_output
versions
"NCBI libraries for biology applications (text-based utilities)"
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
0
dataset
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
1
0
csv
csv_errors
csv_insertions
tsv
json
json_auspice
ndjson
fasta_aligned
fasta_translation
nwk
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.
0
1
2
delta
coords
versions
VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence
0
1
0
1
0
1
0
bam
bai
qc_metrics
bqsr_table
duplicate_metrics
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Creates a sequence dictionary for a reference sequence.
0
1
reference_dict
versions
Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.
Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads
0
1
2
bam
bai
num_reads
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
0
1
2
0
0
0
bp
cem
del
dd
int_final
inv
li
rp
si
td
versions
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Identify plasmids in bacterial sequences and assemblies
0
1
json
txt
tsv
genome_seq
plasmid_seq
versions
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
0
1
reads
log
versions
Calculate pairwise nucleotide identity with respect to a reference sequence
0
1
0
1
0
valid_fasta
invalid_fasta
report
log
versions
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
0
1
good_reads
single_reads
bad_reads
log
versions
frame-shift correction for long read (meta)genomics - fix frameshifts in reads
0
1
0
1
out_fa
versions
frame-shift correction for long read (meta)genomics
frame-shift correction for long read (meta)genomics - maps proteins to reads
0
1
2
tsv
versions
frame-shift correction for long read (meta)genomics
Proteinortho is a tool to detect orthologous genes within different species.
0
1
orthologgroups
orthologgraph
blastgraph
versions
Calculate coverage cutoffs to determine when to purge duplicated sequence.
0
1
cutoff
log
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Separates out sequences purged of falsely duplicated sequences.
0
1
2
haplotigs
purged
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
vcf
tbi
rdata
plots
versions
Read aware low coverage whole genome sequence imputation from a reference panel
Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.
0
stockholm_alignment
phylogeny
versions
Screening DNA sequences for interspersed repeats and low complexity DNA sequences
0
1
0
masked
out
tbl
gff
versions
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
0
1
2
0
0
json
disinfinder_kma
pheno_table_species
pheno_table
pointfinder_kma
pointfinder_prediction
pointfinder_results
pointfinder_table
resfinder_hit_in_genome_seq
resfinder_blast
resfinder_kma
resfinder_resistance_gene_seq
resfinder_results_table
resfinder_results_tab
resfinder_results
versions
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
Predict antibiotic resistance from protein or nucleotide data
0
1
0
0
json
tsv
tmp
tool_version
db_version
versions
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Generate statistics from a bam file
0
1
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Infer strandedness from sequencing reads
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate inner distance between read pairs.
0
1
0
distance
freq
mean
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
0
xls
rscript
log
bed
interact_bed
pdf
events_pdf
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
0
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate read duplication rate
0
1
seq_xls
pos_xls
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate TIN (transcript integrity number) from RNA-seq reads
0
1
2
0
txt
xls
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file
0
1
bam
versions
Clips read alignments where they match BED file defined regions
0
1
0
0
0
bam
stats
rejects_bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
calculates MD and NM tags
0
1
0
1
bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Concatenate BAM or CRAM file
0
1
bam
cram
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces a consensus FASTA/FASTQ/PILEUP
0
1
fasta
fastq
pileup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
convert and then index CRAM -> BAM or BAM -> CRAM file
0
1
2
0
1
0
1
bam
cram
bai
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
produces a histogram or table of coverage per chromosome
0
1
2
0
1
0
1
coverage
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
List CRAM Content-ID and Data-Series sizes
0
1
size
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Create a sequence dictionary file from a FASTA file
0
1
dict
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index FASTA file, and optionally generate a file of chromosome sizes
0
1
0
1
0
fa
fai
sizes
gzi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Converts a SAM/BAM/CRAM file to FASTQ
0
1
0
fastq
interleaved
singleton
other
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
0
1
bam
cram
sam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
0
1
2
flagstat
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
readgroup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Reports alignment summary statistics for a BAM/CRAM/SAM file
0
1
2
idxstats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
converts FASTQ files to unmapped SAM/BAM/CRAM
0
1
sam
bam
cram
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index SAM/BAM/CRAM file
0
1
bai
csi
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Merge BAM or CRAM file
0
1
0
1
0
1
bam
cram
csi
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
BAM
0
1
2
0
mpileup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAMโSAMโBAM conversion.
0
1
bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file
0
1
0
1
bam
cram
csi
crai
metrics
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Sort SAM/BAM/CRAM file
0
1
0
1
bam
cram
crai
csi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces comprehensive statistics from SAM/BAM/CRAM file
0
1
2
0
1
stats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
2
0
1
0
0
bam
cram
sam
bai
csi
crai
unselected
unselected_index
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Call peaks using SEACR on sequenced reads in bedgraph format
0
1
2
0
bed
versions
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.
0
1
fastq
versions
Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.
Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.
0
1
fasta
versions
DNA sequence utilities for FASTX files
Concatenating multiple uncompressed sequence files together
0
1
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Select sequences from a large file based on name/ID
0
1
0
filter
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Use seqkit to find/replace strings within sequences and sequence headers
0
1
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
fastx
log
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Sorts sequences by id/name/sequence/length
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Translate DNA/RNA to protein sequence
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Computes sequence statistics from FASTQ or FASTA files
0
1
seqtk_stats
versions
Generates a BED file containing genomic locations of lengths of N.
0
1
bed
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Rename sequence names in FASTQ or FASTA files.
0
1
sequences
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.
Subsample reads from FASTQ files
0
1
2
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Common transformation operations on FASTA or FASTQ files.
0
1
fastx
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.
Select only sequences that match the filtering condition
0
1
0
sequences
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Trim low quality bases from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Sequence quality metrics for FASTQ and uBAM files.
0
1
json
html
versions
PileupCaller is a tool to create genotype calls from bam files using read-sampling methods
0
1
0
0
eigenstrat
plink
freqsum
versions
Tools for population genetics on sequencing data
Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.
0
1
wig
versions
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.
Induce a variation graph in GFA format from alignments in PAF format
0
1
2
gfa
versions
seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
0
1
assembly
gfa
results
versions
Local sequence alignment tool for filtering, mapping and clustering.
0
1
0
1
0
1
reads
log
index
versions
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Create a signature (a group of FracMinHash sketches) of a sequence using sourmash
0
1
signatures
versions
Compute and compare FracMinHash signatures for DNA and protein data sets.
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
0
1
0
0
reads
versions
SRA Toolkit and SDK from NCBI
Download sequencing data from the NCBI Sequence Read Archive (SRA).
0
1
0
0
sra
versions
SRA Toolkit and SDK from NCBI
Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
0
1
2
gene_results
fullgene_results
mlst_results
pileup
sorted_bam
versions
Short Read Sequence Typing for Bacterial Pathogens
Advanced sequence file format conversions
0
1
0
0
0
cram
gzi
versions
Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Align reads to a reference genome using STAR
0
1
0
1
0
1
0
0
0
log_final
log_out
log_progress
versions
bam
bam_sorted
bam_sorted_aligned
bam_transcript
bam_unsorted
fastq
tab
spl_junc_tab
read_per_gene_tab
junction
sam
wig
bedgraph
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create index for STAR
0
1
0
1
index
versions
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Get the minimal allowed index version from STAR
NO input
index_version
versions
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Aligns sequences using T_COFFEE
0
1
0
1
0
1
2
0
alignment
lib
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Compares 2 alternative MSAs to evaluate them.
0
1
2
scores
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Computes a consensus alignment using T_COFFEE
0
1
0
1
0
alignment
eval
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats the header of PDB files with t-coffee
0
1
formatted_pdb
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Computes the irmsd score for a given alignment and the structures.
0
1
0
1
2
irmsd
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package
0
1
0
1
0
1
2
0
alignment
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats files with t-coffee
0
1
formatted_file
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
0
1
0
1
tcs
scores
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle
0
1
classifications
log
fasta
versions
Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.
tidk explore
attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
0
1
explore_tsv
top_sequence
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA
0
1
0
1
0
1
0
1
0
fasta
vcf
ccf
log
versions
This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.
A post sequencing QC tool for Oxford Nanopore sequencers
0
1
report_data
report_html
plots_html
plotly_js
versions
TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.
0
1
pep
gff3
cds
dat
folder
versions
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf
0
1
0
pep
gff3
cds
bed
versions
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Detection of tRNA sequences using covariance models
0
1
tsv
log
stats
fasta
gff
bed
versions
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
log
selfsm
depthsm
selfrg
depthrg
bestsm
bestrg
versions
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
1
2
0
0
log
ud
bed
mu
self_sm
ancestry
versions
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
0
1
2
3
0
1
0
1
graph
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
calculate secondary structures of two RNAs with dimerization
0
1
rnacofold_csv
rnacofold_ps
versions
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โdot plotโ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
0
1
rnafold_txt
rnafold_ps
versions
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
0
rnalfold_txt
versions
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โscanโ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Extracting sequences that were unbinnned by vRhyme into a FASTA file
0
1
0
1
unbinned_sequences
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
0
1
linked_bins
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Binning virus genomes from metagenomes
0
1
0
1
bins
membership
summary
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
0
1
aln
biom
mothur
otu
bam
out
blast
uc
centroids
clusters
profile
msa
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).
0
1
fasta
clustering
log
versions
A versatile open source tool for metagenomics (USEARCH alternative)
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.
0
1
fasta
log
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
0
1
0
tsv
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
0
1
0
fasta
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
0
1
0
0
0
0
aln
biom
lca
mothur
otu
sam
tsv
txt
uc
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
0
1
2
vcf
versions
A tool set for short variant discovery in genetic sequence data
Decomposes biallelic block substitutions into its constituent SNPs.
0
1
2
3
vcf
versions
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
0
1
2
3
0
1
0
1
vcf
fai
versions
A tool set for short variant discovery in genetic sequence data
Masks out highly repetitive DNA sequences with low complexity in a genome
0
1
converted
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to generate frequency counts of repetitive units.
0
1
counts
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to take a counts file and creates a file of genomic co-ordinates to be masked.
0
1
0
1
intervals
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
Click here to trigger an update.