Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Converts a GFF/GTF file into a TSV file
0
1
tsv
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.
0
1
0
0
sample_dir
txt
csv
faa
summary_csv
summary_html
log
results_db
results_db_dmnd
results_db_fasta
results_db_tsv
versions
A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.
0
cluster_tsv
rep_cluster_tsv
log
versions
A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.
A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.
0
tsv
log
versions
This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.
A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.
0
1
0
0
0
0
0
sample_dir
contig_gbks
db_tsv
tsv
faa
sample_log
full_log
db
db_txt
db_fasta
db_mmseqs
versions
A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.
A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.
0
1
0
0
0
amps_faa
amps_tsv
versions
AMPlify is an attentive deep learning model for antimicrobial peptide prediction.
0
1
0
tsv
versions
Attentive deep learning model for antimicrobial peptide prediction
Post-processing script of the MaltExtract component of the HOPS package
0
0
0
json
summary_pdf
tsv
candidate_pdfs
versions
Module to subset AnnData object to cells with matching barcodes from the csv file
0
1
2
h5ad
versions
Annotation and Ranking of Structural Variation
0
1
2
3
0
1
0
1
0
1
0
1
tsv
unannotated_tsv
vcf
versions
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
annotations
versions
Annotation and Ranking of Structural Variation
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
0
1
0
0
clusterblast_file
html_accessory_files
knownclusterblast_html
knownclusterblast_dir
knownclusterblast_txt
svg_files_clusterblast
svg_files_knownclusterblast
gbk_input
json_results
log
zip
gbk_results
clusterblastoutput
html
knownclusterblastoutput
json_sideloading
versions
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).
0
1
0
0
tsv
versions
Annotation of bacterial genomes (isolates, MAGs) and plasmids
0
1
0
0
0
embl
faa
ffn
fna
gbff
gff
hypotheticals_tsv
hypotheticals_faa
tsv
txt
versions
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Render an assembly graph in GFA 1.0 format to PNG and SVG image formats
0
1
png
svg
versions
Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
0
1
2
0
0
vcf
tbi
csi
versions
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Calculate Jaccard statistic b/w two feature files.
0
1
2
0
1
tsv
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
0
1
0
1
0
xml
tsv
csv
versions
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0
1
0
0
checkm_output
marker_file
checkm_tsv
versions
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM2 bin quality prediction
0
1
0
1
checkm2_output
checkm2_tsv
versions
CheckM2 - Rapid assessment of genome bin quality using machine learning
Copy number variant detection from high-throughput sequencing data
0
1
2
tsv
cnn
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
view function to generate vcfs
0
1
0
0
vcf
tsv
xls
versions
calling CNVs using read depth
Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples
0
1
2
args_txt
clustering_csv
log_txt
original_data_csv
pca_components_csv
pca_transformed_csv
versions
Clustering cONtigs with COverage and ComposiTion
Generate the input coverage table for CONCOCT using a BEDFile
0
1
2
3
tsv
versions
Clustering cONtigs with COverage and ComposiTion
Concatenate two or more CSV (or TSV) tables into a single table
0
1
0
0
csv
versions
A cross-platform, efficient, practical CSV/TSV toolkit
Join two or more CSV (or TSV) tables by selected fields into a single table
0
1
csv
versions
A cross-platform, efficient, practical CSV/TSV toolkit
Splits CSV/TSV into multiple files according to column values
0
1
0
0
split_csv
versions
CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
0
1
gct
versions
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
structural-variant calling with cutesv
0
1
2
0
1
vcf
versions
Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.
0
report
versions
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
0
1
2
0
daa
daa_tsv
arg
potential_arg
versions
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.
0
1
0
readme
log
json
bgc_gbk
bgc_tsv
full_gbk
pfam_tsv
bgc_png
pr_png
roc_png
score_png
versions
DeepBGC - Biosynthetic Gene Cluster detection and classification
A Deep Learning Model for Transmembrane Topology Prediction and Classification
0
1
gff3
line3
md
csv
png
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
4
0
1
0
1
0
1
vcf
vcf_index
gvcf
gvcf_index
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Queries a DIAMOND database using blastp mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
versions
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
log
versions
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
0
1
tsv
versions
Accelerated BLAST compatible local sequence aligner
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
0
1
2
3
4
5
0
0
vcf
versions
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
0
1
2
0
1
2
vcf
tbi
versions
In silico prediction of E. coli serotype
0
1
log
tsv
txt
versions
Provide the SNP coverage of each individual in an eigenstrat formatted dataset.
0
1
2
3
tsv
json
versions
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
EMM typing of Streptococcus pyogenes assemblies
0
1
tsv
versions
Compute genome-wide STR profile
0
1
2
0
1
0
1
locus_tsv
motif_tsv
str_profile
versions
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Quickly compute statistics over a fasta file in windows.
0
1
freq
mononuc
dinuc
trinuc
tetranuc
versions
fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.
0
1
tsv
versions
Cluster genome FASTA files by average nucleotide identity
0
1
2
3
tsv
dereplicated_bins
versions
colours a phylogeny with placement densities
0
1
newick
nexus
phyloxml
svg
colours
log
versions
Genesis Applications for Phylogenetic Placement Analysis
Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data
0
1
2
3
4
0
1
0
1
0
1
0
csv
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
0
1
2
3
0
1
0
1
0
1
hdf5
tsv
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
3
4
0
0
0
split_read_evidence
split_read_evidence_index
paired_end_evidence
paired_end_evidence_index
site_depths
site_depths_index
versions
Genome Analysis Toolkit (GATK4)
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
0
0
0
0
printed_evidence
printed_evidence_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0
1
2
3
0
0
0
annotated_vcf
index
versions
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
0
1
2
0
0
0
0
clustered_vcf
clustered_vcf_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
create mappability files for a genome
0
1
0
1
wig
bedgraph
txt
csv
versions
Ultra-fast computation of genome mappability.
Genotype Salmonella Typhi from Mykrobe results
0
1
tsv
versions
Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
run the Broad Gene Set Enrichment tool in GSEA mode
0
1
2
3
0
1
0
1
rpt
index_html
heat_map_corr_plot
report_tsvs_ref
report_htmls_ref
report_tsvs_target
report_htmls_target
ranked_gene_list
gene_set_sizes
histogram
heatmap
pvalues_vs_nes_plot
ranked_list_corr
butterfly_plot
gene_set_tsv
gene_set_html
gene_set_heatmap
snapshot
gene_set_enplot
gene_set_dist
archive
versions
Gene Set Enrichment Analysis (GSEA)
Merging of CheckM and GUNC results in one summary table
0
1
2
tsv
versions
Python package for detection of chimerism and contamination in prokaryotic genomes.
Detection of Chimerism and Contamination in Prokaryotic Genomes
0
1
0
maxcss_level_tsv
all_levels_tsv
versions
Python package for detection of chimerism and contamination in prokaryotic genomes.
Tool to convert and summarize ABRicate outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize DeepARG outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize fARGene outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize RGI outputs using the hAMRonization specification.
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to summarize and combine all hAMRonization reports into a single file
0
0
json
tsv
html
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
summary_csv
roc_all_csv
roc_indel_locations_csv
roc_indel_locations_pass_csv
roc_snp_locations_csv
roc_snp_locations_pass_csv
extended_csv
runinfo
metrics_json
vcf
tbi
versions
Haplotype VCF comparison tools
Identify cap locus serotype and structure in your Haemophilus influenzae assemblies
0
1
0
0
gbk
svg
tsv
versions
Serotype prediction of Haemophilus parasuis assemblies
0
1
tsv
versions
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
0
1
2
0
1
2
tsv
versions
useful command-line tools written to show-case hts-nim
Plot a metagene of cross-link events/sites around various transcriptomic landmarks.
0
1
0
tsv
versions
Produces protein annotations and predictions from an amino acids FASTA file
0
1
0
tsv
xml
gff3
json
versions
Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
phylogeny
report
mldist
lmap_svg
lmap_eps
lmap_quartetlh
sitefreq_out
bootstrap
state
contree
nex
splits
suptree
alninfo
partlh
siteprob
sitelh
treels
rate
mlrate
exch_matrix
log
versions
Call variants from a BAM file using iVar
0
1
0
0
0
0
tsv
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Convert sam files to tsv files
0
1
2
3
0
1
2
3
tsv
versions
Java utilities for Bioinformatics.
Plot whole genome coverage from BAM/CRAM file as SVG
0
1
2
0
1
0
1
0
1
output
versions
Java utilities for Bioinformatics.
Typing of clinical and environmental isolates of Legionella pneumophila
0
1
tsv
versions
Serogrouping Listeria monocytogenes assemblies
0
1
tsv
versions
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
bam
log
versions
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
vcf
versions
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
vcf
tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
5
6
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
somatic_sv_vcf
somatic_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
tumor_sv_vcf
tumor_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.
0
1
0
1
0
1
csv
versions
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
0
1
tsv
fa
versions
Performs taxonomic profiling of long metagenomic reads against the melon database
0
1
0
0
tsv_output
json_output
log
versions
Serotyping of Neisseria meningitidis assemblies
0
1
tsv
versions
Annotation of eukaryotic metagenomes using MetaEuk
0
1
0
faa
codon
tsv
gff
versions
mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.
0
1
0
1
0
1
2
tsv
versions
Small RNA-seq annotation
mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package
0
1
0
1
0
1
2
tsv
fasta
vcf
versions
Small RNA-seq annotation
A tool for quality control and tracing taxonomic origins of microRNA sequencing data
0
1
2
0
html
json
tsv
all_fa
rnatype_unknown_fa
versions
miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.
Create a tsv file from a query and a target database as well as the result database
0
1
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a database using MMseqs2
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
0
db_exprofile
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
AMR predictions for supported species
0
1
0
csv
json
versions
Antibiotic resistance prediction in minutes
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
1
0
csv
csv_errors
csv_insertions
tsv
json
json_auspice
ndjson
fasta_aligned
fasta_translation
nwk
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
Serotyping Neisseria gonorrhoeae assemblies
0
1
tsv
versions
Determines the gender of a sample from the BAM/CRAM file.
0
1
2
0
1
0
1
0
tsv
versions
Short-read sequencing tools
Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files
0
1
json
tsv
csv
pdf
versions
Estimate average coverage and create curves for metagenomic datasets
Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.
0
1
lay
tsv
versions
An optimized dynamic genome/graph implementation
Metrics describing a variation graph and its path relationship.
0
1
tsv
yaml
versions
An optimized dynamic genome/graph implementation
Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file
0
1
0
0
0
tsv
versions
panacus is a tool for computing counting statistics for GFA files
Create visualizations from a tsv coverage histogram created with panacus.
0
1
image
versions
panacus is a tool for computing counting statistics for GFA files
pbsv - PacBio structural variant (SV) signature discovery tool
0
1
0
1
svsig
versions
pbsv - PacBio structural variant (SV) calling and analysis tools
Predict prophages in bacterial genomes
0
1
coordinates
gbk
log
information
bacteria_fasta
bacteria_gbk
phage_fasta
phage_gbk
prophage_gff
prophage_tbl
prophage_tsv
versions
Prophage finder using multiple metrics
Identify plasmids in bacterial sequences and assemblies
0
1
json
txt
tsv
genome_seq
plasmid_seq
versions
Whole genome annotation of small genomes (bacterial, archeal, viral)
0
1
0
0
gff
gbk
fna
faa
ffn
sqn
fsa
tbl
err
log
txt
tsv
versions
frame-shift correction for long read (meta)genomics - maps proteins to reads
0
1
2
tsv
versions
frame-shift correction for long read (meta)genomics
Run PureCN workflow to normalize, segment and determine purity and ploidy
0
1
2
0
0
pdf
local_optima_pdf
seg
genes_csv
amplification_pvalues_csv
vcf_gz
variants_csv
loh_csv
chr_pdf
segmentation_pdf
multisample_seg
versions
Copy number calling and SNV classification using targeted short read sequencing
Damage parameter estimation for ancient DNA
0
1
2
csv
versions
Damage parameter estimation for ancient DNA
Damage parameter estimation for ancient DNA
0
1
csv
versions
Damage parameter estimation for ancient DNA
Prepare a depth of coverage file for all target genes with SV from BAM files.
0
1
2
0
0
coverage
versions
A Python package for pharmacogenomics research
Predict antibiotic resistance from protein or nucleotide data
0
1
0
0
json
tsv
tmp
tool_version
db_version
versions
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.
0
1
png
svg
versions
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
sage is a search software for proteomics data
0
1
0
1
0
1
results_tsv
results_json
results_pin
versions
tmt_tsv
lfq_tsv
Proteomics searching so fast it feels like magic.
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
0
1
2
0
csv
json
bam
versions
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
Computes the depth at each position or region.
0
1
0
1
tsv
versions
Tools for dealing with SAM, BAM and CRAM files; samtools depth โ computes the read depth at each position or region
SCIMAP is a suite of tools that enables spatial single-cell analyses
0
1
csv
h5ad
versions
Scimap is a scalable toolkit for analyzing spatial molecular data.
metagenomic binning with self-supervised learning
0
1
2
csv
model
output_fasta
recluster_fasta
tsv
versions
Metagenomic binning with semi-supervised siamese neural network
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
0
1
2
0
1
0
1
cram
crai
bam
bai
score
metrics
metrics_multiqc_tsv
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Generate recalibration table and optionally perform base quality recalibration
0
1
2
0
1
0
1
0
1
0
1
0
1
0
table
table_post
recal_alignment
csv
pdf
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Salmonella serotype prediction from reads and assemblies
0
1
log
tsv
txt
versions
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
tsv
txt
versions
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)
0
1
2
3
4
5
0
1
log
read_qual
breakpoints_double
read_alignments
read_ids
collapsed_dup
loh
all_vcf
all_breakpoints_clusters_list
all_breakpoints_clusters
all_plots
somatic_vcf
somatic_breakpoints_clusters_list
somatic_breakpoints_clusters
somatic_plots
versions
Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.
0
1
0
json
tsv
versions
Determine Shigella serotype from Illumina or Oxford Nanopore reads
0
1
tsv
hits
versions
Determine Shigella serotype from assemblies or Illumina paired-end reads
0
1
tsv
versions
Serovar prediction of salmonella assemblies
0
1
tsv
allele_fasta
allele_json
cgmlst_csv
versions
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.
0
1
2
3
0
1
0
1
vcf
versions
structural variant calling and genotyping with existing tools, but, smoothly
Rapid haploid variant calling
0
1
0
tab
csv
html
vcf
bed
gff
bam
bai
log
aligned_fa
consensus_fa
consensus_subs_fa
raw_vcf
filt_vcf
vcf_gz
vcf_csi
txt
versions
Rapid bacterial SNP calling and core genome alignments
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
0
1
2
tsv
html
versions
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
2
0
html
pairs_tsv
samples_tsv
versions
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Compare many FracMinHash signatures generated by sourmash sketch.
0
1
0
0
0
matrix
labels
csv
versions
Compute and compare FracMinHash signatures for DNA and protein data sets.
Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.
0
1
0
0
0
0
0
result
unassigned
matches
prefetch
prefetchcsv
versions
Compute and compare FracMinHash signatures for DNA data sets.
Serotype prediction of Streptococcus suis assemblies
0
1
tsv
versions
Predicts Staphylococcus aureus SCCmec type based on primers.
0
1
tsv
versions
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
0
1
results_xlsx
summary_tsv
detailed_summary_tsv
resfinder_tsv
plasmidfinder_tsv
mlst_tsv
settings_txt
pointfinder_tsv
versions
Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.
Serotype STEC samples from paired-end reads or assemblies
0
1
tsv
versions
Converts a bedpe file to a VCF file (beta version)
0
1
vcf
versions
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
0
1
2
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
0
1
0
0
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
0
1
0
1
0
1
0
0
parameters
vcf
bed
fasta
insertions
versions
Toolset for SV simulation, comparison and filtering
Report multiple stats over a VCF file
0
1
0
0
0
stats
versions
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
1
sv
indel
germ_indel
germ_sv
som_indel
som_sv
unfiltered_sv
unfiltered_indel
unfiltered_germ_indel
unfiltered_germ_sv
unfiltered_som_indel
unfiltered_som_sv
raw_calls
discordants
log
versions
SVbenchmark compares a set of โtestโ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
0
1
2
3
4
5
0
1
0
1
fns
fps
distances
log
report
versions
SVanalyzer: tools for the analysis of structural variation in genomes
Build a structural variant database
0
1
0
db
versions
structural variant database software
The merge module merges structural variants within one or more vcf files.
0
1
0
0
vcf
tbi
csi
versions
structural variant database software
Query a structural variant database, using a vcf file as query
0
1
0
0
0
0
0
0
vcf
versions
structural variant database software
Performs tests on BAF files
0
1
2
3
4
metrics
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
0
1
counts
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
0
1
2
0
vcf
tbi
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
0
1
0
standardized_vcf
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
0
1
2
bed
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert a VCF file to a BEDPE file.
0
1
bedpe
versions
Tools for processing and analyzing structural variants
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
0
1
2
3
0
1
0
1
json
gt_vcf
bam
versions
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
0
1
2
3
0
1
gt_vcf
json
versions
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
0
1
2
3
vcf
tbi
versions
Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA
0
1
csv_gz
versions
Convert taxonids to taxon lineages
0
1
2
0
tsv
versions
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Convert taxon names to TaxIds
0
1
2
0
tsv
versions
A Cross-platform and Efficient NCBI Taxonomy Toolkit
A tool to detect resistance and lineages of M. tuberculosis genomes
0
1
bam
csv
json
txt
vcf
versions
Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
0
1
0
1
tcs
scores
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Identify chromosomal rearrangements.
0
1
2
0
1
0
1
vcf
ploidy
versions
Search for structural variants.
tidk explore
attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
0
1
explore_tsv
top_sequence
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Searches a genome for a telomere string such as TTAGGG
0
1
0
tsv
bedgraph
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Detection of tRNA sequences using covariance models
0
1
tsv
log
stats
fasta
gff
bed
versions
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
0
1
2
3
4
5
0
1
0
1
fn_vcf
fn_tbi
fp_vcf
fp_tbi
tp_base_vcf
tp_base_tbi
tp_comp_vcf
tp_comp_tbi
summary
versions
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
0
1
consistency
versions
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
0
1
vcf
versions
Structural variant comparison tool for VCFs
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
0
bam
log
tsv_edit_distance
tsv_per_umi
tsv_umi_per_position
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
0
1
2
0
0
log
bam
tsv
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
The Java port of the VarDict variant caller
0
1
2
3
0
1
0
1
vcf
versions
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
0
1
2
0
0
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
0
1
0
1
0
1
alignment_properties_json
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
0
1
2
3
4
0
1
0
1
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
Command line tools for parsing and manipulating VCF files.
0
1
2
vcf
versions
Command line tools for parsing and manipulating VCF files.
Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
log
selfsm
depthsm
selfrg
depthrg
bestsm
bestrg
versions
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
1
2
0
0
log
ud
bed
mu
self_sm
ancestry
versions
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
0
1
2
3
0
1
0
1
graph
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
0
1
0
0
vcf
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
0
1
xg
vg_index
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
calculate secondary structures of two RNAs with dimerization
0
1
rnacofold_csv
rnacofold_ps
versions
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โdot plotโ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
0
1
rnafold_txt
rnafold_ps
versions
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
0
rnalfold_txt
versions
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โscanโ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Extracting sequences that were unbinnned by vRhyme into a FASTA file
0
1
0
1
unbinned_sequences
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
0
1
linked_bins
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Binning virus genomes from metagenomes
0
1
0
1
bins
membership
summary
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
0
1
aln
biom
mothur
otu
bam
out
blast
uc
centroids
clusters
profile
msa
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).
0
1
fasta
clustering
log
versions
A versatile open source tool for metagenomics (USEARCH alternative)
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.
0
1
fasta
log
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
0
1
0
tsv
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
0
1
0
fasta
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
0
1
0
0
0
0
aln
biom
lca
mothur
otu
sam
tsv
txt
uc
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
0
1
2
vcf
versions
A tool set for short variant discovery in genetic sequence data
Decomposes biallelic block substitutions into its constituent SNPs.
0
1
2
3
vcf
versions
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
0
1
2
3
0
1
0
1
vcf
fai
versions
A tool set for short variant discovery in genetic sequence data
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
0
1
2
0
0
vcf
tbi
graph
versions
Click here to trigger an update.