Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Screen assemblies for antimicrobial resistance against multiple databases
0
1
0
report
versions
Mass screening of contigs for antibiotic resistance genes
Screen assemblies for antimicrobial resistance against multiple databases
0
1
report
versions
Mass screening of contigs for antibiotic resistance genes
A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes
0
1
matches
partials
virulence
out
txt
versions
A pipeline for running AMRfinderPlus and collating results into functional classes
Trim sequencing adapters and collapse overlapping reads
0
1
0
singles_truncated
discarded
paired_truncated
collapsed
collapsed_truncated
paired_interleaved
settings
versions
Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.
0
1
fixed_fastq
versions
ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).
0
1
2
3
0
ancestry_fractions
allele_frequencies
versions
Read CEL files into an ExpressionSet and generate a matrix
0
1
2
0
1
rds
expression
annotation
versions
Methods for Affymetrix Oligonucleotide Arrays
Converts a GFF/GTF file into a proper GTF file
0
1
output_gtf
log
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Converts a GFF/GTF file into a TSV file
0
1
tsv
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file
0
1
output_gff
log
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Add intron features to gtf/gff file without intron features.
0
1
0
gff
versions
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.
0
1
0
0
gff
versions
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.
0
1
0
gff
versions
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Provides different type of statistics in text format from a GFF/GTF annotation file
0
1
stats_txt
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Provides basic statistics in text format from a GFF/GTF annotation file
0
1
stats_txt
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants
0
1
summary
results_dir
versions
Generates a count of coverage of alleles
0
1
2
0
0
allelecount
versions
A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.
0
1
0
0
sample_dir
txt
csv
faa
summary_csv
summary_html
log
results_db
results_db_dmnd
results_db_fasta
results_db_tsv
versions
A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.
0
cluster_tsv
rep_cluster_tsv
log
versions
A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.
A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.
0
tsv
log
versions
This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.
A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.
0
1
0
0
0
sample_dir
contig_gbks
txt
tsv
faa
sample_log
full_log
results_db
results_db_dmnd
results_db_fasta
results_db_tsv
versions
A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.
A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.
0
1
0
0
0
amps_faa
amps_tsv
versions
AMPlify is an attentive deep learning model for antimicrobial peptide prediction.
0
1
0
tsv
versions
Attentive deep learning model for antimicrobial peptide prediction
Post-processing script of the MaltExtract component of the HOPS package
0
0
0
json
summary_pdf
tsv
candidate_pdfs
versions
Identify antimicrobial resistance in gene or protein sequences
0
1
0
report
mutation_report
versions
tool_version
db_version
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Identify antimicrobial resistance in gene or protein sequences
NO input
db
versions
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.
0
1
0
1
txt
versions
ANGSD: Analysis of next generation Sequencing Data
Calculates base frequency statistics across reference positions from BAM.
0
1
2
3
depth_sample
depth_global
qs
pos
counts
icounts
versions
ANGSD: Analysis of next generation Sequencing Data
Calculated genotype likelihoods from BAM files.
0
1
0
1
0
1
genotype_likelihood
versions
ANGSD: Analysis of next generation Sequencing Data
Annotation and Ranking of Structural Variation
0
1
2
3
0
1
0
1
0
1
0
1
tsv
unannotated_tsv
vcf
versions
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
annotations
versions
Annotation and Ranking of Structural Variation
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
0
1
2
3
0
1
2
translated_mrna
total_mrna
translation
buffering
mrna_abundance
rdata
fold_change_plot
interaction_p_distribution_plot
residual_distribution_summary_plot
residual_vs_fitted_plot
rvm_fit_for_all_contrasts_group_plot
rvm_fit_for_interactions_plot
rvm_fit_for_omnibus_group_plot
simulated_vs_obt_dfbetas_without_interaction_plot
session_info
versions
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
0
1
0
0
clusterblast_file
html_accessory_files
knownclusterblast_html
knownclusterblast_dir
knownclusterblast_txt
svg_files_clusterblast
svg_files_knownclusterblast
gbk_input
json_results
log
zip
gbk_results
clusterblastoutput
html
knownclusterblastoutput
json_sideloading
versions
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
0
0
0
database
antismash_dir
versions
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.
0
1
extracted_reads_fastq
log
intermediate_sam
intermediate_bam
intermediate_sorted_bam
versions
arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).
0
1
0
0
tsv
versions
Download and prepare database for Ariba analysis
0
1
db
versions
ARIBA: Antibiotic Resistance Identification By Assembly
Query input FASTQs against Ariba formatted databases
0
1
0
1
results
versions
ARIBA: Antibiotic Resistance Identification By Assembly
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
0
1
0
1
0
1
0
0
0
0
fusions
fusions_fail
versions
Fast and accurate gene fusion detection from RNA-Seq data
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
0
blacklist
cytobands
protein_domains
known_fusions
versions
Fast and accurate gene fusion detection from RNA-Seq data
Simulation tool to generate synthetic Illumina next-generation sequencing reads
0
1
0
0
0
fastq
aln
sam
versions
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
Aggregates fastq files with demultiplexed reads
0
1
fastq
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
0
1
0
0
0
0
0
0
0
0
results
bam
bai
bam_trimmed
bai_trimmed
bam_primertrimmed
bai_primertrimmed
fasta
vcf
tbi
json
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
copy number profiles of tumour cells.
0
1
2
3
4
0
0
0
0
0
0
allelefreqs
bafs
cnvs
logrs
metrics
png
purityploidy
segments
versions
Alignment by Simultaneous Harmonization of Layer/Adjacency Registration
0
1
0
0
tif
versions
Assembly summary statistics in JSON format
0
1
json
versions
ataqv function of a corresponding ataqv tool
0
1
2
3
0
0
0
0
0
json
problems
versions
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
mkarv function of a corresponding ataqv tool
0
html
versions
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
generate VCF file from a BAM file using various calling methods
0
1
2
3
4
0
0
0
0
vcf
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Estimate the post-mortem damage patterns of DNA
0
1
2
3
0
0
empiric
exponential
counts
table
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Gives an estimation of the sequencing bias based on known invariant sites
0
1
2
3
4
0
0
recal_patterns
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
split single end read groups by length and merge paired end reads
0
1
2
3
4
bam
txt
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Generate tables of feature metadata from GTF files
0
1
0
1
feature_annotation
filtered_cdna
versions
Scripts for manipulating gene annotation
Use deamination patterns to estimate contamination in single-stranded libraries
0
1
0
1
0
1
txt
versions
Estimates present-day DNA contamination in ancient DNA single-stranded libraries.
Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif
images.
0
1
0
1
backsub_tif
markerout
versions
Annotation of bacterial genomes (isolates, MAGs) and plasmids
0
1
0
0
0
embl
faa
ffn
fna
gbff
gff
hypotheticals_tsv
hypotheticals_faa
tsv
txt
versions
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Downloads BAKTA database from Zenodo
NO input
db
versions
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data
0
1
2
fastq
versions
Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files
removes unused references from header of sorted BAM/CRAM files.
0
1
bam
versions
This module is used to clip primer sequences from your alignments.
0
1
2
3
bam
bai
versions
Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).
0
1
2
primary_filtered_bam
contamination_bam
versions
write your description here
0
1
json
versions
A command line tool to compute mapping statistics from a BAM file
Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis
0
1
fastq
versions
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
0
1
bam
versions
C++ API & command-line toolkit for working with BAM data
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
0
1
stats
versions
C++ API & command-line toolkit for working with BAM data
Render an assembly graph in GFA 1.0 format to PNG and SVG image formats
0
1
png
svg
versions
Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
Demultiplex Element Biosciences bases files
0
1
2
sample_fastq
sample_json
qc_report
run_stats
generated_run_manifest
metrics
unassigned
versions
BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.
0
1
fields
versions
Adapter and quality trimming of sequencing reads
0
1
0
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Merging overlapping paired reads into a single read.
0
1
0
merged
unmerged
ihist
versions
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.
0
1
fastq
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates
0
1
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Filter out sequences by sequence header name(s)
0
1
0
0
0
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.
0
index
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.
0
1
covstats
hist
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Compares query sketches to reference sketches hosted on a remote server via the Internet.
0
1
hits
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Concatenate VCF files
0
1
2
vcf
tbi
csi
versions
Concatenate VCF files.
Compresses VCF files
0
1
2
3
4
fasta
versions
Create consensus sequence by applying VCF variants to a reference fasta file.
Converts certain output formats to VCF
0
1
2
0
1
0
vcf_gz
vcf
bcf_gz
bcf
hap
legend
samples
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
bcftools Haplotype-aware consequence caller
0
1
0
1
0
1
0
1
vcf
tbi
csi
versions
Haplotype-aware consequence caller
Filters VCF files
0
1
2
vcf
tbi
csi
versions
Apply fixed-threshold filters to VCF files.
Index VCF tools
0
1
csi
tbi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Apply set operations to VCF files
0
1
2
results
versions
Computes intersections, unions and complements of VCF files.
Merge VCF files
0
1
2
0
1
0
1
0
1
vcf
index
versions
Merge VCF files.
Compresses VCF files
0
1
2
0
1
0
vcf
tbi
stats
mpileup
versions
Generates genotype likelihoods at each genomic position with coverage.
Normalize VCF file
0
1
2
0
1
vcf
tbi
csi
versions
Normalize VCF files.
Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.
0
1
2
0
0
vcf
tbi
csi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available
Split VCF by chunks or regions, creating multiple VCFs.
0
1
2
0
0
0
0
0
scatter
tbi
csi
versions
Split VCF by chunks or regions, creating multiple VCFs.
Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
0
1
2
0
0
0
0
vcf
tbi
csi
versions
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
Split VCF by sample, creating single- or multi-sample VCFs.
0
1
2
0
0
0
0
vcf
tbi
csi
versions
Split VCF by sample, creating single- or multi-sample VCFs.
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
0
1
2
0
0
vcf
tbi
csi
versions
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
Extracts fields from VCF or BCF files and outputs them in user-defined format.
0
1
2
0
0
0
output
versions
Extracts fields from VCF or BCF files and outputs them in user-defined format.
Reheader a VCF file
0
1
2
3
0
1
vcf
index
versions
Modify header of VCF/BCF files, change sample names.
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
0
1
2
0
1
0
0
0
0
roh
versions
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
Sorts VCF files
0
1
vcf
tbi
csi
versions
Sort VCF files by coordinates.
Split a vcf file into files per chromosome
0
1
2
split_vcf
versions
Sort VCF files by coordinates.
Generates stats from VCF files
0
1
2
0
1
0
1
0
1
0
1
0
1
stats
versions
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
0
1
2
0
0
0
vcf
tbi
csi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Demultiplex Illumina BCL files
0
1
2
fastq
fastq_idx
undetermined
undetermined_idx
reports
stats
interop
versions
Demultiplex Illumina BCL files
0
1
2
fastq
fastq_idx
undetermined
undetermined_idx
reports
logs
interop
versions
Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.
0
1
0
0
0
0
vcf
log
versions
Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
Convert a BED file to a VCF file according to a YAML config
0
1
2
0
1
vcf
versions
Convert BAM/GFF/GTF/GVF/PSL files to bed
0
1
bed
versions
High-performance genomic feature operations.
Convert gtf format to bed format
0
1
bed
versions
The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.
Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.
0
1
0
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.
0
1
2
0
0
0
genomecov
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
extract sequences in a FASTA file based on intervals defined in a feature file.
0
1
0
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.
0
1
0
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Allows one to screen for overlaps between two sets of genomic features.
0
1
2
0
1
intersect
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Calculate Jaccard statistic b/w two feature files.
0
1
2
0
1
tsv
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Makes adjacent or sliding windows across a genome or BED file.
0
1
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
masks sequences in a FASTA file based on intervals defined in a feature file.
0
1
0
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
0
1
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.
0
1
0
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Profiles the nucleotide content of intervals in a fasta file.
0
1
2
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)
0
1
0
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Sorts a feature file by chromosome and other criteria.
0
1
0
sorted
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Split BED files into several smaller BED files
0
1
2
beds
versions
A powerful toolset for genome arithmetic
Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.
0
1
2
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Combines multiple BedGraph files into a single file
0
1
0
1
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Locate and tag duplicate reads in a BAM file
0
1
bam
metrics
versions
biobambam is a set of tools for early stage alignment file processing.
Merge a list of sorted bam files
0
1
bam
bam_index
checksum
versions
biobambam is a set of tools for early stage alignment file processing.
Parallel sorting and duplicate marking
0
1
0
1
bam
bam_index
cram
metrics
versions
biobambam is a set of tools for early stage alignment file processing.
Use k-mers to rapidly subtype S. enterica genomes
0
1
0
summary
kmer_results
simple_summary
versions
Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.
0
1
0
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
0
1
0
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Summarize and/or filter reads based on bisulfite conversion rate
0
1
2
0
bsconv_bam
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.
0
1
2
3
0
epiread_bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Indexes a reference genome for use with Biscuit
0
index
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Merges methylation information for opposite-strand C's in a CpG context
0
1
0
mergecg_bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants
0
1
2
3
4
0
vcf
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Perform basic quality control on a BAM file generated with Biscuit
0
1
0
biscuit_qc_reports
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.
0
1
bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Performs alignment of BS-Seq reads using bismark
0
1
0
1
0
1
bam
report
unmapped
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Relates methylation calls back to genomic cytosine contexts.
0
1
0
1
0
1
coverage
report
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Removes alignments to the same position in the genome from the Bismark mapping output.
0
1
bam
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.
0
1
index
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Extracts methylation information for individual cytosines from alignments.
0
1
0
1
bedgraph
methylation_calls
coverage
report
mbias
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Collects bismark alignment reports
0
1
2
3
4
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.
0
0
0
0
0
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Retrieve entries from a BLAST database
0
1
2
0
1
fasta
text
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
txt
versions
BLAST finds regions of similarity between biological sequences.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
0
1
0
1
0
xml
tsv
csv
versions
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
Builds a BLAST database
0
1
db
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
txt
versions
Protein to Translated Nucleotide BLAST.
Downloads a BLAST database from NCBI
0
1
db
versions
BLAST finds regions of similarity between biological sequences.
Create bowtie index for reference genome
0
1
index
versions
bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.
0
1
0
reports
txt
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Extends a Kraken2 database to be compatible with Bracken
0
1
db
bracken_files
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Combine output of metagenomic samples analyzed by bracken.
0
1
txt
versions
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Benchmarking Universal Single Copy Orthologs
meta
fasta
mode
lineage
busco_lineages_path
config_file
meta
batch_summary
short_summaries_txt
short_summaries_json
busco_dir
full_table
missing_busco_list
single_copy_proteins
seq_dir
translated_proteins
versions
Benchmarking Universal Single Copy Orthologs
0
1
0
0
0
0
batch_summary
short_summaries_txt
short_summaries_json
full_table
missing_busco_list
single_copy_proteins
seq_dir
translated_dir
busco_dir
versions
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
BUSCO plot generation tool
0
png
versions
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Create BWA-mem2 index for reference genome
0
1
index
versions
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create BWA-MEME index for reference genome
0
1
index
versions
Faster BWA-MEM2 using learned-index
Performs alignment of BS-Seq reads using bwameth
0
1
0
1
0
1
bam
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Performs indexing of c2t converted reference genome
0
1
index
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
A module for concatenation of gzipped or uncompressed files
0
1
file_out
versions
Just concatenation
Concatenates fastq files
0
1
reads
versions
The cat utility reads files sequentially, writing them to the standard output.
Cluster protein sequences using sequence similarity
0
1
fasta
clusters
versions
Clusters and compares protein or nucleotide sequences
Cluster nucleotide sequences using sequence similarity
0
1
fasta
clusters
versions
Clusters and compares protein or nucleotide sequences
Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth
0
1
0
0
0
celltypes
quality
versions
Module to use CellBender to remove ambient RNA from single-cell RNA-seq data
0
1
2
3
h5ad
versions
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data
0
1
h5
filtered_h5
posterior_h5
barcodes
metrics
report
pdf
log
checkpoint
versions
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.
0
1
0
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.
0
1
2
fastq
undetermined_fastq
reports
stats
interop
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.
0
gtf
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.
0
0
0
reference
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.
0
0
0
0
reference
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.
0
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
config
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.
0
1
0
outs
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.
0
1
2
3
0
outs
lib
versions
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.
0
0
versions
fastq
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.
0
gtf
versions
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.
0
0
0
0
0
reference
config
versions
Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.
Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.
0
1
0
outs
versions
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.
0
0
versions
fastq
Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.
0
0
0
0
0
reference
versions
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.
0
1
2
3
4
base
cell
sample
allele_depth
depth_coverage
depth_other
versions
Efficient genotyping bi-allelic SNPs on single cells
Build centrifuge database for taxonomic profiling
0
1
0
0
0
0
cf
versions
Classifier for metagenomic sequences
Classifies metagenomic sequence data
0
1
0
0
0
report
results
sam
fastq_mapped
fastq_unmapped
versions
Centrifuge is a classifier for metagenomic sequences.
Creates Kraken-style reports from centrifuge out files
0
1
0
kreport
versions
Centrifuge is a classifier for metagenomic sequences.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0
1
0
0
checkm_output
marker_file
checkm_tsv
versions
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0
1
2
3
0
output
fasta
versions
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM2 database download
0
database
versions
CheckM2 - Rapid assessment of genome bin quality using machine learning
CheckM2 bin quality prediction
0
1
0
1
checkm2_output
checkm2_tsv
versions
CheckM2 - Rapid assessment of genome bin quality using machine learning
Construct the database necessary for checkv's quality assessment
NO input
checkv_db
versions
Assess the quality of metagenome-assembled viral genomes.
Assess the quality of metagenome-assembled viral genomes.
0
1
0
quality_summary
completeness
contamination
complete_genomes
proviruses
viruses
versions
Assess the quality of metagenome-assembled viral genomes.
Construct the database necessary for checkv's quality assessment
0
1
0
checkv_db
versions
Assess the quality of metagenome-assembled viral genomes.
Create a schema to determine the allelic profiles of a genome
0
1
0
0
schema
cds_coordinates
invalid_cds
versions
A complete suite for gene-by-gene schema creation and strain identification.
Filter and trim long read data.
0
1
0
fastq
versions
zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.
Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).
Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.
0
1
0
1
0
1
0
0
0
0
bed
bam
tagAlign
pairs
versions
Fast alignment and preprocessing of chromatin profiles
Indexes a fasta reference genome ready for chromatin profiling.
0
1
index
versions
Fast alignment and preprocessing of chromatin profiles
Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.
0
1
0
1
0
1
0
1
0
1
0
1
0
1
plots
versions
Annotate circRNAs detected in the output from CIRCexplorer2 parse
0
1
0
0
txt
versions
Circular RNA analysis toolkits
CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.
0
1
junction
versions
Circular RNA analysis toolkit
A method to improve mappings on circular genomes, using the BWA mapper.
0
1
0
1
0
1
fasta
elongated
versions
Creating a modified reference genome, with an elongation of the an specified amount of bases
Realign reads mapped with BWA to elongated reference genome
0
1
0
1
0
1
0
1
bam
versions
A method to improve mappings on circular genomes such as Mitochondria.
Predict recomination events in bacterial genomes
0
1
2
emsim
em
status
newick
fasta
pos_ref
versions
Align sequences using Clustal Omega
0
1
0
1
0
alignment
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Parallel implementation of the gzip algorithm.
Renders a guidetree in clustalo
0
1
tree
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Calculates polymorphic site rates over protein coding genes
0
1
2
3
4
polymut
versions
Set of utilities on sequences and BAM files
Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.
0
1
0
1
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Derive off-target (“antitarget”) bins from target regions.
0
1
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
0
1
2
0
1
0
1
0
1
0
1
0
bed
cnn
cnr
cns
pdf
png
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number
0
1
2
cns
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.
0
1
output
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
0
1
2
tsv
cnn
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Compile a coverage reference from the given files (normal samples).
0
0
0
cnn
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Transform bait intervals into targets more suitable for CNVkit.
0
1
0
1
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.
0
1
2
0
1
0
1
0
1
root
tab
versions
Tool for calling copy number variations.
convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.
0
1
vcf
versions
Tool for calling copy number variations.
Command line tool for calling CNVs in whole genome sequencing data
0
1
0
pytor
versions
calling CNVs using read depth
calculates read depth histograms
0
1
0
pytor
versions
calling CNVs using read depth
command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.
0
1
2
0
0
pytor
versions
calling CNVs using read depth
partitioning read depth histograms
0
1
0
pytor
versions
calling CNVs using read depth
view function to generate vcfs
0
1
0
0
vcf
tsv
xls
versions
calling CNVs using read depth
A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.
0
1
0
1
0
1
0
1
0
0
0
self_circular
extended_circular
extended_partial
extended_failed
orphan_end
all_cobra_assemblies
joining_summary
log
versions
Builds a classic bloom filter COBS index
0
1
index
versions
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Builds a compact bloom filter COBS index
0
1
index
versions
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples
0
1
2
args_txt
clustering_csv
log_txt
original_data_csv
pca_components_csv
pca_transformed_csv
versions
Clustering cONtigs with COverage and ComposiTion
Generate the input coverage table for CONCOCT using a BEDFile
0
1
2
3
tsv
versions
Clustering cONtigs with COverage and ComposiTion
Calculate confidence scores from Kraken2 output
0
1
0
score
versions
Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC
0
1
2
p_value_txt
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Copy number and genotype annotation from whole genome and whole exome sequencing data
0
1
2
3
4
5
6
0
0
0
0
0
0
0
0
0
bedgraph
control_cpn
sample_cpn
gcprofile_cpn
BAF
CNV
info
ratio
config
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
bed
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Format Freec output to circos input format
0
1
circos
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
2
3
png_baf
png_ratio_log2
png_ratio
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
2
png_baf
png_ratio_log2
png_ratio
versions
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Run matrix balancing on a cool file
0
1
2
cool
versions
Sparse binary format for genomic interaction matrices
Create a cooler from genomic pairs and bins
0
1
2
3
0
cool
versions
Sparse binary format for genomic interaction matrices
Generate fragment-delimited genomic bins
0
0
0
bed
versions
Sparse binary format for genomic interaction matrices
Dump a cooler’s data to a text stream.
0
1
2
bedpe
versions
Sparse binary format for genomic interaction matrices
Generate fixed-width genomic bins
0
1
2
bed
versions
Sparse binary format for genomic interaction matrices
Merge multiple coolers with identical axes
0
1
cool
versions
Sparse binary format for genomic interaction matrices
Generate a multi-resolution cooler file by coarsening
0
1
mcool
versions
Sparse binary format for genomic interaction matrices
Indexes a directory of fasta files for use with CoPTR
0
1
index_dir
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.
0
1
cores
masks
tma_map
centroids
versions
Compress files with crabz
0
1
archive
versions
Like pigz, but rust
Decompress files with crabz
0
1
file
versions
Like pigz, but rust
remove false positives of functional crispr genomics due to CNVs
0
1
2
0
0
norm_count_file
versions
Analysis of CRISPR functional genomics, remove false positive due to CNVs.
Concatenate two or more CSV (or TSV) tables into a single table
0
1
0
0
csv
versions
A cross-platform, efficient, practical CSV/TSV toolkit
Join two or more CSV (or TSV) tables by selected fields into a single table
0
1
csv
versions
A cross-platform, efficient, practical CSV/TSV toolkit
Splits CSV/TSV into multiple files according to column values
0
1
0
0
split_csv
versions
CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.
Custom module to Add a new fasta file to an old one and update an associated GTF
0
1
2
0
1
0
fasta
gtf
versions
Custom module to Add a new fasta file to an old one and update an associated GTF
Custom module used to dump software versions within the nf-core pipeline template
0
yml
mqc_yml
versions
Custom module used to dump software versions within the nf-core pipeline template
Generates a FASTA file of chromosome sizes and a fasta index file
0
1
sizes
fai
gzi
versions
Tools for dealing with SAM, BAM and CRAM files
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
0
1
0
1
gtf
versions
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
filter a matrix based on a minimum value and numbers of samples that must pass.
0
1
0
1
filtered
tests
session_info
versions
filter a matrix based on a minimum value and numbers of samples
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
ncbi_settings
versions
SRA Toolkit and SDK from NCBI
Make a GSEA class file (.cls) from tabular inputs
0
1
cls
versions
Make a GSEA class file (.cls) from tabular inputs
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
0
1
gct
versions
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.
0
1
0
1
0
0
0
tx2gene
versions
"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"
Perform adapter/quality trimming on sequencing reads
0
1
reads
log
versions
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
structural-variant calling with cutesv
0
1
2
0
1
vcf
versions
A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage
0
1
0
0
0
results
versions
DAS Tool binning step.
0
1
2
0
0
log
summary
contig2bin
eval
bins
pdfs
fasta_proteins
candidates_faa
fasta_archaea_scg
fasta_bacteria_scg
b6
seqlength
versions
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
0
1
0
fastatocontig2bin
versions
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
0
1
0
scaffolds2bin
versions
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.
0
report
versions
decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
0
1
0
dc_estimate
dc_pvals
versions
DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).
0
1
bam
json
hist
log
versions
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
NO input
db
versions
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
0
1
2
0
daa
daa_tsv
arg
potential_arg
versions
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.
NO input
db
versions
DeepBGC - Biosynthetic Gene Cluster detection and classification
DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.
0
1
0
readme
log
json
bgc_gbk
bgc_tsv
full_gbk
pfam_tsv
bgc_png
pr_png
roc_png
score_png
versions
DeepBGC - Biosynthetic Gene Cluster detection and classification
Deepcell/mesmer segmentation for whole-cell
0
1
0
1
mask
versions
Deep cell is a collection of tools to segment imaging data
A Deep Learning Model for Transmembrane Topology Prediction and Classification
0
1
gff3
line3
md
csv
png
versions
This tool filters alignments in a BAM/CRAM file according the the specified parameters.
0
1
2
bam
logs
versions
A set of user-friendly tools for normalization and visualzation of deep-sequencing data
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
0
1
2
0
0
bigwig
bedgraph
versions
A set of user-friendly tools for normalization and visualzation of deep-sequencing data
calculates scores per genome regions for other deeptools plotting utilities
0
1
0
matrix
table
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Computes read coverage for genomic regions (bins) across the entire genome.
0
1
2
3
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.
0
1
0
0
pdf
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots cumulative reads coverages by BAM file
0
1
2
pdf
matrix
metrics
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a heatmap
0
1
pdf
table
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.
0
1
pdf
tab
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a profile plot
0
1
pdf
table
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Call variants from the examples produced by make_examples
0
1
call_variants_tfrecords
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Transforms the input alignments to a format suitable for the deep neural network variant caller
0
1
2
3
0
1
0
1
0
1
0
1
examples
gvcf
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
report
versions
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Call structural variants
0
1
2
3
4
5
0
1
0
1
bcf
csi
versions
Structural variant discovery by integrated paired-end and split-read analysis
Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.
0
1
2
0
0
0
0
zarr
out_zarr
versions
runs a differential expression analysis with DESeq2
0
1
2
3
0
1
2
0
1
0
1
results
dispersion_plot
rdata
size_factors
normalised_counts
rlog_counts
vst_counts
model
session_info
versions
Differential gene expression analysis based on the negative binomial distribution
Queries a DIAMOND database using blastp mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
versions
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
0
1
0
1
0
0
blast
xml
txt
daa
sam
tsv
paf
log
versions
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
0
1
tsv
versions
Accelerated BLAST compatible local sequence aligner
Builds a DIAMOND database
0
1
0
0
0
db
versions
Accelerated BLAST compatible local sequence aligner
Doublet detection in single-cell RNA-seq data
0
1
h5ad
predictions
versions
Create DRAGEN hashtable for reference genome
0
1
hashmap
versions
Dragmap is the Dragen mapper/aligner Open Source Software.
Assemble bacterial isolate genomes from Nanopore reads
0
1
2
contigs
log
raw_contigs
gfa
txt
versions
Export assembly segment sequences in GFA 1.0 format to FASTA format
0
1
fasta
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
0
1
2
3
4
5
0
0
vcf
versions
Assessment of duplication rates in RNA-Seq datasets
0
1
0
1
scatter2d
boxplot
hist
dupmatrix
intercept_slope
multiqc
session_info
versions
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
0
1
2
0
1
2
vcf
tbi
versions
In silico prediction of E. coli serotype
0
1
log
tsv
txt
versions
Fast genome-wide functional annotation through orthology assignment.
0
1
0
0
0
1
annotations
orthologs
hits
versions
Convert any PEP project or Nextflow samplesheet to any format
0
0
0
versions
samplesheet_converted
Convert any PEP project or Nextflow samplesheet to any format
Provide the SNP coverage of each individual in an eigenstrat formatted dataset.
0
1
2
3
tsv
json
versions
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
Convert a file in FASTA format to the ELFASTA format
0
1
elfasta
log
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.
0
1
2
3
4
5
6
0
1
0
1
0
1
0
0
0
0
0
bam
logs
metrics
recall
gvcf
table
activity_profile
assembly_regions
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Merge split bam/sam chunks in one file
0
1
bam
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Split bam file into manageable chunks
0
1
bam
versions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.
0
1
consensus
versions
The European Molecular Biology Open Software Suite
the revseq program from emboss reverse complements a nucleotide sequence
0
1
revseq
versions
The European Molecular Biology Open Software Suite
EMM typing of Streptococcus pyogenes assemblies
0
1
tsv
versions
endorS.py calculates endogenous DNA from samtools flagstat files and print to screen
0
1
2
3
json
versions
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args
.
0
1
2
3
cache
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
0
1
0
output
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args
.
0
1
2
0
0
0
0
0
1
0
vcf
tab
json
report
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Searches a term in a public NCBI database
0
1
0
xml
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using Unique Identifier(s)
0
1
2
0
xml
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using an UID
0
1
0
0
0
txt
versions
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
phylogenetic placement of query sequences in a reference tree
0
1
2
3
0
0
epang
jplace
log
versions
Massively parallel phylogenetic placement of genetic sequences
splits an alignment into reference and query parts
0
1
2
query
reference
versions
Massively parallel phylogenetic placement of genetic sequences
estimation of the unfolded site frequency spectrum
0
1
2
3
sfs_out
pvalues_out
versions
Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly
0
1
dropset
okayset
versions
EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.
Estimate repeat sizes using NGS data
0
1
2
0
1
0
1
0
1
vcf
json
bam
versions
Merge STR profiles into a multi-sample STR profile
0
1
0
1
0
1
merged_profiles
versions
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Compute genome-wide STR profile
0
1
2
0
1
0
1
locus_tsv
motif_tsv
str_profile
versions
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Run falco on sequenced reads
0
1
html
txt
versions
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Aligns sequences using FAMSA
0
1
0
1
0
alignment
versions
Algorithm for large-scale multiple sequence alignments
Renders a guidetree in famsa
0
1
tree
versions
Algorithm for large-scale multiple sequence alignments
Perform adapter and quality trimming on sequencing reads with reporting
0
1
reads
stats
debug
statspdf
reads_fail
reads_unpaired
log
versions
tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.
0
1
0
log
txt
hmm
hmm_genes
orfs
orfs_amino
contigs
contigs_pept
filtered
filtered_pept
fragments
trimmed
spades
metagenome
tmp
versions
"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."
0
1
success_log
error_log
versions
"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"
Quickly compute statistics over a fasta file in windows.
0
1
freq
mononuc
dinuc
trinuc
tetranuc
versions
A fast K-mer counter for high-fidelity shotgun datasets
0
1
hist
ktab
prof
versions
A fast K-mer counter for high-fidelity shotgun datasets
A fast K-mer counter for high-fidelity shotgun datasets
0
1
hist
versions
A fast K-mer counter for high-fidelity shotgun datasets
A tool to merge FastK histograms
0
1
2
3
hist
ktab
prof
versions
A fast K-mer counter for high-fidelity shotgun datasets
Distance-based phylogeny with FastME
0
1
2
nwk
stats
matrix
bootstrap
versions
Perform adapter/quality trimming on sequencing reads
0
1
0
0
0
0
reads
json
html
log
reads_fail
reads_merged
versions
fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.
0
1
tsv
versions
Build fastq screen config file from bowtie index files
0
0
database
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Align reads to multiple reference genomes using fastq-screen
0
1
0
txt
png
html
fastq
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
0
1
fasta
versions
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Run NCBI's FCS adaptor on assembled genomes
0
1
cleaned_assembly
adaptor_report
log
pipeline_args
skipped_trims
versions
The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.
Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.
0
1
0
fcs_gx_report
taxonomy_report
versions
"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
0
1
0
0
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
0
1
0
0
bam
versions
Tools for working with genomic and high throughput sequencing data.
Collects a suite of metrics to QC duplex sequencing data.
0
1
0
family_sizes
duplex_family_sizes
duplex_yield_metrics
umi_counts
duplex_qc
duplex_umi_counts
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
0
1
bam
cram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
0
1
0
1
0
0
0
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
0
1
0
bam
histogram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.
0
1
bam
versions
Tools for working with genomic and high throughput sequencing data.
FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM
0
1
0
1
0
1
0
1
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Filtlong filters long reads based on quality measures or short read data.
0
1
2
reads
log
versions
Perform merging of mate paired-end sequencing reads
0
1
merged
notcombined
histogram
versions
De novo assembler for single molecule sequencing reads
0
1
0
fasta
gfa
gv
txt
log
json
versions
Efficient compression tool for protein structures
0
1
fcz
versions
Foldcomp: a library and format for compressing and indexing large protein structure sets
Decompression tool for foldcomp compressed structures
0
1
pdb
versions
Foldcomp: a library and format for compressing and indexing large protein structure sets
Aligns protein structures using foldmason
0
1
0
msa_3di
msa_aa
versions
Multiple Protein Structure Alignment at Scale with FoldMason
Create a database from protein structures
0
1
db
versions
Foldseek: fast and accurate protein structure search
Search for protein structural hits against a foldseek database of protein structures
0
1
0
1
aln
versions
Foldseek: fast and accurate protein structure search
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
0
fastq
versions
fq is a library to generate and validate FASTQ file pairs.
fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.
0
1
fastq
versions
fq is a library to generate and validate FASTQ file pairs.
Demultiplex fastq files
0
1
2
sample_fastq
metrics
most_frequent_unmatched
versions
A haplotype-based variant detector
0
1
2
3
4
5
0
1
0
1
0
1
0
1
0
1
vcf
versions
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
0
1
2
0
0
0
lineages
summarized
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
specify the relative abundance of each known haplotype
0
1
2
0
0
demix
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
downloads new versions of the curated SARS-CoV-2 lineage file and barcodes
0
barcodes
lineages_topology
lineages_meta
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
call variant and sequencing depth information of the variant
0
1
0
variants
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Cluster genome FASTA files by average nucleotide identity
0
1
2
3
tsv
dereplicated_bins
versions
Gene Allele Mutation Microbial Assessment
0
1
0
gamma
psl
gff
fasta
versions
Tool for Gene Allele Mutation Microbial Assessment
Build ganon database using custom reference sequences.
0
1
0
0
0
db
info
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Classify FASTQ files against ganon database
0
1
0
tre
report
one
all
unc
log
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a ganon report file from the output of ganon classify
0
1
0
tre
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a multi-sample report file from the output of ganon report runs
0
1
txt
versions
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
assigns taxonomy to query sequences in phylogenetic placement output
0
1
2
examineassign
profile
labelled_tree
per_query
krona
sativa
versions
Genesis Applications for Phylogenetic Placement Analysis
Grafts query sequences from phylogenetic placement on the reference tree
0
1
newick
versions
Genesis Applications for Phylogenetic Placement Analysis
colours a phylogeny with placement densities
0
1
newick
nexus
phyloxml
svg
colours
log
versions
Genesis Applications for Phylogenetic Placement Analysis
Performs local realignment around indels to correct for mapping errors
0
1
2
3
0
1
0
1
0
1
0
1
bam
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Generates a list of locations that should be considered for local realignment prior genotyping.
0
1
2
0
1
0
1
0
1
0
1
intervals
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
SNP and Indel variant caller on a per-locus basis
0
1
2
0
1
0
1
0
1
0
1
0
1
0
1
0
1
vcf
versions
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
bam
bai
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Annotates intervals with GC content, mappability, and segmental-duplication content
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
annotated_intervals
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
0
0
0
bam
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
0
1
2
3
4
5
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the allele-specific read counts for alle-specific expression analysis of RNAseq data
0
1
2
3
4
0
1
0
1
0
1
0
csv
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
0
0
0
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an interval list from a bed file and a reference dict
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
0
1
2
contamination
segmentation
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
estimates the parameters for the DRAGstr model
0
1
2
0
0
0
0
dragstr_model
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a Convolutional Neural Net to filter annotated variants
0
1
2
3
4
0
0
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
0
1
2
3
0
1
0
1
0
1
hdf5
tsv
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
3
4
0
0
0
split_read_evidence
split_read_evidence_index
paired_end_evidence
paired_end_evidence_index
site_depths
site_depths_index
versions
Genome Analysis Toolkit (GATK4)
Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
0
1
2
0
0
0
combined_gvcf
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
0
0
0
str_table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges adjacent DepthEvidence records
0
1
2
0
0
0
condensed_evidence
condensed_evidence_index
versions
Genome Analysis Toolkit (GATK4)
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.
0
1
pon
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
0
1
dict
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a panel of normals contraining germline and artifactual sites for use with mutect2.
0
1
0
1
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Denoises read counts to produce denoised copy ratios
0
1
0
1
standardized
denoised
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Determines the baseline contig ploidy for germline samples given counts data
0
1
2
3
0
1
0
calls
model
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Estimates the numbers of unique molecules in a sequencing library.
0
1
0
0
0
metrics
versions
Genome Analysis Toolkit (GATK4)
Converts FastQ file to SAM/BAM format
0
1
bam
versions
Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters intervals based on annotations and/or count statistics.
0
1
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
0
1
2
3
4
5
6
7
0
1
0
1
0
1
vcf
tbi
stats
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply tranche filtering
0
1
2
3
0
0
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers scattered BQSR recalibration reports into a single file
0
1
table
versions
Genome Analysis Toolkit (GATK4)
write your description here
0
1
0
table
versions
Genome Analysis Toolkit (GATK4)
merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
0
1
2
3
4
5
0
0
0
genomicsdb
updatedb
intervallist
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
vcf
tbi
versions
Genome Analysis Toolkit (GATK4)
Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.
0
1
2
3
4
cohortcalls
cohortmodel
casecalls
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
0
1
2
3
0
1
0
1
0
1
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call germline SNPs and indels via local re-assembly of haplotypes
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
vcf
tbi
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an index for a feature file, e.g. VCF or BED file.
0
1
index
versions
Genome Analysis Toolkit (GATK4)
Converts an Picard IntervalList file to a BED file.
0
1
bed
versions
Genome Analysis Toolkit (GATK4)
Splits the interval list file into unique, equally-sized interval files and place it under a directory
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
0
1
artifactprior
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Left align and trim variants using GATK4 LeftAlignAndTrimVariants.
0
1
2
3
0
0
0
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
cram
bam
crai
bai
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merge unmapped with mapped BAM files
0
1
2
0
1
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges mutect2 stats generated on different intervals/regions
0
1
stats
versions
Genome Analysis Toolkit (GATK4)
Merges several vcf files
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call somatic SNVs and indels via local assembly of haplotypes.
0
1
2
3
0
1
0
1
0
1
0
0
0
0
vcf
tbi
stats
f1r2
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
0
1
2
3
intervals
segments
denoised
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Prepares bins for coverage collection.
0
1
0
1
0
1
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
0
1
2
0
1
0
1
0
1
bam
cram
sam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
0
0
0
0
printed_evidence
printed_evidence_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Condenses homRef blocks in a single-sample GVCF
0
1
2
3
0
0
0
0
0
vcf
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Reverts SAM or BAM files to a previous state.
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts BAM/SAM file to FastQ format
0
1
fastq
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Select a subset of variants from a VCF file
0
1
2
3
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a fasta with the bases shifted by offset
0
1
0
1
0
1
shift_fa
shift_fai
shift_back_chain
dict
intervals
shift_intervals
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence
0
1
2
0
1
0
0
0
baf
baf_tbi
versions
Genome Analysis Toolkit (GATK4)
Splits CRAM files efficiently by taking advantage of their container based structure
0
1
split_crams
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Split intervals into sub-interval files.
0
1
0
1
0
1
0
1
split_intervals
versions
Genome Analysis Toolkit (GATK4)
Splits reads that contain Ns in their cigar string
0
1
2
3
0
1
0
1
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0
1
2
3
0
0
0
annotated_vcf
index
versions
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
0
1
2
0
0
0
0
clustered_vcf
clustered_vcf_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filter variants
0
1
2
0
1
0
1
0
1
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
0
1
2
0
0
0
0
0
0
recal
idx
tranches
plots
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Extract fields from a VCF file to a tab-delimited table
0
1
2
3
4
5
0
1
0
1
0
1
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
0
0
0
bam
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
0
0
0
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
0
output
bam_index
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.
0
1
0
output
versions
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
0
1
2
0
genes
features
clusters
gbk
json
versions
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
Convert a mappability file to bedgraph format
0
1
0
1
bedgraph
sizes
versions
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
0
1
index
log
versions
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Define the mappability of a reference
0
1
0
map
versions
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
0
1
index
info
versions
The GEM indexer (v3).
Performs fastq alignment to a fasta reference using using gem3-mapper
0
1
0
1
0
bam
versions
The GEM indexer (v3).
A derivative of GenomeScope2.0 modified to work with FastK
0
1
linear_plot
log_plot
model
summary
transformed_linear_plot
transformed_log_plot
kmer_cov
versions
create index file for genmap
0
1
index
versions
Ultra-fast computation of genome mappability.
create mappability files for a genome
0
1
0
1
wig
bedgraph
txt
csv
versions
Ultra-fast computation of genome mappability.
for annotating regions, frequencies, cadd scores
0
1
vcf
versions
Annotate genetic inheritance models in variant files
Score compounds
0
1
vcf
versions
Annotate genetic inheritance models in variant files
annotate models of inheritance
0
1
0
0
vcf
versions
Annotate genetic inheritance models in variant files
Score the variants of a vcf based on their annotation
0
1
0
0
vcf
versions
Annotate genetic inheritance models in variant files
Download geNomad databases and related files
NO input
genomad_db
versions
Identification of mobile genetic elements
Identify mobile genetic elements present in genomic assemblies
0
1
0
aggregated_classification
taxonomy
provirus
compositions
calibrated_classification
plasmid_fasta
plasmid_genes
plasmid_proteins
plasmid_summary
virus_fasta
virus_genes
virus_proteins
virus_summary
versions
Identification of mobile genetic elements
Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach
0
1
linear_plot_png
transformed_linear_plot_png
log_plot_png
transformed_log_plot_png
model
summary
lookup_table
fitted_histogram_png
versions
Genotype Salmonella Typhi from Mykrobe results
0
1
tsv
versions
Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)
Peak-calling for ChIP-seq and ATAC-seq enrichment experiments
0
1
2
0
peak
versions
bedgraph_pvalues
bedgraph_pileup
bed_intervals
duplicates
geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA
0
samples
versions
Retrieves GEO data from the Gene Expression Omnibus (GEO)
0
1
rds
expression
annotation
versions
Get data from NCBI Gene Expression Omnibus (GEO)
Downloads databases needed for running getorganelle
0
db
versions
Get organelle genomes from genome skimming data
Assembles organelle genomes from genomic data
0
1
0
1
fasta
etc
versions
Get organelle genomes from genome skimming data
Collapse walk-preserving shared affixes in variation graphs in GFA format
0
1
gfa
affixes
versions
A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
0
1
0
0
0
0
0
0
0
assembly_summary
assembly
versions
Converts GFA or rGFA files to FASTA
0
1
fasta
versions
Tools for manipulating sequence graphs in the GFA and rGFA formats
Summary statistics for GFA files
0
1
stats
versions
Tools for manipulating sequence graphs in the GFA and rGFA formats
Compare, merge, annotate and estimate accuracy of generated gtf files
0
1
0
1
2
0
1
annotated_gtf
combined_gtf
tmap
refmap
loci
stats
tracking
versions
Validate, filter, convert and perform various other operations on GFF files
0
1
0
gtf
gffread_gff
gffread_fasta
versions
gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
0
1
files
output
versions
gget enables efficient querying of genomic databases
Defines chunks where to run imputation
0
1
2
3
chunk_chr
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.
0
1
2
3
4
5
6
7
0
0
0
errors_cal
errors_grp
errors_spl
rsquare_grp
rsquare_spl
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Concatenates imputation chunks in a single VCF/BCF file ligating phased information.
0
1
2
merged_variants
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
0
1
2
3
4
5
6
7
8
phased_variants
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Generates haplotype calls by sampling haplotype estimates
0
1
haplo_sampled
versions
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Defines chunks where to run imputation
0
1
2
3
4
0
chunk_chr
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Program to compute the genotyping error rate at the sample or marker level.
0
1
2
3
4
5
6
7
8
0
1
2
3
4
0
0
errors_cal
errors_grp
errors_spl
rsquare_grp
rsquare_spl
rsquare_per_site
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.
0
1
2
merged_variants
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool for imputation and phasing from vcf file or directly from bam files.
0
1
2
3
4
5
6
7
8
9
0
1
2
phased_variants
stats_coverage
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool to create a binary reference panel for quick reading time.
0
1
2
3
4
0
1
bin_ref
versions
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
merge gVCF files and perform joint variant calling
0
1
0
1
bcf
versions
GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).
0
1
2
0
0
0
0
barcodes
matrix
features
classification_report
config_report
summary_report
versions
Writes a sorted concatenation of file/s
0
1
sorted
versions
Writes a sorted concatenation of file/s
Split a file into consecutive or interleaved sections
0
1
split
versions
The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.
Query metadata for any taxon across the tree of life.
0
1
2
taxonsearch
versions
goat-cli is a command line interface to query the Genomes on a Tree Open API.
Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
0
1
2
0
1
output
ped
bed
bed_index
roc
html
png
versions
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
runs a functional enrichment analysis with gprofiler2
0
1
0
0
all_enrich
rds
plot_png
plot_html
sub_enrich
sub_plot
filtered_gmt
session_info
versions
An R interface corresponding to the 2019 update of g:Profiler web tool.
Checks if the input file is bgzip compressed or not
0
1
compress_bgzip
versions
a wee tool for random access into BGZF files.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
0
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Tools for population-scale genotyping using pangenome graphs.
0
1
vcf
tbi
versions
A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
0
1
0
1
vcf
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
2
3
0
1
0
1
0
1
bedpe
bed
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
run the Broad Gene Set Enrichment tool in GSEA mode
0
1
2
3
0
1
0
rpt
index_html
heat_map_corr_plot
report_tsvs_ref
report_htmls_ref
report_tsvs_target
report_htmls_target
ranked_gene_list
gene_set_sizes
histogram
heatmap
pvalues_vs_nes_plot
ranked_list_corr
butterfly_plot
gene_set_tsv
gene_set_html
gene_set_heatmap
snapshot
gene_set_enplot
gene_set_dist
archive
versions
Gene Set Enrichment Analysis (GSEA)
Collapse redundant transcript models in Iso-Seq data.
0
1
0
bed
bed_trans_reads
local_density_error
polya
read
strand_check
trans_report
versions
varcov
variants
Collapse similar gene model
Merge multiple transcriptomes while maintaining source information.
0
1
0
bed
gene_report
merge
trans_report
versions
Gene-Switch Transcriptome Annotation by Modular Algorithms
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
0
1
fasta
report
tails
versions
Gene-Switch Transcriptome Annotation by Modular Algorithms
GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files
0
1
gt_gff3
error_log
versions
The GenomeTools genome analysis system
GenomeTools gt-gff3validator utility to strictly validate a GFF3 file
0
1
success_log
error_log
versions
The GenomeTools genome analysis system
Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility
0
1
tabout
gff3
fasta
inner_fasta
versions
The GenomeTools genome analysis system
GenomeTools gt-stat utility to show statistics about features contained in GFF3 files
0
1
stats
versions
The GenomeTools genome analysis system
Computes enhanced suffix array using GenomeTools gt-suffixerator utility
0
1
0
index
versions
The GenomeTools genome analysis system
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
0
1
0
1
0
0
summary
tree
markers
msa
user_msa
filtered
failed
log
warnings
versions
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
0
fasta
gff
vcf
stats
phylip
embl_predicted
embl_branch
tree
tree_labelled
versions
Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes
0
db
versions
Python package for detection of chimerism and contamination in prokaryotic genomes.
Merging of CheckM and GUNC results in one summary table
0
1
2
tsv
versions
Python package for detection of chimerism and contamination in prokaryotic genomes.
Detection of Chimerism and Contamination in Prokaryotic Genomes
0
1
0
maxcss_level_tsv
all_levels_tsv
versions
Python package for detection of chimerism and contamination in prokaryotic genomes.
Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.
0
1
vcf
versions
gvcftools is a package of small utilities for creating and analyzing gVCF files
Tool to convert and summarize ABRicate outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize DeepARG outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize fARGene outputs using the hAMRonization specification
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize RGI outputs using the hAMRonization specification.
0
1
0
0
0
json
tsv
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to summarize and combine all hAMRonization reports into a single file
0
0
json
tsv
html
versions
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.
0
1
txt
html
versions
classification into haplogroups
0
1
0
txt
versions
A tool for mtDNA haplogroup classification.
Somatic VCF Feature Extraction tool from hap.y.
0
1
2
3
4
0
1
0
1
features
versions
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
summary_csv
roc_all_csv
roc_indel_locations_csv
roc_indel_locations_pass_csv
roc_snp_locations_csv
roc_snp_locations_pass_csv
extended_csv
runinfo
metrics_json
vcf
tbi
versions
Haplotype VCF comparison tools
Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py
0
1
2
0
1
0
1
preprocessed_vcf
versions
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
features
metrics
stats
versions
Haplotype VCF comparison tools somatic variant comparison
Identify cap locus serotype and structure in your Haemophilus influenzae assemblies
0
1
0
0
gbk
svg
tsv
versions
Computes PCA eigenvectors for a Hi-C matrix.
0
1
results
pca1
pca2
versions
Set of programs to process, analyze and visualize Hi-C and capture Hi-C data
Whole-genome assembly using PacBio HiFi reads
0
1
0
1
2
0
1
2
raw_unitigs
corrected_reads
source_overlaps
reverse_overlaps
processed_contigs
processed_unitigs
primary_contigs
alternate_contigs
paternal_contigs
maternal_contigs
log
versions
Align RNA-Seq reads to a reference with HISAT2
0
1
0
1
0
1
bam
summary
fastq
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Builds HISAT2 index for reference genome
0
1
0
1
0
1
index
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Extracts splicing sites from a gtf files
0
1
txt
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Pre-compute the graph index structure.
0
1
graph
versions
HLA typing from short and long reads
Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.
0
1
2
3
results
extraction
extraction_mapped
extraction_unmpapped
hla
fastq
reads_per_level
remapped
versions
HLA typing from short and long reads
gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference
0
1
wig
versions
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.
0
1
bigwig
versions
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file
0
1
wig
versions
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
readCounter function from HMMcopy utilities, used to generate read in windows
0
1
2
wig
versions
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Mask multiple sequence alignments
0
1
2
3
4
5
6
7
0
maskedaln
fmask_rf
fmask_all
gmask_rf
gmask_all
pmask_rf
pmask_all
versions
Biosequence analysis using profile hidden Markov models
reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.
0
1
seqreformated
versions
Biosequence analysis using profile hidden Markov models
hmmalign from the HMMER suite aligns a number of sequences to an HMM profile
0
1
0
sthlm
versions
Biosequence analysis using profile hidden Markov models
create an hmm profile from a multiple sequence alignment
0
1
0
hmm
hmmbuildout
versions
Biosequence analysis using profile hidden Markov models
extract hmm from hmm database file or create index for hmm database
0
1
0
0
0
hmm
index
versions
Biosequence analysis using profile hidden Markov models
R script that scores output from multiple runs of hmmer/hmmsearch
0
1
hmmrank
versions
Biosequence analysis using profile hidden Markov models
A Language and Environment for Statistical Computing
Tidyverse: R packages for data science
search profile(s) against a sequence database
0
1
2
3
4
5
output
alignments
target_summary
domain_summary
versions
Biosequence analysis using profile hidden Markov models
Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline
0
1
vcf
versions
Human mitochondrial variants annotation using HmtVar.
Annotate peaks with HOMER suite
0
1
0
0
txt
stats
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Find peaks with HOMER suite
0
1
txt
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Create a tag directory with the HOMER suite
0
1
0
tagdir
taginfo
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Differential gene expression analysis based on the negative binomial distribution
Empirical Analysis of Digital Gene Expression Data in R
Create a UCSC bed graph with the HOMER suite
0
1
bedGraph
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Coverting from HOMER peak to BED file formats
0
1
bed
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Serotype prediction of Haemophilus parasuis assemblies
0
1
tsv
versions
count how many reads map to each feature
0
1
2
0
1
txt
versions
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
0
1
2
0
1
2
tsv
versions
useful command-line tools written to show-case hts-nim
HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.
0
1
0
1
log
dedup
annotated
stats
versions
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals
0
0
0
0
0
0
rds
txt
versions
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA
0
1
0
0
0
0
0
0
0
rdata
seg
cna_seg
seg_txt
corrected_depth
ichorcna_params
plots
genome_plot
versions
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
Plot a metagene of cross-link events/sites around various transcriptomic landmarks.
0
1
0
tsv
versions
Runs iCount peaks on a BED file of crosslinks
0
1
2
peaks
versions
Computational pipeline for analysis of iCLIP data
Formats a GTF file for use with iCount sigxls
0
1
0
gtf
regions
versions
Computational pipeline for analysis of iCLIP data
Runs iCount sigxls on a BED file of crosslinks
0
1
0
sigxls
scores
versions
Computational pipeline for analysis of iCLIP data
Report proportion of cross-link events/sites on each region type.
0
1
0
summary_type
summary_subtype
summary_gene
versions
Computational pipeline for analysis of iCLIP data
Demultiplex paired-end FASTQ files from QuantSeq-Pool
0
1
2
fastq
undetermined
stats
versions
igv.js is an embeddable interactive genome visualization component
0
1
2
browser
align_files
index_files
versions
Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as teh genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements
A Python application to generate self-contained HTML reports for variant review and other genomic applications
0
1
2
3
0
1
2
report
versions
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
0
1
0
1
0
1
out_tiff
versions
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
0
1
0
1
output
versions
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification
0
1
0
0
0
profile
snvs
gene_info
genome_info
linkage
mapping_info
scaffold_info
versions
Calculation of strain-level metrics
Produces protein annotations and predictions from an amino acids FASTA file
0
1
0
tsv
xml
gff3
json
versions
Download, extract, and check md5 of iPHoP databases
NO input
iphop_db
versions
Predict host genus from genomes of uncultivated phages.
Predict phage host using iPHoP
0
1
0
iphop_genus
iphop_genome
iphop_detailed_output
versions
Predict host genus from genomes of uncultivated phages.
Produces a Newick format phylogeny from a multiple sequence alignment using the maxium likelihood algorithm. Capable of bacterial genome size alignments.
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
phylogeny
report
mldist
lmap_svg
lmap_eps
lmap_quartetlh
sitefreq_out
bootstrap
state
contree
nex
splits
suptree
alninfo
partlh
siteprob
sitelh
treels
rate
mlrate
exch_matrix
log
versions
Quantification of transposable elements expression in scRNA-seq
0
1
0
0
versions
results
counts
log
tmp
Genomic island prediction in bacterial and archaeal genomes
0
1
gff
log
versions
IsoSeq - Cluster - Cluster trimmed consensus sequences
0
1
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
versions
IsoSeq - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
0
1
0
bam
pbi
consensusreadset
summary
report
versions
IsoSeq - Scalable De Novo Isoform Discovery
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
meta
bam
meta
version
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
meta
bam
primers
meta
bam
pbi
consensusreadset
summary
report
versions
IsoSeq3 - Scalable De Novo Isoform Discovery
Extract UMI and cell barcodes
0
1
0
bam
pbi
versions
Iso-Seq - Scalable De Novo Isoform Discovery
Generate a consensus sequence from a BAM file using iVar
0
1
0
0
fasta
qual
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Trim primer sequences rom a BAM file with iVar
0
1
2
0
bam
log
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Call variants from a BAM file using iVar
0
1
0
0
0
0
tsv
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.
0
1
0
0
report
artifacts
versions
Jupyter notebooks as plain text scripts or markdown documents
Parameterize, execute, and analyze notebooks
Parameterize, execute, and analyze notebooks
Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)
0
1
bed
versions
Java utilities for Bioinformatics.
Convert sam files to tsv files
0
1
2
3
0
1
2
3
tsv
versions
Java utilities for Bioinformatics.
Convert VCF to a user friendly table
0
1
2
3
0
1
output
versions
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Filtering VCF with dynamically-compiled java expressions
0
1
2
3
0
1
0
1
0
1
0
1
0
1
vcf
tbi
csi
versions
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
annotate VCF files for poly repeats
0
1
0
1
0
1
0
1
vcf
tbi
csi
versions
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Plot whole genome coverage from BAM/CRAM file as SVG
0
1
2
0
1
0
1
0
1
output
versions
Java utilities for Bioinformatics.
Taxonomic classification of metagenomic sequence data using a protein reference database
0
1
0
results
versions
Fast and sensitive taxonomic classification for metagenomics
Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.
0
1
0
txt
versions
Fast and sensitive taxonomic classification for metagenomics
write your description here
0
1
0
0
summary
versions
Fast and sensitive taxonomic classification for metagenomics
Merge two tab-separated output files of Kaiju and Kraken in the column format
0
1
2
0
merged
versions
Fast and sensitive taxonomic classification for metagenomics
Make Kaiju FMI-index file from a protein FASTA file
0
1
fmi
versions
Fast and sensitive taxonomic classification for metagenomics
Aligns sequences using kalign
0
1
0
alignment
versions
Kalign is a fast and accurate multiple sequence alignment algorithm.
Create kallisto index
0
1
index
versions
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
0
1
0
1
0
0
0
0
results
json_info
log
versions
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
quantifies scRNA-seq data from fastq files using kb-python.
0
1
0
0
0
0
0
0
count
versions
matrix
kallisto and bustools are wrapped in an easy-to-use program called kb
index creation for kb count quantification of single-cell data.
0
0
0
versions
index
t2g
cdna
intron
cdna_t2c
intron_t2c
kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.
Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.
0
0
0
reads
versions
khmer k-mer counting library
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
0
0
report
kmers
versions
khmer k-mer counting library
Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).
0
1
txt
versions
Generate k-mers (sketches) from FASTA/Q sequences
0
1
outdir
info
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Construct KMCP database from k-mer files
0
1
kmcp
log
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Merge search results from multiple databases.
0
1
result
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Generate taxonomic profile from search results
0
1
0
profile
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Search sequences against database
0
1
0
result
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Adds fasta files to a Kraken2 taxonomic database
0
1
0
0
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Builds Kraken2 database
0
1
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Downloads and builds Kraken2 standard database
0
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Classifies metagenomic sequence data
0
1
0
0
0
classified_reads_fastq
unclassified_reads_fastq
classified_reads_assignment
report
versions
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Takes multiple kraken-style reports and combines them into a single report file
0
1
txt
versions
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Extract reads classified at any user-specified taxonomy IDs.
0
0
1
0
1
0
1
extracted_kraken2_reads
versions
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Takes a Kraken report file and prints out a krona-compatible TEXT file
0
1
txt
versions
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Download and build (custom) KrakenUniq databases
0
1
2
3
db
versions
Metagenomics classifier with unique k-mer counting for more specific results
Download KrakenUniq databases and related fles
0
output
versions
Metagenomics classifier with unique k-mer counting for more specific results
Classifies metagenomic sequence data using unique k-mer counts
0
1
0
0
0
0
0
0
classified_reads
unclassified_reads
classified_assignment
report
versions
Metagenomics classifier with unique k-mer counting for more specific results
KronaTools Update Taxonomy downloads a taxonomy database
NO input
db
versions
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.
0
1
0
html
versions
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Creates a Krona chart from text files listing quantities and lineages.
0
1
html
versions
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Update Taxonomy downloads a taxonomy database
NO input
db
versions
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Aligns query sequences to target sequences indexed with lastdb
0
1
2
0
maf
multiqc
versions
LAST finds & aligns related regions of sequences.
Prepare sequences for subsequent alignment with lastal.
0
1
index
versions
LAST finds & aligns related regions of sequences.
Converts MAF alignments in another format.
0
1
0
axt_gz
blast_gz
blasttab_gz
chain_gz
gff_gz
html_gz
psl_gz
sam_gz
tab_gz
versions
LAST finds & aligns related regions of sequences.
Reorder alignments in a MAF file
0
1
maf
versions
LAST finds & aligns related regions of sequences.
Post-alignment masking
0
1
maf
versions
LAST finds & aligns related regions of sequences.
Find suitable score parameters for sequence alignment
0
1
0
param_file
multiqc
versions
LAST finds & aligns related regions of sequences.
Align sequences using learnMSA
0
1
0
alignment
versions
learnMSA: Learning and Aligning large Protein Families
Bayesian reconstruction of ancient DNA fragments
0
1
bam
fq_pass
fq_fail
unmerged_r1_fq_pass
unmerged_r1_fq_fail
unmerged_r2_fq_pass
unmerged_r2_fq_fail
log
versions
Typing of clinical and environmental isolates of Legionella pneumophila
0
1
tsv
versions
Index chain files for lift over
0
1
0
clft
versions
Fast and accurate coordinate conversion between assemblies
Converting aligned short and long reads records from one reference to another
0
1
0
1
bam
versions
Fast and accurate coordinate conversion between assemblies
runs a differential expression analysis with Limma
0
1
2
3
0
1
2
results
md_plot
rdata
model
session_info
normalised_counts
versions
Linear Models for Microarray Data
Serogrouping Listeria monocytogenes assemblies
0
1
tsv
versions
Lofreq subcommand to for insert base and indel alignment qualities
0
1
0
bam
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments
0
1
2
0
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
It predicts variants using multiple processors
0
1
2
3
0
1
0
1
vcf
tbi
versions
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Lofreq subcommand to remove variants with low coverage or strand bias potential
0
1
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Inserts indel qualities in a BAM file
0
1
0
1
bam
versions
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
2
3
4
5
0
1
0
1
vcf
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
0
1
bam
versions
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
bam
log
versions
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
vcf
versions
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder
0
1
scn
gff
versions
A Perl wrapper for LTR_FINDER
An efficient program for finding full-length LTR retrotranspsons in genome sequences
Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain
0
1
gff3
scn
versions
A Perl wrapper for LTR_harvest
The GenomeTools genome analysis system
Identifies LTR retrotransposons using LTR_retriever
meta
genome
harvest
finder
mgescan
non_tgca
meta
log
pass_list
pass_list_gff
ltrlib
annotation_out
annotation_gff
versions
Sensitive and accurate identification of LTR retrotransposons
Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments
0
1
0
0
0
log
lai_out
versions
Assessing genome assembly quality using the LTR Assembly Index (LAI)
Identifies LTR retrotransposons using LTR_retriever
0
1
0
0
0
0
log
pass_list
pass_list_gff
ltrlib
annotation_out
annotation_gff
versions
Sensitive and accurate identification of LTR retrotransposons
A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.
0
1
smorfs
all_orfs
amp_prediction
readme_file
log_file
versions
A pipeline for AMP (antimicrobial peptide) prediction
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
0
1
2
0
peak
xls
versions
gapped
bed
bdg
Model Based Analysis for ChIP-Seq data
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
0
1
2
0
peak
xls
versions
gapped
bed
bdg
Model Based Analysis for ChIP-Seq data
Multiple sequence alignment using MAFFT
0
1
0
1
0
1
0
1
0
1
0
1
0
fas
versions
Parallel implementation of the gzip algorithm.
mageck count for functional genomics, reads are usually mapped to a specific sgRNA
0
1
0
count
norm
versions
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
maximum-likelihood analysis of gene essentialities computation
0
1
0
gene_summary
sgrna_summary
versions
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.
0
1
gene_summary
sgrna_summary
r_script
versions
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Multiple Sequence Alignment using Graph Clustering
0
1
0
1
0
alignment
versions
Multiple Sequence Alignment using Graph Clustering
Multiple Sequence Alignment using Graph Clustering
0
1
tree
versions
Multiple Sequence Alignment using Graph Clustering
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
0
0
0
index
versions
log
A tool for mapping metagenomic data
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
0
1
0
rma6
alignments
log
versions
A tool for mapping metagenomic data
Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening
0
1
0
0
results
versions
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
vcf
tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
5
6
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
somatic_sv_vcf
somatic_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
tumor_sv_vcf
tumor_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Create mapAD index for reference genome
0
1
index
versions
An aDNA aware short-read mapper
Map short-reads to an indexed reference genome
0
1
0
1
0
0
0
0
0
0
0
bam
versions
An aDNA aware short-read mapper
Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
0
1
0
runtime_log
fragmisincorporation_plot
length_plot
misincorporation
lgdistribution
dnacomp
stats_out_mcmc_hist
stats_out_mcmc_iter
stats_out_mcmc_trace
stats_out_mcmc_iter_summ_stat
stats_out_mcmc_post_pred
stats_out_mcmc_correct_prob
dnacomp_genome
rescaled
pctot_freq
pgtoa_freq
fasta
folder
versions
Screens query sequences against large sequence databases
0
1
0
1
screen
versions
Fast sequence distance estimator that uses MinHash
Creates vastly reduced representations of sequences using MinHash
0
1
mash
stats
versions
Fast sequence distance estimator that uses MinHash
MaxBin is a software that is capable of clustering metagenomic contigs
0
1
2
3
binned_fastas
summary
abundance
log
marker_counts
unbinned_fasta
tooshort_fasta
marker_bins
marker_genes
versions
Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"
0
1
2
0
maxquant_txt
versions
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.
Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.
0
1
0
1
0
1
csv
versions
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
0
1
tsv
fa
versions
Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.
0
1
tif
versions
Staging modules for MCMICRO
Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.
0
1
tif
versions
Staging modules for MCMICRO
Analyses a DAA file and exports information in text format
0
1
0
txt_gz
megan
versions
A tool for studying the taxonomic content of a set of DNA reads
Analyses an RMA file and exports information in text format
0
1
0
txt
megan_summary
versions
A tool for studying the taxonomic content of a set of DNA reads
Serotyping of Neisseria meningitidis assemblies
0
1
tsv
versions
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
0
1
0
1
0
0
0
hist
log_stderr
versions
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
k-mer based assembly evaluation.
meta
meryl_db
assembly
meta
versions
assembly_only_kmers_bed
assembly_only_kmers_wig
stats
dist_hist
spectra_cn_fl_png
spectra_cn_ln_png
spectra_cn_st_png
spectra_cn_hist
spectra_asm_fl_png
spectra_asm_ln_png
spectra_asm_st_png
spectra_asm_hist
assembly_qv
scaffold_qv
read_ploidy
A script to generate hap-mer dbs for trios
0
1
0
0
mat_hapmer_meryl
pat_hapmer_meryl
inherited_hapmers_fl_png
inherited_hapmers_ln_png
inherited_hapmers_st_png
versions
Evaluate genome assemblies with k-mers and more.
k-mer based assembly evaluation.
0
1
2
assembly_only_kmers_bed
assembly_only_kmers_wig
stats
dist_hist
spectra_cn_fl_png
spectra_cn_hist
spectra_cn_ln_png
spectra_cn_st_png
spectra_asm_fl_png
spectra_asm_hist
spectra_asm_ln_png
spectra_asm_st_png
assembly_qv
scaffold_qv
read_ploidy
hapmers_blob_png
versions
Evaluate genome assemblies with k-mers and more.
A reimplemenation of Kat Comp to work with FastK databases
0
1
2
3
4
filled_png
line_png
stacked_png
filled_pdf
line_pdf
stacked_pdf
versions
FastK based version of Merqury
A reimplemenation of KatGC to work with FastK databases
0
1
2
filled_gc_plot_png
filled_gc_plot_pdf
line_gc_plot_png
line_gc_plot_pdf
stacked_gc_plot_png
stacked_gc_plot_pdf
versions
FastK based version of Merqury
FastK based version of Merqury
0
1
2
3
4
0
0
stats
bed
assembly_qv
spectra_cn_fl
spectra_cn_ln
spectra_cn_st
qv
spectra_asm_fl
spectra_asm_ln
spectra_asm_st
phased_block_bed
phased_block_stats
continuity_N
block_N
block_blob
hapmers_blob
versions
FastK based version of Merqury
An improved version of Smudgeplot using FastK
0
1
2
filled_ploidy_plot_png
filled_ploidy_plot_pdf
line_ploidy_plot_png
line_ploidy_plot_pdf
stacked_ploidy_plot_png
stacked_ploidy_plot_pdf
versions
FastK based version of Merqury
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
meryl_db
versions
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
hist
versions
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
0
meryl_db
versions
A genomic k-mer counter (and sequence utility) with nice features.
Depth computation per contig step of metabat2
0
1
2
depth
versions
Metagenome binning of contigs
0
1
2
tooshort
lowdepth
unbinned
membership
fasta
versions
Annotation of eukaryotic metagenomes using MetaEuk
0
1
0
faa
codon
tsv
gff
versions
Strain-level metagenomic assignment
0
1
2
3
4
0
wimp
evidence_unknown_species
reads2taxon
em
contig_coverage
length_and_id
krona
versions
Maps long reads to a metamaps database
0
1
0
classification_res
meta_file
meta_unmappedreadsLengths
para_file
versions
Build MetaPhlAn database for taxonomic profiling.
NO input
db
versions
Merges output abundance tables from MetaPhlAn4
0
1
txt
versions
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
0
1
0
profile
biom
bt2out
versions
Merges output abundance tables from MetaPhlAn3
0
1
txt
versions
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
0
1
0
profile
biom
bt2out
versions
Extracts per-base methylation metrics from alignments
0
1
2
0
0
bedgraph
methylkit
versions
Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.
Generates methylation bias plots from alignments
0
1
2
0
0
txt
versions
Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.
A tool to estimate bacterial species abundance
0
1
0
0
results
versions
An integrated pipeline for estimating strain-level genomic variation from metagenomic data
marks duplicate spots along gridline edges.
0
1
marked_dups_spots
versions
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
0
1
tiff
versions
Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.
Minia is a short-read assembler based on a de Bruijn graph
0
1
contigs
unitigs
h5
versions
Provides fasta index required by minimap2 alignment.
0
1
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by miniprot alignment.
0
1
index
versions
A versatile pairwise aligner for genomic and protein sequences.
miRanda is an algorithm for finding genomic targets for microRNAs
0
1
0
txt
versions
miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.
0
1
0
1
outputs
versions
miRDeep2 Mapper (mapper.pl
) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.
miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
0
1
2
0
1
0
1
2
3
outputs
versions
miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
It includes three main scripts: miRDeep2.pl
, mapper.pl
, and quantifier.pl
for comprehensive miRNA detection and quantification.
mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.
0
1
0
1
0
1
2
tsv
versions
Small RNA-seq annotation
mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package
0
1
0
1
0
1
2
tsv
fasta
vcf
versions
Small RNA-seq annotation
mirtop gff generates the GFF3 adapter format to capture miRNA variations
0
1
0
1
0
1
2
gff
versions
Small RNA-seq annotation
mirtop gff gets the number of isomiRs and miRNAs annotated in the GFF file by isomiR category.
0
1
txt
log
versions
Small RNA-seq annotation
A tool for quality control and tracing taxonomic origins of microRNA sequencing data
0
1
2
0
html
json
tsv
all_fa
rnatype_unknown_fa
versions
miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.
Download a mitochondrial genome to be used as reference for MitoHiFi
0
1
fasta
gb
versions
Fetch mitochondrial genome in Fasta and Genbank format from NCBI
A python workflow that assembles mitogenomes from Pacbio HiFi reads
0
1
0
0
0
0
fasta
stats
gb
gff
all_potential_contigs
contigs_annotations
contigs_circularization
contigs_filtering
coverage_mapping
coverage_plot
final_mitogenome_annotation
final_mitogenome_choice
final_mitogenome_coverage
potential_contigs
reads_mapping_and_assembly
shared_genes
versions
A python workflow that assembles mitogenomes from Pacbio HiFi reads
Cluster sequences using MMSeqs2 cluster.
0
1
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create an MMseqs database from an existing FASTA/Q file
0
1
db
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Creates sequence index for mmseqs database
0
1
db_indexed
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create a tsv file from a query and a target database as well as the result database
0
1
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Download an mmseqs-formatted database
0
database
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a databse using MMseqs2
0
1
0
1
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences in linear time using MMSeqs2 linclust.
0
1
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Search and calculate a score for similar sequences in a query and a target database.
0
1
0
1
db_search
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Computes the lowest common ancestor by identifying the query sequence homologs against the target database.
0
1
0
db_taxonomy
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
0
db_exprofile
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
A tool to reconstruct plasmids in bacterial assemblies
0
1
chromosome
contig_report
plasmids
mobtyper_results
versions
Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.
A bioinformatics tool for working with modified bases
0
1
2
0
1
0
1
bed
bedgraph
log
versions
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data
Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.
0
1
img_clahe
versions
One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.
Download the mOTUs database
0
db
versions
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
Taxonomic meta-omics profiling using universal marker genes
0
1
0
0
txt
biom
versions
Marker gene-based OTU (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
0
1
0
out
versions
Marker gene-based operational taxonomic unit (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
0
1
0
out
bam
mgc
log
versions
Marker gene-based OTU (mOTU) profiling
Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data
0
1
2
3
4
5
6
output
output_dis
output_germline
output_somatic
versions
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Scan a reference genome to get microsatellite & homopolymer information
0
1
txt
versions
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
msisensor2 detection of MSI regions.
0
1
2
3
4
5
0
0
msi
distribution
somatic
versions
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
msisensor2 detection of MSI regions.
0
0
scan
versions
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
0
1
2
3
4
5
0
0
output_report
output_dis
output_germline
output_somatic
versions
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
0
1
list
versions
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Aligns protein structures using mTM-align
0
1
0
alignment
structure
versions
Algorithm for structural multiple sequence alignments
Parallel implementation of the gzip algorithm.
A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.
0
1
0
mtnucratio
json
versions
Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.
0
1
0
0
0
bam
rad
versions
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.
0
index
versions
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Aggregate results from bioinformatics analyses across many samples into a single report
0
0
0
0
0
0
report
data
plots
versions
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA
0
1
0
1
0
1
0
1
0
0
0
0
0
0
1
full_alignment
info_txt
snp_alignment
snp_genome_alignment
snpstatistics
snptable
snptable_snpeff
snptable_uncertainty
structure_genotypes
structure_genotypes_nomissing
json
versions
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two
0
1
aligned_fasta
phyi
phys
clustalw
html
msf
tree
log
versions
Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.
0
1
0
alignment
versions
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
Parallel implementation of the gzip algorithm.
AMR predictions for supported species
0
1
0
csv
json
versions
Antibiotic resistance prediction in minutes
Compare multiple runs of long read sequencing data and alignments
0
1
report_html
lengths_violin_html
log_length_violin_html
n50_html
number_of_reads_html
overlay_histogram_html
overlay_histogram_normalized_html
overlay_log_histogram_html
overlay_log_histogram_normalized_html
total_throughput_html
quals_violin_html
overlay_histogram_identity_html
overlay_histogram_phredscore_html
percent_identity_violin_html
active_pores_over_time_html
cumulative_yield_plot_gigabases_html
sequencing_speed_over_time_html
stats_txt
versions
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
0
1
html
png
txt
log
versions
Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
0
1
0
stats
reads
versions
Create DRAGEN hashtable for reference genome
0
1
hashmap
versions
narfmap is a fork of the Dragen mapper/aligner Open Source Software.
A tool to quickly download assemblies from NCBI's Assembly database
0
0
0
0
gbk
fna
rm
features
gff
faa
gpff
wgs_gbk
cds
rna
rna_fna
report
stats
versions
NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose
0
1
0
1
vecscreen_output
versions
"NCBI libraries for biology applications (text-based utilities)"
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
0
dataset
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
1
0
csv
csv_errors
csv_insertions
tsv
json
json_auspice
ndjson
fasta_aligned
fasta_translation
nwk
versions
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
Performs fastq alignment to a fasta reference using NextGenMap
0
1
0
bam
versions
NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime
Serotyping Neisseria gonorrhoeae assemblies
0
1
tsv
versions
Merging paired-end reads and removing sequencing adapters.
0
1
merged_reads
unstitched_read1
unstitched_read2
versions
Determines the gender of a sample from the BAM/CRAM file.
0
1
2
0
1
0
1
0
tsv
versions
Short-read sequencing tools
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.
0
1
0
1
vaf
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.
0
1
0
1
0
1
corr_matrix
matched
all
pdf
vcf
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
0
1
0
1
0
1
pt
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
0
1
pdf
corr_matrix
all
matched
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
write your description here
meta
reads
format
mode
meta
versions
npa
npc
npl
npo
Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file
0
1
png
versions
Estimate average coverage and create curves for metagenomic datasets
Calculate metagenome redundancy curve from FASTQ files
0
1
0
0
npa
npc
npl
npo
versions
Estimate average coverage and create curves for metagenomic datasets
Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files
0
1
json
tsv
csv
pdf
versions
Estimate average coverage and create curves for metagenomic datasets
Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image
0
1
png
versions
Estimate average coverage and create curves for metagenomic datasets
NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.
0
1
2
delta
coords
versions
Construct a dynamic succinct variation graph in ODGI format from a GFAv1.
0
1
og
versions
An optimized dynamic genome/graph implementation
Draw previously-determined 2D layouts of the graph with diverse annotations.
0
1
2
png
versions
An optimized dynamic genome/graph implementation
Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.
0
1
lay
tsv
versions
An optimized dynamic genome/graph implementation
Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.
0
1
sorted_graph
versions
An optimized dynamic genome/graph implementation
Squeezes multiple graphs in ODGI format into the same file in ODGI format.
0
1
graph
versions
An optimized dynamic genome/graph implementation
Metrics describing a variation graph and its path relationship.
0
1
tsv
yaml
versions
An optimized dynamic genome/graph implementation
Merge unitigs into a single node preserving the node order.
0
1
unchopped_graph
versions
An optimized dynamic genome/graph implementation
Project a graph into other formats.
0
1
gfa
versions
An optimized dynamic genome/graph implementation
Visualize a variation graph in 1D.
0
1
png
versions
An optimized dynamic genome/graph implementation
Calls CNVs in bam files from tumor patients
0
1
2
3
4
0
0
png
profile
summary
versions
Create a decoy peptide database from a standard FASTA database.
0
1
decoy_fasta
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
0
1
2
filtered
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Calculates a distribution of the mass error from given mass spectra and IDs.
0
1
2
frag_err
prec_err
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Merges several idXML files into one idXML file.
0
1
idxml
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Split a merged identification file into their originating identification files
0
1
idxmls
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Switches between different scores of peptide or protein hits in identification data
0
1
idxml
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)
0
1
mzml
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Refreshes the protein references for all peptide hits.
0
1
0
1
id_file_pi
versions
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Annotates MS/MS spectra using Comet.
0
1
2
idxml
pin
versions
Perform HLA-I typing of sequencing data
0
1
2
hla_type
coverage_plot
versions
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.
0
1
0
1
orthofinder
working
versions
A program to convert bam into paf.
0
1
paf
versions
A program to manipulate paf files / convert to and from paf.
a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates
0
1
index
versions
Find and remove PCR/optical duplicates
0
1
pairs
stat
versions
CLI tools to process mapped Hi-C data
Flip pairs to get an upper-triangular matrix
0
1
0
flip
versions
CLI tools to process mapped Hi-C data
Merge multiple pairs/pairsam files
0
1
pairs
versions
CLI tools to process mapped Hi-C data
Find ligation junctions in .sam, make .pairs
0
1
0
pairsam
stat
versions
CLI tools to process mapped Hi-C data
Assign restriction fragments to pairs
0
1
0
restrict
versions
CLI tools to process mapped Hi-C data
Select pairs according to given condition by options.args
0
1
selected
unselected
versions
CLI tools to process mapped Hi-C data
Sort a .pairs/.pairsam file
0
1
sorted
versions
CLI tools to process mapped Hi-C data
Split a .pairsam file into .pairs and .sam.
0
1
pairs
bam
versions
CLI tools to process mapped Hi-C data
Calculate pairs statistics
0
1
stats
versions
CLI tools to process mapped Hi-C data
Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file
0
1
0
0
0
tsv
versions
panacus is a tool for computing counting statistics for GFA files
Create visualizations from a tsv coverage histogram created with panacus.
0
1
image
versions
panacus is a tool for computing counting statistics for GFA files
A fast and scalable tool for bacterial pangenome analysis
0
1
results
aln
versions
panaroo - an updated pipeline for pangenome investigation
NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).
0
1
2
3
4
0
1
bam
bai
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database
0
1
2
3
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.
0
1
2
3
0
1
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.
0
1
0
1
0
1
0
1
0
bam
bai
bqsr_table
versions
qc_metrics
duplicate_metrics
NVIDIA Clara Parabricks GPU-accelerated genomics tools
VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence
0
1
0
1
0
1
0
bam
bai
qc_metrics
bqsr_table
duplicate_metrics
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs
0
1
0
1
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.
0
1
2
3
0
1
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.
0
1
gvcf_index
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Determines the depth in a BAM/CRAM file
0
1
2
0
1
0
1
depth
binned_depth
versions
Graph realignment tools for structural variants
Genotype structural variants using paragraph and grmpy
0
1
2
3
4
5
0
1
0
1
vcf
json
versions
Graph realignment tools for structural variants
Convert a VCF file to a JSON graph
0
1
0
1
graph
versions
Graph realignment tools for structural variants
The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.
0
1
bam
pbi
versions
PacBio BAM C++ library
Alignment with PacBio's minimap2 frontend
0
1
0
1
bam
versions
A minimap2 frontend for PacBio native data formats
pbsv - PacBio structural variant (SV) signature discovery tool
0
1
0
1
svsig
versions
pbsv - PacBio structural variant (SV) calling and analysis tools
converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq
0
1
2
fastq
versions
pbtk - PacBio BAM toolkit
Minimalistic tool which creates an index file that enables random access into PacBio BAM files
0
1
pbi
versions
pbtk - PacBio BAM toolkit
"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."
0
1
spp
pdf
rdata
versions
Predict prophages in bacterial genomes
0
1
coordinates
gbk
log
information
bacteria_fasta
bacteria_gbk
phage_fasta
phage_gbk
prophage_gff
prophage_tbl
prophage_tsv
versions
Prophage finder using multiple metrics
phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
0
1
0
0
results
versions
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
bam
bai
cram
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates an interval list from a bed file and a reference dict
0
1
0
1
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
0
1
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collects hybrid-selection (HS) metrics for a SAM or BAM file.
0
1
2
3
4
0
1
0
1
0
1
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the insert size distribution of a paired-end library.
0
1
metrics
histogram
versions
Java tools for working with NGS data in the BAM format
Collect multiple metrics from a BAM file
0
1
2
0
1
0
1
metrics
pdf
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics from a RNAseq BAM file
0
1
0
0
0
metrics
pdf
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
0
1
2
0
1
0
1
0
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates a sequence dictionary for a reference sequence.
0
1
reference_dict
versions
Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.
Checks that all data in the set of input files appear to come from the same individual
0
1
2
3
4
5
0
1
crosscheck_metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.
0
1
2
0
0
0
0
vcf
tbi
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Converts a FASTQ file to an unaligned BAM or SAM file.
0
1
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list
0
1
2
0
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Verify mate-pair information between mates and fix if needed
0
1
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Lifts over a VCF file from one reference build to another.
0
1
0
1
0
1
0
1
vcf_lifted
vcf_unlifted
versions
Move annotations from one assembly to another
Locate and tag duplicate reads in a BAM file
0
1
0
1
0
1
bam
bai
cram
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Merges multiple BAM files into a single file
0
1
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads
0
1
2
bam
bai
num_reads
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
changes name of sample in the vcf file
0
1
vcf
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases
0
1
0
1
0
1
intervals
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts BAM/SAM files based on a variety of picard specific criteria
0
1
0
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts vcf files
0
1
0
1
0
1
vcf
versions
Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format
Compresses files with pigz.
0
1
archive
versions
Parallel implementation of the gzip algorithm.
write your description here
0
1
file
versions
Parallel implementation of the gzip algorithm.
Automatically improve draft assemblies and find variation among strains, including large event detection
0
1
0
1
2
0
improved_assembly
vcf
change_record
tracks_bed
tracks_wig
versions
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
0
1
2
0
0
0
bp
cem
del
dd
int_final
inv
li
rp
si
td
versions
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Main caller script for peak calling
0
1
2
0
divergent_TREs
bidirectional_TREs
unidirectional_TREs
peakcalling_log
versions
Peak Identifier for Nascent Transcripts Starts (PINTS)
Identify plasmids in bacterial sequences and assemblies
0
1
json
txt
tsv
genome_seq
plasmid_seq
versions
Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
epi
episummary
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Exclude variant identifiers from plink bfiles
0
1
2
3
4
bed
bim
fam
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink bfiles with a text file of variant identifiers
0
1
2
3
4
bed
bim
fam
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
fepi
fepisummary
flog
fnosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate GWAS association studies
0
1
2
3
0
1
0
1
0
1
assoc
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate Hardy-Weinberg statistics for provided input
0
1
2
3
0
1
0
1
hwe
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.
0
1
2
3
0
0
0
prunein
pruneout
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.
0
1
2
3
0
0
0
prunein
pruneout
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
LD analysis in PLINK examines genetic variant associations within populations
0
1
2
3
0
1
0
1
0
1
ld
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Recodes plink bfiles into a new text fileset applying different modifiers
0
1
2
3
ped
map
txt
raw
traw
beagledat
chrdat
chrmap
geno
pheno
pos
phase
info
lgen
list
gen
gengz
sample
rlist
strctin
tped
tfam
vcf
vcfgz
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink pfiles with a text file of variant identifiers
0
1
2
3
4
extract_pgen
extract_psam
extract_pvar
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Apply a scoring system to each sample in a plink 2 fileset
0
1
2
3
0
score
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Import variant genetic data using plink2
0
1
pgen
psam
pvar
pvar_zst
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
pmdtools command to filter ancient DNA molecules from others
0
1
2
0
0
bam
versions
Compute postmortem damage patterns and decontaminate ancient genomes
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
xml
txt
versions
PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.
0
1
0
1
0
1
2
vcf
max_cov
bad_sites
versions
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.
0
1
2
3
demuxlet_result
versions
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
0
1
2
cel
plp
var
umi
versions
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.
0
1
2
result
vcf
lmix
singlet_result
singlet_vcf
versions
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
0
1
reads
log
versions
Adapter removal and demultiplexing of Oxford Nanopore reads
0
1
reads
log
versions
Adapter removal and demultiplexing of Oxford Nanopore reads
Software for predicting library complexity and genome coverage in high-throughput sequencing
0
1
c_curve
log
versions
Software for predicting library complexity and genome coverage in high-throughput sequencing
Software for predicting library complexity and genome coverage in high-throughput sequencing
0
1
lc_extrap
log
versions
Software for predicting library complexity and genome coverage in high-throughput sequencing
Calculate pairwise nucleotide identity with respect to a reference sequence
0
1
0
1
0
valid_fasta
invalid_fasta
report
log
versions
Filter reads by quality score.
0
1
reads
logs
versions
log_tab
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
converts sam/bam/cram/pairs into genome contact map
0
1
0
1
2
pretext
versions
a module to generate images from Pretext contact maps.
0
1
image
versions
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
0
1
good_reads
single_reads
bad_reads
log
versions
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
0
1
0
gene_annotations
nucleotide_fasta
amino_acid_fasta
all_gene_annotations
versions
Whole genome annotation of small genomes (bacterial, archeal, viral)
0
1
0
0
gff
gbk
fna
faa
ffn
sqn
fsa
tbl
err
log
txt
tsv
versions
frame-shift correction for long read (meta)genomics - fix frameshifts in reads
0
1
0
1
out_fa
versions
frame-shift correction for long read (meta)genomics
frame-shift correction for long read (meta)genomics - maps proteins to reads
0
1
2
tsv
versions
frame-shift correction for long read (meta)genomics
Perform Gene Ratio Enrichment Analysis
0
1
0
1
enrichedGO
versions
session_info
Gene Ratio Enrichment Analysis
Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)
0
1
logratio
session_info
versions
Logratio methods for omics data
Perform differential proportionality analysis
0
1
0
1
propd
results
fdr
adj
warnings
session_info
versions
Logratio methods for omics data
Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.
0
1
propr
matrix
fdr
adj
warnings
session_info
versions
Logratio methods for omics data
Efficient Estimation of Covariance and (Partial) Correlation
Proteinortho is a tool to detect orthologous genes within different species.
0
1
orthologgroups
orthologgraph
blastgraph
versions
reads a maxQuant proteinGroups file with Proteus
0
1
2
dendro_plot
mean_var_plot
raw_dist_plot
norm_dist_plot
raw_rdata
norm_rdata
raw_tab
norm_tab
session_info
versions
R package for analysing proteomics data
Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.
0
1
2
0
txt
png
loess_qc_txt
loess_txt
versions
Copy number calling and SNV classification using targeted short read sequencing
Generate on and off-target intervals for PureCN from a list of targets
0
1
0
1
0
txt
bed
versions
Copy number calling and SNV classification using targeted short read sequencing
Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.
0
1
2
3
0
0
rds
png
bias_rds
bias_bed
low_cov_bed
versions
Copy number calling and SNV classification using targeted short read sequencing
Run PureCN workflow to normalize, segment and determine purity and ploidy
0
1
2
0
0
pdf
local_optima_pdf
seg
genes_csv
amplification_pvalues_csv
vcf_gz
variants_csv
loh_csv
chr_pdf
segmentation_pdf
multisample_seg
versions
Copy number calling and SNV classification using targeted short read sequencing
Calculate coverage cutoffs to determine when to purge duplicated sequence.
0
1
cutoff
log
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Separates out sequences purged of falsely duplicated sequences.
0
1
2
haplotigs
purged
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Plots the read coverage from a purge dups statistics file and cutoffs.
0
1
2
png
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Create read depth histogram and base-level read depth for an assembly based on pacbio data
0
1
stat
basecov
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Purge haplotigs and overlaps for an assembly
0
1
2
3
bed
log
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Split fasta file by 'N's to aid in self alignment for duplicate purging
0
1
split_fasta
versions
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Damage parameter estimation for ancient DNA
0
1
2
csv
versions
Damage parameter estimation for ancient DNA
Damage parameter estimation for ancient DNA
0
1
csv
versions
Damage parameter estimation for ancient DNA
Compute summary statistics for control gene from BAM files.
0
1
2
0
0
control_stats
versions
A Python package for pharmacogenomics research
Call SNVs/indels from BAM files for all target genes.
0
1
2
0
1
0
0
vcf
tbi
versions
A Python package for pharmacogenomics research
Prepare a depth of coverage file for all target genes with SV from BAM files.
0
1
2
0
0
coverage
versions
A Python package for pharmacogenomics research
Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.
0
1
0
annotations
fna
faa
score
versions
Evaluate alignment data
0
1
0
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
0
1
2
0
0
0
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
0
1
0
1
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
0
1
vcf
tbi
rdata
plots
versions
Read aware low coverage whole genome sequence imputation from a reference panel
Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.
0
stockholm_alignment
phylogeny
versions
Randomly subsample sequencing reads to a specified coverage
0
1
2
0
reads
versions
De novo genome assembler for long uncorrected reads.
0
1
fasta
gfa
versions
RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.
0
phylogeny
phylogeny_bootstrapped
versions
Create a database for RepeatModeler
0
1
db
versions
RepeatModeler is a de-novo repeat family identification and modeling package.
Performs de novo transposable element (TE) family identification with RepeatModeler
0
1
fasta
stk
log
versions
RepeatModeler is a de-novo repeat family identification and modeling package.
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
0
1
2
0
0
json
disinfinder_kma
pheno_table_species
pheno_table
pointfinder_kma
pointfinder_prediction
pointfinder_results
pointfinder_table
resfinder_hit_in_genome_seq
resfinder_blast
resfinder_kma
resfinder_resistance_gene_seq
resfinder_results_table
resfinder_results_tab
resfinder_results
versions
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data
0
db
tool_version
db_version
versions
This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.
Predict antibiotic resistance from protein or nucleotide data
0
1
0
0
json
tsv
tmp
tool_version
db_version
versions
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Markup VCF file using rho-calls.
0
1
2
0
1
0
vcf
versions
Call regions of homozygosity and make tentative UPD calls.
Call regions of homozygosity and make tentative UPD calls
0
1
0
1
bed
wig
versions
Call regions of homozygosity and make tentative UPD calls.
Quality control of riboseq bam data
0
1
2
0
1
2
0
1
2
0
1
0
1
0
1
predictions
all
transprofile
versions
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Quality control of riboseq bam data
0
1
2
0
1
distribution
pdf
offset
versions
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Accurate detection of short and long active ORFs using Ribo-seq data
0
1
2
0
1
protocol
bam_summary
read_length_dist
metagene_profile_5p
metagene_profile_3p
metagene_plots
psite_offsets
pos_wig
neg_wig
orfs
versions
Python package to detect translating ORF from Ribo-seq data
Accurate detection of short and long active ORFs using Ribo-seq data
0
1
2
candidate_orfs
versions
Python package to detect translating ORF from Ribo-seq data
Render an rmarkdown notebook. Supports parametrization.
0
1
0
0
report
parameterised_notebook
artifacts
session_info
versions
Dynamic Documents for R
Calculate pan-genome from annotated bacterial assemblies in GFF3 format
0
1
results
aln
versions
Ribosomal RNA extraction from a GTF file.
0
rrna_gtf
versions
Calculate expression with RSEM
0
1
0
counts_gene
counts_transcript
stat
logs
versions
bam_star
bam_genome
bam_transcript
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Prepare a reference genome for RSEM
0
0
index
transcript_fasta
versions
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Generate statistics from a bam file
0
1
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Infer strandedness from sequencing reads
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate inner distance between read pairs.
0
1
0
distance
freq
mean
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
0
xls
rscript
log
bed
interact_bed
pdf
events_pdf
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
0
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate read duplication rate
0
1
seq_xls
pos_xls
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculte TIN (transcript integrity number) from RNA-seq reads
0
1
2
0
txt
xls
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Converts a PED file to VCF headers
0
1
output
versions
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.
0
1
png
svg
versions
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set
0
1
2
3
4
5
6
0
1
tp_vcf
tp_tbi
fn_vcf
fn_tbi
fp_vcf
fp_tbi
baseline_vcf
baseline_tbi
snp_roc
non_snp_roc
weighted_roc
summary
phasing
versions
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Uses the RTN R package for transcriptional regulatory network inference (TNI).
0
1
tni
tni_perm
tni_bootstrap
tni_filtered
versions
RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons
sage is a search software for proteomics data
0
1
0
1
0
1
results_tsv
results_json
results_pin
versions
tmt_tsv
lfq_tsv
Proteomics searching so fast it feels like magic.
Create index for salmon
0
0
index
versions
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
gene/transcript quantification with Salmon
0
1
0
0
0
0
0
results
json_info
lib_format_counts
versions
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
SALSA, A tool to scaffold long read assemblies with HiC
0
1
2
0
0
0
0
fasta
agp
agp_original_coordinates
versions
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
0
1
2
0
csv
json
bam
versions
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
Outputs some statistics drawn from read flags.
0
1
stats
versions
Tools for working with SAM/BAM data
find and mark duplicate reads in BAM file
0
1
bam
bai
versions
process your BAM data faster!
This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file
0
1
bam
versions
Module to validate illumina® Sample Sheet v2 files.
0
1
0
samplesheet
versions
Clips read alignments where they match BED file defined regions
0
1
0
0
0
bam
stats
rejects_bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0
1
0
reads
versions
Tools for dealing with SAM, BAM and CRAM files
calculates MD and NM tags
0
1
0
1
bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Concatenate BAM or CRAM file
0
1
bam
cram
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
shuffles and groups reads together by their names
0
1
0
1
bam
cram
sam
versions
Tools for dealing with SAM, BAM and CRAM files
The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0
1
0
1
0
fastq
fastq_interleaved
fastq_other
fastq_singleton
versions
Tools for dealing with SAM, BAM and CRAM files
Produces a consensus FASTA/FASTQ/PILEUP
0
1
fasta
fastq
pileup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
convert and then index CRAM -> BAM or BAM -> CRAM file
0
1
2
0
1
0
1
bam
cram
bai
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
produces a histogram or table of coverage per chromosome
0
1
2
0
1
0
1
coverage
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
List CRAM Content-ID and Data-Series sizes
0
1
size
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Computes the depth at each position or region.
0
1
0
1
tsv
versions
Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region
Create a sequence dictionary file from a FASTA file
0
1
dict
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index FASTA file
0
1
0
1
fa
fai
gzi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Converts a SAM/BAM/CRAM file to FASTA
0
1
0
fasta
interleaved
singleton
other
versions
Tools for dealing with SAM, BAM and CRAM files
Converts a SAM/BAM/CRAM file to FASTQ
0
1
0
fastq
interleaved
singleton
other
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
0
1
bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
0
1
2
flagstat
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
readgroup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Reports alignment summary statistics for a BAM/CRAM/SAM file
0
1
2
idxstats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
converts FASTQ files to unmapped SAM/BAM/CRAM
0
1
sam
bam
cram
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index SAM/BAM/CRAM file
0
1
bai
csi
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
mark duplicate alignments in a coordinate sorted file
0
1
0
1
bam
cram
sam
versions
Tools for dealing with SAM, BAM and CRAM files
Merge BAM or CRAM file
0
1
0
1
0
1
bam
cram
csi
crai
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
BAM
0
1
2
0
mpileup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
0
1
bam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file
0
1
0
1
bam
cram
csi
crai
metrics
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Sort SAM/BAM/CRAM file
0
1
0
1
bam
cram
crai
csi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces comprehensive statistics from SAM/BAM/CRAM file
0
1
2
0
1
stats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
2
0
1
0
bam
cram
sam
bai
csi
crai
unselected
unselected_index
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SCIMAP is a suite of tools that enables spatial single-cell analyses
0
1
csv
h5ad
versions
Scimap is a scalable toolkit for analyzing spatial molecular data.
SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.
0
1
spatial_lda_output
composition_plot
motif_location_plot
versions
Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.
The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier
0
1
0
0
meis_tab
dels_tab
vcf
versions
Soft Clipped Read Alignment Mapper
The cluster_identifier tool of Scramble identifies soft clipped clusters
0
1
2
0
clusters
versions
Soft Clipped Read Alignment Mapper
Module to use scAR to remove ambient RNA from single-cell RNA-seq data
0
1
2
h5ad
versions
scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data
scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.
Call peaks using SEACR on sequenced reads in bedgraph format
0
1
2
0
bed
versions
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
0
1
0
0
alignment
trans_alignments
multi_bed
single_bed
versions
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
Generate genome indices for segemehl align
0
index
versions
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
metagenomic binning with self-supervised learning
0
1
2
csv
model
output_fasta
recluster_fasta
tsv
versions
Metagenomic binning with semi-supervised siamese neural network
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
0
1
2
3
4
5
0
1
0
1
vcf
tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Create BWA index for reference genome
0
1
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
0
1
0
1
0
1
0
1
bam_and_bai
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the Picard CollectVariantCallingMetrics tool.
0
1
2
0
1
2
0
1
0
1
0
1
metrics
summary
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the GATK DepthOfCoverage tool.
0
1
2
0
1
0
1
0
1
0
1
per_locus
sample_summary
statistics
coverage_counts
coverage_proportions
interval_summary
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects multiple quality metrics from a bam file
0
1
2
0
1
0
1
0
mq_metrics
qd_metrics
gc_summary
gc_metrics
aln_metrics
is_metrics
mq_plot
qd_plot
is_plot
gc_plot
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
0
1
2
0
1
0
1
cram
crai
bam
bai
score
metrics
metrics_multiqc_tsv
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
0
1
2
0
1
0
1
0
1
vcf
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
0
1
2
3
0
1
0
1
0
1
0
1
vcf_gz
vcf_gz_tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Generate recalibration table and optionally perform base quality recalibration
0
1
2
0
1
0
1
0
1
0
1
0
1
0
table
table_post
recal_alignment
csv
pdf
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.
0
1
2
0
1
0
1
output
index
output_index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Filters the raw output of sentieon/tnhaplotyper2.
0
1
2
3
4
5
6
0
1
0
1
vcf
vcf_tbi
stats
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
orientation_data
contamination_data
contamination_segments
stats
vcf
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
0
1
2
0
1
0
1
0
1
2
0
1
2
0
1
2
0
1
vcf
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
0
1
2
0
0
0
0
0
recal
idx
tranches
plots
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects whole genome quality metrics from a bam file
0
1
2
0
1
0
1
0
1
wgs_metrics
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.
0
1
fastq
versions
Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.
Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.
0
1
fasta
versions
DNA sequence utilities for FASTX files
Statistics for FASTA or FASTQ files
0
1
stats
multiqc
versions
Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files
Concatenating multiple uncompressed sequence files together
0
1
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Convert FASTQ to FASTA format
0
1
fasta
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Select sequences from a large file based on name/ID
0
1
0
filter
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
match up paired-end reads from two fastq files
0
1
reads
unpaired_reads
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Use seqkit to find/replace strings within sequences and sequence headers
0
1
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
fastx
log
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Use seqkit to generate sliding windows of input fasta
0
1
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Sorts sequences by id/name/sequence/length
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Split single or paired-end fastq.gz files
0
1
reads
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
simple statistics of FASTA/Q files
0
1
stats
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Translate DNA/RNA to protein sequence
0
1
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Salmonella serotype prediction from reads and assemblies
0
1
log
tsv
txt
versions
Generates a BED file containing genomic locations of lengths of N.
0
1
bed
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Rename sequence names in FASTQ or FASTA files.
0
1
sequences
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.
Subsample reads from FASTQ files
0
1
2
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Common transformation operations on FASTA or FASTQ files.
0
1
fastx
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.
Select only sequences that match the filtering condition
0
1
0
sequences
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Trim low quality bases from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Sequence quality metrics for FASTQ and uBAM files.
0
1
json
html
versions
PileupCaller is a tool to create genotype calls from bam files using read-sampling methods
0
1
0
0
eigenstrat
plink
freqsum
versions
Tools for population genetics on sequencing data
Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file
0
1
2
0
0
seqz
versions
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.
Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.
0
1
wig
versions
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.
Induce a variation graph in GFA format from alignments in PAF format
0
1
2
gfa
versions
seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
tsv
txt
versions
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)
0
1
2
3
4
5
0
1
log
read_qual
breakpoints_double
read_alignments
read_ids
collapsed_dup
loh
all_vcf
all_breakpoints_clusters_list
all_breakpoints_clusters
all_plots
somatic_vcf
somatic_breakpoints_clusters_list
somatic_breakpoints_clusters
somatic_plots
versions
Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.
0
1
0
json
tsv
versions
Demultiplex bgzip'd fastq files
0
1
2
sample_fastq
metrics
most_frequent_unmatched
per_project_metrics
per_sample_metrics
sample_barcode_hop_metrics
versions
Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.
0
1
2
merged_variants
versions
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.
0
1
2
3
4
0
1
2
0
1
2
0
1
phased_variant
versions
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.
0
1
2
3
4
0
1
2
3
0
1
phased_variant
versions
Fast and accurate method for estimation of haplotypes (phasing)
Program to compute switch error rate and genotyping error rate given simulated or trio data.
0
1
2
3
4
0
1
2
0
1
2
errors
versions
Fast and accurate method for estimation of haplotypes (phasing)
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
0
1
assembly
gfa
results
versions
Determine Shigella serotype from Illumina or Oxford Nanopore reads
0
1
tsv
hits
versions
Determine Shigella serotype from assemblies or Illumina paired-end reads
0
1
tsv
versions
build and deploy Shiny apps for interactively mining differential abundance data
0
1
2
3
0
1
2
0
app
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make plots for interpretation of differential abundance statistics
0
1
0
1
2
3
volcanos_png
volcanos_html
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots
0
1
2
3
boxplots_png
boxplots_html
densities_png
densities_html
pca2d_png
pca2d_html
pca3d_png
pca3d_html
mad_png
mad_html
dendro
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
validate consistency of feature and sample annotations with matrices and contrasts
0
1
2
0
1
0
1
sample_meta
feature_meta
assays
contrasts
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
A windowed adaptive trimming tool for FASTQ files using quality
0
1
2
single_trimmed
paired_trimmed
singleton_trimmed
log
versions
Indexing of transcriptome for gene expression quantification using SimpleAF
0
1
0
1
0
1
index
transcript_tsv
salmon
versions
SimpleAF is a tool for quantification of gene expression from RNA-seq data
simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
0
1
2
0
1
0
1
0
0
1
results
versions
SimpleAF is a tool for quantification of gene expression from RNA-seq data
Serovar prediction of salmonella assemblies
0
1
tsv
allele_fasta
allele_json
cgmlst_csv
versions
Fast, efficient, lossless compression of FASTQ files.
0
1
sfq
versions
tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.
0
1
2
smncopynumber
run_metrics
versions
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developped by Brent Pedersen.
0
1
2
3
0
1
0
1
vcf
versions
structural variant calling and genotyping with existing tools, but, smoothly
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."
0
1
0
1
outputs
snakemake_dir
versions
Create a SNAP index for reference genome
0
1
2
3
4
index
versions
Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
structural-variant calling with sniffles
0
1
2
0
1
0
1
0
0
vcf
tbi
snf
versions
Core-SNP alignment from Snippy outputs
0
1
2
0
aln
full_aln
tab
vcf
txt
versions
Rapid bacterial SNP calling and core genome alignments
Rapid haploid variant calling
0
1
0
tab
csv
html
vcf
bed
gff
bam
bai
log
aligned_fa
consensus_fa
consensus_subs_fa
raw_vcf
filt_vcf
vcf_gz
vcf_csi
txt
versions
Rapid bacterial SNP calling and core genome alignments
Genetic variant annotation and functional effect prediction toolbox
0
1
2
cache
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Genetic variant annotation and functional effect prediction toolbox
0
1
0
0
1
vcf
report
summary_html
genes_txt
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Annotate a VCF file with another VCF file
0
1
2
0
1
2
vcf
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
The dbNSFP is an integrated database of functional predictions from multiple algorithms
0
1
2
0
1
2
vcf
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Splits/Joins VCF(s) file into chromosomes
0
1
out_vcfs
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
0
1
2
tsv
html
versions
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
2
0
1
0
1
0
1
extract
versions
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
2
0
html
pairs_tsv
samples_tsv
versions
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Local sequence alignment tool for filtering, mapping and clustering.
0
1
0
1
0
1
reads
log
index
versions
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Classifies and predicts the origin of metagenomic samples
0
1
0
0
0
0
report
versions
Compare many FracMinHash signatures generated by sourmash sketch.
0
1
0
0
0
matrix
labels
csv
versions
Compute and compare FracMinHash signatures for DNA and protein data sets.
Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.
0
1
0
0
0
0
0
result
unassigned
matches
prefetch
prefetchcsv
versions
Compute and compare FracMinHash signatures for DNA data sets.
Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.
0
1
0
signature_index
versions
Compute and compare FracMinHash signatures for DNA data sets.
Create a signature (a group of FracMinHash sketches) of a sequence using sourmash
0
1
signatures
versions
Compute and compare FracMinHash signatures for DNA and protein data sets.
Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.
0
1
0
result
versions
Compute and compare FracMinHash signatures for DNA data sets.
Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data
0
1
2
3
4
5
6
7
0
0
outs
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.
0
gtf
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.
0
0
0
reference
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Assembles a small genome (bacterial, fungal, viral)
0
1
2
3
0
0
scaffolds
contigs
transcripts
gene_clusters
gfa
warnings
log
versions
Fast, efficient, lossless compression of FASTQ files.
0
1
2
spring
versions
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Fast, efficient, lossless decompression of FASTQ files.
0
1
0
fastq
versions
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
0
1
0
0
reads
versions
SRA Toolkit and SDK from NCBI
Download sequencing data from the NCBI Sequence Read Archive (SRA).
0
1
0
0
sra
versions
SRA Toolkit and SDK from NCBI
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
versions
ncbi_settings
SRA Toolkit and SDK from NCBI
Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
0
1
2
gene_results
fullgene_results
mlst_results
pileup
sorted_bam
versions
Short Read Sequence Typing for Bacterial Pathogens
Serotype prediction of Streptococcus suis assemblies
0
1
tsv
versions
Advanced sequence file format conversions
0
1
0
0
0
cram
gzi
versions
Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Predicts Staphylococcus aureus SCCmec type based on primers.
0
1
tsv
versions
Align reads to a reference genome using STAR
0
1
0
1
0
1
0
0
0
log_final
log_out
log_progress
versions
bam
bam_sorted
bam_sorted_aligned
bam_transcript
bam_unsorted
fastq
tab
spl_junc_tab
read_per_gene_tab
junction
sam
wig
bedgraph
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create index for STAR
0
1
0
1
index
versions
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
0
1
results_xlsx
summary_tsv
detailed_summary_tsv
resfinder_tsv
plasmidfinder_tsv
mlst_tsv
settings_txt
pointfinder_tsv
versions
Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.
Cell and nuclear segmentation with star-convex shapes
0
1
mask
versions
Serotype STEC samples from paired-end reads or assemblies
0
1
tsv
versions
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
0
1
2
3
0
1
2
3
4
5
6
0
1
2
0
input
rdata
plots
vcf
bgen
versions
Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.
0
1
0
1
vcf
versions
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
0
1
2
3
4
0
0
vcf
vcf_tbi
genome_vcf
genome_vcf_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
0
1
2
3
4
5
6
7
8
0
0
vcf_indels
vcf_indels_tbi
vcf_snvs
vcf_snvs_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Merges the annotation gtf file and the stringtie output gtf files
0
0
gtf
versions
Transcript assembly and quantification for RNA-Seq
Transcript assembly and quantification for RNA-Se
0
1
0
transcript_gtf
abundance
coverage_gtf
ballgown
versions
Transcript assembly and quantification for RNA-Seq
Count reads that map to genomic features
0
1
2
counts
summary
versions
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.
SummarizedExperiment container
0
1
0
1
0
1
rds
log
versions
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Converts a bedpe file ot a VCF file (beta version)
0
1
vcf
versions
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
0
1
2
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
0
1
0
0
0
0
0
0
vcf
versions
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
0
1
0
1
0
1
0
0
parameters
vcf
bed
fasta
insertions
versions
Toolset for SV simulation, comparison and filtering
Report multipe stats over a VCF file
0
1
0
0
0
stats
versions
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
1
sv
indel
germ_indel
germ_sv
som_indel
som_sv
unfiltered_sv
unfiltered_indel
unfiltered_germ_indel
unfiltered_germ_sv
unfiltered_som_indel
unfiltered_som_sv
raw_calls
discordants
log
versions
SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
0
1
2
3
4
5
0
1
0
1
fns
fps
distances
log
report
versions
SVanalyzer: tools for the analysis of structural variation in genomes
Build a structural variant database
0
1
0
db
versions
structural variant database software
The merge module merges structural variants within one or more vcf files.
0
1
0
0
vcf
tbi
csi
versions
structural variant database software
Query a structural variant database, using a vcf file as query
0
1
0
0
0
0
0
0
vcf
versions
structural variant database software
Performs tests on BAF files
0
1
2
3
4
metrics
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
0
1
counts
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
0
1
2
0
vcf
tbi
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
0
1
0
standardized_vcf
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
0
1
2
bed
versions
Utilities for consolidating, filtering, resolving, and annotating structural variants.
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
0
1
2
3
0
1
0
1
json
gt_vcf
bam
versions
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
0
1
2
3
0
1
gt_vcf
json
versions
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
0
1
2
3
vcf
versions
Compresses/decompresses files
0
1
output
gzi
versions
Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.
create tabix index from a sorted bgzip tab-delimited genome file
0
1
tbi
csi
versions
Generic indexer for TAB-delimited genome position files.
Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA
0
1
csv_gz
versions
Convert taxon names to TaxIds
0
1
2
0
tsv
versions
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Standardise and merge two or more taxonomic profiles into a single table
0
1
0
0
0
0
merged_profiles
versions
TAXonomic Profile Aggregation and STAndardisation
Standardise the output of a wide range of taxonomic profilers
0
1
0
0
0
standardised_profile
versions
TAXonomic Profile Aggregation and STAndardisation
A tool to detect resistance and lineages of M. tuberculosis genomes
0
1
bam
csv
json
txt
vcf
versions
Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data
Aligns sequences using T_COFFEE
0
1
0
1
0
1
2
0
alignment
lib
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Compares 2 alternative MSAs to evaluate them.
0
1
2
scores
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Computes a consensus alignment using T_COFFEE
0
1
0
1
0
alignment
eval
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Computes the irmsd score for a given alignment and the structures.
0
0
irmsd
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package
0
1
0
1
0
1
2
0
alignment
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats files with t-coffee
0
1
formatted_file
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
0
1
0
1
tcs
scores
versions
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Parses a Thermo RAW file containing mass spectra to an open file format
0
1
spectra
versions
Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle
0
1
classifications
log
fasta
versions
Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.
Computes the coverage of different regions from the bam file.
0
1
0
1
cov
wig
versions
TIDDIT - structural variant calling.
Identify chromosomal rearrangements.
0
1
2
0
1
0
1
vcf
ploidy
versions
Search for structural variants.
tidk explore
attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
0
1
explore_tsv
top_sequence
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Searches a genome for a telomere string such as TTAGGG
0
1
0
tsv
bedgraph
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA
0
1
0
1
0
1
0
1
0
fasta
vcf
ccf
log
versions
This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.
A post sequencing QC tool for Oxford Nanopore sequencers
0
1
report_data
report_html
plots_html
plotly_js
versions
TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.
0
1
pep
gff3
cds
dat
folder
versions
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf
0
1
0
pep
gff3
cds
bed
versions
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Trim FastQ files using Trim Galore!
0
1
reads
log
unpaired
html
zip
versions
Performs quality and adapter trimming on paired end and single end reads
0
1
trimmed_reads
unpaired_reads
trim_log
out_log
summary
versions
Assembles a de novo transcriptome from RNAseq reads
0
1
transcript_fasta
log
versions
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
0
1
2
3
4
5
0
1
0
1
fn_vcf
fn_tbi
fp_vcf
fp_tbi
tp_base_vcf
tp_base_tbi
tp_comp_vcf
tp_comp_tbi
summary
versions
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
0
1
consistency
versions
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
0
1
vcf
versions
Structural variant comparison tool for VCFs
Subsample a long-read sequencing fastq file for multiple assemblies
0
1
subreads
versions
Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Transcript Selector for BRAKER TSEBRA combines gene predictions by selecing transcripts based on their extrisic evidence support
0
1
0
0
0
tsebra_gtf
tsebra_scores
versions
Import transcript-level abundances and estimated counts for gene-level analysis packages
0
1
0
1
0
tpm_gene
counts_gene
counts_gene_length_scaled
counts_gene_scaled
lengths_gene
tpm_transcript
counts_transcript
lengths_transcript
versions
Remove lines from bed file that refer to off-chromosome locations.
0
1
0
bedgraph
versions
Remove lines from bed file that refer to off-chromosome locations.
Convert a bedGraph file to bigWig format.
0
1
0
bigwig
versions
Convert a bedGraph file to bigWig format.
Convert file from bed to bigBed format
0
1
0
0
bigbed
versions
Convert file from bed to bigBed format
compute average score of bigwig over bed file
0
1
0
tab
versions
Compute average score of big wig over each bed, which may have introns.
compute average score of bigwig over bed file
0
1
genepred
refflat
versions
Convert GTF files to GenePred format
convert between genome builds
0
1
0
lifted
unlifted
versions
Move annotations from one assembly to another
Convert ascii format wig file to binary big wig format
0
1
0
bw
versions
Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format
Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.
0
1
0
0
fastq
no_match_fastq
report
versions
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
0
bam
fastq
log
versions
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
0
bam
log
tsv_edit_distance
tsv_per_umi
tsv_umi_per_position
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
0
1
reads
log
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
0
1
2
0
0
log
bam
tsv
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Make the output from umi_tools dedup or group compatible with RSEM
0
1
2
bam
log
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Assembles bacterial genomes
0
1
2
scaffolds
gfa
log
versions
Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data
0
1
0
outs
versions
Unzip ZIP archive files
0
1
files
versions
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Simple software to call UPD regions from germline exome/wgs trios.
0
1
bed
versions
The Java port of the VarDict variant caller
0
1
2
3
0
1
0
1
vcf
versions
Filtering, downsampling and profiling alignments in BAM/CRAM formats
0
1
bam
versions
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
0
1
2
0
0
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
0
1
0
1
0
1
alignment_properties_json
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
0
1
2
3
4
0
1
0
1
bcf_gz
vcf_gz
bcf
vcf
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Convert VCF with structural variations to CytoSure format
0
1
0
1
0
1
0
1
0
cgh
versions
If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
Command line tools for parsing and manipulating VCF files.
0
1
2
vcf
versions
Command line tools for parsing and manipulating VCF files.
Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.
0
1
2
vcf
versions
Command-line tools for manipulating VCF files
A set of tools written in Perl and C++ for working with VCF files
0
1
0
0
vcf
bcf
frq
frq_count
idepth
ldepth
ldepth_mean
gdepth
hap_ld
geno_ld
geno_chisq
list_hap_ld
list_geno_ld
interchrom_hap_ld
interchrom_geno_ld
tstv
tstv_summary
tstv_count
tstv_qual
filter_summary
sites_pi
windowed_pi
weir_fst
heterozygosity
hwe
tajima_d
freq_burden
lroh
relatedness
relatedness2
lqual
missing_individual
missing_site
snp_density
kept_sites
removed_sites
singeltons
indel_hist
hapcount
mendel
format
info
genotypes_matrix
genotypes_matrix_individual
genotypes_matrix_position
impute_hap
impute_hap_legend
impute_hap_indv
ldhat_sites
ldhat_locs
beagle_gl
beagle_pl
ped
map_
tped
tfam
diff_sites_in_files
diff_indv_in_files
diff_sites
diff_indv
diff_discd_matrix
diff_switch_error
versions
Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use
Path(resolve_path=True)
and breaks the nextflow logic of symbolic links.
If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME]
it will skip the samtools sort step.
Cellsorted bam file should be cell sorted with:
samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam
See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:
withName: SAMTOOLS_SORT {
ext.prefix = { "cellsorted_${bam.baseName}" }
ext.args = '-t CB -O BAM'
}
Optional mask must be passed with ext.args
and option --mask
This is why I need to stage in the work dir 2 bam files (cellsorted and original).
See also velocyto turorial
0
1
2
3
0
loom
versions
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
log
selfsm
depthsm
selfrg
depthrg
bestsm
bestrg
versions
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
1
2
0
0
log
ud
bed
mu
self_sm
ancestry
versions
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
0
1
2
3
0
1
0
1
graph
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
0
1
0
0
vcf
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
0
1
xg
vg_index
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
calculate secondary structures of two RNAs with dimerization
0
1
rnacofold_csv
rnacofold_ps
versions
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
0
1
rnafold_txt
rnafold_ps
versions
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
0
rnalfold_txt
versions
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Use vireo to perform donor deconvolution for multiplexed scRNA-seq data
0
1
2
3
4
summary
donor_ids
prob_singlets
prob_doublets
versions
Extracting sequences that were unbinnned by vRhyme into a FASTA file
0
1
0
1
unbinned_sequences
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
0
1
linked_bins
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Binning virus genomes from metagenomes
0
1
0
1
bins
membership
summary
versions
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
0
1
aln
biom
mothur
otu
bam
out
blast
uc
centroids
clusters
profile
msa
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).
0
1
fasta
clustering
log
versions
A versatile open source tool for metagenomics (USEARCH alternative)
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.
0
1
fasta
log
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
0
1
0
tsv
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
0
1
0
fasta
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
0
1
0
0
0
0
aln
biom
lca
mothur
otu
sam
tsv
txt
uc
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
0
1
2
vcf
versions
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
0
1
2
3
0
1
0
1
vcf
fai
versions
A tool set for short variant discovery in genetic sequence data
a pangenome-scale aligner
0
1
2
3
4
0
0
paf
versions
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
0
1
2
0
0
vcf
tbi
graph
versions
Masks out highly repetitive DNA sequences with low complexity in a genome
0
1
converted
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to generate frequency counts of repetitive units.
0
1
counts
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to take a counts file and creates a file of genomic co-ordinates to be masked.
0
1
0
1
intervals
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
Convert and filter aligned reads to .npz
0
1
2
0
1
0
1
npz
versions
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase
0
1
0
1
gender
versions
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Create a new reference using healthy reference samples
0
1
npz
versions
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Find copy number aberrations
0
1
0
1
0
1
aberrations_bed
bins_bed
segments_bed
chr_statistics
chr_plots
genome_plot
versions
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
A large variant benchmarking tool analogous to hap.py for small variants.
0
1
2
3
4
report
bench_vcf
bench_vcf_tbi
versions
Compresses files with xz.
0
1
archive
versions
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Decompresses files with xz.
0
1
file
versions
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Performs assembly scaffolding using YaHS
0
1
0
0
scaffolds_fasta
scaffolds_agp
binary
versions
Align reads to a reference genome using YARA
0
1
0
1
bam
bai
versions
Yara is an exact tool for aligning DNA sequencing reads to reference genomes.
Compress file lists to produce ZIP archive files
0
1
zipped_archive
versions
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Click here to trigger an update.