Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Screen assemblies for antimicrobial resistance against multiple databases
0
1
databasedir
0
0
Mass screening of contigs for antibiotic resistance genes
Screen assemblies for antimicrobial resistance against multiple databases
0
1
0
0
Mass screening of contigs for antibiotic resistance genes
A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes
0
1
0
0
0
0
0
0
A pipeline for running AMRfinderPlus and collating results into functional classes
Trim sequencing adapters and collapse overlapping reads
0
1
adapterlist
0
0
0
0
0
0
0
0
Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.
0
1
0
0
ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).
0
1
2
3
K
0
0
0
Read CEL files into an ExpressionSet and generate a matrix
0
1
2
0
1
0
0
0
0
Methods for Affymetrix Oligonucleotide Arrays
Takes a bed12 file and converts to a GFF3 file
0
1
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Converts a GFF/GTF file into a proper GTF file
0
1
0
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Converts a GFF/GTF file into a TSV file
0
1
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file
0
1
0
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Add intron features to gtf/gff file without intron features.
0
1
config
0
0
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.
0
1
config
0
0
0
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.
0
1
kill_list
config
0
0
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script flags the short introns with the attribute
0
1
config
0
0
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.
0
1
config
0
0
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Provides different type of statistics in text format from a GFF/GTF annotation file
0
1
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Provides basic statistics in text format from a GFF/GTF annotation file
0
1
0
0
AGAT is a toolkit for manipulation and getting information from GFF/GTF annotation files
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants
0
1
0
0
0
A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.
summary_file
0
0
0
0
A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.
A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.
summaries
0
0
0
This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.
A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.
0
1
faa_input
gbk_input
opt_amp_db
opt_amp_db_dir
opt_interproscan
0
0
0
0
0
0
0
0
0
0
0
0
A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.
A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.
0
1
model
min_length
min_probability
0
0
0
AMPlify is an attentive deep learning model for antimicrobial peptide prediction.
0
1
model_dir
0
0
Attentive deep learning model for antimicrobial peptide prediction
Post-processing script of the MaltExtract component of the HOPS package
maltextract_results
taxon_list
filter
0
0
0
0
0
Identify antimicrobial resistance in gene or protein sequences
0
1
db
0
0
0
0
0
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Identify antimicrobial resistance in gene or protein sequences
NO input
0
0
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
A module to create antiberta2 embeddings of antibody (BCR) amino acid sequences using amulety.
0
1
chain
0
0
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create antiberty embeddings of antibody (BCR) amino acid sequences using amulety.
0
1
chain
0
0
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create BALM paired embeddings of antibody (BCR) amino acid sequences using amulety.
0
1
chain
0
0
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create esm2 embeddings of antibody (BCR) amino acid sequences using amulety.
0
1
chain
0
0
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to translate BCR and TCR nucleotide sequences into amino acid sequences using amulety and igblast.
0
1
reference_igblast
0
0
Python package to create embeddings of BCR and TCR amino acid sequences.
A tool for immunoglobulin (IG, BCR) and T cell receptor (TCR) V domain sequences blasting.
A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.
0
1
0
1
0
0
ANGSD: Analysis of next generation Sequencing Data
Calculates base frequency statistics across reference positions from BAM.
0
1
2
3
0
0
0
0
0
0
0
ANGSD: Analysis of next generation Sequencing Data
Calculated genotype likelihoods from BAM files.
0
1
0
1
0
1
0
0
ANGSD: Analysis of next generation Sequencing Data
Module to subset AnnData object to cells with matching barcodes from the csv file
0
1
2
0
0
Get the size (n_cells or n_genes) of an anndata object stored as a h5ad file
0
1
size_type
0
0
An annotated data matrix.
Accelerating de novo SINE annotation in plant and animal genomes
0
1
mode
0
0
0
Annotation and Ranking of Structural Variation
0
1
2
3
0
1
0
1
0
1
0
1
0
0
0
0
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
0
0
Annotation and Ranking of Structural Variation
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
0
1
2
3
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
0
1
databases
gff
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
0
0
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
0
1
databases
antismash_dir
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
database_css
database_detection
database_modules
0
0
0
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.
0
1
0
0
0
0
0
0
arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).
0
1
tool
db
0
0
Download and prepare database for Ariba analysis
0
1
0
0
ARIBA: Antibiotic Resistance Identification By Assembly
Query input FASTQs against Ariba formatted databases
0
1
0
1
0
0
ARIBA: Antibiotic Resistance Identification By Assembly
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
0
1
0
1
0
1
blacklist
known_fusions
cytobands
protein_domains
0
0
0
Fast and accurate gene fusion detection from RNA-Seq data
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
genome
0
0
0
0
0
Fast and accurate gene fusion detection from RNA-Seq data
Simulation tool to generate synthetic Illumina next-generation sequencing reads
0
1
sequencing_system
fold_coverage
read_length
0
0
0
0
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
Aggregates fastq files with demultiplexed reads
0
1
0
0
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
0
1
0
1
2
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
copy number profiles of tumour cells.
0
1
2
3
4
allele_files
loci_files
bed_file
fasta
gc_file
rt_file
0
0
0
0
0
0
0
0
0
Alignment by Simultaneous Harmonization of Layer/Adjacency Registration
0
1
opt_dfp
opt_ffp
0
0
ataqv function of a corresponding ataqv tool
0
1
2
3
organism
mito_name
tss_file
excl_regs_file
autosom_ref_file
0
0
0
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
mkarv function of a corresponding ataqv tool
jsons/*
0
0
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
generate VCF file from a BAM file using various calling methods
0
1
2
3
4
fasta
fai
known_alleles
method
0
0
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Estimate the post-mortem damage patterns of DNA
0
1
2
3
fasta
fai
0
0
0
0
0
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Gives an estimation of the sequencing bias based on known invariant sites
0
1
2
3
4
alleles
invariant_sites
0
0
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
split single end read groups by length and merge paired end reads
0
1
2
3
4
0
0
0
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Generate tables of feature metadata from GTF files
0
1
0
1
0
0
0
Scripts for manipulating gene annotation
Use deamination patterns to estimate contamination in single-stranded libraries
0
1
0
1
0
1
0
0
Estimates present-day DNA contamination in ancient DNA single-stranded libraries.
Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif
images.
0
1
0
1
0
0
0
Annotation of bacterial genomes (isolates, MAGs) and plasmids
0
1
db
proteins
prodigal_tf
0
0
0
0
0
0
0
0
0
0
0
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Downloads BAKTA database from Zenodo
NO input
0
0
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data
0
1
2
0
0
Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files
removes unused references from header of sorted BAM/CRAM files.
0
1
0
0
This module is used to clip primer sequences from your alignments.
0
1
2
3
0
0
0
write your description here
0
1
0
0
A command line tool to compute mapping statistics from a BAM file
Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis
0
1
0
0
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
0
1
0
0
C++ API & command-line toolkit for working with BAM data
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
0
1
0
0
C++ API & command-line toolkit for working with BAM data
clips overlapping read pairs. When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.
0
1
0
0
0
Programs that perform operations on SAM/BAM files, all built into a single executable, bam.
Render an assembly graph in GFA 1.0 format to PNG and SVG image formats
0
1
0
0
0
Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
Demultiplex Element Biosciences bases files
0
1
2
0
0
0
0
0
0
0
0
BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.
0
1
0
0
Adapter and quality trimming of sequencing reads
0
1
contaminants
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Merging overlapping paired reads into a single read.
0
1
interleave
0
0
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.
0
1
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates
0
1
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Filter out sequences by sequence header name(s)
0
1
names_to_filter
output_format
interleaved_output
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.
fasta
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.
0
1
0
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
Re-pairs reads that became disordered or had some mates eliminated.
0
1
interleave
0
0
0
0
Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated tools.
Compares query sketches to reference sketches hosted on a remote server via the Internet.
0
1
0
0
BBMap is a short read aligner, as well as various other bioinformatic tools.
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
0
1
2
regions
targets
samples
0
0
0
0
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Concatenate VCF files
0
1
2
0
0
0
0
Concatenate VCF files.
Compresses VCF files
0
1
2
3
4
0
0
Create consensus sequence by applying VCF variants to a reference fasta file.
Converts certain output formats to VCF
0
1
2
0
1
bed
0
0
0
0
0
0
0
0
0
0
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
bcftools Haplotype-aware consequence caller
0
1
0
1
0
1
0
1
0
0
0
0
Haplotype-aware consequence caller
Filters VCF files
0
1
2
0
0
0
0
Apply fixed-threshold filters to VCF files.
Index VCF tools
0
1
0
0
0
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Apply set operations to VCF files
0
1
2
0
0
Computes intersections, unions and complements of VCF files.
Compresses VCF files
0
1
2
0
1
save_mpileup
0
0
0
0
0
Generates genotype likelihoods at each genomic position with coverage.
Normalize VCF file
0
1
2
0
1
0
0
0
0
Normalize VCF files.
Compute and fill various INFO tags
0
1
2
regions
targets
samples
0
0
0
0
Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.
0
1
2
regions
targets
0
0
0
0
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available
Split VCF by chunks or regions, creating multiple VCFs.
0
1
2
sites_per_chunk
scatter
scatter_file
regions
targets
0
0
0
0
Split VCF by chunks or regions, creating multiple VCFs.
Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
0
1
2
target_gt
new_gt
regions
targets
0
0
0
0
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
Split VCF by sample, creating single- or multi-sample VCFs.
0
1
2
samples
groups
regions
targets
0
0
0
0
Split VCF by sample, creating single- or multi-sample VCFs.
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
0
1
2
regions
targets
0
0
0
0
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
Extracts fields from VCF or BCF files and outputs them in user-defined format.
0
1
2
regions
targets
samples
0
0
Extracts fields from VCF or BCF files and outputs them in user-defined format.
Reheader a VCF file
0
1
2
3
0
1
0
0
0
Modify header of VCF/BCF files, change sample names.
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
0
1
2
0
1
genetic_map
regions_file
samples_file
targets_file
0
0
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
Sorts VCF files
0
1
0
0
0
0
Sort VCF files by coordinates.
Split a vcf file into files per chromosome
0
1
2
0
0
Sort VCF files by coordinates.
Generates stats from VCF files
0
1
2
0
1
0
1
0
1
0
1
0
1
0
0
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
0
1
2
regions
targets
samples
0
0
0
0
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.
0
1
refpanel
genmap
exclsamples
exclmarkers
0
0
0
Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
Convert a BED file to a VCF file according to a YAML config
0
1
2
0
1
0
0
Convert BAM/GFF/GTF/GVF/PSL files to bed
0
1
0
0
High-performance genomic feature operations.
Convert gtf format to bed format
0
1
0
0
The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.
Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.
0
1
sizes
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.
0
1
2
sizes
extension
sort
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
extract sequences in a FASTA file based on intervals defined in a feature file.
0
1
fasta
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.
0
1
summary_col
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Allows one to screen for overlaps between two sets of genomic features.
0
1
2
0
1
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Calculate Jaccard statistic b/w two feature files.
0
1
2
0
1
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Makes adjacent or sliding windows across a genome or BED file.
0
1
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
masks sequences in a FASTA file based on intervals defined in a feature file.
0
1
fasta
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
0
1
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.
0
1
chrom_sizes
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Profiles the nucleotide content of intervals in a fasta file.
0
1
2
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
bedtools shuffle will randomly permute the genomic locations of a feature file among a genome defined in a genome file
0
1
0
1
exclude_file
include_file
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)
0
1
sizes
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Sorts a feature file by chromosome and other criteria.
0
1
genome_file
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Split BED files into several smaller BED files
0
1
2
0
0
A powerful toolset for genome arithmetic
Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.
0
1
2
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Combines multiple BedGraph files into a single file
0
1
0
1
0
0
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Locate and tag duplicate reads in a BAM file
0
1
0
0
0
biobambam is a set of tools for early stage alignment file processing.
Merge a list of sorted bam files
0
1
0
0
0
0
biobambam is a set of tools for early stage alignment file processing.
Parallel sorting and duplicate marking
0
1
0
1
0
0
0
0
0
biobambam is a set of tools for early stage alignment file processing.
Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.
0
1
0
0
Use k-mers to rapidly subtype S. enterica genomes
0
1
scheme_metadata
0
0
0
0
Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.
0
1
0
1
0
1
0
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
0
1
0
1
0
1
0
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Summarize and/or filter reads based on bisulfite conversion rate
0
1
0
1
0
1
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.
0
1
0
1
0
1
0
1
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Indexes a reference genome for use with Biscuit
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Merges methylation information for opposite-strand C's in a CpG context
0
1
0
1
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants
0
1
2
3
4
0
1
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Perform basic quality control on a BAM file generated with Biscuit
0
1
0
1
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.
0
1
0
0
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Performs alignment of BS-Seq reads using bismark
0
1
0
1
0
1
0
0
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Relates methylation calls back to genomic cytosine contexts.
0
1
0
1
0
1
0
0
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Removes alignments to the same position in the genome from the Bismark mapping output.
0
1
0
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.
0
1
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Extracts methylation information for individual cytosines from alignments.
0
1
0
1
0
0
0
0
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Collects bismark alignment reports
0
1
2
3
4
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.
bam
align_report
dedup_report
splitting_report
mbias
0
0
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Retrieve entries from a BLAST database
0
1
2
0
1
0
0
0
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
0
0
BLAST finds regions of similarity between biological sequences.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
0
1
0
1
out_ext
0
0
0
0
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
Builds a BLAST database
0
1
0
0
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0
1
0
1
0
0
Protein to Translated Nucleotide BLAST.
Downloads a BLAST database from NCBI
0
1
0
0
BLAST finds regions of similarity between biological sequences.
Create bowtie index for reference genome
0
1
0
0
bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.
0
1
database
0
0
0
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Extends a Kraken2 database to be compatible with Bracken
0
1
0
0
0
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Combine output of metagenomic samples analyzed by bracken.
0
1
0
0
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Benchmarking Universal Single Copy Orthologs
meta
fasta
mode
lineage
busco_lineages_path
config_file
meta
batch_summary
short_summaries_txt
short_summaries_json
busco_dir
full_table
missing_busco_list
single_copy_proteins
seq_dir
translated_proteins
versions
Benchmarking Universal Single Copy Orthologs
0
1
mode
lineage
busco_lineages_path
config_file
clean_intermediates
0
0
0
0
0
0
0
0
0
0
0
0
0
0
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Download database for BUSCO
lineage
0
0
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
BUSCO plot generation tool
short_summary_txt
0
0
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Construct species phylogenies using BUSCO proteins
0
1
0
0
0
Construct species phylogenies using BUSCO proteins
Create BWA-mem2 index for reference genome
0
1
0
0
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create BWA-MEME index for reference genome
0
1
0
0
Faster BWA-MEM2 using learned-index
Performs alignment of BS-Seq reads using bwameth
0
1
0
1
0
1
0
0
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Performs indexing of c2t converted reference genome
0
1
0
0
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
A module for concatenation of gzipped or uncompressed files
0
1
0
0
Just concatenation
Concatenates fastq files
0
1
0
0
The cat utility reads files sequentially, writing them to the standard output.
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
0
1
0
1
0
1
bin_suffix
0
0
0
0
0
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Downloads the required files for either Nr or GTDB for building into a CAT database
0
1
0
0
0
0
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
mode
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Summarises results from CAT/BAT/RAT classification steps
0
1
0
1
0
0
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Cluster protein sequences using sequence similarity
0
1
0
0
0
Clusters and compares protein or nucleotide sequences
Cluster nucleotide sequences using sequence similarity
0
1
0
0
0
Clusters and compares protein or nucleotide sequences
Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth
0
1
signature
high_thresholds
low_thresholds
0
0
0
Module to use CellBender to remove ambient RNA from single-cell RNA-seq data
0
1
2
3
0
0
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data
0
1
0
0
0
0
0
0
0
0
0
0
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.
0
1
reference
0
0
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.
0
1
2
0
0
0
0
0
0
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.
gtf
0
0
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.
fasta
gtf
reference_name
0
0
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.
fasta
gtf
seqs
reference_name
0
0
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.
meta
0
1
0
1
0
1
0
1
0
1
0
1
gex_reference
gex_frna_probeset
gex_targetpanel
vdj_reference
vdj_primer_index
fb_reference
beam_antigen_panel
beam_control_panel
cmo_reference
cmo_barcodes
cmo_barcode_assignment
frna_sampleinfo
skip_renaming
0
0
0
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.
0
1
reference
0
0
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.
0
1
2
3
reference
0
0
0
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.
0
1
csv
0
0
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.
gtf
0
0
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.
fasta
gtf
motifs
reference_config
reference_name
0
0
0
Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.
Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.
0
1
reference
0
0
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.
bcl
csv
0
0
Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.
fasta
gtf
motifs
reference_config
reference_name
0
0
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.
0
1
2
3
4
0
0
0
0
0
0
0
Efficient genotyping bi-allelic SNPs on single cells
Build centrifuge database for taxonomic profiling
0
1
conversion_table
taxonomy_tree
name_table
size_table
0
0
Classifier for metagenomic sequences
Classifies metagenomic sequence data
0
1
db
save_unaligned
save_aligned
0
0
0
0
0
0
Centrifuge is a classifier for metagenomic sequences.
Creates Kraken-style reports from centrifuge out files
0
1
db
0
0
Centrifuge is a classifier for metagenomic sequences.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0
1
fasta_ext
db
0
0
0
0
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0
1
2
3
exclude_marker_file
0
0
0
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM2 database download
db_zenodo_id
0
0
CheckM2 - Rapid assessment of genome bin quality using machine learning
CheckM2 bin quality prediction
0
1
0
1
0
0
0
CheckM2 - Rapid assessment of genome bin quality using machine learning
Construct the database necessary for checkv's quality assessment
NO input
0
0
Assess the quality of metagenome-assembled viral genomes.
Assess the quality of metagenome-assembled viral genomes.
0
1
db
0
0
0
0
0
0
0
Assess the quality of metagenome-assembled viral genomes.
Construct the database necessary for checkv's quality assessment
0
1
db
0
0
Assess the quality of metagenome-assembled viral genomes.
Determine the allelic profiles of a genome using a pre-defined schema
0
1
0
1
0
0
0
0
0
0
0
0
0
0
A complete suite for gene-by-gene schema creation and strain identification.
Create a schema to determine the allelic profiles of a genome
0
1
prodigal_tf
cds
0
0
0
0
A complete suite for gene-by-gene schema creation and strain identification.
Filter and trim long read data.
0
1
fasta
0
0
zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.
Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).
Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.
0
1
0
1
0
1
barcodes
whitelist
chr_order
pairs_chr_order
0
0
0
0
0
Fast alignment and preprocessing of chromatin profiles
Indexes a fasta reference genome ready for chromatin profiling.
0
1
0
0
Fast alignment and preprocessing of chromatin profiles
Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
Annotate circRNAs detected in the output from CIRCexplorer2 parse
0
1
fasta
gene_annotation
0
0
Circular RNA analysis toolkits
CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.
0
1
0
0
Circular RNA analysis toolkit
A method to improve mappings on circular genomes, using the BWA mapper.
0
1
0
1
0
1
0
0
0
Creating a modified reference genome, with an elongation of the an specified amount of bases
Realign reads mapped with BWA to elongated reference genome
0
1
0
1
0
1
0
1
0
0
A method to improve mappings on circular genomes such as Mitochondria.
Predict recomination events in bacterial genomes
0
1
2
0
0
0
0
0
0
0
Align sequences using Clustal Omega
0
1
0
1
hmm_in
hmm_batch
profile1
profile2
compress
0
0
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Parallel implementation of the gzip algorithm.
Renders a guidetree in clustalo
0
1
0
0
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Calculates polymorphic site rates over protein coding genes
0
1
2
3
4
0
0
Set of utilities on sequences and BAM files
Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.
0
1
0
1
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Derive off-target (“antitarget”) bins from target regions.
0
1
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
0
1
2
0
1
0
1
0
1
0
1
panel_of_normals
0
0
0
0
0
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number
0
1
2
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.
0
1
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
0
1
2
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Compile a coverage reference from the given files (normal samples).
fasta
targets
antitargets
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Transform bait intervals into targets more suitable for CNVkit.
0
1
0
1
0
0
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.
0
1
2
0
1
0
1
0
1
0
0
0
Tool for calling copy number variations.
convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.
0
1
0
0
Tool for calling copy number variations.
Command line tool for calling CNVs in whole genome sequencing data
0
1
bin_sizes
0
0
calling CNVs using read depth
calculates read depth histograms
0
1
bin_sizes
0
0
calling CNVs using read depth
command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.
0
1
2
fasta
fai
0
0
calling CNVs using read depth
Calculate segmentation for specified bin size (multiple bin sizes separate by space)
0
1
bin_sizes
0
0
Calling CNVs using read depth
view function to generate vcfs
0
1
bin_sizes
output_format
0
0
0
0
calling CNVs using read depth
Builds a classic bloom filter COBS index
0
1
0
0
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Builds a compact bloom filter COBS index
0
1
0
0
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples
0
1
2
0
0
0
0
0
0
0
Clustering cONtigs with COverage and ComposiTion
Generate the input coverage table for CONCOCT using a BEDFile
0
1
2
3
0
0
Clustering cONtigs with COverage and ComposiTion
Calculate confidence scores from Kraken2 output
0
1
kraken_taxon_db
0
0
Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC
0
1
2
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Copy number and genotype annotation from whole genome and whole exome sequencing data
0
1
2
3
4
5
6
fasta
fai
snp_position
known_snps
known_snps_tbi
chr_directory
mappability
target_bed
gccontent_profile
0
0
0
0
0
0
0
0
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Format Freec output to circos input format
0
1
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
2
3
0
0
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0
1
2
0
0
0
0
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Run matrix balancing on a cool file
0
1
2
0
0
Sparse binary format for genomic interaction matrices
Create a cooler from genomic pairs and bins
0
1
2
3
chromsizes
0
0
Sparse binary format for genomic interaction matrices
Generate fragment-delimited genomic bins
fasta
chromsizes
enzyme
0
0
Sparse binary format for genomic interaction matrices
Dump a cooler’s data to a text stream.
0
1
2
0
0
Sparse binary format for genomic interaction matrices
Generate fixed-width genomic bins
0
1
2
0
0
Sparse binary format for genomic interaction matrices
Merge multiple coolers with identical axes
0
1
0
0
Sparse binary format for genomic interaction matrices
Generate a multi-resolution cooler file by coarsening
0
1
0
0
Sparse binary format for genomic interaction matrices
Calculate the diamond insulation scores and call insulating boundaries
0
1
0
0
0
Analysis tools for genomic interaction data stored in .cool format
Calculates peak-to-through ratio (PTR) from metagenomic sequence data
0
1
0
0
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Computes the coverage map along the reference genome
0
1
0
0
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Indexes a directory of fasta files for use with CoPTR
0
1
0
0
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Merge reads that were mapped to multiple indices
0
1
0
0
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.
0
1
0
0
0
0
0
Map reads to contigs and estimate coverage
0
1
0
1
bam_input
interleaved
0
0
CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications
In-house generated or curated data can be imported into CRABS.
0
1
0
1
0
1
0
1
import_format
0
0
Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses
CRABS extracts the amplicon region of the primer set by conducting an in silico PCR.
0
1
0
0
Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses
Decompress files with crabz
0
1
0
0
Like pigz, but rust
remove false positives of functional crispr genomics due to CNVs
0
1
2
min_reads
min_targeted_genes
0
0
Analysis of CRISPR functional genomics, remove false positive due to CNVs.
Concatenate two or more CSV (or TSV) tables into a single table
0
1
in_format
out_format
0
0
A cross-platform, efficient, practical CSV/TSV toolkit
Join two or more CSV (or TSV) tables by selected fields into a single table
0
1
0
0
A cross-platform, efficient, practical CSV/TSV toolkit
Splits CSV/TSV into multiple files according to column values
0
1
in_format
out_format
0
0
CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.
Annotate a VEP annotated VCF with the most severe consequence field
0
1
0
1
0
0
Custom module to annotate a VEP annotated VCF with the most severe consequence field
Annotate a VEP annotated VCF with the most severe pLi field
0
1
0
0
Custom module to annotate a VEP annotated VCF with the most severe pLi field
Custom module to Add a new fasta file to an old one and update an associated GTF
0
1
2
0
1
biotype
0
0
0
Custom module to Add a new fasta file to an old one and update an associated GTF
Custom module used to dump software versions within the nf-core pipeline template
versions
0
0
0
Custom module used to dump software versions within the nf-core pipeline template
Filters a differential expression table based on logFC and adjusted p-value thresholds
0
1
0
1
2
0
1
2
0
0
Python library for data manipulation and analysis
Generates a FASTA file of chromosome sizes and a fasta index file
0
1
0
0
0
0
Tools for dealing with SAM, BAM and CRAM files
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
0
1
0
1
0
0
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
filter a matrix based on a minimum value and numbers of samples that must pass.
0
1
0
1
0
0
0
0
filter a matrix based on a minimum value and numbers of samples
Test for the presence of suitable NCBI settings or create them on the fly.
ids
0
0
SRA Toolkit and SDK from NCBI
Make a GSEA class file (.chip) from tabular inputs
0
1
0
1
0
0
Make a GSEA annotation file (.chip) from tabular inputs
Make a GSEA class file (.cls) from tabular inputs
0
1
0
0
Make a GSEA class file (.cls) from tabular inputs
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
0
1
0
0
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.
0
1
0
1
quant_type
id
extra
0
0
"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"
Perform adapter/quality trimming on sequencing reads
0
1
0
0
0
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage
0
1
fasta
fai
specieslist
0
0
DAS Tool binning step.
0
1
2
3
db_directory
0
0
0
0
0
0
0
0
0
0
0
0
0
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
0
1
extension
0
0
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
0
1
extension
0
0
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.
meta
0
0
Create deacon index for reference genome
0
1
0
0
Fast alignment-free sequence filter
decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
0
1
net
gtf
0
0
0
DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).
0
1
0
0
0
0
0
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
NO input
0
0
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
0
1
2
db
0
0
0
0
0
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.
NO input
0
0
DeepBGC - Biosynthetic Gene Cluster detection and classification
DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.
0
1
db
0
0
0
0
0
0
0
0
0
0
0
0
DeepBGC - Biosynthetic Gene Cluster detection and classification
Deepcell/mesmer segmentation for whole-cell
0
1
0
1
0
0
Deep cell is a collection of tools to segment imaging data
DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
0
0
0
0
A Deep Learning Model for Transmembrane Topology Prediction and Classification
0
1
0
0
0
0
0
0
This tool filters alignments in a BAM/CRAM file according the the specified parameters.
0
1
2
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
0
1
2
fasta
fasta_fai
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
calculates scores per genome regions for other deeptools plotting utilities
0
1
bed
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Computes read coverage for genomic regions (bins) across the entire genome.
0
1
2
3
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.
0
1
method
plot_type
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots cumulative reads coverages by BAM file
0
1
2
0
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a heatmap
0
1
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.
0
1
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a profile plot
0
1
0
0
0
A set of user-friendly tools for normalization and visualization of deep-sequencing data
(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
0
0
0
0
0
Call variants from the examples produced by make_examples
0
1
0
0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Transforms the input alignments to a format suitable for the deep neural network variant caller
0
1
2
3
0
1
0
1
0
1
0
1
0
0
0
0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
4
0
1
0
1
0
1
0
0
0
0
0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
2
3
0
1
0
1
0
1
0
1
0
0
0
0
0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
0
1
0
0
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Call structural variants
0
1
2
3
4
5
0
1
0
1
0
0
0
Structural variant discovery by integrated paired-end and split-read analysis
Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.
0
1
2
output_name
generate_gender_plot
genome
generate_diagnostic_plots
0
0
0
runs a differential expression analysis with DESeq2
0
1
2
3
0
1
2
0
1
0
1
0
0
0
0
0
0
0
0
0
0
Differential gene expression analysis based on the negative binomial distribution
Queries a DIAMOND database using blastp mode
0
1
0
1
outfmt
blast_columns
0
0
0
0
0
0
0
0
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
0
1
0
1
out_ext
blast_columns
0
0
0
0
0
0
0
0
0
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
0
1
0
0
Accelerated BLAST compatible local sequence aligner
Builds a DIAMOND database
0
1
taxonmap
taxonnodes
taxonnames
0
0
Accelerated BLAST compatible local sequence aligner
Doublet detection in single-cell RNA-seq data
0
1
0
0
0
Create DRAGEN hashtable for reference genome
0
1
0
0
Dragmap is the Dragen mapper/aligner Open Source Software.
Assemble bacterial isolate genomes from Nanopore reads
0
1
2
0
0
0
0
0
0
Performs rapid genome comparisons for a group of genomes and visualize their relatedness
0
1
0
0
De-replication of microbial genomes assembled from multiple samples
Export assembly segment sequences in GFA 1.0 format to FASTA format
0
1
0
0
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
0
1
0
0
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
0
1
0
0
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
0
1
0
0
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
0
1
0
0
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Calculates secondary structure assignments from PDB files using mkdssp (DSSP). DSSP is a standard tool for assigning secondary structure to amino acids in protein structures.
0
1
format
0
0
Calculates secondary structure information from PDB files.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
0
1
2
3
4
5
fasta
fasta_fai
0
0
Assessment of duplication rates in RNA-Seq datasets
0
1
0
1
0
0
0
0
0
0
0
0
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
0
1
2
0
1
2
0
0
0
Perform phasing of genotyped data with or without a reference panel
0
1
2
3
4
5
0
0
Fast genome-wide functional annotation through orthology assignment.
0
1
eggnog_db
eggnog_data_dir
0
1
0
0
0
0
Convert any PEP project or Nextflow samplesheet to any format
samplesheet
format
pep_input_base_dir
0
0
Convert any PEP project or Nextflow samplesheet to any format
Provide the SNP coverage of each individual in an eigenstrat formatted dataset.
0
1
2
3
0
0
0
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track
0
1
0
result
bigwig
versions
Analysis tools for genomic interaction data stored in .cool format
Convert a file in FASTA format to the ELFASTA format
0
1
0
0
0
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.
0
1
2
3
4
5
6
0
1
0
1
0
1
run_haplotypecaller
run_bqsr
bqsr_tables_only
get_activity_profile
get_assembly_regions
0
0
0
0
0
0
0
0
0
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Merge split bam/sam chunks in one file
0
1
0
0
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Split bam file into manageable chunks
0
1
0
0
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.
0
1
0
0
The European Molecular Biology Open Software Suite
the revseq program from emboss reverse complements a nucleotide sequence
0
1
0
0
The European Molecular Biology Open Software Suite
A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.
0
1
db
0
0
0
0
0
0
Emu is a relative abundance estimator for 16s genomic data.
endorS.py calculates endogenous DNA from samtools flagstat files and print to screen
0
1
2
3
0
0
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args
.
0
1
2
3
0
0
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
0
1
feature_file
0
0
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args
.
0
1
2
genome
species
cache_version
cache
0
1
extra_files
0
0
0
0
0
0
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Searches a term in a public NCBI database
0
1
database
0
0
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using Unique Identifier(s)
0
1
2
database
0
0
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using an UID
0
1
pattern
element
sep
0
0
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
phylogenetic placement of query sequences in a reference tree
0
1
2
3
bfastfile
binaryfile
0
0
0
0
Massively parallel phylogenetic placement of genetic sequences
splits an alignment into reference and query parts
0
1
2
0
0
0
Massively parallel phylogenetic placement of genetic sequences
estimation of the unfolded site frequency spectrum
0
1
2
3
0
0
0
Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly
0
1
0
0
0
EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.
Estimate repeat sizes using NGS data
0
1
2
0
1
0
1
0
1
0
0
0
0
Merge STR profiles into a multi-sample STR profile
0
1
0
1
0
1
0
0
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Compute genome-wide STR profile
0
1
2
0
1
0
1
0
0
0
0
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Run falco on sequenced reads
0
1
0
0
0
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Aligns sequences using FAMSA
0
1
0
1
compress
0
0
Algorithm for large-scale multiple sequence alignments
Renders a guidetree in famsa
0
1
0
0
Algorithm for large-scale multiple sequence alignments
Perform adapter and quality trimming on sequencing reads with reporting
0
1
0
0
0
0
0
0
0
0
tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.
0
1
hmm_model
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
A program that counts sequence occurrences in FASTQ files.
0
1
0
1
0
0
0
0
0
0
2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.
"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."
0
1
0
0
0
"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"
Quickly compute statistics over a fasta file in windows.
0
1
0
0
0
0
0
0
A fast K-mer counter for high-fidelity shotgun datasets
0
1
0
0
0
0
A fast K-mer counter for high-fidelity shotgun datasets
A fast K-mer counter for high-fidelity shotgun datasets
0
1
0
0
A fast K-mer counter for high-fidelity shotgun datasets
A tool to merge FastK histograms
0
1
2
3
0
0
0
0
A fast K-mer counter for high-fidelity shotgun datasets
Distance-based phylogeny with FastME
0
1
2
0
0
0
0
0
Perform adapter/quality trimming on sequencing reads
0
1
adapter_fasta
discard_trimmed_pass
save_trimmed_fail
save_merged
0
0
0
0
0
0
0
fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.
0
1
0
0
Build fastq screen config file from bowtie index files
genome_names
indexes
0
0
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Align reads to multiple reference genomes using fastq-screen
0
1
database
0
0
0
0
0
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Performs quality control of FASTQ files
0
1
0
0
Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
0
1
0
0
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Run NCBI's FCS adaptor on assembled genomes
0
1
0
0
0
0
0
0
The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.
Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.
0
1
gxdb
0
0
0
"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."
Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to remove foreign contamination from genome assemblies
0
1
2
0
0
0
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Fetches the NCBI FCS-GX database using a provided manifest URL
manifest
0
0
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to screen and remove foreign contamination from genome assemblies
0
1
2
gxdb
ramdisk_path
0
0
0
0
0
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
0
1
min_reads
min_baseq
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
0
1
min_reads
min_baseq
0
0
Tools for working with genomic and high throughput sequencing data.
Collects a suite of metrics to QC duplex sequencing data.
0
1
interval_list
0
0
0
0
0
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
Copies the UMI at the end of a bam files read name to the RX tag.
0
1
2
0
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
0
1
0
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
0
1
0
1
min_reads
min_baseq
max_base_error_rate
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
0
1
strategy
0
0
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.
0
1
0
0
Tools for working with genomic and high throughput sequencing data.
FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM
0
1
0
1
0
1
0
1
0
0
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Filtlong filters long reads based on quality measures or short read data.
0
1
2
0
0
0
A module for concatenation of gzipped or uncompressed files getting around UNIX terminal argument size
0
1
0
0
GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression
pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
A module for decompressing a large number of gzipped files, getting around the UNIX terminal argument limit
0
1
0
0
GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression
pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
Perform merging of mate paired-end sequencing reads
0
1
0
0
0
0
De novo assembler for single molecule sequencing reads
0
1
mode
0
0
0
0
0
0
0
Efficient compression tool for protein structures
0
1
0
0
Foldcomp: a library and format for compressing and indexing large protein structure sets
Decompression tool for foldcomp compressed structures
0
1
0
0
Foldcomp: a library and format for compressing and indexing large protein structure sets
Creates a database for Foldmason.
0
1
0
0
Multiple Protein Structure Alignment at Scale with FoldMason
Aligns protein structures using foldmason
0
1
0
1
compress
0
0
0
Multiple Protein Structure Alignment at Scale with FoldMason
Renders a visualization report using foldmason
0
1
0
1
0
1
0
1
0
0
Multiple Protein Structure Alignment at Scale with FoldMason
Create a database from protein structures
0
1
0
0
Foldseek: fast and accurate protein structure search
Search for protein structural hits against a foldseek database of protein structures
0
1
0
1
0
0
Foldseek: fast and accurate protein structure search
Generate processing masks for a give datacube definition and area of interest. These files can be used to spatially restrict downstream analysis tasks.
aoi
mask/datacube-definition.prj
shapefile_dbf
shapefile_prj
shapefile_shx
0
0
A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.
Compute valid tiles for a given datacube definition and area of interest. This list can be used by downstream analysis tasks to limit processing to the area of interest when satellite data covers a larger region.
aoi
datacube_definition
shapefile_dbf
shapefile_prj
shapefile_shx
0
0
A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
meta
0
0
fq is a library to generate and validate FASTQ file pairs.
fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.
0
1
0
0
fq is a library to generate and validate FASTQ file pairs.
A haplotype-based variant detector
0
1
2
3
4
5
0
1
0
1
0
1
0
1
0
1
0
0
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
0
1
2
repeats
barcodes
lineages_meta
0
0
0
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
specify the relative abundance of each known haplotype
0
1
2
barcodes
lineages_meta
0
0
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
downloads new versions of the curated SARS-CoV-2 lineage file and barcodes
db_name
0
0
0
0
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
call variant and sequencing depth information of the variant
0
1
fasta
0
0
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Build references for fusioncatcher
meta
0
0
Build genome for fusioncatcher
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data
0
1
0
1
0
0
0
0
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data
fusionreport_detect
0
1
2
3
0
1
0
fusion_list
fusion_list_filtered
report
html
csv
json
versions
Tool for parsing outputs from fusion detection tools
Build DB for fusionreport
NO input
0
0
Generate an interactive summary report from fusion detection tools.
Cluster genome FASTA files by average nucleotide identity
0
1
2
3
0
0
0
Gene Allele Mutation Microbial Assessment
0
1
db
0
0
0
0
0
Tool for Gene Allele Mutation Microbial Assessment
Build ganon database using custom reference sequences.
0
1
input_tsv
taxonomy_files
genome_size_files
0
0
0
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Classify FASTQ files against ganon database
0
1
db
0
0
0
0
0
0
0
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a ganon report file from the output of ganon classify
0
1
db
0
0
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a multi-sample report file from the output of ganon report runs
0
1
0
0
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
assigns taxonomy to query sequences in phylogenetic placement output
0
1
2
0
0
0
0
0
0
0
Genesis Applications for Phylogenetic Placement Analysis
Grafts query sequences from phylogenetic placement on the reference tree
0
1
0
0
Genesis Applications for Phylogenetic Placement Analysis
colours a phylogeny with placement densities
0
1
0
0
0
0
0
0
0
Genesis Applications for Phylogenetic Placement Analysis
Performs local realignment around indels to correct for mapping errors
0
1
2
3
0
1
0
1
0
1
0
1
0
0
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Generates a list of locations that should be considered for local realignment prior genotyping.
0
1
2
0
1
0
1
0
1
0
1
0
0
The full Genome Analysis Toolkit (GATK) framework, license restricted.
SNP and Indel variant caller on a per-locus basis
0
1
2
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Annotates intervals with GC content, mappability, and segmental-duplication content
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
fasta
fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
0
1
2
3
4
5
fasta
fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data
0
1
2
3
4
0
1
0
1
0
1
intervals
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an interval list from a bed file and a reference dict
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
0
1
2
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
estimates the parameters for the DRAGstr model
0
1
2
fasta
fasta_fai
dict
strtablefile
0
0
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a Convolutional Neural Net to filter annotated variants
0
1
2
3
4
fasta
fai
dict
architecture
weights
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
0
1
2
3
0
1
0
1
0
1
0
0
0
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
3
4
fasta
fasta_fai
dict
0
0
0
0
0
0
0
Genome Analysis Toolkit (GATK4)
Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
0
1
2
fasta
fai
dict
0
0
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
fasta
fasta_fai
dict
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges adjacent DepthEvidence records
0
1
2
fasta
fasta_fai
dict
0
0
0
Genome Analysis Toolkit (GATK4)
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a panel of normals constraining germline and artifactual sites for use with mutect2.
0
1
0
1
0
1
0
1
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Denoises read counts to produce denoised copy ratios
0
1
0
1
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Determines the baseline contig ploidy for germline samples given counts data
0
1
2
3
0
1
contig_ploidy_table
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Estimates the numbers of unique molecules in a sequencing library.
0
1
fasta
fai
dict
0
0
Genome Analysis Toolkit (GATK4)
Converts FastQ file to SAM/BAM format
0
1
0
0
Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters intervals based on annotations and/or count statistics.
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
0
1
2
3
4
5
6
7
0
1
0
1
0
1
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply tranche filtering
0
1
2
3
resources
resources_index
fasta
fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers scattered BQSR recalibration reports into a single file
0
1
0
0
Genome Analysis Toolkit (GATK4)
write your description here
0
1
dict
0
0
Genome Analysis Toolkit (GATK4)
merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
0
1
2
3
4
5
run_intlist
run_updatewspace
input_map
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
0
0
Genome Analysis Toolkit (GATK4)
Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.
0
1
2
3
4
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
0
1
2
3
0
1
0
1
0
1
variants
variants_tbi
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call germline SNPs and indels via local re-assembly of haplotypes
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an index for a feature file, e.g. VCF or BED file.
0
1
0
0
Genome Analysis Toolkit (GATK4)
Converts an Picard IntervalList file to a BED file.
0
1
0
0
Genome Analysis Toolkit (GATK4)
Splits the interval list file into unique, equally-sized interval files and place it under a directory
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Left align and trim variants using GATK4 LeftAlignAndTrimVariants.
0
1
2
3
fasta
fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
fasta
fasta_fai
0
0
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merge unmapped with mapped BAM files
0
1
2
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges mutect2 stats generated on different intervals/regions
0
1
0
0
Genome Analysis Toolkit (GATK4)
Merges several vcf files
0
1
0
1
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts copy number ratios (and optonally allelic counts) to copy number segments
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call somatic SNVs and indels via local assembly of haplotypes.
0
1
2
3
0
1
0
1
0
1
germline_resource
germline_resource_tbi
panel_of_normals
panel_of_normals_tbi
0
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
0
1
2
3
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Prepares bins for coverage collection.
0
1
0
1
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
0
1
2
0
1
0
1
0
1
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
bed
fasta
fasta_fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Condenses homRef blocks in a single-sample GVCF
0
1
2
3
fasta
fai
dict
dbsnp
dbsnp_tbi
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Reverts SAM or BAM files to a previous state.
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts BAM/SAM file to FastQ format
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Select a subset of variants from a VCF file
0
1
2
3
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a fasta with the bases shifted by offset
0
1
0
1
0
1
0
0
0
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence
0
1
2
0
1
fasta
fasta_fai
dict
0
0
0
Genome Analysis Toolkit (GATK4)
Splits CRAM files efficiently by taking advantage of their container based structure
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Split intervals into sub-interval files.
0
1
0
1
0
1
0
1
0
0
Genome Analysis Toolkit (GATK4)
Splits reads that contain Ns in their cigar string
0
1
2
3
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0
1
2
3
fasta
fasta_fai
dict
0
0
0
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
0
1
2
ploidy_table
fasta
fasta_fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filter variants
0
1
2
0
1
0
1
0
1
0
1
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
0
1
2
resource_vcf
resource_tbi
labels
fasta
fai
dict
0
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Extract fields from a VCF file to a tab-delimited table
0
1
2
3
4
5
0
1
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
0
1
2
3
4
fasta
fai
dict
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0
1
2
3
fasta
fai
dict
known_sites
known_sites_tbi
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
fasta
fasta_fai
dict
0
0
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.
0
1
program_file
disable_redirect_output
0
0
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
0
1
2
model_dir
0
0
0
0
0
0
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
Convert a mappability file to bedgraph format
0
1
0
1
0
0
0
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
0
1
0
0
0
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Define the mappability of a reference
0
1
read_length
0
0
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
0
1
0
0
0
The GEM indexer (v3).
Performs fastq alignment to a fasta reference using using gem3-mapper
0
1
0
1
sort_bam
0
0
The GEM indexer (v3).
A derivative of GenomeScope2.0 modified to work with FastK
0
1
0
0
0
0
0
0
0
0
create index file for genmap
0
1
0
0
Ultra-fast computation of genome mappability.
create mappability files for a genome
0
1
0
1
0
0
0
0
0
Ultra-fast computation of genome mappability.
for annotating regions, frequencies, cadd scores
0
1
0
0
Annotate genetic inheritance models in variant files
Score compounds
0
1
0
0
Annotate genetic inheritance models in variant files
annotate models of inheritance
0
1
2
reduced_penetrance
0
0
Annotate genetic inheritance models in variant files
Score the variants of a vcf based on their annotation
0
1
2
score_config
0
0
Annotate genetic inheritance models in variant files
Download geNomad databases and related files
NO input
0
0
Identification of mobile genetic elements
Identify mobile genetic elements present in genomic assemblies
0
1
genomad_db
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Identification of mobile genetic elements
Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach
0
1
0
0
0
0
0
0
0
0
0
Genotype Salmonella Typhi from Mykrobe results
0
1
0
0
Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)
Peak-calling for ChIP-seq and ATAC-seq enrichment experiments
0
1
2
blacklist_bed
0
0
0
0
0
0
geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA
geo_accession
0
0
Retrieves GEO data from the Gene Expression Omnibus (GEO)
0
1
0
0
0
0
Get data from NCBI Gene Expression Omnibus (GEO)
Downloads databases needed for running getorganelle
organelle_type
0
0
Get organelle genomes from genome skimming data
Assembles organelle genomes from genomic data
0
1
0
1
0
0
0
Get organelle genomes from genome skimming data
Collapse walk-preserving shared affixes in variation graphs in GFA format
0
1
0
0
0
A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
0
1
out_fmt
genome_size
target
0
1
0
1
0
1
0
1
0
0
0
Converts GFA or rGFA files to FASTA
0
1
0
0
Tools for manipulating sequence graphs in the GFA and rGFA formats
Summary statistics for GFA files
0
1
0
0
Tools for manipulating sequence graphs in the GFA and rGFA formats
Compare, merge, annotate and estimate accuracy of generated gtf files
0
1
0
1
2
0
1
0
0
0
0
0
0
0
0
Validate, filter, convert and perform various other operations on GFF files
0
1
fasta
0
0
0
0
gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
0
1
0
0
0
gget enables efficient querying of genomic databases
Defines chunks where to run imputation
0
1
2
3
0
0
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.
0
1
2
3
4
5
6
7
min_prob
min_dp
bins
0
0
0
0
0
0
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Concatenates imputation chunks in a single VCF/BCF file ligating phased information.
0
1
2
0
0
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
0
1
2
3
4
5
6
7
8
0
0
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Generates haplotype calls by sampling haplotype estimates
0
1
0
0
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Defines chunks where to run imputation
0
1
2
3
4
model
0
0
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Program to compute the genotyping error rate at the sample or marker level.
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
0
0
0
0
0
0
0
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.
0
1
2
0
0
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool for imputation and phasing from vcf file or directly from bam files.
0
1
2
3
4
5
6
7
8
9
0
1
2
0
0
0
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool to create a binary reference panel for quick reading time.
0
1
2
3
4
0
1
0
0
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
merge gVCF files and perform joint variant calling
0
1
2
0
1
vcf_output
0
0
0
GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).
0
1
2
type_report
summary_report
skip
examine
0
0
0
0
0
0
0
Writes a sorted concatenation of file/s
0
1
0
0
Writes a sorted concatenation of file/s
Query metadata for any taxon across the tree of life.
0
1
2
0
0
goat-cli is a command line interface to query the Genomes on a Tree Open API.
Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
0
1
2
0
1
0
0
0
0
0
0
0
0
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
runs a functional enrichment analysis with gprofiler2
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
An R interface corresponding to the 2019 update of g:Profiler web tool.
Checks if the input file is bgzip compressed or not
0
1
0
0
a wee tool for random access into BGZF files.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
fasta
0
0
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Tools for population-scale genotyping using pangenome graphs.
0
1
0
0
0
A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
2
3
0
1
0
1
0
1
0
0
0
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
0
1
0
1
0
0
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
2
3
0
1
0
1
0
1
bedpe
bed
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
high_conf_sv
all_sv
versions
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0
1
0
1
0
0
0
GRIDSS: the Genomic Rearrangement IDentification Software Suite
run the Broad Gene Set Enrichment tool in GSEA mode
0
1
2
3
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Gene Set Enrichment Analysis (GSEA)
Collapse redundant transcript models in Iso-Seq data.
0
1
fasta
0
0
0
0
0
0
0
0
0
0
Collapse similar gene model
Merge multiple transcriptomes while maintaining source information.
0
1
filelist
0
0
0
0
0
Gene-Switch Transcriptome Annotation by Modular Algorithms
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
0
1
0
0
0
0
Gene-Switch Transcriptome Annotation by Modular Algorithms
GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files
0
1
0
0
0
The GenomeTools genome analysis system
GenomeTools gt-gff3validator utility to strictly validate a GFF3 file
0
1
0
0
0
The GenomeTools genome analysis system
Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility
0
1
0
0
0
0
0
The GenomeTools genome analysis system
GenomeTools gt-stat utility to show statistics about features contained in GFF3 files
0
1
0
0
The GenomeTools genome analysis system
Computes enhanced suffix array using GenomeTools gt-suffixerator utility
0
1
mode
0
0
The GenomeTools genome analysis system
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
0
1
0
1
use_pplacer_scratch_dir
mash_db
0
0
0
0
0
0
0
0
0
0
0
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Converts the output classifications of GTDB-TK from GTDB taxonomy to NCBI taxonomy
0
1
2
0
1
0
1
0
0
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
alignment
0
0
0
0
0
0
0
0
0
0
Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes
db_name
0
0
Python package for detection of chimerism and contamination in prokaryotic genomes.
Merging of CheckM and GUNC results in one summary table
0
1
2
0
0
Python package for detection of chimerism and contamination in prokaryotic genomes.
Detection of Chimerism and Contamination in Prokaryotic Genomes
0
1
db
0
0
0
Python package for detection of chimerism and contamination in prokaryotic genomes.
Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.
0
1
0
0
gvcftools is a package of small utilities for creating and analyzing gVCF files
Tool to convert and summarize ABRicate outputs using the hAMRonization specification
0
1
format
software_version
reference_db_version
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.
0
1
format
software_version
reference_db_version
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize DeepARG outputs using the hAMRonization specification
0
1
format
software_version
reference_db_version
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize fARGene outputs using the hAMRonization specification
0
1
format
software_version
reference_db_version
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize RGI outputs using the hAMRonization specification.
0
1
format
software_version
reference_db_version
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to summarize and combine all hAMRonization reports into a single file
reports
format
0
0
0
0
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.
0
1
0
0
0
classification into haplogroups
0
1
format
0
0
A tool for mtDNA haplogroup classification.
classification into haplogroups
0
1
0
0
A tool for mtDNA haplogroup classification.
Somatic VCF Feature Extraction tool from hap.y.
0
1
2
3
4
0
1
0
1
0
0
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
Haplotype VCF comparison tools
Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py
0
1
2
0
1
0
1
0
0
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
0
0
0
Haplotype VCF comparison tools somatic variant comparison
Generating cell hashing calls from a matrix of count data.
0
1
2
3
0
0
0
0
0
0
0
0
HelitronScanner draw tool for Helitron transposons in genomes
0
1
0
1
0
1
0
0
HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes
HelitronScanner scanHead and scanTail tools for Helitron transposons in genomes
0
1
command
lcv_filepath
buffer_size
0
0
HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes
Fast and sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)
0
1
0
1
0
0
HH-suite3 for fast remote homology detection and deep protein annotation
Sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)
0
1
0
1
0
0
HH-suite3 for fast remote homology detection and deep protein annotation
Reformat a Multiple Sequence Alignment (MSA) file
0
1
informat
outformat
0
0
HH-suite3 for fast remote homology detection and deep protein annotation
Identify cap locus serotype and structure in your Haemophilus influenzae assemblies
0
1
database_dir
model_fp
0
0
0
0
Computes PCA eigenvectors for a Hi-C matrix.
0
1
0
0
0
0
Set of programs to process, analyze and visualize Hi-C and capture Hi-C data
Whole-genome assembly using PacBio HiFi reads
0
1
2
0
1
2
0
1
2
0
1
0
0
0
0
0
0
0
0
0
0
0
Align RNA-Seq reads to a reference with HISAT2
0
1
0
1
0
1
0
0
0
0
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Extracts splicing sites from a gtf files
0
1
0
0
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Pre-compute the graph index structure.
0
1
0
0
HLA typing from short and long reads
Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.
0
1
2
3
0
0
0
0
0
0
0
0
0
0
0
HLA typing from short and long reads
gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference
0
1
0
0
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.
0
1
0
0
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file
0
1
0
0
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
readCounter function from HMMcopy utilities, used to generate read in windows
0
1
2
0
0
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Mask multiple sequence alignments
0
1
2
3
4
5
6
7
maskfile
0
0
0
0
0
0
0
0
Biosequence analysis using profile hidden Markov models
reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.
0
1
0
0
Biosequence analysis using profile hidden Markov models
hmmalign from the HMMER suite aligns a number of sequences to an HMM profile
0
1
hmm
0
0
Biosequence analysis using profile hidden Markov models
create an hmm profile from a multiple sequence alignment
0
1
mxfile
0
0
0
Biosequence analysis using profile hidden Markov models
extract hmm from hmm database file or create index for hmm database
0
1
key
keyfile
index
0
0
0
Biosequence analysis using profile hidden Markov models
compress and index profile database for hmmscan
0
1
0
0
Biosequence analysis using profile hidden Markov models
R script that scores output from multiple runs of hmmer/hmmsearch
0
1
0
0
Biosequence analysis using profile hidden Markov models
A Language and Environment for Statistical Computing
Tidyverse: R packages for data science
search profile(s) against a sequence database
0
1
2
3
4
5
0
0
0
0
0
Biosequence analysis using profile hidden Markov models
iterative searches to detect distant homologs by refining an HMM profile from hits
0
1
2
3
4
5
0
0
0
0
0
Biosequence analysis using profile hidden Markov models
Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline
0
1
0
0
Human mitochondrial variants annotation using HmtVar.
Annotate peaks with HOMER suite
0
1
fasta
gtf
0
0
0
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Find peaks with HOMER suite
0
1
uniqmap
0
0
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Create a tag directory with the HOMER suite
0
1
fasta
0
0
0
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Differential gene expression analysis based on the negative binomial distribution
Empirical Analysis of Digital Gene Expression Data in R
Create a UCSC bed graph with the HOMER suite
0
1
0
0
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Converting from HOMER peak to BED file formats
0
1
0
0
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
write your description here
0
1
0
1
0
0
0
Hostile: accurate host decontamination
Downloads required reference genomes for Hostile
index_name
0
0
Hostile: accurate host decontamination
Demultiplex samples based on data from cell hashing.
0
1
2
0
0
0
0
0
count how many reads map to each feature
0
1
2
0
1
0
0
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
0
1
2
0
1
2
0
0
useful command-line tools written to show-case hts-nim
HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.
0
1
0
1
0
0
0
0
0
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals
wigs
gc_wig
map_wig
centromere
rep_time_wig
exons
0
0
0
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA
0
1
gc_wig
map_wig
normal_wig
normal_background
centromere
rep_time_wig
exons
0
0
0
0
0
0
0
0
0
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
Plot a metagene of cross-link events/sites around various transcriptomic landmarks.
0
1
segmentation
0
0
Runs iCount peaks on a BED file of crosslinks
0
1
2
0
0
Computational pipeline for analysis of iCLIP data
Formats a GTF file for use with iCount sigxls
0
1
fai
0
0
0
Computational pipeline for analysis of iCLIP data
Runs iCount sigxls on a BED file of crosslinks
0
1
segmentation
0
0
0
Computational pipeline for analysis of iCLIP data
Report proportion of cross-link events/sites on each region type.
0
1
segmentation
0
0
0
0
Computational pipeline for analysis of iCLIP data
igv.js is an embeddable interactive genome visualization component
0
1
2
0
0
0
0
Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as the genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements
A Python application to generate self-contained HTML reports for variant review and other genomic applications
0
1
2
3
0
1
2
0
0
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
0
1
2
ilp
0
0
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
0
1
2
3
ilp
0
0
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Perform immune cell deconvolution using RNA-seq data and various computational methods.
0
1
2
3
gene_symbol_col
0
0
0
Search covariance models against a sequence database
0
1
2
write_align
write_target
0
0
0
0
Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.
inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification
0
1
genome_fasta
genes_fasta
stb_file
0
0
0
0
0
0
0
0
Calculation of strain-level metrics
Detect integrons in DNA sequences
0
1
0
0
0
0
0
Produces protein annotations and predictions from an amino acids FASTA file
0
1
interproscan_database
0
0
0
0
0
Download, extract, and check md5 of iPHoP databases
NO input
0
0
Predict host genus from genomes of uncultivated phages.
Predict phage host using iPHoP
0
1
iphop_db
0
0
0
0
Predict host genus from genomes of uncultivated phages.
Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.
0
1
2
tree_te
lmclust
mdef
partitions_equal
partitions_proportional
partitions_unlinked
guide_tree
sitefreq_in
constraint_tree
trees_z
suptree
trees_rf
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Quantification of transposable elements expression in scRNA-seq
0
1
genome
bed
0
0
0
0
0
Genomic island prediction in bacterial and archaeal genomes
0
1
0
0
0
IsoSeq - Cluster - Cluster trimmed consensus sequences
0
1
0
0
0
0
0
0
0
0
0
0
0
0
IsoSeq - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
0
1
primers
0
0
0
0
0
0
IsoSeq - Scalable De Novo Isoform Discovery
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
meta
bam
meta
version
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
meta
bam
primers
meta
bam
pbi
consensusreadset
summary
report
versions
IsoSeq3 - Scalable De Novo Isoform Discovery
Extract UMI and cell barcodes
0
1
design
0
0
0
Iso-Seq - Scalable De Novo Isoform Discovery
Generate a consensus sequence from a BAM file using iVar
0
1
fasta
save_mpileup
0
0
0
0
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Trim primer sequences rom a BAM file with iVar
0
1
2
bed
0
0
0
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Call variants from a BAM file using iVar
0
1
fasta
fai
gff
save_mpileup
0
0
0
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Efficiently counts k-mers from DNA sequencing reads using a fast, memory-efficient, parallelized algorithm
0
1
kmer_length
size
0
0
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence
Dumps the results from a jellyfish binary file into a human readable format
0
1
0
0
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence
Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.
0
1
parameters
input_files
0
0
0
Jupyter notebooks as plain text scripts or markdown documents
Parameterize, execute, and analyze notebooks
Parameterize, execute, and analyze notebooks
Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)
0
1
0
0
Java utilities for Bioinformatics.
Convert sam files to tsv files
0
1
2
3
0
1
2
3
0
0
Java utilities for Bioinformatics.
Convert VCF to a user friendly table
0
1
2
3
0
1
0
0
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
annotate VCF files for poly repeats
0
1
2
3
0
1
0
1
0
1
0
0
0
0
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Plot whole genome coverage from BAM/CRAM file as SVG
0
1
2
0
1
0
1
0
1
0
0
Java utilities for Bioinformatics.
Taxonomic classification of metagenomic sequence data using a protein reference database
0
1
db
0
0
Fast and sensitive taxonomic classification for metagenomics
Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.
0
1
db
0
0
Fast and sensitive taxonomic classification for metagenomics
write your description here
0
1
db
taxon_rank
0
0
Fast and sensitive taxonomic classification for metagenomics
Merge two tab-separated output files of Kaiju and Kraken in the column format
0
1
2
db
0
0
Fast and sensitive taxonomic classification for metagenomics
Make Kaiju FMI-index file from a protein FASTA file
0
1
keep_intermediate
0
0
0
0
Fast and sensitive taxonomic classification for metagenomics
Aligns sequences using kalign
0
1
compress
0
0
Kalign is a fast and accurate multiple sequence alignment algorithm.
Create kallisto index
0
1
0
0
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
0
1
0
1
gtf
chromosomes
fragment_length
fragment_length_sd
0
0
0
0
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
quantifies scRNA-seq data from fastq files using kb-python.
0
1
index
t2g
t1c
t2c
technology
workflow_mode
0
0
0
kallisto and bustools are wrapped in an easy-to-use program called kb
index creation for kb count quantification of single-cell data.
fasta
gtf
workflow_mode
0
0
0
0
0
0
0
kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.
Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.
0
1
2
0
0
khmer k-mer counting library
Removes low abundance k-mers from FASTA/FASTQ files
0
1
0
0
khmer k-mer counting library
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
0
1
kmer_size
0
0
0
khmer k-mer counting library
Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).
0
1
0
0
Generate k-mers (sketches) from FASTA/Q sequences
0
1
0
0
0
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Construct KMCP database from k-mer files
0
1
0
0
0
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Merge search results from multiple databases.
0
1
0
0
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Generate taxonomic profile from search results
0
1
db
0
0
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Search sequences against database
0
1
db
0
0
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Adds fasta files to a Kraken2 taxonomic database
0
1
taxonomy_names
taxonomy_nodes
accession2taxid
seqid2taxid
0
0
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Builds Kraken2 database
0
1
cleaning
0
0
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Downloads and builds Kraken2 standard database
cleaning
0
0
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Classifies metagenomic sequence data
0
1
db
save_output_fastqs
save_reads_assignment
0
0
0
0
0
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Takes multiple kraken-style reports and combines them into a single report file
0
1
0
0
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Extract reads classified at any user-specified taxonomy IDs.
taxid
0
1
0
1
0
1
0
0
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Takes a Kraken report file and prints out a krona-compatible TEXT file
0
1
0
0
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Download and build (custom) KrakenUniq databases
0
1
2
3
keep_intermediate
0
0
Metagenomics classifier with unique k-mer counting for more specific results
Download KrakenUniq databases and related fles
pattern
0
0
Metagenomics classifier with unique k-mer counting for more specific results
Classifies metagenomic sequence data using unique k-mer counts
0
1
2
sequence_type
db
save_output_reads
report_file
save_output
0
0
0
0
0
Metagenomics classifier with unique k-mer counting for more specific results
KronaTools Update Taxonomy downloads a taxonomy database
NO input
0
0
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.
0
1
taxonomy
0
0
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Creates a Krona chart from text files listing quantities and lineages.
0
1
0
0
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Update Taxonomy downloads a taxonomy database
NO input
0
0
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Aligns query sequences to target sequences indexed with lastdb
0
1
2
index
0
0
0
LAST finds & aligns related regions of sequences.
Prepare sequences for subsequent alignment with lastal.
0
1
0
0
LAST finds & aligns related regions of sequences.
Converts MAF alignments in another format.
0
1
2
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
LAST finds & aligns related regions of sequences.
Reorder alignments in a MAF file
0
1
0
0
LAST finds & aligns related regions of sequences.
Post-alignment masking
0
1
0
0
LAST finds & aligns related regions of sequences.
Find suitable score parameters for sequence alignment
0
1
index
0
0
0
LAST finds & aligns related regions of sequences.
Align sequences using learnMSA
0
1
0
0
learnMSA: Learning and Aligning large Protein Families
Bayesian reconstruction of ancient DNA fragments
0
1
0
0
0
0
0
0
0
0
0
Typing of clinical and environmental isolates of Legionella pneumophila
0
1
0
0
Index chain files for lift over
0
1
chain
0
0
Fast and accurate coordinate conversion between assemblies
Converting aligned short and long reads records from one reference to another
0
1
0
1
0
0
Fast and accurate coordinate conversion between assemblies
runs a differential expression analysis with Limma
0
1
2
3
4
5
0
1
2
0
0
0
0
0
0
0
Linear Models for Microarray Data
LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. This module is for LINKS >=2.0.0 and does not support MPET input.
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
Serogrouping Listeria monocytogenes assemblies
0
1
0
0
Lofreq subcommand to for insert base and indel alignment qualities
0
1
fasta
0
0
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments
0
1
2
fasta
0
0
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
It predicts variants using multiple processors
0
1
2
3
0
1
0
1
0
0
0
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Lofreq subcommand to remove variants with low coverage or strand bias potential
0
1
0
0
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Inserts indel qualities in a BAM file
0
1
0
1
0
0
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
2
3
4
5
0
1
0
1
0
0
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0
1
0
1
0
0
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
0
0
0
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0
1
2
3
4
5
0
1
0
1
0
0
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
"A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps:
- Tigmint cuts the draft assembly at potentially misassembled regions
- ntLink is then used to scaffold the corrected assembly
- followed by ARKS for further scaffolding (optional)"
0
1
0
1
command
span
genomesize
longmap
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder
0
1
0
0
0
A Perl wrapper for LTR_FINDER
An efficient program for finding full-length LTR retrotranspsons in genome sequences
Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain
0
1
0
0
0
A Perl wrapper for LTR_harvest
The GenomeTools genome analysis system
Identifies LTR retrotransposons using LTR_retriever
meta
genome
harvest
finder
mgescan
non_tgca
meta
log
pass_list
pass_list_gff
ltrlib
annotation_out
annotation_gff
versions
Sensitive and accurate identification of LTR retrotransposons
Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments
0
1
pass_list
annotation_out
monoploid_seqs
0
0
0
Assessing genome assembly quality using the LTR Assembly Index (LAI)
Identifies LTR retrotransposons using LTR_retriever
0
1
harvest
finder
mgescan
non_tgca
0
0
0
0
0
0
0
Sensitive and accurate identification of LTR retrotransposons
A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.
0
1
0
0
0
0
0
0
A pipeline for AMP (antimicrobial peptide) prediction
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
0
1
2
macs2_gsize
0
0
0
0
0
0
Model Based Analysis for ChIP-Seq data
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
0
1
2
macs3_gsize
0
0
0
0
0
0
Model Based Analysis for ChIP-Seq data
Multiple sequence alignment using MAFFT
0
1
0
1
0
1
0
1
0
1
0
1
0
fas
versions
Parallel implementation of the gzip algorithm.
Multiple sequence alignment using MAFFT
0
1
0
1
0
1
0
1
0
1
0
1
compress
0
0
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
Parallel implementation of the gzip algorithm.
Guide tree rendering using MAFFT
0
1
0
0
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
mageck count for functional genomics, reads are usually mapped to a specific sgRNA
0
1
library
0
0
0
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
maximum-likelihood analysis of gene essentialities computation
0
1
design_matrix
0
0
0
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.
0
1
0
0
0
0
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Multiple Sequence Alignment using Graph Clustering
0
1
0
1
compress
0
0
Multiple Sequence Alignment using Graph Clustering
Multiple Sequence Alignment using Graph Clustering
0
1
0
0
Multiple Sequence Alignment using Graph Clustering
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
fastas
gff
mapping_db
map_type
0
0
0
A tool for mapping metagenomic data
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
0
1
index
0
0
0
0
A tool for mapping metagenomic data
Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening
0
1
taxon_list
ncbi_dir
0
0
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
0
0
0
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
config
0
0
0
0
0
0
0
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
5
6
0
1
0
1
config
0
0
0
0
0
0
0
0
0
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
config
0
0
0
0
0
0
0
Structural variant and indel caller for mapped sequencing data
Create mapAD index for reference genome
0
1
0
0
An aDNA aware short-read mapper
Map short-reads to an indexed reference genome
0
1
0
1
mismatch_parameter
double_stranded_library
five_prime_overhang
three_prime_overhang
deam_rate_double_stranded
deam_rate_single_stranded
indel_rate
0
0
An aDNA aware short-read mapper
Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
0
1
fasta
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Screens query sequences against large sequence databases
0
1
0
1
0
0
Fast sequence distance estimator that uses MinHash
Creates vastly reduced representations of sequences using MinHash
0
1
0
0
0
Fast sequence distance estimator that uses MinHash
MaxBin is a software that is capable of clustering metagenomic contigs
0
1
2
3
0
0
0
0
0
0
0
0
0
0
Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"
0
1
2
raw
0
0
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.
Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.
0
1
0
1
0
1
0
0
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
0
1
0
0
0
Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.
0
1
0
0
Staging modules for MCMICRO
Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.
0
1
0
0
Staging modules for MCMICRO
mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences
0
1
0
0
Analyses a DAA file and exports information in text format
0
1
megan_summary
0
0
0
A tool for studying the taxonomic content of a set of DNA reads
Analyses an RMA file and exports information in text format
0
1
megan_summary
0
0
0
A tool for studying the taxonomic content of a set of DNA reads
Performs taxonomic profiling of long metagenomic reads against the melon database
0
1
database
k2_db
0
0
0
0
Serotyping of Neisseria meningitidis assemblies
0
1
0
0
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
0
1
0
1
lookup_table
seqmers
peak
0
0
0
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
k-mer based assembly evaluation.
meta
meryl_db
assembly
meta
versions
assembly_only_kmers_bed
assembly_only_kmers_wig
stats
dist_hist
spectra_cn_fl_png
spectra_cn_ln_png
spectra_cn_st_png
spectra_cn_hist
spectra_asm_fl_png
spectra_asm_ln_png
spectra_asm_st_png
spectra_asm_hist
assembly_qv
scaffold_qv
read_ploidy
A script to generate hap-mer dbs for trios
0
1
maternal_meryl
paternal_meryl
0
0
0
0
0
0
Evaluate genome assemblies with k-mers and more.
k-mer based assembly evaluation.
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Evaluate genome assemblies with k-mers and more.
Produces maternal and paternal FastK kmer tables from maternal, paternal and child FastK tables
0
1
0
1
0
1
0
0
0
FastK based version of Merqury
A reimplemenation of Kat Comp to work with FastK databases
0
1
2
3
4
0
0
0
0
FastK based version of Merqury
A reimplemenation of KatGC to work with FastK databases
0
1
2
0
0
0
0
FastK based version of Merqury
FastK based version of Merqury
0
1
2
3
4
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
FastK based version of Merqury
An improved version of Smudgeplot using FastK
0
1
2
0
0
0
0
FastK based version of Merqury
A genomic k-mer counter (and sequence utility) with nice features.
0
1
kvalue
0
0
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
kvalue
0
0
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
0
1
kvalue
0
0
A genomic k-mer counter (and sequence utility) with nice features.
Depth computation per contig step of metabat2
0
1
2
0
0
Metagenome binning of contigs
0
1
2
0
0
0
0
0
0
Taxonomic profiling database building with MetaCache
0
1
taxonomy
seq2taxid
0
0
Metacache query command for taxonomic classification
0
1
db
do_abundances
0
0
0
Annotation of eukaryotic metagenomes using MetaEuk
0
1
database
0
0
0
0
0
Strain-level metagenomic assignment
0
1
2
3
4
database_folder
0
0
0
0
0
0
0
0
Maps long reads to a metamaps database
0
1
database
0
0
0
0
0
Metagenome assembler for long-read sequences (HiFi and ONT).
0
1
input_type
0
0
0
MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
Build MetaPhlAn database for taxonomic profiling.
NO input
0
0
Merges output abundance tables from MetaPhlAn4
0
1
0
0
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
0
1
metaphlan_db_latest
0
0
0
0
Merges output abundance tables from MetaPhlAn3
0
1
0
0
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
0
1
metaphlan_db
0
0
0
0
Export METASPACE datasets to AnnData and SpatialData objects
ds_id
0
0
0
A module to download dataset results from the METASPACE platform and save them as CSV files, using a containerized Python script. Inputs are provided via a CSV file or a list of datasets, with results saved to a specified output directory.
0
1
2
0
0
0
Extracts per-base methylation metrics from alignments
0
1
2
fasta
fai
0
0
0
Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.
Generates methylation bias plots from alignments
0
1
2
fasta
fai
0
0
Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.
Demultiplex MGI fastq files
0
1
2
0
0
0
0
0
0
0
0
0
0
Demultiplex MGI fastq files
A tool to estimate bacterial species abundance
0
1
0
1
mode
0
0
An integrated pipeline for estimating strain-level genomic variation from metagenomic data
marks duplicate spots along gridline edges.
0
1
0
0
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
0
1
0
0
Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.
Minia is a short-read assembler based on a de Bruijn graph
0
1
0
0
0
0
Compression of a reference panel for genotype imputation to .msav
format
0
1
2
0
0
Computationally efficient genotype imputation
Imputation of genotypes using a reference panel
0
1
2
3
4
5
6
0
0
Computationally efficient genotype imputation
Provides fasta index required by minimap2 alignment.
0
1
0
0
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by miniprot alignment.
0
1
0
0
A versatile pairwise aligner for genomic and protein sequences.
miRanda is an algorithm for finding genomic targets for microRNAs
0
1
mirbase
0
0
miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.
0
1
0
1
0
0
miRDeep2 Mapper (mapper.pl
) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.
miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
0
1
2
0
1
0
1
2
3
0
0
miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
It includes three main scripts: miRDeep2.pl
, mapper.pl
, and quantifier.pl
for comprehensive miRNA detection and quantification.
mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.
0
1
0
1
0
1
2
0
0
Small RNA-seq annotation
mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package
0
1
0
1
0
1
2
0
0
0
0
Small RNA-seq annotation
mirtop gff generates the GFF3 adapter format to capture miRNA variations
0
1
0
1
0
1
2
0
0
Small RNA-seq annotation
mirtop gff gets the number of isomiRs and miRNAs annotated in the GFF file by isomiR category.
0
1
0
0
0
Small RNA-seq annotation
A tool for quality control and tracing taxonomic origins of microRNA sequencing data
0
1
2
mirtrace_species
0
0
0
0
0
0
miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.
Download a mitochondrial genome to be used as reference for MitoHiFi
0
1
0
0
0
Fetch mitochondrial genome in Fasta and Genbank format from NCBI
A python workflow that assembles mitogenomes from Pacbio HiFi reads
0
1
ref_fa
ref_gb
input_mode
mito_code
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
A python workflow that assembles mitogenomes from Pacbio HiFi reads
Cluster sequences using MMSeqs2 cluster.
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create an MMseqs database from an existing FASTA/Q file
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Creates sequence index for mmseqs database
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create a tsv file from a query and a target database as well as the result database
0
1
0
1
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Download an mmseqs-formatted database
database
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences using MMSeqs2 easy cluster.
0
1
0
0
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a database using MMseqs2
0
1
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences in linear time using MMSeqs2 linclust.
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Search and calculate a score for similar sequences in a query and a target database.
0
1
0
1
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Computes the lowest common ancestor by identifying the query sequence homologs against the target database.
0
1
db_target
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
database
0
0
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Subclonal deconvolution of cancer genome sequencing data.
0
1
0
0
0
0
0
0
0
A tool to reconstruct plasmids in bacterial assemblies
0
1
0
0
0
0
0
Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.
A bioinformatics tool for working with modified bases
0
1
2
0
1
2
0
1
0
0
0
0
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data
Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.
0
1
0
0
One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.
Download the mOTUs database
motus_downloaddb_script
0
0
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
Taxonomic meta-omics profiling using universal marker genes
0
1
db
profile_version_yml
0
0
0
Marker gene-based OTU (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
0
1
db
0
0
Marker gene-based operational taxonomic unit (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
0
1
db
0
0
0
0
0
Marker gene-based OTU (mOTU) profiling
Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data
0
1
2
3
4
5
6
0
0
0
0
0
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Scan a reference genome to get microsatellite & homopolymer information
0
1
0
0
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
msisensor2 detection of MSI regions.
0
1
2
3
4
5
scan
models
0
0
0
0
0
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
msisensor2 detection of MSI regions.
fasta
output
0
0
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
0
1
2
3
4
5
0
1
msisensor_scan
0
0
0
0
0
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro/pro is a tool used to evaluate MSI using single (tumor) sample sequencing data
0
1
2
0
1
0
1
0
1
0
0
0
0
0
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
0
1
0
0
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Aligns protein structures using mTM-align
0
1
compress
0
0
0
Algorithm for structural multiple sequence alignments
Parallel implementation of the gzip algorithm.
A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.
0
1
mt_id
0
0
0
Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.
0
1
index
gtf
rad
0
0
0
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.
gtf
0
0
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Aggregate results from bioinformatics analyses across many samples into a single report
multiqc_files
multiqc_config
extra_multiqc_config
multiqc_logo
replace_names
sample_names
0
0
0
0
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
Identify singlets, doublets and negative cells from multiplexing experiments. Annotate singlets by tags.
0
1
2
0
0
0
0
SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA
0
1
0
1
0
1
0
1
allele_freqs
genotype_quality
coverage
homozygous_freq
heterozygous_freq
0
1
0
0
0
0
0
0
0
0
0
0
0
0
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two
0
1
0
0
0
0
0
0
0
0
0
Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.
0
1
compress
0
0
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
Parallel implementation of the gzip algorithm.
AMR predictions for supported species
0
1
species
0
0
0
Antibiotic resistance prediction in minutes
NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.
0
1
0
1
0
0
0
R package that uses two main functions to summarize and visualize NanoString RCC files,
namely: load_rcc()
and visualise()
. It also includes a function normalise()
, which (re)calculates
sample specific size factors and normalises the data.
For more information vignette("NACHO")
and vignette("NACHO-analysis")
NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.
0
1
0
1
0
0
0
0
R package that uses two main functions to summarize and visualize NanoString RCC files,
namely: load_rcc()
and visualise()
. It also includes a function normalise()
, which (re)calculates
sample specific size factors and normalises the data.
For more information vignette("NACHO")
and vignette("NACHO-analysis")
nail search is a fast and scalable tool for searching protein sequences against protein databases
0
1
0
1
write_align
0
0
0
0
Profile Hidden Markov Model (pHMM) biological sequence alignment tool
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
0
0
0
0
0
0
0
0
0
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
0
1
0
0
0
0
0
Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
0
1
output_format
0
0
0
Create DRAGEN hashtable for reference genome
0
1
0
0
narfmap is a fork of the Dragen mapper/aligner Open Source Software.
A tool to quickly download assemblies from NCBI's Assembly database
meta
accessions
taxids
groups
0
0
0
0
0
0
0
0
0
0
0
0
0
0
NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose
0
1
0
1
0
0
"NCBI libraries for biology applications (text-based utilities)"
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
dataset
tag
0
0
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
0
1
dataset
0
0
0
0
0
0
0
0
0
0
0
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
Performs fastq alignment to a fasta reference using NextGenMap
0
1
fasta
0
0
NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime
Merging paired-end reads and removing sequencing adapters.
0
1
0
0
0
0
Annotates GC content fraction to regions in a BED file.
0
1
0
1
0
1
0
0
Short-read sequencing tools
Annotates a BED file with the average coverage of the regions from one or several BAM/CRAM file(s).
0
1
2
3
0
1
0
1
0
0
Short-read sequencing tools
Determines the gender of a sample from the BAM/CRAM file.
0
1
2
0
1
0
1
method
0
0
Short-read sequencing tools
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.
0
1
0
1
0
0
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.
0
1
0
1
0
1
0
0
0
0
0
0
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
0
1
0
1
0
1
0
0
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
0
1
0
0
0
0
0
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
write your description here
meta
reads
format
mode
meta
versions
npa
npc
npl
npo
Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file
0
1
0
0
Estimate average coverage and create curves for metagenomic datasets
Calculate metagenome redundancy curve from FASTQ files
0
1
format
mode
0
0
0
0
0
Estimate average coverage and create curves for metagenomic datasets
Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files
0
1
0
0
0
0
0
Estimate average coverage and create curves for metagenomic datasets
Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image
0
1
0
0
Estimate average coverage and create curves for metagenomic datasets
NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.
0
1
2
0
0
0
Construct a dynamic succinct variation graph in ODGI format from a GFAv1.
0
1
0
0
An optimized dynamic genome/graph implementation
Draw previously-determined 2D layouts of the graph with diverse annotations.
0
1
2
0
0
An optimized dynamic genome/graph implementation
Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.
0
1
0
0
0
An optimized dynamic genome/graph implementation
Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.
0
1
0
0
An optimized dynamic genome/graph implementation
Squeezes multiple graphs in ODGI format into the same file in ODGI format.
0
1
0
0
An optimized dynamic genome/graph implementation
Metrics describing a variation graph and its path relationship.
0
1
0
0
0
An optimized dynamic genome/graph implementation
Merge unitigs into a single node preserving the node order.
0
1
0
0
An optimized dynamic genome/graph implementation
Project a graph into other formats.
0
1
0
0
An optimized dynamic genome/graph implementation
Visualize a variation graph in 1D.
0
1
0
0
An optimized dynamic genome/graph implementation
Create a decoy peptide database from a standard FASTA database.
0
1
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
0
1
0
0
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
0
1
2
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Calculates a distribution of the mass error from given mass spectra and IDs.
0
1
2
0
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Merges several idXML files into one idXML file.
0
1
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Split a merged identification file into their originating identification files
0
1
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Switches between different scores of peptide or protein hits in identification data
0
1
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)
0
1
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Refreshes the protein references for all peptide hits.
0
1
2
0
0
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Annotates MS/MS spectra using Comet.
0
1
2
0
0
0
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.
0
1
0
1
0
0
0
A python library and a command-line client for up- and downloading files to and from your Open Science Framework projects
0
1
2
0
0
The osfclient is a python library and a command-line client for up- and downloading files to and from your Open Science Framework projects.
A program to convert bam into paf.
0
1
0
0
A program to manipulate paf files / convert to and from paf.
a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates
0
1
0
0
Find and remove PCR/optical duplicates
0
1
0
0
0
CLI tools to process mapped Hi-C data
Flip pairs to get an upper-triangular matrix
0
1
chromsizes
0
0
CLI tools to process mapped Hi-C data
Merge multiple pairs/pairsam files
0
1
0
0
CLI tools to process mapped Hi-C data
Find ligation junctions in .sam, make .pairs
0
1
chromsizes
0
0
0
CLI tools to process mapped Hi-C data
Assign restriction fragments to pairs
0
1
frag
0
0
CLI tools to process mapped Hi-C data
Select pairs according to given condition by options.args
0
1
0
0
0
CLI tools to process mapped Hi-C data
Sort a .pairs/.pairsam file
0
1
0
0
CLI tools to process mapped Hi-C data
Split a .pairsam file into .pairs and .sam.
0
1
0
0
0
CLI tools to process mapped Hi-C data
Calculate pairs statistics
0
1
0
0
CLI tools to process mapped Hi-C data
Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file
0
1
bed_subset
bed_exclude
tsv_groupby
0
0
panacus is a tool for computing counting statistics for GFA files
Create visualizations from a tsv coverage histogram created with panacus.
0
1
0
0
panacus is a tool for computing counting statistics for GFA files
A fast and scalable tool for bacterial pangenome analysis
0
1
0
0
0
panaroo - an updated pipeline for pangenome investigation
Phylogenetic Assignment of Named Global Outbreak LINeages
0
1
db
0
0
Phylogenetic Assignment of Named Global Outbreak LINeages
Phylogenetic Assignment of Named Global Outbreak LINeages
dbname
0
0
Phylogenetic Assignment of Named Global Outbreak LINeages
NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).
0
1
0
1
0
1
0
1
0
1
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database
0
1
2
3
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.
0
1
2
3
0
1
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.
0
1
0
1
0
1
0
1
0
1
output_fmt
0
0
0
0
0
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence
0
1
0
1
0
1
known_sites
0
0
0
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs
0
1
0
1
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.
0
1
2
3
0
1
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.
0
1
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated minimap2 for aligning long read sequences against a large reference database using an accelerated KSW2 to convert FASTQ to BAM/CRAM.
0
1
0
1
0
1
0
1
output_fmt
0
0
0
0
0
0
0
0
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Determines the depth in a BAM/CRAM file
0
1
2
0
1
0
1
0
0
0
Graph realignment tools for structural variants
Genotype structural variants using paragraph and grmpy
0
1
2
3
4
5
0
1
0
1
0
0
0
Graph realignment tools for structural variants
Convert a VCF file to a JSON graph
0
1
0
1
0
0
Graph realignment tools for structural variants
The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.
0
1
0
0
0
PacBio BAM C++ library
Alignment with PacBio's minimap2 frontend
0
1
0
1
0
0
A minimap2 frontend for PacBio native data formats
pbsv - PacBio structural variant (SV) signature discovery tool
0
1
0
1
0
0
pbsv - PacBio structural variant (SV) calling and analysis tools
converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq
0
1
2
0
0
pbtk - PacBio BAM toolkit
Minimalistic tool which creates an index file that enables random access into PacBio BAM files
0
1
0
0
pbtk - PacBio BAM toolkit
"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."
0
1
0
0
0
0
Predict prophages in bacterial genomes
0
1
0
0
0
0
0
0
0
0
0
0
0
0
Prophage finder using multiple metrics
phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
0
1
silva_db
univec_db
0
0
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
0
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates an interval list from a bed file and a reference dict
0
1
0
1
0
0
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the alignment summary of a paired-end library.
0
1
0
1
0
0
Java tools for working with NGS data in the BAM format
Collects hybrid-selection (HS) metrics for a SAM or BAM file.
0
1
2
3
4
0
1
0
1
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the insert size distribution of a paired-end library.
0
1
0
0
0
Java tools for working with NGS data in the BAM format
Collect multiple metrics from a BAM file
0
1
2
0
1
0
1
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics from a RNAseq BAM file
0
1
ref_flat
fasta
rrna_intervals
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
0
1
2
0
1
0
1
intervallist
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates a sequence dictionary for a reference sequence.
0
1
0
0
Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.
Checks that all data in the set of input files appear to come from the same individual
0
1
2
3
4
5
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.
0
1
2
haplotype_map
fasta
fasta_fai
sequence_dictionary
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Converts a FASTQ file to an unaligned BAM or SAM file.
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list
0
1
2
filter
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Verify mate-pair information between mates and fix if needed
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Lifts over a VCF file from one reference build to another.
0
1
0
1
0
1
0
1
0
0
0
Move annotations from one assembly to another
Locate and tag duplicate reads in a BAM file
0
1
0
1
0
1
0
0
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the mean quality by cycle of a paired-end library.
0
1
0
0
0
Java tools for working with NGS data in the BAM format
Merges multiple BAM files into a single file
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads
0
1
2
0
0
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
changes name of sample in the vcf file
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases
0
1
0
1
0
1
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
This tool takes in a coordinate-sorted SAM or BAM and calculatesthe NM, MD, and UQ tags by comparing with the reference.
0
1
0
1
0
0
0
Sorts BAM/SAM files based on a variety of picard specific criteria
0
1
sort_order
0
0
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts vcf files
0
1
0
1
0
1
0
0
Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format
Compresses files with pigz.
0
1
0
0
Parallel implementation of the gzip algorithm.
write your description here
0
1
0
0
Parallel implementation of the gzip algorithm.
Automatically improve draft assemblies and find variation among strains, including large event detection
0
1
0
1
2
pilon_mode
0
0
0
0
0
0
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
0
1
2
fasta
fai
bed
0
0
0
0
0
0
0
0
0
0
0
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Main caller script for peak calling
0
1
2
assay_type
0
0
0
0
0
Peak Identifier for Nascent Transcripts Starts (PINTS)
Identify plasmids in bacterial sequences and assemblies
0
1
0
0
0
0
0
0
Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Exclude variant identifiers from plink bfiles
0
1
2
3
4
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink bfiles with a text file of variant identifiers
0
1
2
3
4
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0
1
2
3
0
1
0
1
0
1
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Calculates identity-by-descent over autosomal SNPs
0
1
2
3
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate GWAS association studies
0
1
2
3
0
1
0
1
0
1
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate Hardy-Weinberg statistics for provided input
0
1
2
3
0
1
0
1
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.
0
1
2
3
window_size
variant_count
variance_inflation_factor
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.
0
1
2
3
window_size
variant_count
r2_threshold
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
LD analysis in PLINK examines genetic variant associations within populations
0
1
2
3
0
1
0
1
0
1
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Recodes plink bfiles into a new text fileset applying different modifiers
0
1
2
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink pfiles with a text file of variant identifiers
0
1
2
3
4
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Filters plink bfiles or pfiles with filters such as maf or var
0
1
2
3
0
0
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Calculate Inbreeding data with plink2
0
1
2
3
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Produce pruned set of variants in approximatelinkage equilibrium
0
1
2
3
win
step
r2
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Perform PCA analysis using PLINK
0
1
2
3
4
5
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Remove samples from a plink2 dataset
0
1
2
3
sample_exclude_list
0
0
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Apply a scoring system to each sample in a plink 2 fileset
0
1
2
3
scorefile
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Import variant genetic data using plink2
0
1
0
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Convert from VCF file to BGEN file version 1.2 format preserving dosages.
0
1
2
3
4
0
0
0
0
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
pmdtools command to filter ancient DNA molecules from others
0
1
2
threshold
reference
0
0
Compute postmortem damage patterns and decontaminate ancient genomes
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
0
0
0
Polishing genome assemblies with short reads.
0
1
0
1
save_debug
0
0
0
Polishing genome assemblies with short reads.
PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.
0
1
0
1
0
1
2
0
0
0
0
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.
0
1
2
3
0
0
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
0
1
2
0
0
0
0
0
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.
0
1
2
0
0
0
0
0
0
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
0
1
custom_adapters
0
0
0
Adapter removal and demultiplexing of Oxford Nanopore reads
0
1
0
0
0
Adapter removal and demultiplexing of Oxford Nanopore reads
Run all Portcullis steps in one go
0
1
0
1
0
1
0
0
0
0
0
0
0
0
Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.
Software for predicting library complexity and genome coverage in high-throughput sequencing
0
1
0
0
0
Software for predicting library complexity and genome coverage in high-throughput sequencing
Software for predicting library complexity and genome coverage in high-throughput sequencing
0
1
0
0
0
Software for predicting library complexity and genome coverage in high-throughput sequencing
Calculate pairwise nucleotide identity with respect to a reference sequence
0
1
0
1
compress
0
0
0
0
0
Filter reads by quality score.
0
1
0
0
0
0
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
a module to generate images from Pretext contact maps.
0
1
0
0
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
0
1
0
0
0
0
0
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
0
1
output_format
0
0
0
0
0
Whole genome annotation of small genomes (bacterial, archeal, viral)
0
1
proteins
prodigal_tf
0
0
0
0
0
0
0
0
0
0
0
0
0
frame-shift correction for long read (meta)genomics - fix frameshifts in reads
0
1
0
1
0
0
frame-shift correction for long read (meta)genomics
frame-shift correction for long read (meta)genomics - maps proteins to reads
0
1
2
0
0
frame-shift correction for long read (meta)genomics
Perform Gene Ratio Enrichment Analysis
0
1
0
1
0
0
0
Gene Ratio Enrichment Analysis
Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)
0
1
0
0
0
Logratio methods for omics data
Perform differential proportionality analysis
0
1
2
3
0
1
2
0
0
0
0
0
0
0
0
0
Logratio methods for omics data
Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.
0
1
0
0
0
0
0
0
0
Logratio methods for omics data
Efficient Estimation of Covariance and (Partial) Correlation
Proteinortho is a tool to detect orthologous genes within different species.
0
1
0
0
0
0
reads a maxQuant proteinGroups file with Proteus
0
1
2
0
0
0
0
0
0
0
0
0
0
R package for analysing proteomics data
Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.
0
1
2
intervals
0
0
0
0
0
Copy number calling and SNV classification using targeted short read sequencing
Generate on and off-target intervals for PureCN from a list of targets
0
1
0
1
genome
0
0
0
Copy number calling and SNV classification using targeted short read sequencing
Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.
0
1
2
3
genome
assay
0
0
0
0
0
0
Copy number calling and SNV classification using targeted short read sequencing
Run PureCN workflow to normalize, segment and determine purity and ploidy
0
1
2
normal_db
genome
0
0
0
0
0
0
0
0
0
0
0
0
Copy number calling and SNV classification using targeted short read sequencing
Calculate coverage cutoffs to determine when to purge duplicated sequence.
0
1
0
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Separates out sequences purged of falsely duplicated sequences.
0
1
2
0
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Plots the read coverage from a purge dups statistics file and cutoffs.
0
1
2
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Create read depth histogram and base-level read depth for an assembly based on pacbio data
0
1
0
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Purge haplotigs and overlaps for an assembly
0
1
2
3
0
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Split fasta file by 'N's to aid in self alignment for duplicate purging
0
1
0
0
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Damage parameter estimation for ancient DNA
0
1
2
0
0
Damage parameter estimation for ancient DNA
Damage parameter estimation for ancient DNA
0
1
0
0
Damage parameter estimation for ancient DNA
Compute summary statistics for control gene from BAM files.
0
1
2
control_gene
0
0
A Python package for pharmacogenomics research
Call SNVs/indels from BAM files for all target genes.
0
1
2
0
1
0
0
0
A Python package for pharmacogenomics research
Prepare a depth of coverage file for all target genes with SV from BAM files.
0
1
2
0
0
A Python package for pharmacogenomics research
PyPGx pharmacogenomics genotyping pipeline for NGS data.
0
1
2
3
4
5
0
1
0
0
0
0
A Python package for pharmacogenomics research
Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.
0
1
output_format
0
0
0
0
0
Evaluate alignment data
0
1
gff
0
0
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
0
1
2
gff
fasta
fasta_fai
0
0
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
0
1
0
1
0
0
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
0
0
0
0
0
Read aware low coverage whole genome sequence imputation from a reference panel
Homology-based assembly patching: Make continuous joins and fill gaps in 'target.fa' using sequences from 'query.fa'
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
Fast reference-guided genome assembly scaffolding
Scaffolding is the process of ordering and orienting draft assembly (query) sequences into longer sequences. Gaps (stretches of "N" characters) are placed between adjacent query sequences to indicate the presence of unknown sequence. RagTag uses whole-genome alignments to a reference assembly to scaffold query sequences. RagTag does not alter input query sequence in any way and only orders and orients sequences, joining them with gaps.
0
1
0
1
0
1
0
1
2
0
0
0
0
Fast reference-guided genome assembly scaffolding
Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.
alignment
0
0
0
Randomly subsample sequencing reads to a specified coverage
0
1
2
depth_cutoff
0
0
De novo genome assembler for long uncorrected reads.
0
1
0
0
0
RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.
0
1
2
0
0
0
Extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format.
0
1
2
0
0
RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.
Screening DNA sequences for interspersed repeats and low complexity DNA sequences
0
1
lib
0
0
0
0
0
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
A utility script to assist to convert old RepeatMasker *.out files to version 3 gff files.
0
1
0
0
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
Create a database for RepeatModeler
0
1
0
0
RepeatModeler is a de-novo repeat family identification and modeling package.
Performs de novo transposable element (TE) family identification with RepeatModeler
0
1
0
0
0
0
RepeatModeler is a de-novo repeat family identification and modeling package.
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
0
1
2
db_point
db_res
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data
card
0
0
0
0
This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.
Predict antibiotic resistance from protein or nucleotide data
0
1
card
wildcard
0
0
0
0
0
0
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Markup VCF file using rho-calls.
0
1
2
0
1
bed
0
0
Call regions of homozygosity and make tentative UPD calls.
Call regions of homozygosity and make tentative UPD calls
0
1
0
1
0
0
0
Call regions of homozygosity and make tentative UPD calls.
Quality control of riboseq bam data
0
1
2
0
1
2
0
1
2
0
1
0
1
0
1
0
0
0
0
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Quality control of riboseq bam data
0
1
2
0
1
0
0
0
0
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Accurate detection of short and long active ORFs using Ribo-seq data
0
1
2
0
1
0
0
0
0
0
0
0
0
0
0
0
Python package to detect translating ORF from Ribo-seq data
Accurate detection of short and long active ORFs using Ribo-seq data
0
1
2
0
0
Python package to detect translating ORF from Ribo-seq data
Render an rmarkdown notebook. Supports parametrization.
0
1
parameters
input_files
0
0
0
0
0
Dynamic Documents for R
Assess the quality of an RNAseq assembly with or without a reference genome
0
1
0
1
0
1
0
0
Calculate pan-genome from annotated bacterial assemblies in GFF3 format
0
1
0
0
0
Calculate expression with RSEM
0
1
index
0
0
0
0
0
0
0
0
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Prepare a reference genome for RSEM
fasta
gtf
0
0
0
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Generate statistics from a bam file
0
1
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Infer strandedness from sequencing reads
0
1
bed
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate inner distance between read pairs.
0
1
bed
0
0
0
0
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
bed
0
0
0
0
0
0
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
0
1
bed
0
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
0
1
bed
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate read duplication rate
0
1
0
0
0
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate TIN (transcript integrity number) from RNA-seq reads
0
1
2
bed
0
0
0
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
The bndeval tool of RTG tools. It is used to evaluate called BND type of variants for agreement with a BND baseline variant set
0
1
2
3
4
5
0
0
0
0
0
0
0
0
0
0
0
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Converts a PED file to VCF headers
0
1
0
0
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.
0
1
0
0
0
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The svdecompose tool of RTG tools. It is used to decompose structural variants to BNDs
0
1
2
0
0
0
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set
0
1
2
3
4
5
6
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Uses the RTN R package for transcriptional regulatory network inference (TNI).
0
1
0
0
0
0
0
RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons
CAZyme annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
0
1
dbcan_db
0
0
0
0
0
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
command from run_dbcan to prepare the database for dbCAN annotation.
NO input
0
0
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
CGC annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
0
1
0
1
2
dbcan_db
0
0
0
0
0
0
0
0
0
0
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
0
1
0
1
2
dbcan_db
0
0
0
0
0
0
0
0
0
0
0
0
0
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
Prediction of a protein's secondary structure from its amino acid sequence
0
1
0
0
Accurate prediction of a protein's secondary structure from its amino acid sequence
sage is a search software for proteomics data
0
1
0
1
0
1
0
0
0
0
0
0
Proteomics searching so fast it feels like magic.
Create index for salmon
genome_fasta
transcript_fasta
0
0
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
gene/transcript quantification with Salmon
0
1
index
gtf
transcript_fasta
alignment_mode
lib_type
0
0
0
0
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
SALSA, A tool to scaffold long read assemblies with HiC
0
1
2
bed
gfa
dup
filter_bed
0
0
0
0
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
0
1
2
database
0
0
0
0
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
Outputs some statistics drawn from read flags.
0
1
0
0
Tools for working with SAM/BAM data
find and mark duplicate reads in BAM file
0
1
0
0
0
process your BAM data faster!
This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file
0
1
0
0
Module to validate illumina® Sample Sheet v2 files.
0
1
file_schema_validator
0
0
Clips read alignments where they match BED file defined regions
0
1
bed
save_cliprejects
save_clipstats
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0
1
split
0
0
Tools for dealing with SAM, BAM and CRAM files
Outputs a FASTA file compressed with the BGZF algorithm
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
calculates MD and NM tags
0
1
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Concatenate BAM or CRAM file
0
1
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
shuffles and groups reads together by their names
0
1
0
1
0
0
0
0
Tools for dealing with SAM, BAM and CRAM files
The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0
1
0
1
interleave
0
0
0
0
0
Tools for dealing with SAM, BAM and CRAM files
Produces a consensus FASTA/FASTQ/PILEUP
0
1
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
convert and then index CRAM -> BAM or BAM -> CRAM file
0
1
2
0
1
0
1
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
produces a histogram or table of coverage per chromosome
0
1
2
0
1
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
List CRAM Content-ID and Data-Series sizes
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Computes the depth at each position or region.
0
1
0
1
0
0
Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region
Create a sequence dictionary file from a FASTA file
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index FASTA file, and optionally generate a file of chromosome sizes
0
1
0
1
get_sizes
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Converts a SAM/BAM/CRAM file to FASTA
0
1
interleave
0
0
0
0
0
Tools for dealing with SAM, BAM and CRAM files
Converts a SAM/BAM/CRAM file to FASTQ
0
1
interleave
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
0
1
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
0
1
2
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Reports alignment summary statistics for a BAM/CRAM/SAM file
0
1
2
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
converts FASTQ files to unmapped SAM/BAM/CRAM
0
1
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index SAM/BAM/CRAM file
0
1
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
mark duplicate alignments in a coordinate sorted file
0
1
0
1
0
0
0
0
Tools for dealing with SAM, BAM and CRAM files
Merge BAM or CRAM file
0
1
0
1
0
1
0
1
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
BAM
0
1
2
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file
0
1
0
1
0
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Sort SAM/BAM/CRAM file
0
1
0
1
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces comprehensive statistics from SAM/BAM/CRAM file
0
1
2
0
1
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
0
1
2
0
1
qname
index_format
0
0
0
0
0
0
0
0
0
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Filter cells and genes in single-cell RNA-seq data using Scanpy
0
1
0
0
0
0
0
0
h5ad
versions
Single-Cell Analysis in Python
Detect doublets in single-cell RNA-seq data using Scrublet via Scanpy
0
1
batch_col
0
0
0
Single-Cell Analysis in Python
SCIMAP is a suite of tools that enables spatial single-cell analyses
0
1
0
0
0
Scimap is a scalable toolkit for analyzing spatial molecular data.
SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.
0
1
0
0
0
0
Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.
The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier
0
1
0
1
mei_ref
0
0
0
0
Soft Clipped Read Alignment Mapper
The cluster_identifier tool of Scramble identifies soft clipped clusters
0
1
2
0
1
0
0
Soft Clipped Read Alignment Mapper
Module to use scAR to remove ambient RNA from single-cell RNA-seq data
0
1
2
0
0
scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data
scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.
Detect doublets in single-cell RNA-Seq data
0
1
0
0
0
A scalable toolkit for probabilistic modeling applied to single-cell omics data
Call peaks using SEACR on sequenced reads in bedgraph format
0
1
2
threshold
0
0
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
0
1
fasta
index
0
0
0
0
0
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
Generate genome indices for segemehl align
fasta
0
0
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
metagenomic binning with self-supervised learning
0
1
2
0
0
0
0
0
Metagenomic binning with semi-supervised siamese neural network
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
0
1
2
3
4
5
0
1
0
1
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Create BWA index for reference genome
0
1
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
0
1
0
1
0
1
0
1
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the Picard CollectVariantCallingMetrics tool.
0
1
2
0
1
2
0
1
0
1
0
1
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the GATK DepthOfCoverage tool.
0
1
2
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects multiple quality metrics from a bam file
0
1
2
0
1
0
1
plot_results
0
0
0
0
0
0
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
0
1
2
0
1
0
1
0
0
0
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
0
1
2
0
1
0
1
0
1
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
pcr_indel_model
emit_vcf
emit_gvcf
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
0
1
2
3
0
1
0
1
0
1
0
1
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
0
1
2
3
4
0
1
0
1
0
1
0
1
emit_vcf
emit_gvcf
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Generate recalibration table and optionally perform base quality recalibration
0
1
2
0
1
0
1
0
1
0
1
0
1
generate_recalibrated_bams
0
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.
0
1
2
0
1
0
1
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Filters the raw output of sentieon/tnhaplotyper2.
0
1
2
3
4
5
6
0
1
0
1
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
1
0
1
emit_orientation_data
emit_contamination_data
0
0
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
0
1
2
resource_vcf
resource_tbi
labels
fasta
fai
0
0
0
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects whole genome quality metrics from a bam file
0
1
2
0
1
0
1
0
1
0
0
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.
0
1
0
0
Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.
Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.
0
1
0
0
DNA sequence utilities for FASTX files
Statistics for FASTA or FASTQ files
0
1
0
0
0
Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files
Concatenating multiple uncompressed sequence files together
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Convert FASTQ to FASTA format
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Select sequences from a large file based on name/ID
0
1
pattern
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
match up paired-end reads from two fastq files
0
1
0
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Use seqkit to find/replace strings within sequences and sequence headers
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
0
0
0
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
0
1
0
0
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Use seqkit to generate sliding windows of input fasta
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Sorts sequences by id/name/sequence/length
0
1
0
0
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Split single or paired-end fastq.gz files
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
simple statistics of FASTA/Q files
0
1
0
0
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Translate DNA/RNA to protein sequence
0
1
0
0
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Salmonella serotype prediction from reads and assemblies
0
1
0
0
0
0
Generates a BED file containing genomic locations of lengths of N.
0
1
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
0
1
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Rename sequence names in FASTQ or FASTA files.
0
1
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.
Subsample reads from FASTQ files
0
1
2
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Common transformation operations on FASTA or FASTQ files.
0
1
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.
Select only sequences that match the filtering condition
0
1
filter_list
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Trim low quality bases from FastQ files
0
1
0
0
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Sequence quality metrics for FASTQ and uBAM files.
0
1
0
0
0
PileupCaller is a tool to create genotype calls from bam files using read-sampling methods
0
1
snpfile
sample_names_fn
0
0
0
0
Tools for population genetics on sequencing data
Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file
0
1
2
fasta
wigfile
0
0
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.
Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.
0
1
0
0
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.
Induce a variation graph in GFA format from alignments in PAF format
0
1
2
0
0
seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
0
0
0
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.
0
1
sample_list_file
0
0
0
Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.
0
1
2
0
0
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.
0
1
2
3
4
0
1
2
0
1
2
0
1
0
0
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.
0
1
2
3
4
0
1
2
3
0
1
0
0
Fast and accurate method for estimation of haplotypes (phasing)
Program to compute switch error rate and genotyping error rate given simulated or trio data.
0
1
2
3
4
0
1
2
0
1
2
0
0
Fast and accurate method for estimation of haplotypes (phasing)
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
0
1
0
0
0
0
Determine Shigella serotype from Illumina or Oxford Nanopore reads
0
1
0
0
0
Determine Shigella serotype from assemblies or Illumina paired-end reads
0
1
0
0
build and deploy Shiny apps for interactively mining differential abundance data
0
1
2
3
0
1
2
contrast_stats_assay
0
0
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make plots for interpretation of differential abundance statistics
0
1
0
1
2
3
0
0
0
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots
0
1
2
3
4
0
0
0
0
0
0
0
0
0
0
0
0
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
validate consistency of feature and sample annotations with matrices and contrasts
0
1
2
0
1
0
1
0
0
0
0
0
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
A windowed adaptive trimming tool for FASTQ files using quality
0
1
2
0
0
0
0
0
Indexing of transcriptome for gene expression quantification using SimpleAF
0
1
2
0
1
0
1
0
1
0
0
0
0
SimpleAF is a tool for quantification of gene expression from RNA-seq data
simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
0
1
2
0
1
2
0
1
2
3
resolution
0
1
0
0
0
SimpleAF is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
Calculate pairwise distances and basic clustering from SKA sketches
0
1
2
0
0
0
0
0
SKA (Split Kmer Analysis)
Create genome sketch using split k-mers
0
1
2
0
0
SKA (Split Kmer Analysis)
Simple ANI calculation between reference and query genomes.
0
1
0
1
0
0
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
Memory-efficient ANI database queries with skani.
0
1
0
1
0
0
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
Storing skani sketches/indices on disk.
0
1
0
0
0
0
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
All-to-all ANI computation.
0
1
0
0
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.
0
1
2
0
0
0
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.
0
1
2
3
0
1
0
1
0
0
structural variant calling and genotyping with existing tools, but, smoothly
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."
0
1
0
1
0
0
0
Create a SNAP index for reference genome
0
1
2
3
4
0
0
Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
structural-variant calling with sniffles
0
1
2
0
1
0
1
vcf_output
snf_output
0
0
0
0
Core-SNP alignment from Snippy outputs
0
1
2
reference
0
0
0
0
0
0
Rapid bacterial SNP calling and core genome alignments
Rapid haploid variant calling
0
1
reference
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Rapid bacterial SNP calling and core genome alignments
Genetic variant annotation and functional effect prediction toolbox
0
1
0
0
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Genetic variant annotation and functional effect prediction toolbox
0
1
db
0
1
0
0
0
0
0
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Annotate a VCF file with another VCF file
0
1
2
0
1
2
0
0
SnpSift is a toolbox that allows you to filter and manipulate annotated files
The dbNSFP is an integrated database of functional predictions from multiple algorithms
0
1
2
0
1
2
0
0
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Splits/Joins VCF(s) file into chromosomes
0
1
0
0
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
0
1
2
0
0
0
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
2
0
1
0
1
0
1
0
0
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
0
1
2
sample_groups
0
0
0
0
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Local sequence alignment tool for filtering, mapping and clustering.
0
1
0
1
0
1
0
0
0
0
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Classifies and predicts the origin of metagenomic samples
0
1
sources
labels
taxa_sqlite
taxa_sqlite_traverse_pkl
0
0
Compare many FracMinHash signatures generated by sourmash sketch.
0
1
file_list
save_numpy_matrix
save_csv
0
0
0
0
Compute and compare FracMinHash signatures for DNA and protein data sets.
Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.
0
1
database
save_unassigned
save_matches_sig
save_prefetch
save_prefetch_csv
0
0
0
0
0
0
Compute and compare FracMinHash signatures for DNA data sets.
Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.
0
1
ksize
0
0
Compute and compare FracMinHash signatures for DNA data sets.
Create a signature (a group of FracMinHash sketches) of a sequence using sourmash
0
1
0
0
Compute and compare FracMinHash signatures for DNA and protein data sets.
Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.
0
1
taxonomy
0
0
Compute and compare FracMinHash signatures for DNA data sets.
Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data
0
1
2
3
4
5
6
7
8
9
reference
probeset
0
0
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.
gtf
0
0
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.
fasta
gtf
reference_name
0
0
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Assembles a small genome (bacterial, fungal, viral)
0
1
2
3
yml
hmm
0
0
0
0
0
0
0
0
mutational signature deconvolution of cancer cells
0
1
0
0
0
0
0
0
0
SparseSignatures is an R-based computational framework which performs de novo extraction, inference, interpretation, or deconvolution of mutational counts of a large number of patients.
Reference Genome Sequence (hs37d5), based on NCBI GRCh37
Full genomic sequences for Homo sapiens (UCSC genome hg38)
Spotiflow, accurate and efficient spot detection with stereographic flow.
0
1
0
0
Fast, efficient, lossless compression of FASTQ files.
0
1
2
0
0
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Fast, efficient, lossless decompression of FASTQ files.
0
1
write_one_fastq_gz
0
0
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
0
1
ncbi_settings
certificate
0
0
SRA Toolkit and SDK from NCBI
Download sequencing data from the NCBI Sequence Read Archive (SRA).
0
1
ncbi_settings
certificate
0
0
SRA Toolkit and SDK from NCBI
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
versions
ncbi_settings
SRA Toolkit and SDK from NCBI
Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
0
1
2
db_type
0
0
0
0
0
0
Short Read Sequence Typing for Bacterial Pathogens
Advanced sequence file format conversions
0
1
fasta
fai
gzi
0
0
0
Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Predicts Staphylococcus aureus SCCmec type based on primers.
0
1
0
0
Align reads to a reference genome using STAR
0
1
0
1
0
1
star_ignore_sjdbgtf
seq_platform
seq_center
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create index for STAR
0
1
0
1
0
0
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Get the minimal allowed index version from STAR
NO input
0
0
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
0
1
0
0
0
0
0
0
0
0
0
Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.
Framework that scores enhancer–gene interactions using the Activity-By-Contact model and derives transcription factor affinities on gene level
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
0
Download STAR-fusion genome resource required to run STAR-Fusion caller
0
1
0
1
fusion_annot_lib
dfam_species
0
0
Fusion calling algorithm for RNAseq data
Serotype STEC samples from paired-end reads or assemblies
0
1
0
0
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
0
1
2
3
4
5
6
7
8
9
10
0
1
2
seed
0
0
0
0
0
0
Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.
0
1
0
1
0
0
0
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
0
1
2
3
4
fasta
fai
0
0
0
0
0
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
0
1
2
3
4
5
6
7
8
fasta
fai
0
0
0
0
0
Strelka calls somatic and germline small variants from mapped sequencing reads
Merges the annotation gtf file and the stringtie output gtf files
stringtie_gtf
annotation_gtf
0
0
Transcript assembly and quantification for RNA-Seq
Transcript assembly and quantification for RNA-Se
0
1
annotation_gtf
0
0
0
0
0
Transcript assembly and quantification for RNA-Seq
Count reads that map to genomic features
0
1
2
0
0
0
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.
SummarizedExperiment container
0
1
0
1
0
1
0
0
0
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Converts a bedpe file to a VCF file (beta version)
0
1
0
0
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
0
1
2
minsv
maxsv
minallelefreq
minnumreads
0
0
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
0
1
max_distance_breakpoints
min_supporting_callers
account_for_type
account_for_sv_strands
estimate_distanced_by_sv_size
min_sv_size
0
0
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
0
1
0
1
0
1
snp_mutation_frequency
sim_reads
0
0
0
0
0
0
Toolset for SV simulation, comparison and filtering
Report multiple stats over a VCF file
0
1
minsv
maxsv
minnumreads
0
0
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
0
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
0
1
2
3
4
5
0
1
0
1
0
0
0
0
0
0
SVanalyzer: tools for the analysis of structural variation in genomes
Build a structural variant database
0
1
input_type
0
0
structural variant database software
The merge module merges structural variants within one or more vcf files.
0
1
input_priority
sort_inputs
0
0
0
0
structural variant database software
Query a structural variant database, using a vcf file as query
0
1
in_occs
in_frqs
out_occs
out_frqs
vcf_dbs
bedpe_dbs
0
0
structural variant database software
Performs tests on BAF files
0
1
2
3
4
0
0
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
0
1
0
0
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
0
1
2
fasta_fai
0
0
0
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
0
1
0
1
0
0
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
0
1
2
0
0
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert a VCF file to a BEDPE file.
0
1
0
0
Tools for processing and analyzing structural variants
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
0
1
2
3
0
1
0
1
0
0
0
0
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
0
1
2
3
0
1
0
0
0
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
0
1
2
3
0
0
0
Sylph profile command for taxonoming profiling
0
1
database
0
0
Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.
Sketching/indexing sequencing reads
0
1
reference
0
0
Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.
Merge multiple taxonomic profiles from sylphtaxt/taxprof into a tsv table
0
1
data_type
0
0
Integrating taxonomic information into the sylph metagenome profiler.
Incorporates taxonomy into sylph metagenomic classifier
0
1
taxonomy
0
0
Integrating taxonomic information into the sylph metagenome profiler.
Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.
0
1
0
1
0
1
file_type
0
0
0
Compresses/decompresses files
0
1
0
0
0
Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.
create tabix index from a sorted bgzip tab-delimited genome file
0
1
0
0
0
Generic indexer for TAB-delimited genome position files.
Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA
0
1
0
0
Convert taxonids to taxon lineages
0
1
2
taxdb
0
0
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Convert taxon names to TaxIds
0
1
2
taxdb
0
0
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Standardise and merge two or more taxonomic profiles into a single table
0
1
profiler
format
taxonomy
samplesheet
0
0
TAXonomic Profile Aggregation and STAndardisation
Standardise the output of a wide range of taxonomic profilers
0
1
profiler
format
taxonomy
0
0
TAXonomic Profile Aggregation and STAndardisation
A tool to detect resistance and lineages of M. tuberculosis genomes
0
1
0
0
0
0
0
0
Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data
Aligns sequences using T_COFFEE
0
1
0
1
0
1
2
compress
0
0
0
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Compares 2 alternative MSAs to evaluate them.
0
1
2
0
0
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Computes a consensus alignment using T_COFFEE
0
1
0
1
compress
0
0
0
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats the header of PDB files with t-coffee
0
1
0
0
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Computes the irmsd score for a given alignment and the structures.
0
1
0
1
2
0
0
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package
0
1
0
1
0
1
2
compress
0
0
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats files with t-coffee
0
1
0
0
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
0
1
0
1
0
0
0
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Parses a Thermo RAW file containing mass spectra to an open file format
0
1
0
0
Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle
0
1
0
0
0
0
Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.
Computes the coverage of different regions from the bam file.
0
1
0
1
0
0
0
TIDDIT - structural variant calling.
Identify chromosomal rearrangements.
0
1
2
0
1
0
1
0
0
0
Search for structural variants.
tidk explore
attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
0
1
0
0
0
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Searches a genome for a telomere string such as TTAGGG
0
1
string
0
0
0
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA
0
1
0
1
0
1
0
1
vcf_output
0
0
0
0
0
This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.
A post sequencing QC tool for Oxford Nanopore sequencers
0
1
0
0
0
0
0
TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.
0
1
0
0
0
0
0
0
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf
0
1
fold
0
0
0
0
0
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Tandem repeat genotyping from PacBio HiFi data
0
1
2
3
0
1
0
1
0
1
0
0
0
Tandem repeat genotyping and visualization from PacBio HiFi data
Merge TRGT VCFs from multiple samples
0
1
2
0
1
0
1
0
0
Tandem repeat genotyping and visualization from PacBio HiFi data
Trim FastQ files using Trim Galore!
0
1
0
0
0
0
0
0
Performs quality and adapter trimming on paired end and single end reads
0
1
0
0
0
0
0
0
Assembles a de novo transcriptome from RNAseq reads
0
1
0
0
0
Detection of tRNA sequences using covariance models
0
1
0
0
0
0
0
0
0
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
0
1
2
3
4
5
0
1
0
1
0
0
0
0
0
0
0
0
0
0
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
0
1
0
0
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
0
1
0
0
Structural variant comparison tool for VCFs
Subsample a long-read sequencing fastq file for multiple assemblies
0
1
0
0
Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Transcript Selector for BRAKER TSEBRA combines gene predictions by selecting transcripts based on their extrisic evidence support
0
1
hints_files
keep_gtfs
config
0
0
0
Import transcript-level abundances and estimated counts for gene-level analysis packages
0
1
0
1
quant_type
0
0
0
0
0
0
0
0
0
Remove lines from bed file that refer to off-chromosome locations.
0
1
sizes
0
0
Remove lines from bed file that refer to off-chromosome locations.
Convert a bedGraph file to bigWig format.
0
1
sizes
0
0
Convert a bedGraph file to bigWig format.
Convert file from bed to bigBed format
0
1
sizes
autosql
0
0
Convert file from bed to bigBed format
compute average score of bigwig over bed file
0
1
bigwig
0
0
Compute average score of big wig over each bed, which may have introns.
compute average score of bigwig over bed file
0
1
0
0
0
Convert GTF files to GenePred format
convert between genome builds
0
1
chain
0
0
0
Move annotations from one assembly to another
Convert ascii format wig file to binary big wig format
0
1
sizes
0
0
Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format
Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.
0
1
barcode_file
0
0
0
0
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
mode
0
0
0
0
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
get_output_stats
0
0
0
0
0
0
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
0
1
0
0
0
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
0
1
2
create_bam
get_group_info
0
0
0
0
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Make the output from umi_tools dedup or group compatible with RSEM
0
1
2
0
0
0
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data
0
1
0
1
technology
0
0
Unzip ZIP archive files
0
1
0
0
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Simple software to call UPD regions from germline exome/wgs trios.
0
1
0
0
Runs a differential expression analysis with dream() from variancePartition R package
0
1
2
3
4
5
0
1
2
0
0
0
Differential expression for repeated measures
Filtering, downsampling and profiling alignments in BAM/CRAM formats
0
1
0
0
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
0
1
2
scenario
scenario_sample_name
0
0
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
0
1
0
1
0
1
0
0
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
0
1
2
3
4
0
1
0
1
0
0
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Convert VCF with structural variations to CytoSure format
0
1
0
1
0
1
0
1
blacklist_bed
0
0
If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields
0
1
2
0
0
Command-line tools for manipulating VCF files
Command line tools for parsing and manipulating VCF files.
0
1
2
0
0
Command line tools for parsing and manipulating VCF files.
Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.
0
1
2
0
0
Command-line tools for manipulating VCF files
List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.
0
1
2
0
0
Command-line tools for manipulating VCF files
The align command performs pairwise sequence alignments of viral genomes and provides similarity measures like ANI and coverage (alignment fraction)
0
1
0
1
save_alignment
0
0
0
0
Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.
Vclust cluster performs threshold-based clustering by assigning a genome sequence to a cluster if its similarity (e.g., ANI) to the cluster meets or exceeds a user-defined threshold.
0
1
0
1
metric
tani
gani
ani
0
0
0
"Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes."
The prefilter command creates a pre-alignment filter that reduces the number of genome pairs to be aligned by filtering out dissimilar sequences before the alignment step.
0
1
0
0
Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.
Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use
Path(resolve_path=True)
and breaks the nextflow logic of symbolic links.
If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME]
it will skip the samtools sort step.
Cellsorted bam file should be cell sorted with:
samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam
See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:
withName: SAMTOOLS_SORT {
ext.prefix = { "cellsorted_${bam.baseName}" }
ext.args = '-t CB -O BAM'
}
Optional mask must be passed with ext.args
and option --mask
This is why I need to stage in the work dir 2 bam files (cellsorted and original).
See also velocyto tutorial
0
1
2
3
gtf
0
0
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
refvcf
0
0
0
0
0
0
0
0
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
1
2
refvcf
references
0
0
0
0
0
0
0
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
0
1
2
3
0
1
0
1
0
0
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
0
1
pb
gbwt
0
0
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
0
1
0
0
0
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
calculate secondary structures of two RNAs with dimerization
0
1
0
0
0
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
0
1
0
0
0
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
fasta
0
0
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Use vireo to perform donor deconvolution for multiplexed scRNA-seq data
0
1
2
3
4
0
0
0
0
0
The module prepares the specification JSON file for Vizgen's post-processing tool cell segmentation workflow.
0
1
2
algorithm_json
images_regex
0
0
Vizgen's post-processing tool
The module runs the segmentation algorithm on a specific tile using Vizgen's post-processing tool.
0
1
2
3
algorithm_json
custom_weights
0
0
Vizgen's post-processing tool
Extracting sequences that were unbinnned by vRhyme into a FASTA file
0
1
0
1
0
0
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
0
1
0
0
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).
0
1
0
0
0
0
A versatile open source tool for metagenomics (USEARCH alternative)
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.
0
1
0
0
0
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
0
1
db
0
0
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
0
1
sort_arg
0
0
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
0
1
db
idcutoff
outoption
user_columns
0
0
0
0
0
0
0
0
0
0
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
0
1
2
0
0
A tool set for short variant discovery in genetic sequence data
Decomposes biallelic block substitutions into its constituent SNPs.
0
1
2
3
0
0
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
0
1
2
3
0
1
0
1
0
0
0
A tool set for short variant discovery in genetic sequence data
The VueGen nf-core module is designed to automate report generation from outputs produced by other modules, subworkflows, or pipelines. The module integrates the VueGen Python library and customizes it for compatibility with the Nextflow environment. VueGen automates the creation of reports from bioinformatics outputs, supporting formats like PDF, HTML, DOCX, ODT, PPTX, Reveal.js, Jupyter notebooks, and Streamlit web applications.
input_type
input_path
report_type
0
0
a pangenome-scale aligner
0
1
2
3
4
query_self
fasta_query_list
0
0
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
0
1
2
fasta
fasta_fai
0
0
0
0
Masks out highly repetitive DNA sequences with low complexity in a genome
0
1
0
0
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to generate frequency counts of repetitive units.
0
1
0
0
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to take a counts file and creates a file of genomic co-ordinates to be masked.
0
1
0
1
0
0
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter
0
1
0
0
A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter.
A tool of the wipertools suite that splits FASTQ files into chunks
0
1
num_splits
0
0
A tool of the wipertools suite that splits FASTQ files into chunks.
A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files
0
1
0
0
0
A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files.
A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper
0
1
0
0
A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper.
Convert and filter aligned reads to .npz
0
1
2
0
1
0
1
0
0
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase
0
1
0
1
0
0
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Create a new reference using healthy reference samples
0
1
0
0
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Find copy number aberrations
0
1
0
1
0
1
0
0
0
0
0
0
0
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
A large variant benchmarking tool analogous to hap.py for small variants.
0
1
2
3
4
0
0
0
0
The xeniumranger import-segmentation module allows you to specify 2D nuclei and/or cell segmentation results for assigning transcripts to cells and recalculate all Xenium Onboard Analysis (XOA) outputs that depend on segmentation. Segmentation results can be generated by community-developed tools or prior Xenium segmentation result.
0
1
expansion_distance
coordinate_transform
nuclei
cells
transcript_assignment
viz_polygons
0
0
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.
0
1
gene_panel
0
0
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.
0
1
region_name
cassette_name
0
0
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.
0
1
expansion_distance
dapi_filter
boundary_stain
interior_stain
0
0
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
Compresses files with xz.
0
1
0
0
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Decompresses files with xz.
0
1
0
0
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Align reads to a reference genome using YARA
0
1
0
1
0
0
0
Yara is an exact tool for aligning DNA sequencing reads to reference genomes.
Compress file lists to produce ZIP archive files
0
1
0
0
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Click here to trigger an update.