Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Assembly Based ReAligner for next-generation sequencing data
0120101010101
0 0 0 
Screen assemblies for antimicrobial resistance against multiple databases
01databasedir
0 0 
Mass screening of contigs for antibiotic resistance genes
Screen assemblies for antimicrobial resistance against multiple databases
01
0 0 
Mass screening of contigs for antibiotic resistance genes
A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes
01
0 0 0 0 0 0 
A pipeline for running AMRfinderPlus and collating results into functional classes
Trim sequencing adapters and collapse overlapping reads
01adapterlist
0 0 0 0 0 0 0 0 
Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.
01
0 0 
ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).
0123K
0 0 0 
Read CEL files into an ExpressionSet and generate a matrix
01201
0 0 0 0 
Methods for Affymetrix Oligonucleotide Arrays
Takes a bed12 file and converts to a GFF3 file
01
0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Takes a GFF3 file and converts to a bed12 file
01
0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Converts a GFF/GTF file into a proper GTF file
01
0 0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Converts a GFF/GTF file into a TSV file
01
0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file
01
0 0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Add intron features to gtf/gff file without intron features.
01config
0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.
01config
0 0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.
01kill_listconfig
0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script flags the short introns with the attribute 
01config
0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.
01config
0 0 
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Provides different type of statistics in text format from a GFF/GTF annotation file
01
0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Provides basic statistics in text format from a GFF/GTF annotation file
01
0 0 
AGAT is a toolkit for manipulation and getting information from GFF/GTF annotation files
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants
01
0 0 0 
A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.
summary_file
0 0 0 0 
A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.
A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.
summaries
0 0 0 
This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.
A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.
01faa_inputgbk_inputopt_amp_dbopt_amp_db_diropt_interproscan
0 0 0 0 0 0 0 0 0 0 0 0 
A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.
A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.
01modelmin_lengthmin_probability
0 0 0 
AMPlify is an attentive deep learning model for antimicrobial peptide prediction.
01model_dir
0 0 
Attentive deep learning model for antimicrobial peptide prediction
Post-processing script of the MaltExtract component of the HOPS package
maltextract_resultstaxon_listfilter
0 0 0 0 0 
Identify antimicrobial resistance in gene or protein sequences
01db
0 0 0 0 0 
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Identify antimicrobial resistance in gene or protein sequences
NO input
0 0 
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
A module to create antiberta2 embeddings of antibody (BCR) amino acid sequences using amulety.
01chain
0 0 
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create antiberty embeddings of antibody (BCR) amino acid sequences using amulety.
01chain
0 0 
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create BALM paired embeddings of antibody (BCR) amino acid sequences using amulety.
01chain
0 0 
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to create esm2 embeddings of antibody (BCR) amino acid sequences using amulety.
01chain
0 0 
Python package to create embeddings of BCR and TCR amino acid sequences.
A module to translate BCR and TCR nucleotide sequences into amino acid sequences using amulety and igblast.
01reference_igblast
0 0 
Python package to create embeddings of BCR and TCR amino acid sequences.
A tool for immunoglobulin (IG, BCR) and T cell receptor (TCR) V domain sequences blasting.
A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.
0101
0 0 
ANGSD: Analysis of next generation Sequencing Data
Calculates base frequency statistics across reference positions from BAM.
0123
0 0 0 0 0 0 0 
ANGSD: Analysis of next generation Sequencing Data
Calculated genotype likelihoods from BAM files.
010101
0 0 
ANGSD: Analysis of next generation Sequencing Data
Module to subset AnnData object to cells with matching barcodes from the csv file
012
0 0 
Get the size (n_cells or n_genes) of an anndata object stored as a h5ad file
01size_type
0 0 
An annotated data matrix.
Accelerating de novo SINE annotation in plant and animal genomes
01mode
0 0 0 
Annotation and Ranking of Structural Variation
012301010101
0 0 0 0 
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
0 0 
Annotation and Ranking of Structural Variation
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
0123012
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
01databasesgff
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
0 0 
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
01databasesantismash_dir
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
database_cssdatabase_detectiondatabase_modules
0 0 0 
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.
01
0 0 0 0 0 0 
arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).
01tooldb
0 0 
Download and prepare database for Ariba analysis
01
0 0 
ARIBA: Antibiotic Resistance Identification By Assembly
Query input FASTQs against Ariba formatted databases
0101
0 0 
ARIBA: Antibiotic Resistance Identification By Assembly
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
010101blacklistknown_fusionscytobandsprotein_domains
0 0 0 
Fast and accurate gene fusion detection from RNA-Seq data
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
genome
0 0 0 0 0 
Fast and accurate gene fusion detection from RNA-Seq data
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
0123010101
0 0 
Fast and accurate gene fusion detection from RNA-Seq data
Simulation tool to generate synthetic Illumina next-generation sequencing reads
01sequencing_systemfold_coverageread_length
0 0 0 0 
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
Standalone version of fieldbioinformatics aligntrim. Soft clips amplicon scheme primer sites in BAM/SAM files.
0123sort_bam
0 0 0 0 
ARTIC align_trim: A tool for trimming amplicon sequencing primers from aligned reads.
Aggregates fastq files with demultiplexed reads
01
0 0 
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
01012012
0 0 0 0 0 0 0 0 0 0 0 0 
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
copy number profiles of tumour cells.
01234allele_filesloci_filesbed_filefastagc_filert_file
0 0 0 0 0 0 0 0 0 
Alignment by Simultaneous Harmonization of Layer/Adjacency Registration
01opt_dfpopt_ffp
0 0 
ataqv function of a corresponding ataqv tool
0123organismmito_nametss_fileexcl_regs_fileautosom_ref_file
0 0 0 
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
mkarv function of a corresponding ataqv tool
jsons/*
0 0 
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
generate VCF file from a BAM file using various calling methods
01234fastafaiknown_allelesmethod
0 0 
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Estimate the post-mortem damage patterns of DNA
0123fastafai
0 0 0 0 0 
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Gives an estimation of the sequencing bias based on known invariant sites
01234allelesinvariant_sites
0 0 
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
split single end read groups by length and merge paired end reads
01234
0 0 0 
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Generate tables of feature metadata from GTF files
0101
0 0 0 
Scripts for manipulating gene annotation
Use deamination patterns to estimate contamination in single-stranded libraries
010101
0 0 
Estimates present-day DNA contamination in ancient DNA single-stranded libraries.
Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images.
0101
0 0 0 
Annotation of bacterial genomes (isolates, MAGs) and plasmids
01dbproteinsprodigal_tfregionshmms
0 0 0 0 0 0 0 0 0 0 0 
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Downloads BAKTA database from Zenodo
NO input
0 0 
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data
012
0 0 
Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files
removes unused references from header of sorted BAM/CRAM files.
01
0 0 
This module is used to clip primer sequences from your alignments.
0123
0 0 0 
bam-readcount is a utility that runs on a BAM or CRAM file and generates low-level information about sequencing data at specific nucleotide positions. Its outputs include observed bases, readcounts, summarized mapping and base qualities, strandedness information, mismatch counts, and position within the reads.
012referencebed
0 0 
write your description here
01
0 0 
A command line tool to compute mapping statistics from a BAM file
Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis
01
0 0 
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
01
0 0 
C++ API & command-line toolkit for working with BAM data
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
01
0 0 
C++ API & command-line toolkit for working with BAM data
clips overlapping read pairs. When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.
01
0 0 0 
Programs that perform operations on SAM/BAM files, all built into a single executable, bam.
Render an assembly graph in GFA 1.0 format to PNG and SVG image formats
01
0 0 0 
Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
Demultiplex Element Biosciences bases files
012
0 0 0 0 0 0 0 0 
BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.
01
0 0 
Adapter and quality trimming of sequencing reads
01contaminants
0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Merging overlapping paired reads into a single read.
01interleave
0 0 0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.
01
0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates
01
0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Filter out sequences by sequence header name(s)
01names_to_filteroutput_formatinterleaved_output
0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.
fasta
0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.
01
0 0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
Re-pairs reads that became disordered or had some mates eliminated.
01interleave
0 0 0 0 
Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated tools.
Compares query sketches to reference sketches hosted on a remote server via the Internet.
01
0 0 
BBMap is a short read aligner, as well as various other bioinformatic tools.
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
012regionstargetssamples
0 0 0 0 
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Concatenate VCF files
012
0 0 0 0 
Concatenate VCF files.
Compresses VCF files
01234
0 0 
Create consensus sequence by applying VCF variants to a reference fasta file.
Converts certain output formats to VCF
01201bed
0 0 0 0 0 0 0 0 0 0 
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
bcftools Haplotype-aware consequence caller
01010101
0 0 0 0 
Haplotype-aware consequence caller
Filters VCF files
012
0 0 0 0 
Apply fixed-threshold filters to VCF files.
Index VCF tools
01
0 0 0 
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Apply set operations to VCF files
012
0 0 
Computes intersections, unions and complements of VCF files.
Compresses VCF files
01201save_mpileup
0 0 0 0 0 
Generates genotype likelihoods at each genomic position with coverage.
Normalize VCF file
01201
0 0 0 0 
Normalize VCF files.
Compute and fill various INFO tags
012regionstargetssamples
0 0 0 0 
Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.
012regionstargets
0 0 0 0 
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available
Split VCF by chunks or regions, creating multiple VCFs.
012sites_per_chunkscatterscatter_fileregionstargets
0 0 0 0 
Split VCF by chunks or regions, creating multiple VCFs.
Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
012target_gtnew_gtregionstargets
0 0 0 0 
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.
Split VCF by sample, creating single- or multi-sample VCFs.
012samplesgroupsregionstargets
0 0 0 0 
Split VCF by sample, creating single- or multi-sample VCFs.
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
012regionstargets
0 0 0 0 
Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.
Extracts fields from VCF or BCF files and outputs them in user-defined format.
012regionstargetssamples
0 0 
Extracts fields from VCF or BCF files and outputs them in user-defined format.
Reheader a VCF file
012301
0 0 0 
Modify header of VCF/BCF files, change sample names.
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
01201genetic_mapregions_filesamples_filetargets_file
0 0 
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
Sorts VCF files
01
0 0 0 0 
Sort VCF files by coordinates.
Split a vcf file into files per chromosome
012
0 0 
Sort VCF files by coordinates.
Generates stats from VCF files
0120101010101
0 0 
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
012regionstargetssamples
0 0 0 0 
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Beagle v5.5 is a software package for phasing genotypes and for imputing ungenotyped markers.
01refpanelgenmapexclsamplesexclmarkers
0 0 0 
Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
Convert a BED file to a VCF file according to a YAML config
01201
0 0 
Convert BAM/GFF/GTF/GVF/PSL files to bed
01
0 0 
High-performance genomic feature operations.
Convert gtf format to bed format
01
0 0 
The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.
Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.
01sizes
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.
012sizesextensionsort
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
extract sequences in a FASTA file based on intervals defined in a feature file.
01fasta
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.
01summary_col
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Allows one to screen for overlaps between two sets of genomic features.
01201
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Calculate Jaccard statistic b/w two feature files.
01201
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Makes adjacent or sliding windows across a genome or BED file.
01
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
masks sequences in a FASTA file based on intervals defined in a feature file.
01fasta
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
01
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.
01chrom_sizes
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Profiles the nucleotide content of intervals in a fasta file.
012
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
bedtools shuffle will randomly permute the genomic locations of a feature file among a genome defined in a genome file
0101exclude_fileinclude_file
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)
01sizes
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Sorts a feature file by chromosome and other criteria.
01genome_file
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Split BED files into several smaller BED files
012
0 0 
A powerful toolset for genome arithmetic
Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.
012
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Combines multiple BedGraph files into a single file
0101
0 0 
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Generating cell hashing calls from a matrix of count data.
0123
0 0 0 0 
Extract OME xml data from OME-tif
01
0 0 
Suite of tools to handle several imaging protocols.
Locate and tag duplicate reads in a BAM file
01
0 0 0 
biobambam is a set of tools for early stage alignment file processing.
Merge a list of sorted bam files
01
0 0 0 0 
biobambam is a set of tools for early stage alignment file processing.
Parallel sorting and duplicate marking
0101
0 0 0 0 0 
biobambam is a set of tools for early stage alignment file processing.
Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.
01
0 0 
Use k-mers to rapidly subtype S. enterica genomes
01scheme_metadata
0 0 0 0 
Convert biom table to different format. Conversion between text tab-delimited, BIOM-v1 (JSON), and BIOM-v2 (HDF5) formats are supported
01
0 0 0 
Biological Observation Matrix (BIOM) format. This package includes basic tools for converting, summarizing, and adding metadata to biom-format files.
Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.
010101
0 0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
010101
0 0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Summarize and/or filter reads based on bisulfite conversion rate
01010101
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.
0101010101
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Indexes a reference genome for use with Biscuit
01
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Merges methylation information for opposite-strand C's in a CpG context
010101
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants
012340101
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Perform basic quality control on a BAM file generated with Biscuit
010101
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.
01
0 0 
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Performs alignment of BS-Seq reads using bismark
010101
0 0 0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Relates methylation calls back to genomic cytosine contexts.
010101
0 0 0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Removes alignments to the same position in the genome from the Bismark mapping output.
01
0 0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.
01
0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Extracts methylation information for individual cytosines from alignments.
0101
0 0 0 0 0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Collects bismark alignment reports
01234
0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.
bamalign_reportdedup_reportsplitting_reportmbias
0 0 
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Retrieve entries from a BLAST database
01201
0 0 0 
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0101taxidlisttaxidsnegative_tax
0 0 
BLAST finds regions of similarity between biological sequences.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
0101out_ext
0 0 0 0 
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
Builds a BLAST database
01
0 0 
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
0101
0 0 
Protein to Translated Nucleotide BLAST.
Downloads a BLAST database from NCBI
01
0 0 
BLAST finds regions of similarity between biological sequences.
Creates a bed file containing the depth of data at intervals of an aligned bam.
012
0 0 
BlobTk contains a set of core functions used by BlobToolKit tools. Implemented in Rust, these functions are intended to be accessible from the command line, as python modules and will include web assembly code for use in javascript.
Creates differing styles of blobplots depending on provided arguments.
01local_pathonline_pathextra_args
0 0 
BlobTk contains a set of core functions used by BlobToolKit tools. Implemented in Rust, these functions are intended to be accessible from the command line, as python modules and will include web assembly code for use in javascript.
Create bowtie index for reference genome
01
0 0 
bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.
01database
0 0 0 
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Extends a Kraken2 database to be compatible with Bracken
01
0 0 0 
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Combine output of metagenomic samples analyzed by bracken.
01
0 0 
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Benchmarking Universal Single Copy Orthologs
metafastamodelineagebusco_lineages_pathconfig_file
meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_proteins versions 
Benchmarking Universal Single Copy Orthologs
01modelineagebusco_lineages_pathconfig_fileclean_intermediates
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Download database for BUSCO
lineage
0 0 
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
BUSCO plot generation tool
short_summary_txt
0 0 
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Construct species phylogenies using BUSCO proteins
01
0 0 0 
Construct species phylogenies using BUSCO proteins
Create BWA-mem2 index for reference genome
01
0 0 
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create BWA-MEME index for reference genome
01
0 0 
Faster BWA-MEM2 using learned-index
Performs alignment of BS-Seq reads using bwameth
010101
0 0 
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Performs indexing of c2t converted reference genome
01use_mem2
0 0 
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
A module for concatenation of gzipped or uncompressed files
01
0 0 
Just concatenation
Concatenates fastq files
01
0 0 
The cat utility reads files sequentially, writing them to the standard output.
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0101
0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0101010101bin_suffix
0 0 0 0 0 0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0101010101
0 0 0 0 0 0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Downloads the required files for either Nr or GTDB for building into a CAT database
01
0 0 0 0 0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0101010101mode01010101010101
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Summarises results from CAT/BAT/RAT classification steps
0101
0 0 
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Cluster protein sequences using sequence similarity
01
0 0 0 
Clusters and compares protein or nucleotide sequences
Cluster nucleotide sequences using sequence similarity
01
0 0 0 
Clusters and compares protein or nucleotide sequences
Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth
01signaturehigh_thresholdslow_thresholds
0 0 0 
Module to use CellBender to remove ambient RNA from single-cell RNA-seq data
0123
0 0 
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data
01
0 0 0 0 0 0 0 0 0 0 
CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.
01reference
0 0 
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.
012
0 0 0 0 0 0 0 
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.
gtf
0 0 
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.
fastagtfreference_name
0 0 
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.
fastagtfseqsreference_name
0 0 
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.
meta010101010101gex_referencegex_frna_probesetgex_targetpanelvdj_referencevdj_primer_indexfb_referencebeam_antigen_panelbeam_control_panelcmo_referencecmo_barcodescmo_barcode_assignmentfrna_sampleinfoocm_barcodesskip_renaming
0 0 0 
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.
01reference
0 0 
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.
0123reference
0 0 0 
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.
01csv
0 0 
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.
gtf
0 0 
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.
fastagtfmotifsreference_configreference_name
0 0 0 
Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.
Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.
01reference
0 0 
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.
bclcsv
0 0 
Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.
fastagtfmotifsreference_configreference_name
0 0 
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.
01234
0 0 0 0 0 0 0 
Efficient genotyping bi-allelic SNPs on single cells
Build centrifuge database for taxonomic profiling
01conversion_tabletaxonomy_treename_tablesize_table
0 0 
Classifier for metagenomic sequences
Classifies metagenomic sequence data
01dbsave_unalignedsave_aligned
0 0 0 0 0 0 
Centrifuge is a classifier for metagenomic sequences.
Creates Kraken-style reports from centrifuge out files
01db
0 0 
Centrifuge is a classifier for metagenomic sequences.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
01fasta_extdb
0 0 0 0 
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
0123exclude_marker_file
0 0 0 
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM2 database download
db_zenodo_id
0 0 
CheckM2 - Rapid assessment of genome bin quality using machine learning
CheckM2 bin quality prediction
0101
0 0 0 
CheckM2 - Rapid assessment of genome bin quality using machine learning
Construct the database necessary for checkv's quality assessment
NO input
0 0 
Assess the quality of metagenome-assembled viral genomes.
Assess the quality of metagenome-assembled viral genomes.
01db
0 0 0 0 0 0 0 
Assess the quality of metagenome-assembled viral genomes.
Construct the database necessary for checkv's quality assessment
01db
0 0 
Assess the quality of metagenome-assembled viral genomes.
Determine the allelic profiles of a genome using a pre-defined schema
0101
0 0 0 0 0 0 0 0 0 0 
A complete suite for gene-by-gene schema creation and strain identification.
Create a schema to determine the allelic profiles of a genome
01prodigal_tfcds
0 0 0 0 
A complete suite for gene-by-gene schema creation and strain identification.
Filter and trim long read data.
01fasta
0 0 
zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.
Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).
Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.
010101barcodeswhitelistchr_orderpairs_chr_order
0 0 0 0 0 
Fast alignment and preprocessing of chromatin profiles
Indexes a fasta reference genome ready for chromatin profiling.
01
0 0 
Fast alignment and preprocessing of chromatin profiles
Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.
01010101010101
0 0 
Annotate circRNAs detected in the output from CIRCexplorer2 parse
01fastagene_annotation
0 0 
Circular RNA analysis toolkits
CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.
01
0 0 
Circular RNA analysis toolkit
A method to improve mappings on circular genomes, using the BWA mapper.
010101
0 0 0 
Creating a modified reference genome, with an elongation of the an specified amount of bases
Realign reads mapped with BWA to elongated reference genome
01010101
0 0 
A method to improve mappings on circular genomes such as Mitochondria.
Predict recomination events in bacterial genomes
012
0 0 0 0 0 0 0 
Align sequences using Clustal Omega
0101hmm_inhmm_batchprofile1profile2compress
0 0 
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Parallel implementation of the gzip algorithm.
Renders a guidetree in clustalo
01
0 0 
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Efficient phylogenetic tree reconstruction for sequences using the CMAPLE algorithm
012
0 0 0 
Calculates polymorphic site rates over protein coding genes
01234
0 0 
Set of utilities on sequences and BAM files
Quality control of copy number data from bulk WGS assays
0123
0 0 0 0 0 0 
Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.
0101
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Derive off-target (“antitarget”) bins from target regions.
01
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
01201010101panel_of_normals
0 0 0 0 0 0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number
012
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.
01
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
012
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Compile a coverage reference from the given files (normal samples).
fastatargetsantitargets
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Transform bait intervals into targets more suitable for CNVkit.
0101
0 0 
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.
012010101
0 0 0 
Tool for calling copy number variations.
convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.
01
0 0 
Tool for calling copy number variations.
Command line tool for calling CNVs in whole genome sequencing data
01bin_sizes
0 0 
calling CNVs using read depth
calculates read depth histograms
01bin_sizes
0 0 
calling CNVs using read depth
command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.
012fastafai
0 0 
calling CNVs using read depth
Calculate segmentation for specified bin size (multiple bin sizes separate by space)
01bin_sizes
0 0 
Calling CNVs using read depth
view function to generate vcfs
01bin_sizesoutput_format
0 0 0 0 
calling CNVs using read depth
Builds a classic bloom filter COBS index
01
0 0 
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Builds a compact bloom filter COBS index
01
0 0 
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Effective binning of metagenomic contigs using COntrastive Multi-viEw representation learning
012
0 0 0 0 0 0 
COMEBin allows effective binning of metagenomic contigs using COntrastive Multi-viEw representation learning
Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples
012
0 0 0 0 0 0 0 
Clustering cONtigs with COverage and ComposiTion
Generate the input coverage table for CONCOCT using a BEDFile
0123
0 0 
Clustering cONtigs with COverage and ComposiTion
Calculate confidence scores from Kraken2 output
01kraken_taxon_db
0 0 
Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC
012
0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Copy number and genotype annotation from whole genome and whole exome sequencing data
0123456fastafaisnp_positionknown_snpsknown_snps_tbichr_directorymappabilitytarget_bedgccontent_profile
0 0 0 0 0 0 0 0 0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
01
0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Format Freec output to circos input format
01
0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
0123
0 0 0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
012
0 0 0 0 
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Parses rastair call output and converts it into a MethylKit-compatible format.
01
0 0 
A tool for rapid genome-wide assessment of C->T conversion as a readout for methylation. The module uses a script bundled with Rastair to convert formats.
Run matrix balancing on a cool file
012
0 0 
Sparse binary format for genomic interaction matrices
Create a cooler from genomic pairs and bins
01201modecool_bin
0 0 
Sparse binary format for genomic interaction matrices
Generate fragment-delimited genomic bins
fastachromsizesenzyme
0 0 
Sparse binary format for genomic interaction matrices
Dump a cooler’s data to a text stream.
012
0 0 
Sparse binary format for genomic interaction matrices
Generate fixed-width genomic bins
012
0 0 
Sparse binary format for genomic interaction matrices
Merge multiple coolers with identical axes
01
0 0 
Sparse binary format for genomic interaction matrices
Generate a multi-resolution cooler file by coarsening
01
0 0 
Sparse binary format for genomic interaction matrices
Calculate the diamond insulation scores and call insulating boundaries
01
0 0 0 
Analysis tools for genomic interaction data stored in .cool format
Calculates peak-to-through ratio (PTR) from metagenomic sequence data
01
0 0 
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Computes the coverage map along the reference genome
01
0 0 
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Indexes a directory of fasta files for use with CoPTR
01
0 0 
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Merge reads that were mapped to multiple indices
01
0 0 
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.
01
0 0 0 0 0 
Map reads to contigs and estimate coverage
0101bam_inputinterleaved
0 0 
CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications
Calculate read coverage per-genome
0101bam_inputinterleavedref_mode
0 0 
CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications
In-house generated or curated data can be imported into CRABS.
01010101import_format
0 0 
Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses
CRABS extracts the amplicon region of the primer set by conducting an in silico PCR.
01
0 0 
Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses
Decompress files with crabz
01
0 0 
Like pigz, but rust
Quality assessment of long-read bam files using cramino.
012
0 0 0 
remove false positives of functional crispr genomics due to CNVs
012min_readsmin_targeted_genes
0 0 
Analysis of CRISPR functional genomics, remove false positive due to CNVs.
A software pipeline for the analysis of genome editing outcomes from deep sequencing data
01
0 0 0 0 
Concatenate two or more CSV (or TSV) tables into a single table
01in_formatout_format
0 0 
A cross-platform, efficient, practical CSV/TSV toolkit
Join two or more CSV (or TSV) tables by selected fields into a single table
01
0 0 
A cross-platform, efficient, practical CSV/TSV toolkit
Sort CSV (or TSV) tables
01in_formatout_format
0 0 
A cross-platform, efficient, practical CSV/TSV toolkit
Splits CSV/TSV into multiple files according to column values
01in_formatout_format
0 0 
CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.
Reference preparation for CTAT-splicing
01cancer_intron_tsv
0 0 
Detection and annotation of cancer splicing aberrations
Detection and annotation of aberrant splicing isoforms in cancer transcriptomes
0123401
0 0 0 0 0 0 0 0 0 0 0 0 
Detection and annotation of cancer splicing aberrations
Clone trees for Cancer Evolution studies from bulk sequencing data.
01
0 0 0 0 0 0 
Annotate a VEP annotated VCF with the most severe consequence field
0101
0 0 
Custom module to annotate a VEP annotated VCF with the most severe consequence field
Annotate a VEP annotated VCF with the most severe pLi field
01
0 0 
Custom module to annotate a VEP annotated VCF with the most severe pLi field
Custom module to Add a new fasta file to an old one and update an associated GTF
01201biotype
0 0 0 
Custom module to Add a new fasta file to an old one and update an associated GTF
Custom module used to dump software versions within the nf-core pipeline template
versions
0 0 0 
Custom module used to dump software versions within the nf-core pipeline template
Filters a differential expression table based on logFC and adjusted p-value thresholds
01012012
0 0 0 0 
Python library for data manipulation and analysis
Generates a FASTA file of chromosome sizes and a fasta index file
01
0 0 0 0 
Tools for dealing with SAM, BAM and CRAM files
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
0101
0 0 
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
filter a matrix based on a minimum value and numbers of samples that must pass.
0101
0 0 0 0 
filter a matrix based on a minimum value and numbers of samples
Test for the presence of suitable NCBI settings or create them on the fly.
ids
0 0 
SRA Toolkit and SDK from NCBI
Make a GSEA class file (.chip) from tabular inputs
0101
0 0 
Make a GSEA annotation file (.chip) from tabular inputs
Make a GSEA class file (.cls) from tabular inputs
01
0 0 
Make a GSEA class file (.cls) from tabular inputs
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
01
0 0 
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.
0101quant_typeidextra
0 0 
"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"
Perform adapter/quality trimming on sequencing reads
01
0 0 0 
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage
01fastafaispecieslist
0 0 
DAS Tool binning step.
0123db_directory
0 0 0 0 0 0 0 0 0 0 0 0 0 
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
01extension
0 0 
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
01extension
0 0 
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.
012
0 0 
Create deacon index for reference genome
01
0 0 
Fast alignment-free sequence filter
decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
010101
0 0 0 0 
decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
010101
0 0 0 0 
Ensemble of methods to infer biological activities from omics data
DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).
01
0 0 0 0 0 
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
NO input
0 0 
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
012db
0 0 0 0 0 
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.
NO input
0 0 
DeepBGC - Biosynthetic Gene Cluster detection and classification
DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.
01db
0 0 0 0 0 0 0 0 0 0 0 0 
DeepBGC - Biosynthetic Gene Cluster detection and classification
Deepcell/mesmer segmentation for whole-cell
0101
0 0 
Deep cell is a collection of tools to segment imaging data
DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.
0123401010101
0 0 0 0 0 
A Deep Learning Model for Transmembrane Topology Prediction and Classification
01
0 0 0 0 0 0 
This tool filters alignments in a BAM/CRAM file according the the specified parameters.
012
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
012fastafasta_fai01
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
calculates scores per genome regions for other deeptools plotting utilities
01bed
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Computes read coverage for genomic regions (bins) across the entire genome.
012301
0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.
01methodplot_type
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots cumulative reads coverages by BAM file
012
0 0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a heatmap
01
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.
01
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a profile plot
01
0 0 0 
A set of user-friendly tools for normalization and visualization of deep-sequencing data
(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
012301010101
0 0 0 0 0 
Call variants from the examples produced by make_examples
01
0 0 
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Transforms the input alignments to a format suitable for the deep neural network variant caller
012301010101
0 0 0 0 
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
01234010101
0 0 0 0 0 
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
012301010101
0 0 0 0 0 
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
01
0 0 
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Call structural variants
0123450101
0 0 0 
Structural variant discovery by integrated paired-end and split-read analysis
Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.
012output_namegenerate_gender_plotgenomegenerate_diagnostic_plots
0 0 0 
runs a differential expression analysis with DESeq2
0123450120101
0 0 0 0 0 0 0 0 0 0 
Differential gene expression analysis based on the negative binomial distribution
Queries a DIAMOND database using blastp mode
0101outfmtblast_columns
0 0 0 0 0 0 0 0 
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
0101out_extblast_columns
0 0 0 0 0 0 0 0 0 
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
01
0 0 
Accelerated BLAST compatible local sequence aligner
Builds a DIAMOND database
01taxonmaptaxonnodestaxonnames
0 0 
Accelerated BLAST compatible local sequence aligner
Generic DIA-NN module for running any DIA-NN operation including in-silico library generation, preliminary analysis, empirical library assembly, individual analysis, and final quantification
012345
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Generate in silico predicted spectral library using DIA-NN deep learning predictor. This module uses DIA-NN software for data-independent acquisition (DIA) proteomics data processing. Output materials should include attribution: "Generated using DIA-NN".
01
0 0 0 
DIA-NN is a universal software for data-independent acquisition (DIA) proteomics data processing. It uses deep learning to predict spectral libraries and perform peptide identification and quantification.
Doublet detection in single-cell RNA-seq data
01
0 0 0 
Create DRAGEN hashtable for reference genome
01
0 0 
Dragmap is the Dragen mapper/aligner Open Source Software.
Assemble bacterial isolate genomes from Nanopore reads
012
0 0 0 0 0 0 
Performs rapid genome comparisons for a group of genomes and visualize their relatedness
01
0 0 
De-replication of microbial genomes assembled from multiple samples
Dereplicates a genome set by identifying highly similar genomes and choose the best representative genome
0101
0 0 0 0 0 
De-replication of microbial genomes assembled from multiple samples
Export assembly segment sequences in GFA 1.0 format to FASTA format
01
0 0 
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
01
0 0 
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
01
0 0 
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
01
0 0 
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
01
0 0 
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Calculates secondary structure assignments from PDB files using mkdssp (DSSP). DSSP is a standard tool for assigning secondary structure to amino acids in protein structures.
01format
0 0 
Calculates secondary structure information from PDB files.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
012345fastafasta_fai
0 0 
Assessment of duplication rates in RNA-Seq datasets
0101
0 0 0 0 0 0 0 0 
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
012012
0 0 0 
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
012010101010101
0 0 0 
Structural variant caller for mapped sequencing data
Perform phasing of genotyped data with or without a reference panel
012345
0 0 
Fast genome-wide functional annotation through orthology assignment.
01eggnog_dbeggnog_data_dir01
0 0 0 0 
Convert any PEP project or Nextflow samplesheet to any format
samplesheetformat
0 0 
Convert any PEP project or Nextflow samplesheet to any format
Provide the SNP coverage of each individual in an eigenstrat formatted dataset.
0123
0 0 0 
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track
010
result bigwig versions 
Analysis tools for genomic interaction data stored in .cool format
Convert a file in FASTA format to the ELFASTA format
01
0 0 0 
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.
0123456010101run_haplotypecallerrun_bqsrbqsr_tables_onlyget_activity_profileget_assembly_regions
0 0 0 0 0 0 0 0 0 
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Merge split bam/sam chunks in one file
01
0 0 
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Split bam file into manageable chunks
01
0 0 
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.
01
0 0 
The European Molecular Biology Open Software Suite
the revseq program from emboss reverse complements a nucleotide sequence
01
0 0 
The European Molecular Biology Open Software Suite
A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.
01db
0 0 0 0 0 0 
Emu is a relative abundance estimator for 16s genomic data.
endorS.py calculates endogenous DNA from samtools flagstat files and print to screen
0123
0 0 
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.
0123
0 0 
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
01feature_file
0 0 
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.
012genomespeciescache_versioncache01extra_files
0 0 0 0 0 0 
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Searches a term in a public NCBI database
01database
0 0 
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using Unique Identifier(s)
012database
0 0 
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using an UID
01patternelementsep
0 0 
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
phylogenetic placement of query sequences in a reference tree
0123bfastfilebinaryfile
0 0 0 0 
Massively parallel phylogenetic placement of genetic sequences
splits an alignment into reference and query parts
012
0 0 0 
Massively parallel phylogenetic placement of genetic sequences
estimation of the unfolded site frequency spectrum
0123
0 0 0 
Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly
01
0 0 0 
EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.
Estimate repeat sizes using NGS data
012010101
0 0 0 0 
Merge STR profiles into a multi-sample STR profile
010101
0 0 
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Compute genome-wide STR profile
0120101
0 0 0 0 
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Run falco on sequenced reads
01
0 0 0 
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Aligns sequences using FAMSA
0101compress
0 0 
Algorithm for large-scale multiple sequence alignments
Renders a guidetree in famsa
01
0 0 
Algorithm for large-scale multiple sequence alignments
Perform adapter and quality trimming on sequencing reads with reporting
01
0 0 0 0 0 0 0 0 
tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.
01hmm_model
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
A program that counts sequence occurrences in FASTQ files.
0101
0 0 0 0 0 0 
2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.
"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."
01
0 0 0 
"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"
Quickly compute statistics over a fasta file in windows.
01
0 0 0 0 0 0 
A fast K-mer counter for high-fidelity shotgun datasets
01
0 0 0 0 
A fast K-mer counter for high-fidelity shotgun datasets
A fast K-mer counter for high-fidelity shotgun datasets
01
0 0 
A fast K-mer counter for high-fidelity shotgun datasets
A tool to merge FastK histograms
0123
0 0 0 0 
A fast K-mer counter for high-fidelity shotgun datasets
Distance-based phylogeny with FastME
012
0 0 0 0 0 
Perform adapter/quality trimming on sequencing reads
012discard_trimmed_passsave_trimmed_failsave_merged
0 0 0 0 0 0 0 
Perform adapter/quality trimming and QC on long sequencing reads (ONT, PacBio, etc.)
01adapter_fastadiscard_trimmed_passsave_trimmed_fail
0 0 0 0 0 0 
fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.
01
0 0 
Build fastq screen config file from bowtie index files
genome_namesindexes
0 0 
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Align reads to multiple reference genomes using fastq-screen
01database
0 0 0 0 0 
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Performs quality control of FASTQ files
01
0 0 
Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
01
0 0 
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Run NCBI's FCS adaptor on assembled genomes
01
0 0 0 0 0 0 
The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.
Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.
01gxdb
0 0 0 
"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."
Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to remove foreign contamination from genome assemblies
012
0 0 0 
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Fetches the NCBI FCS-GX database using a provided manifest URL
manifest
0 0 
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to screen and remove foreign contamination from genome assemblies
012gxdbramdisk_path
0 0 0 0 0 
The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
01min_readsmin_baseq
0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
01min_readsmin_baseq
0 0 
Tools for working with genomic and high throughput sequencing data.
Collects a suite of metrics to QC duplex sequencing data.
01interval_list
0 0 0 0 0 0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
Copies the UMI at the end of a bam files read name to the RX tag.
012
0 0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
01
0 0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
0101min_readsmin_baseqmax_base_error_rate
0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
01strategy
0 0 0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.
01
0 0 
Tools for working with genomic and high throughput sequencing data.
FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM
01010101
0 0 
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Add nucleosomes positions and MSP position to ONT BAM files
01
0 0 
Mitchell Vollger's rust tools for fiberseq data.
Extract Fiber-seq information (such as m6A, CpG, nucleosomes, and MSPs) from BAM file into BED file
01extract_type
0 0 
Mitchell Vollger's rust tools for fiberseq data.
Predict m6A positions using HiFi kinetics data and encode the results in the MM and ML bam tags. Also adds nucleosome (nl, ns) and MTase sensitive patches (al, as)
01
0 0 
Mitchell Vollger's rust tools for fiberseq data.
Filtlong filters long reads based on quality measures or short read data.
012
0 0 0 
A module for concatenation of gzipped or uncompressed files getting around UNIX terminal argument size
01
0 0 
GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression
pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
A module for decompressing a large number of gzipped files, getting around the UNIX terminal argument limit
01
0 0 
GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression
pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
Perform merging of mate paired-end sequencing reads
01
0 0 0 0 
De novo assembler for single molecule sequencing reads
01mode
0 0 0 0 0 0 0 
Efficient compression tool for protein structures
01
0 0 
Foldcomp: a library and format for compressing and indexing large protein structure sets
Decompression tool for foldcomp compressed structures
01
0 0 
Foldcomp: a library and format for compressing and indexing large protein structure sets
Creates a database for Foldmason.
01
0 0 
Multiple Protein Structure Alignment at Scale with FoldMason
Aligns protein structures using foldmason
0101compress
0 0 0 
Multiple Protein Structure Alignment at Scale with FoldMason
Renders a visualization report using foldmason
01010101
0 0 
Multiple Protein Structure Alignment at Scale with FoldMason
Create a database from protein structures
01
0 0 
Foldseek: fast and accurate protein structure search
Search for protein structural hits against a foldseek database of protein structures
0101
0 0 
Foldseek: fast and accurate protein structure search
Generate processing masks for a give datacube definition and area of interest. These files can be used to spatially restrict downstream analysis tasks.
aoimask/datacube-definition.prjshapefile_dbfshapefile_prjshapefile_shx
0 0 
A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.
Compute valid tiles for a given datacube definition and area of interest. This list can be used by downstream analysis tasks to limit processing to the area of interest when satellite data covers a larger region.
aoidatacube_definitionshapefile_dbfshapefile_prjshapefile_shx
0 0 
A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
meta
0 0 
fq is a library to generate and validate FASTQ file pairs.
fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.
01
0 0 
fq is a library to generate and validate FASTQ file pairs.
A haplotype-based variant detector
0123450101010101
0 0 
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
012repeatsbarcodeslineages_meta
0 0 0 
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
specify the relative abundance of each known haplotype
012barcodeslineages_meta
0 0 
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
downloads new versions of the curated SARS-CoV-2 lineage file and barcodes
db_name
0 0 0 0 
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
call variant and sequencing depth information of the variant
01fasta
0 0 
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Build references for fusioncatcher
meta
0 0 
Build genome for fusioncatcher
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data
0101
0 0 0 0 
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data
Validation of Fusion Transcript Predictions
01201
0 0 0 0 0 0 0 0 0 
fusionreport_detect
012301tools_cutoff
0 0 0 0 0 0 0 
Tool for parsing outputs from fusion detection tools
Build DB for fusionreport
NO input
0 0 
Generate an interactive summary report from fusion detection tools.
Cluster genome FASTA files by average nucleotide identity
0123
0 0 0 
Gene Allele Mutation Microbial Assessment
01db
0 0 0 0 0 
Tool for Gene Allele Mutation Microbial Assessment
Build ganon database using custom reference sequences.
01input_tsvtaxonomy_filesgenome_size_files
0 0 0 
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Classify FASTQ files against ganon database
01db
0 0 0 0 0 0 0 
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a ganon report file from the output of ganon classify
01db
0 0 
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a multi-sample report file from the output of ganon report runs
01
0 0 
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
assigns taxonomy to query sequences in phylogenetic placement output
012
0 0 0 0 0 0 0 
Genesis Applications for Phylogenetic Placement Analysis
Grafts query sequences from phylogenetic placement on the reference tree
01
0 0 
Genesis Applications for Phylogenetic Placement Analysis
colours a phylogeny with placement densities
01
0 0 0 0 0 0 0 
Genesis Applications for Phylogenetic Placement Analysis
Performs local realignment around indels to correct for mapping errors
012301010101
0 0 
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Generates a list of locations that should be considered for local realignment prior genotyping.
01201010101
0 0 
The full Genome Analysis Toolkit (GATK) framework, license restricted.
SNP and Indel variant caller on a per-locus basis
01201010101010101
0 0 
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Assigns all the reads in a file to a single new read-group
010101
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Annotates intervals with GC content, mappability, and segmental-duplication content
0101010101010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
01234fastafaidict
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
metainputinput_indexbqsr_tableintervalsfastafaidict
meta versions bam cram 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
012345fastafaidict
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data
01234010101intervals
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
01230101010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi
meta versions table 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an interval list from a bed file and a reference dict
0101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
012
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
estimates the parameters for the DRAGstr model
012fastafasta_faidictstrtablefile
0 0 
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a Convolutional Neural Net to filter annotated variants
01234fastafaidictarchitectureweights
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
0123010101
0 0 0 
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
01234fastafasta_faidict
0 0 0 0 0 0 0 
Genome Analysis Toolkit (GATK4)
Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
012fastafaidict
0 0 
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
fastafasta_faidict
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges adjacent DepthEvidence records
012fastafasta_faidict
0 0 0 
Genome Analysis Toolkit (GATK4)
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a panel of normals constraining germline and artifactual sites for use with mutect2.
01010101
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Denoises read counts to produce denoised copy ratios
0101
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Determines the baseline contig ploidy for germline samples given counts data
012301contig_ploidy_table
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Estimates the numbers of unique molecules in a sequencing library.
01fastafaidict
0 0 
Genome Analysis Toolkit (GATK4)
Converts FastQ file to SAM/BAM format
01
0 0 
Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters intervals based on annotations and/or count statistics.
010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
01234567010101
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply tranche filtering
0123resourcesresources_indexfastafaidict
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers scattered BQSR recalibration reports into a single file
01
0 0 
Genome Analysis Toolkit (GATK4)
write your description here
01dict
0 0 
Genome Analysis Toolkit (GATK4)
merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
012345run_intlistrun_updatewspaceinput_map
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.
012340101010101
0 0 0 
Genome Analysis Toolkit (GATK4)
Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.
01234
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
0123010101variantsvariants_tbi
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call germline SNPs and indels via local re-assembly of haplotypes
012340101010101
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an index for a feature file, e.g. VCF or BED file.
01
0 0 
Genome Analysis Toolkit (GATK4)
Converts an Picard IntervalList file to a BED file.
01
0 0 
Genome Analysis Toolkit (GATK4)
Splits the interval list file into unique, equally-sized interval files and place it under a directory
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Left align and trim variants using GATK4 LeftAlignAndTrimVariants.
0123fastafaidict
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
01fastafasta_fai
0 0 0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
metabamfastafaidict
meta versions output bam_index 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merge unmapped with mapped BAM files
0120101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges mutect2 stats generated on different intervals/regions
01
0 0 
Genome Analysis Toolkit (GATK4)
Merges several vcf files
0101
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts copy number ratios (and optonally allelic counts) to copy number segments
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call somatic SNVs and indels via local assembly of haplotypes.
01230101201allelesalleles_tbigermline_resourcegermline_resource_tbipanel_of_normalspanel_of_normals_tbi
0 0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
0123
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Prepares bins for coverage collection.
0101010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
012010101
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
012bedfastafasta_faidict
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Condenses homRef blocks in a single-sample GVCF
0123fastafaidictdbsnpdbsnp_tbi
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Reverts SAM or BAM files to a previous state.
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts BAM/SAM file to FastQ format
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Select a subset of variants from a VCF file
0123
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a fasta with the bases shifted by offset
010101
0 0 0 0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence
01201fastafasta_faidict
0 0 0 
Genome Analysis Toolkit (GATK4)
Splits CRAM files efficiently by taking advantage of their container based structure
01
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Split intervals into sub-interval files.
01010101
0 0 
Genome Analysis Toolkit (GATK4)
Splits reads that contain Ns in their cigar string
0123010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
0123401010101
0 0 0 
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
012ploidy_tablefastafasta_faidict
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
01
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filter variants
01201010101
0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
012resource_vcfresource_tbilabelsfastafaidict
0 0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Extract fields from a VCF file to a tab-delimited table
012345010101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
01234fastafaidict
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
0123fastafaidictknown_sitesknown_sites_tbi
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
01fastafasta_faidict
0 0 0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.
01program_filedisable_redirect_output
0 0 
This command helps transforming the output files created by GECCO into helpful format, should you want to use the results in combination with other tools.
012modeformat
0 0 0 0 0 
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
012model_dir
0 0 0 0 0 0 
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
Convert a mappability file to bedgraph format
0101
0 0 0 
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
01
0 0 0 
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Define the mappability of a reference
01read_length
0 0 
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
01
0 0 0 
The GEM indexer (v3).
Performs fastq alignment to a fasta reference using using gem3-mapper
0101sort_bam
0 0 
The GEM indexer (v3).
A derivative of GenomeScope2.0 modified to work with FastK
01
0 0 0 0 0 0 0 0 
create index file for genmap
01
0 0 
Ultra-fast computation of genome mappability.
create mappability files for a genome
0101
0 0 0 0 0 
Ultra-fast computation of genome mappability.
for annotating regions, frequencies, cadd scores
01
0 0 
Annotate genetic inheritance models in variant files
Score compounds
01
0 0 
Annotate genetic inheritance models in variant files
annotate models of inheritance
012reduced_penetrance
0 0 
Annotate genetic inheritance models in variant files
Score the variants of a vcf based on their annotation
012score_config
0 0 
Annotate genetic inheritance models in variant files
Download geNomad databases and related files
NO input
0 0 
Identification of mobile genetic elements
Identify mobile genetic elements present in genomic assemblies
01genomad_db
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Identification of mobile genetic elements
Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach
01
0 0 0 0 0 0 0 0 0 
Genotype Salmonella Typhi from Mykrobe results
01
0 0 
Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)
Peak-calling for ChIP-seq and ATAC-seq enrichment experiments
012blacklist_bed
0 0 0 0 0 0 
geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA
geo_accession
0 0 
Retrieves GEO data from the Gene Expression Omnibus (GEO)
01
0 0 0 0 
Get data from NCBI Gene Expression Omnibus (GEO)
Downloads databases needed for running getorganelle
organelle_type
0 0 
Get organelle genomes from genome skimming data
Assembles organelle genomes from genomic data
0101
0 0 0 
Get organelle genomes from genome skimming data
Collapse walk-preserving shared affixes in variation graphs in GFA format
01
0 0 0 
A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
01out_fmtgenome_sizetarget01010101
0 0 0 
Converts GFA or rGFA files to FASTA
01
0 0 
Tools for manipulating sequence graphs in the GFA and rGFA formats
Summary statistics for GFA files
01
0 0 
Tools for manipulating sequence graphs in the GFA and rGFA formats
Compare, merge, annotate and estimate accuracy of generated gtf files
0101201
0 0 0 0 0 0 0 0 
Validate, filter, convert and perform various other operations on GFF files
01fasta
0 0 0 0 
gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
01
0 0 0 
gget enables efficient querying of genomic databases
Defines chunks where to run imputation
0123
0 0 
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.
01234567min_probmin_dpbins
0 0 0 0 0 0 
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Concatenates imputation chunks in a single VCF/BCF file ligating phased information.
012
0 0 
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
012345678
0 0 
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Generates haplotype calls by sampling haplotype estimates
01
0 0 
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Defines chunks where to run imputation
01234model
0 0 
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Program to compute the genotyping error rate at the sample or marker level.
0123456780123456
0 0 0 0 0 0 0 
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.
012
0 0 
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool for imputation and phasing from vcf file or directly from bam files.
0123456789012
0 0 0 
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool to create a binary reference panel for quick reading time.
0123401
0 0 
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
merge gVCF files and perform joint variant calling
01201
0 0 
GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).
012type_reportsummary_reportskipexamine
0 0 0 0 0 0 0 
Writes a sorted concatenation of file/s
01
0 0 
The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.
Query metadata for any taxon across the tree of life.
012
0 0 
goat-cli is a command line interface to query the Genomes on a Tree Open API.
Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
01201
0 0 0 0 0 0 0 0 
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
runs a functional enrichment analysis with gprofiler2
010101
0 0 0 0 0 0 0 0 0 
An R interface corresponding to the 2019 update of g:Profiler web tool.
Checks if the input file is bgzip compressed or not
01
0 0 
a wee tool for random access into BGZF files.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
fasta
0 0 
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Tools for population-scale genotyping using pangenome graphs.
01
0 0 0 
A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0123010101
0 0 0 
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
01010101
0 0 
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0123010101
bedpe bed versions 
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0101
high_conf_sv all_sv versions 
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
0101
0 0 0 
GRIDSS: the Genomic Rearrangement IDentification Software Suite
run the Broad Gene Set Enrichment tool in GSEA mode
01230101
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Gene Set Enrichment Analysis (GSEA)
Collapse redundant transcript models in Iso-Seq data.
01fasta
0 0 0 0 0 0 0 0 0 0 
Collapse similar gene model
Merge multiple transcriptomes while maintaining source information.
01filelist
0 0 0 0 0 
Gene-Switch Transcriptome Annotation by Modular Algorithms
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
01
0 0 0 0 
Gene-Switch Transcriptome Annotation by Modular Algorithms
GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files
01
0 0 0 
The GenomeTools genome analysis system
GenomeTools gt-gff3validator utility to strictly validate a GFF3 file
01
0 0 0 
The GenomeTools genome analysis system
Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility
01
0 0 0 0 0 
The GenomeTools genome analysis system
GenomeTools gt-stat utility to show statistics about features contained in GFF3 files
01
0 0 
The GenomeTools genome analysis system
Computes enhanced suffix array using GenomeTools gt-suffixerator utility
01mode
0 0 
The GenomeTools genome analysis system
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
0101use_pplacer_scratch_dir
0 0 0 0 0 0 0 0 0 0 0 
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Converts the output classifications of GTDB-TK from GTDB taxonomy to NCBI taxonomy
0120101
0 0 
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
alignment
0 0 0 0 0 0 0 0 0 0 
Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes
db_name
0 0 
Python package for detection of chimerism and contamination in prokaryotic genomes.
Merging of CheckM and GUNC results in one summary table
012
0 0 
Python package for detection of chimerism and contamination in prokaryotic genomes.
Detection of Chimerism and Contamination in Prokaryotic Genomes
01db
0 0 0 
Python package for detection of chimerism and contamination in prokaryotic genomes.
Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.
01
0 0 
gvcftools is a package of small utilities for creating and analyzing gVCF files
Tool to convert and summarize ABRicate outputs using the hAMRonization specification
01formatsoftware_versionreference_db_version
0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.
01formatsoftware_versionreference_db_version
0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize DeepARG outputs using the hAMRonization specification
01formatsoftware_versionreference_db_version
0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize fARGene outputs using the hAMRonization specification
01formatsoftware_versionreference_db_version
0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize RGI outputs using the hAMRonization specification.
01formatsoftware_versionreference_db_version
0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to summarize and combine all hAMRonization reports into a single file
reportsformat
0 0 0 0 
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.
01
0 0 0 
classification into haplogroups
01format
0 0 
A tool for mtDNA haplogroup classification.
classification into haplogroups
01
0 0 
A tool for mtDNA haplogroup classification.
Somatic VCF Feature Extraction tool from hap.y.
012340101
0 0 
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
012340101010101
0 0 0 0 0 0 0 0 0 0 0 0 
Haplotype VCF comparison tools
Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py
0120101
0 0 
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.
012340101010101
0 0 0 0 
Haplotype VCF comparison tools somatic variant comparison
Generating cell hashing calls from a matrix of count data.
0123
0 0 0 0 0 0 0 0 0 
HelitronScanner draw tool for Helitron transposons in genomes
010101
0 0 
HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes
HelitronScanner scanHead and scanTail tools for Helitron transposons in genomes
01commandlcv_filepathbuffer_size
0 0 
HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes
Fast and sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)
0101
0 0 
HH-suite3 for fast remote homology detection and deep protein annotation
Sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)
0101
0 0 
HH-suite3 for fast remote homology detection and deep protein annotation
Reformat a Multiple Sequence Alignment (MSA) file
01informatoutformat
0 0 
HH-suite3 for fast remote homology detection and deep protein annotation
Identify cap locus serotype and structure in your Haemophilus influenzae assemblies
01database_dirmodel_fp
0 0 0 0 
Computes PCA eigenvectors for a Hi-C matrix.
01
0 0 0 0 
Set of programs to process, analyze and visualize Hi-C and capture Hi-C data
Whole-genome assembly using PacBio HiFi reads
01201201201
0 0 0 0 0 0 0 0 0 0 0 
Align RNA-Seq reads to a reference with HISAT2
010101
0 0 0 0 
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Extracts splicing sites from a gtf files
01
0 0 
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Pre-compute the graph index structure.
01
0 0 
HLA typing from short and long reads
Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.
0123
0 0 0 0 0 0 0 0 0 0 0 
HLA typing from short and long reads
gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference
01
0 0 
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.
01
0 0 
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file
01
0 0 
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
readCounter function from HMMcopy utilities, used to generate read in windows
01201
0 0 
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Mask multiple sequence alignments
01234567maskfile
0 0 0 0 0 0 0 0 
Biosequence analysis using profile hidden Markov models
reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.
01
0 0 
Biosequence analysis using profile hidden Markov models
hmmalign from the HMMER suite aligns a number of sequences to an HMM profile
01hmm
0 0 
Biosequence analysis using profile hidden Markov models
create an hmm profile from a multiple sequence alignment
01mxfile
0 0 0 
Biosequence analysis using profile hidden Markov models
extract hmm from hmm database file or create index for hmm database
01keykeyfileindex
0 0 0 
Biosequence analysis using profile hidden Markov models
compress and index profile database for hmmscan
01
0 0 
Biosequence analysis using profile hidden Markov models
R script that scores output from multiple runs of hmmer/hmmsearch
01
0 0 
Biosequence analysis using profile hidden Markov models
A Language and Environment for Statistical Computing
Tidyverse: R packages for data science
search profile(s) against a sequence database
012345
0 0 0 0 0 
Biosequence analysis using profile hidden Markov models
iterative searches to detect distant homologs by refining an HMM profile from hits
012345
0 0 0 0 0 
Biosequence analysis using profile hidden Markov models
Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline
01
0 0 
Human mitochondrial variants annotation using HmtVar.
Annotate peaks with HOMER suite
01fastagtf
0 0 0 
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Find peaks with HOMER suite
01uniqmap
0 0 
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Create a tag directory with the HOMER suite
01fasta
0 0 0 
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Differential gene expression analysis based on the negative binomial distribution
Empirical Analysis of Digital Gene Expression Data in R
Create a UCSC bed graph with the HOMER suite
01
0 0 
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Converting from HOMER peak to BED file formats
01
0 0 
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Removes host reads from short- and long-read FASTQ sequencing files
0101
0 0 0 
Hostile: accurate host decontamination
Downloads required reference genomes for Hostile
index_name
0 0 
Hostile: accurate host decontamination
Demultiplex samples based on data from cell hashing.
012
0 0 0 0 0 
count how many reads map to each feature
01201
0 0 
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
012012
0 0 
useful command-line tools written to show-case hts-nim
HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.
0101
0 0 0 0 0 
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals
wigsgc_wigmap_wigcentromererep_time_wigexons
0 0 0 
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA
01gc_wigmap_wignormal_wignormal_backgroundcentromererep_time_wigexons
0 0 0 0 0 0 0 0 0 
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
Plot a metagene of cross-link events/sites around various transcriptomic landmarks.
01segmentation
0 0 
Runs iCount peaks on a BED file of crosslinks
012
0 0 
Computational pipeline for analysis of iCLIP data
Formats a GTF file for use with iCount sigxls
01fai
0 0 0 
Computational pipeline for analysis of iCLIP data
Runs iCount sigxls on a BED file of crosslinks
01segmentation
0 0 0 
Computational pipeline for analysis of iCLIP data
Report proportion of cross-link events/sites on each region type.
01segmentation
0 0 0 0 
Computational pipeline for analysis of iCLIP data
igv.js is an embeddable interactive genome visualization component
012
0 0 0 0 
Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as the genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements
A Python application to generate self-contained HTML reports for variant review and other genomic applications
0123012
0 0 
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
012ilp
0 0 
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
0123ilp
0 0 
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Perform immune cell deconvolution using RNA-seq data and various computational methods.
0123gene_symbol_col
0 0 0 
Search covariance models against a sequence database
012write_alignwrite_target
0 0 0 0 
Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.
inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification
01genome_fastagenes_fastastb_file
0 0 0 0 0 0 0 0 
Calculation of strain-level metrics
Detect integrons in DNA sequences
01
0 0 0 0 0 
Produces protein annotations and predictions from an amino acids FASTA file
01interproscan_database
0 0 0 0 0 
Download, extract, and check md5 of iPHoP databases
NO input
0 0 
Predict host genus from genomes of uncultivated phages.
Predict phage host using iPHoP
01iphop_db
0 0 0 0 
Predict host genus from genomes of uncultivated phages.
Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.
012tree_telmclustmdefpartitions_equalpartitions_proportionalpartitions_unlinkedguide_treesitefreq_inconstraint_treetrees_zsuptreetrees_rf
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Quantification of transposable elements expression in scRNA-seq
01genomebed
0 0 0 0 0 
Genomic island prediction in bacterial and archaeal genomes
01
0 0 0 
IsoSeq - Cluster - Cluster trimmed consensus sequences
01
0 0 0 0 0 0 0 0 0 0 0 0 
IsoSeq - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
01primers
0 0 0 0 0 0 
IsoSeq - Scalable De Novo Isoform Discovery
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
metabam
meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi 
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
metabamprimers
meta bam pbi consensusreadset summary report versions 
IsoSeq3 - Scalable De Novo Isoform Discovery
Extract UMI and cell barcodes
01design
0 0 0 
Iso-Seq - Scalable De Novo Isoform Discovery
Generate a consensus sequence from a BAM file using iVar
01fastasave_mpileup
0 0 0 0 
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Trim primer sequences rom a BAM file with iVar
012bed
0 0 0 
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Call variants from a BAM file using iVar
01fastafaigffsave_mpileup
0 0 0 
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Efficiently counts k-mers from DNA sequencing reads using a fast, memory-efficient, parallelized algorithm
01kmer_lengthsize
0 0 
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence
Dumps the results from a jellyfish binary file into a human readable format
01
0 0 
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence
Create a multi-resolution .hic contact matrix for analysis with Juicer
01012
0 0 
Visualization and analysis software for Hi-C data
Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.
01parametersinput_files
0 0 0 
Jupyter notebooks as plain text scripts or markdown documents
Parameterize, execute, and analyze notebooks
Parameterize, execute, and analyze notebooks
Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)
01
0 0 
Java utilities for Bioinformatics.
Convert sam files to tsv files
01230123
0 0 
Java utilities for Bioinformatics.
Convert VCF to a user friendly table
012301
0 0 
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
annotate VCF files for poly repeats
0123010101
0 0 0 0 
Java utilities for Bioinformatics.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Plot whole genome coverage from BAM/CRAM file as SVG
012010101
0 0 
Java utilities for Bioinformatics.
Taxonomic classification of metagenomic sequence data using a protein reference database
01db
0 0 
Fast and sensitive taxonomic classification for metagenomics
Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.
01db
0 0 
Fast and sensitive taxonomic classification for metagenomics
write your description here
01dbtaxon_rank
0 0 
Fast and sensitive taxonomic classification for metagenomics
Merge two tab-separated output files of Kaiju and Kraken in the column format
012db
0 0 
Fast and sensitive taxonomic classification for metagenomics
Make Kaiju FMI-index file from a protein FASTA file
01keep_intermediate
0 0 0 0 
Fast and sensitive taxonomic classification for metagenomics
Aligns sequences using kalign
01compress
0 0 
Kalign is a fast and accurate multiple sequence alignment algorithm.
Create kallisto index
01
0 0 
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
0101gtfchromosomesfragment_lengthfragment_length_sd
0 0 0 0 
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
quantifies scRNA-seq data from fastq files using kb-python.
01indext2gt1ct2ctechnologyworkflow_mode
0 0 0 
kallisto and bustools are wrapped in an easy-to-use program called kb
index creation for kb count quantification of single-cell data.
fastagtfworkflow_mode
0 0 0 0 0 0 0 
kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.
Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.
012
0 0 
khmer k-mer counting library
Removes low abundance k-mers from FASTA/FASTQ files
01
0 0 
khmer k-mer counting library
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
01kmer_size
0 0 0 
khmer k-mer counting library
Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).
01
0 0 
Generate k-mers (sketches) from FASTA/Q sequences
01
0 0 0 
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Construct KMCP database from k-mer files
01
0 0 0 
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Merge search results from multiple databases.
01
0 0 
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Generate taxonomic profile from search results
01db
0 0 
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Search sequences against database
01db
0 0 
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Adds fasta files to a Kraken2 taxonomic database
01taxonomy_namestaxonomy_nodesaccession2taxidseqid2taxid
0 0 0 0 
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Builds Kraken2 database
010101cleaning
0 0 
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Downloads and builds Kraken2 standard database
cleaning
0 0 
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Classifies metagenomic sequence data
01dbsave_output_fastqssave_reads_assignment
0 0 0 0 0 
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Takes multiple kraken-style reports and combines them into a single report file
01
0 0 
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Extract reads classified at any user-specified taxonomy IDs.
taxid010101
0 0 
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Takes a Kraken report file and prints out a krona-compatible TEXT file
01
0 0 
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Download and build (custom) KrakenUniq databases
0123keep_intermediate
0 0 
Metagenomics classifier with unique k-mer counting for more specific results
Download KrakenUniq databases and related fles
pattern
0 0 
Metagenomics classifier with unique k-mer counting for more specific results
Classifies metagenomic sequence data using unique k-mer counts
012sequence_typedbsave_output_readsreport_filesave_output
0 0 0 0 0 
Metagenomics classifier with unique k-mer counting for more specific results
KronaTools Update Taxonomy downloads a taxonomy database
NO input
0 0 
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Collect multiple krona reports into a single html file
html
0 0 
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.
01taxonomy
0 0 
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Creates a Krona chart from text files listing quantities and lineages.
01
0 0 
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Update Taxonomy downloads a taxonomy database
NO input
0 0 
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Aligns query sequences to target sequences indexed with lastdb
012index
0 0 0 
LAST finds & aligns related regions of sequences.
Prepare sequences for subsequent alignment with lastal.
01
0 0 
LAST finds & aligns related regions of sequences.
Converts MAF alignments in another format.
01201010101
0 0 0 0 0 0 0 0 0 0 0 0 0 
LAST finds & aligns related regions of sequences.
Reorder alignments in a MAF file
01
0 0 
LAST finds & aligns related regions of sequences.
Post-alignment masking
01
0 0 
LAST finds & aligns related regions of sequences.
Find suitable score parameters for sequence alignment
01index
0 0 0 
LAST finds & aligns related regions of sequences.
Align sequences using learnMSA
01
0 0 
learnMSA: Learning and Aligning large Protein Families
Bayesian reconstruction of ancient DNA fragments
01
0 0 0 0 0 0 0 0 0 
Typing of clinical and environmental isolates of Legionella pneumophila
01
0 0 
Index chain files for lift over
01chain
0 0 
Fast and accurate coordinate conversion between assemblies
Converting aligned short and long reads records from one reference to another
0101
0 0 
Fast and accurate coordinate conversion between assemblies
runs a differential expression analysis with Limma
012345012
0 0 0 0 0 0 0 
Linear Models for Microarray Data
LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. This module is for LINKS >=2.0.0 and does not support MPET input.
0101
0 0 0 0 0 0 0 0 0 0 0 
Serogrouping Listeria monocytogenes assemblies
01
0 0 
Lofreq subcommand to for insert base and indel alignment qualities
01fasta
0 0 
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments
012fasta
0 0 
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
It predicts variants using multiple processors
01230101
0 0 0 
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Lofreq subcommand to remove variants with low coverage or strand bias potential
01
0 0 
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Inserts indel qualities in a BAM file
0101
0 0 
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0123450101
0 0 
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
0101
0 0 
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0123450101
0 0 0 
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
0123450101
0 0 
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
"A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps:
- Tigmint cuts the draft assembly at potentially misassembled regions
- ntLink is then used to scaffold the corrected assembly
- followed by ARKS for further scaffolding (optional)"
0101commandspangenomesizelongmap
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder
01
0 0 0 
A Perl wrapper for LTR_FINDER
An efficient program for finding full-length LTR retrotranspsons in genome sequences
Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain
01
0 0 0 
A Perl wrapper for LTR_harvest
The GenomeTools genome analysis system
Identifies LTR retrotransposons using LTR_retriever
metagenomeharvestfindermgescannon_tgca
meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions 
Sensitive and accurate identification of LTR retrotransposons
Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments
01pass_listannotation_outmonoploid_seqs
0 0 0 
Assessing genome assembly quality using the LTR Assembly Index (LAI)
Identifies LTR retrotransposons using LTR_retriever
01harvestfindermgescannon_tgca
0 0 0 0 0 0 0 
Sensitive and accurate identification of LTR retrotransposons
A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.
01
0 0 0 0 0 0 
A pipeline for AMP (antimicrobial peptide) prediction
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
012macs2_gsize
0 0 0 0 0 0 
Model Based Analysis for ChIP-Seq data
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
012macs3_gsize
0 0 0 0 0 0 
Model Based Analysis for ChIP-Seq data
Multiple sequence alignment using MAFFT
0101010101010
fas versions 
Parallel implementation of the gzip algorithm.
Multiple sequence alignment using MAFFT
010101010101compress
0 0 
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
Parallel implementation of the gzip algorithm.
Guide tree rendering using MAFFT
01
0 0 
Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform
mageck count for functional genomics, reads are usually mapped to a specific sgRNA
01library
0 0 0 
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
maximum-likelihood analysis of gene essentialities computation
01design_matrix
0 0 0 
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.
01
0 0 0 0 
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Multiple Sequence Alignment using Graph Clustering
0101compress
0 0 
Multiple Sequence Alignment using Graph Clustering
Multiple Sequence Alignment using Graph Clustering
01
0 0 
Multiple Sequence Alignment using Graph Clustering
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
fastasgffmapping_dbmap_type
0 0 0 
A tool for mapping metagenomic data
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
01index
0 0 0 0 
A tool for mapping metagenomic data
Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening
01taxon_listncbi_dir
0 0 
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0101
0 0 0 
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
012340101config
0 0 0 0 0 0 0 
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
01234560101config
0 0 0 0 0 0 0 0 0 
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
012340101config
0 0 0 0 0 0 0 
Structural variant and indel caller for mapped sequencing data
Create mapAD index for reference genome
01
0 0 
An aDNA aware short-read mapper
Map short-reads to an indexed reference genome
0101mismatch_parameterdouble_stranded_libraryfive_prime_overhangthree_prime_overhangdeam_rate_double_strandeddeam_rate_single_strandedindel_rate
0 0 
An aDNA aware short-read mapper
Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
01fasta
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Screens query sequences against large sequence databases
0101
0 0 
Fast sequence distance estimator that uses MinHash
Creates vastly reduced representations of sequences using MinHash
01
0 0 0 
Fast sequence distance estimator that uses MinHash
MaxBin is a software that is capable of clustering metagenomic contigs
0123
0 0 0 0 0 0 0 0 0 0 
Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"
012raw
0 0 
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.
Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.
010101
0 0 
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
01
0 0 0 
Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.
01
0 0 
Staging modules for MCMICRO
Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.
01
0 0 
Staging modules for MCMICRO
mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences
01
0 0 
Analyses a DAA file and exports information in text format
01megan_summary
0 0 0 
A tool for studying the taxonomic content of a set of DNA reads
Analyses an RMA file and exports information in text format
01megan_summary
0 0 0 
A tool for studying the taxonomic content of a set of DNA reads
Performs taxonomic profiling of long metagenomic reads against the melon database
01databasek2_db
0 0 0 0 
Serotyping of Neisseria meningitidis assemblies
01
0 0 
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
0101lookup_tableseqmerspeak
0 0 0 
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
k-mer based assembly evaluation.
metameryl_dbassembly
meta versions assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_cn_hist spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_asm_hist assembly_qv scaffold_qv read_ploidy 
A script to generate hap-mer dbs for trios
01maternal_merylpaternal_meryl
0 0 0 0 0 0 
Evaluate genome assemblies with k-mers and more.
k-mer based assembly evaluation.
012
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Evaluate genome assemblies with k-mers and more.
Produces maternal and paternal FastK kmer tables from maternal, paternal and child FastK tables
010101
0 0 0 
FastK based version of Merqury
A reimplemenation of Kat Comp to work with FastK databases
01234
0 0 0 0 
FastK based version of Merqury
A reimplemenation of KatGC to work with FastK databases
012
0 0 0 0 
FastK based version of Merqury
FastK based version of Merqury
012340101
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
FastK based version of Merqury
An improved version of Smudgeplot using FastK
012
0 0 0 0 
FastK based version of Merqury
A genomic k-mer counter (and sequence utility) with nice features.
01kvalue
0 0 
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
01kvalue
0 0 
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
01kvalue
0 0 
A genomic k-mer counter (and sequence utility) with nice features.
Depth computation per contig step of metabat2
012
0 0 
Metagenome binning of contigs
012
0 0 0 0 0 0 
Taxonomic profiling database building with MetaCache
01taxonomyseq2taxid
0 0 
Metacache query command for taxonomic classification
01dbdo_abundances
0 0 0 
Annotation of eukaryotic metagenomes using MetaEuk
01database
0 0 0 0 0 
Strain-level metagenomic assignment
01234database_folder
0 0 0 0 0 0 0 0 
Maps long reads to a metamaps database
01database
0 0 0 0 0 
Metagenome assembler for long-read sequences (HiFi and ONT).
01input_type
0 0 0 
MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
Build MetaPhlAn database for taxonomic profiling.
NO input
0 0 
Merges output abundance tables from MetaPhlAn4
01
0 0 
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
01metaphlan_db_latestsave_samfile
0 0 0 0 0 
Merges output abundance tables from MetaPhlAn3
01
0 0 
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
01metaphlan_db
0 0 0 0 
Export METASPACE datasets to AnnData and SpatialData objects
ds_id
0 0 0 
A module to download dataset results from the METASPACE platform and save them as CSV files, using a containerized Python script. Inputs are provided via a CSV file or a list of datasets, with results saved to a specified output directory.
012
0 0 0 
Extracts per-base methylation metrics from alignments
01010101
0 0 0 
Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.
Generates methylation bias plots from alignments
01010101
0 0 
Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.
Demultiplex MGI fastq files
012
0 0 0 0 0 0 0 0 0 0 
Demultiplex MGI fastq files
A tool to estimate bacterial species abundance
0101mode
0 0 
An integrated pipeline for estimating strain-level genomic variation from metagenomic data
marks duplicate spots along gridline edges.
01
0 0 
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
01
0 0 
Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.
Minia is a short-read assembler based on a de Bruijn graph
01
0 0 0 0 
Compression of a reference panel for genotype imputation to .msav format
012
0 0 
Computationally efficient genotype imputation
Imputation of genotypes using a reference panel
0123456
0 0 
Computationally efficient genotype imputation
Provides fasta index required by minimap2 alignment.
01
0 0 
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by miniprot alignment.
01
0 0 
A versatile pairwise aligner for genomic and protein sequences.
miRanda is an algorithm for finding genomic targets for microRNAs
01mirbase
0 0 
miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.
0101
0 0 
miRDeep2 Mapper (mapper.pl) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.
miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
012010123
0 0 
miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
It includes three main scripts: miRDeep2.pl, mapper.pl, and quantifier.pl for comprehensive miRNA detection and quantification.
mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.
0101012
0 0 
Small RNA-seq annotation
mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package
0101012
0 0 0 0 
Small RNA-seq annotation
mirtop gff generates the GFF3 adapter format to capture miRNA variations
0101012
0 0 
Small RNA-seq annotation
mirtop gff gets the number of isomiRs and miRNAs annotated in the GFF file by isomiR category.
01
0 0 0 
Small RNA-seq annotation
A tool for quality control and tracing taxonomic origins of microRNA sequencing data
012mirtrace_species
0 0 0 0 0 0 
miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.
Download a mitochondrial genome to be used as reference for MitoHiFi.
NOTE: An optional NCBI API key can be supplied to MITOHIFI_FINDMITOREFERENCE. This should be set using Nextflow's secrets functionality:
nextflow secrets set NCBI_API_KEY <key>
See https://www.nextflow.io/docs/latest/secrets.html for more information.
01
0 0 0 
Fetch mitochondrial genome in Fasta and Genbank format from NCBI
A python workflow that assembles mitogenomes from Pacbio HiFi reads
010101input_modemito_code
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
A python workflow that assembles mitogenomes from Pacbio HiFi reads
Cluster sequences using MMSeqs2 cluster.
01
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create an MMseqs database from an existing FASTA/Q file
01
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Creates sequence index for mmseqs database
01
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create a tsv file from a query and a target database as well as the result database
010101
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Download an mmseqs-formatted database
database
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences using MMSeqs2 easy cluster.
01
0 0 0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a database using MMseqs2
0101
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences in linear time using MMSeqs2 linclust.
01
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Search and calculate a score for similar sequences in a query and a target database.
0101
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Computes the lowest common ancestor by identifying the query sequence homologs against the target database.
01db_target
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
database
0 0 
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Subclonal deconvolution of cancer genome sequencing data.
01
0 0 0 0 0 0 0 
A tool to reconstruct plasmids in bacterial assemblies
01
0 0 0 0 0 
Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.
Convert a bedMethyl file to bigWig format using modkit
0101modcodes
0 0 
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.
Call mods from a modbam, creates a new modbam with probabilities set to 100% if a base modification is called or 0% if called canonical
01
0 0 0 
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.
A bioinformatics tool for working with modified bases
01201201
0 0 0 0 
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data
Modkit is a bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data. The command modkit reapir is used for repairing the MM/ML tags on trimmed or hard-clipped ONT reads.
012
0 0 0 
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.
Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.
01
0 0 
One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.
Download the mOTUs database
motus_downloaddb_script
0 0 
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
Taxonomic meta-omics profiling using universal marker genes
01dbprofile_version_yml
0 0 0 
Marker gene-based OTU (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
01db
0 0 
Marker gene-based operational taxonomic unit (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
01db
0 0 0 0 0 
Marker gene-based OTU (mOTU) profiling
Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data
0123456
0 0 0 0 0 
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Scan a reference genome to get microsatellite & homopolymer information
01
0 0 
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
msisensor2 detection of MSI regions.
01201
0 0 0 0 
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
msisensor2 detection of MSI regions.
01
0 0 
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
01234501msisensor_scan
0 0 0 0 0 
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro/pro is a tool used to evaluate MSI using single (tumor) sample sequencing data
012010101
0 0 0 0 0 
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
01
0 0 
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Aligns protein structures using mTM-align
01compress
0 0 0 
Algorithm for structural multiple sequence alignments
Parallel implementation of the gzip algorithm.
A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.
01mt_id
0 0 0 
Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.
01indexgtfrad
0 0 0 
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.
gtf
0 0 
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Aggregate results from bioinformatics analyses across many samples into a single report
multiqc_filesmultiqc_configextra_multiqc_configmultiqc_logoreplace_namessample_names
0 0 0 0 
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
Identify singlets, doublets and negative cells from multiplexing experiments. Annotate singlets by tags.
012
0 0 0 0 
SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA
01010101allele_freqsgenotype_qualitycoveragehomozygous_freqheterozygous_freq01
0 0 0 0 0 0 0 0 0 0 0 0 
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two
01
0 0 0 0 0 0 0 0 0 
Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.
01compress
0 0 
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
Parallel implementation of the gzip algorithm.
AMR predictions for supported species
01species
0 0 0 
Antibiotic resistance prediction in minutes
NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.
0101
0 0 0 
R package that uses two main functions to summarize and visualize NanoString RCC files,
namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates
sample specific size factors and normalises the data.
For more information vignette("NACHO") and vignette("NACHO-analysis")
NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.
0101
0 0 0 0 
R package that uses two main functions to summarize and visualize NanoString RCC files,
namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates
sample specific size factors and normalises the data.
For more information vignette("NACHO") and vignette("NACHO-analysis")
nail search is a fast and scalable tool for searching protein sequences against protein databases
0101write_align
0 0 0 0 
Profile Hidden Markov Model (pHMM) biological sequence alignment tool
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
012
0 0 0 0 0 0 0 0 0 
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
01
0 0 0 0 
Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
01output_format
0 0 0 
Create DRAGEN hashtable for reference genome
01
0 0 
narfmap is a fork of the Dragen mapper/aligner Open Source Software.
A tool to quickly download assemblies from NCBI's Assembly database
metaaccessionstaxidsgroups
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose
0101
0 0 
"NCBI libraries for biology applications (text-based utilities)"
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
datasettag
0 0 
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
01dataset
0 0 0 0 0 0 0 0 0 0 0 
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
Performs fastq alignment to a fasta reference using NextGenMap
01fasta
0 0 
NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime
Merging paired-end reads and removing sequencing adapters.
01
0 0 0 0 
Annotates GC content fraction to regions in a BED file.
010101
0 0 
Short-read sequencing tools
Annotates a BED file with the average coverage of the regions from one or several BAM/CRAM file(s).
01230101
0 0 
Short-read sequencing tools
Determines the gender of a sample from the BAM/CRAM file.
0120101method
0 0 
Short-read sequencing tools
UPD detection from trio variant data.
012
0 0 0 
Short-read sequencing tools
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.
0101
0 0 
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.
010101
0 0 0 0 0 0 
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
010101
0 0 
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
01
0 0 0 0 0 
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
write your description here
metareadsformatmode
meta versions npa npc npl npo 
Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file
01
0 0 
Estimate average coverage and create curves for metagenomic datasets
Calculate metagenome redundancy curve from FASTQ files
01formatmode
0 0 0 0 0 
Estimate average coverage and create curves for metagenomic datasets
Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files
01
0 0 0 0 0 
Estimate average coverage and create curves for metagenomic datasets
Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image
01
0 0 
Estimate average coverage and create curves for metagenomic datasets
NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.
012
0 0 0 
Construct a dynamic succinct variation graph in ODGI format from a GFAv1.
01
0 0 
An optimized dynamic genome/graph implementation
Draw previously-determined 2D layouts of the graph with diverse annotations.
012
0 0 
An optimized dynamic genome/graph implementation
Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.
01
0 0 0 
An optimized dynamic genome/graph implementation
Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.
01
0 0 
An optimized dynamic genome/graph implementation
Squeezes multiple graphs in ODGI format into the same file in ODGI format.
01
0 0 
An optimized dynamic genome/graph implementation
Metrics describing a variation graph and its path relationship.
01
0 0 0 
An optimized dynamic genome/graph implementation
Merge unitigs into a single node preserving the node order.
01
0 0 
An optimized dynamic genome/graph implementation
Project a graph into other formats.
01
0 0 
An optimized dynamic genome/graph implementation
Visualize a variation graph in 1D.
01
0 0 
An optimized dynamic genome/graph implementation
Create a decoy peptide database from a standard FASTA database.
01
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
01
0 0 0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
012
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Calculates a distribution of the mass error from given mass spectra and IDs.
012
0 0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Merges several idXML files into one idXML file.
01
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Split a merged identification file into their originating identification files
01
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Switches between different scores of peptide or protein hits in identification data
01
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)
01
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Refreshes the protein references for all peptide hits.
012
0 0 
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Annotates MS/MS spectra using Comet.
012
0 0 0 
flip corrects probes that are aligning to the opposite strand of their intended target genes by reverse complementing them
01012
0 0 
opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity
stat summarizes opt binding predictions
0101gene_synonyms
0 0 
opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity
track aligns query probe sequences to any target transcriptome
01012
0 0 
opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.
0101
0 0 0 
A python library and a command-line client for up- and downloading files to and from your Open Science Framework projects
012
0 0 
The osfclient is a python library and a command-line client for up- and downloading files to and from your Open Science Framework projects.
A program to convert bam into paf.
01
0 0 
A program to manipulate paf files / convert to and from paf.
a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates
01
0 0 
Find and remove PCR/optical duplicates
01
0 0 0 
CLI tools to process mapped Hi-C data
Flip pairs to get an upper-triangular matrix
01chromsizes
0 0 
CLI tools to process mapped Hi-C data
Merge multiple pairs/pairsam files
01
0 0 
CLI tools to process mapped Hi-C data
Find ligation junctions in .sam, make .pairs
01chromsizes
0 0 0 
CLI tools to process mapped Hi-C data
Assign restriction fragments to pairs
01frag
0 0 
CLI tools to process mapped Hi-C data
Select pairs according to given condition by options.args
01
0 0 0 
CLI tools to process mapped Hi-C data
Sort a .pairs/.pairsam file
01
0 0 
CLI tools to process mapped Hi-C data
Split a .pairsam file into .pairs and .sam.
01
0 0 0 
CLI tools to process mapped Hi-C data
Calculate pairs statistics
01
0 0 
CLI tools to process mapped Hi-C data
Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file
01bed_subsetbed_excludetsv_groupby
0 0 
panacus is a tool for computing counting statistics for GFA files
Create visualizations from a tsv coverage histogram created with panacus.
01
0 0 
panacus is a tool for computing counting statistics for GFA files
A fast and scalable tool for bacterial pangenome analysis
01
0 0 0 
panaroo - an updated pipeline for pangenome investigation
Phylogenetic Assignment of Named Global Outbreak LINeages
01db
0 0 
Phylogenetic Assignment of Named Global Outbreak LINeages
Phylogenetic Assignment of Named Global Outbreak LINeages
dbname
0 0 
Phylogenetic Assignment of Named Global Outbreak LINeages
NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).
0101010101
0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database
0123
0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.
012301
0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.
0101010101output_fmt
0 0 0 0 0 0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence
010101known_sites
0 0 0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs
0101
0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.
012301
0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.
01
0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated minimap2 for aligning long read sequences against a large reference database using an accelerated KSW2 to convert FASTQ to BAM/CRAM.
01010101output_fmt
0 0 0 0 0 0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
This tool is the equivalent of fq2bam for RNA-Seq samples, receiving inputs in FASTQ format, performing alignment with the splice-aware STAR algorithm, optionally marking of duplicate reads, and outputting an aligned BAM file ready for variant and fusion calling.
01010101
0 0 0 0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
This tool uses the GPU to perform fusion calling for RNA-Seq samples, utilizing the STAR-Fusion algorithm. This requires input of a genome resource library, in accordance with the original STAR-Fusion tool, and outputs candidate fusion transcripts.
0101
0 0 0 
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Download STAR-fusion genome resource required to run STAR-Fusion caller
0101fusion_annot_libdfam_speciespfam_urldfam_urlsannot_filter_url
0 0 
Fusion calling algorithm for RNAseq data
This is near identical to the existing star/genomegenerate however it runs on an older version (2.7.2a) that is required for Parabricks compatibility.
0101
0 0 
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Determines the depth in a BAM/CRAM file
0120101
0 0 0 
Graph realignment tools for structural variants
Genotype structural variants using paragraph and grmpy
0123450101
0 0 0 
Graph realignment tools for structural variants
Convert a VCF file to a JSON graph
0101
0 0 
Graph realignment tools for structural variants
The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.
01
0 0 0 
PacBio BAM C++ library
Identify specific base modifications in PacBio HiFi reads by analyzing polymerase kinetic signatures
01
0 0 
Alignment with PacBio's minimap2 frontend
0101
0 0 
A minimap2 frontend for PacBio native data formats
pbsv - PacBio structural variant (SV) signature discovery tool
0101
0 0 
pbsv - PacBio structural variant (SV) calling and analysis tools
converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq
012
0 0 
pbtk - PacBio BAM toolkit
Minimalistic tool which creates an index file that enables random access into PacBio BAM files
01
0 0 
pbtk - PacBio BAM toolkit
"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."
01
0 0 0 0 
Predict prophages in bacterial genomes
01
0 0 0 0 0 0 0 0 0 0 0 0 
Prophage finder using multiple metrics
phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
01silva_dbunivec_db
0 0 
Assigns all the reads in a file to a single new read-group
010101
0 0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates an interval list from a bed file and a reference dict
0101
0 0 
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
01
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the alignment summary of a paired-end library.
0101
0 0 
Java tools for working with NGS data in the BAM format
Collects hybrid-selection (HS) metrics for a SAM or BAM file.
0123401010101
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the insert size distribution of a paired-end library.
01
0 0 0 
Java tools for working with NGS data in the BAM format
Collect multiple metrics from a BAM file
0120101
0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics from a RNAseq BAM file
01ref_flatfastarrna_intervals
0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
0120101intervallist
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates a sequence dictionary for a reference sequence.
01
0 0 
Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.
Checks that all data in the set of input files appear to come from the same individual
01234501
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.
012haplotype_mapfastafasta_faisequence_dictionary
0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Converts a FASTQ file to an unaligned BAM or SAM file.
01
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list
012filter
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Verify mate-pair information between mates and fix if needed
01
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Lifts over a VCF file from one reference build to another.
01010101
0 0 0 
Move annotations from one assembly to another
Locate and tag duplicate reads in a BAM file
010101
0 0 0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the mean quality by cycle of a paired-end library.
01
0 0 0 
Java tools for working with NGS data in the BAM format
Merges multiple BAM files into a single file
01
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads
012
0 0 0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
changes name of sample in the vcf file
01
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases
010101
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
This tool takes in a coordinate-sorted SAM or BAM and calculatesthe NM, MD, and UQ tags by comparing with the reference.
0101
0 0 0 
Sorts BAM/SAM files based on a variety of picard specific criteria
01sort_order
0 0 
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts vcf files
010101
0 0 
Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format
Splits a SAM/BAM/CRAM file to multiple files. This tool splits the input query-grouped SAM/BAM/CRAM file into multiple files while maintaining the sort order. This can be used to split a large unmapped input in order to parallelize alignment.
01012split_to_N_readssplit_to_N_filesarguments_file
0 0 
Splits a SAM or BAM file to multiple BAMs by number of reads.
Compresses files with pigz.
01
0 0 
Parallel implementation of the gzip algorithm.
write your description here
01
0 0 
Parallel implementation of the gzip algorithm.
Automatically improve draft assemblies and find variation among strains, including large event detection
01012pilon_mode
0 0 0 0 0 0 
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
012fastafaibed
0 0 0 0 0 0 0 0 0 0 0 
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Main caller script for peak calling
012assay_type
0 0 0 0 0 
Peak Identifier for Nascent Transcripts Starts (PINTS)
Identify plasmids in bacterial sequences and assemblies
01
0 0 0 0 0 0 
Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0123010101
0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Exclude variant identifiers from plink bfiles
01234
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink bfiles with a text file of variant identifiers
01234
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
0123010101
0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Calculates identity-by-descent over autosomal SNPs
0123
0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate GWAS association studies
0123010101
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate Hardy-Weinberg statistics for provided input
01230101
0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.
0123window_sizevariant_countvariance_inflation_factor
0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.
0123window_sizevariant_countr2_threshold
0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
LD analysis in PLINK examines genetic variant associations within populations
0123010101
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Recodes plink bfiles into a new text fileset applying different modifiers
0123
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink pfiles with a text file of variant identifiers
01234
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Filters plink bfiles or pfiles with filters such as maf or var
0123
0 0 0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Calculate Inbreeding data with plink2
0123
0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Produce pruned set of variants in approximatelinkage equilibrium
0123winstepr2
0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Perform PCA analysis using PLINK
012345
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Remove samples from a plink2 dataset
0123sample_exclude_list
0 0 0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Apply a scoring system to each sample in a plink 2 fileset
0123scorefile
0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Import variant genetic data using plink2
01
0 0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Convert from VCF file to BGEN file version 1.2 format preserving dosages.
01234
0 0 0 0 
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes.
0101010101010101
0 0 
pmdtools command to filter ancient DNA molecules from others
012thresholdreference
0 0 
Compute postmortem damage patterns and decontaminate ancient genomes
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
01
0 0 0 
Polishing genome assemblies with short reads.
0101save_debug
0 0 0 
Polishing genome assemblies with short reads.
PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.
0101012
0 0 0 0 
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.
0123
0 0 
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
012
0 0 0 0 0 
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.
012
0 0 0 0 0 0 
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
01custom_adapters
0 0 0 
Adapter removal and demultiplexing of Oxford Nanopore reads
01
0 0 0 
Adapter removal and demultiplexing of Oxford Nanopore reads
Run all Portcullis steps in one go
010101
0 0 0 0 0 0 0 0 
Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.
Software for predicting library complexity and genome coverage in high-throughput sequencing
01
0 0 0 
Software for predicting library complexity and genome coverage in high-throughput sequencing
Software for predicting library complexity and genome coverage in high-throughput sequencing
01
0 0 0 
Software for predicting library complexity and genome coverage in high-throughput sequencing
Calculate pairwise nucleotide identity with respect to a reference sequence
0101compress
0 0 0 0 0 
Filter reads by quality score.
01
0 0 0 0 
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
a module to generate images from Pretext contact maps.
01
0 0 
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
01
0 0 0 0 0 
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
01output_format
0 0 0 0 0 
Whole genome annotation of small genomes (bacterial, archeal, viral)
01proteinsprodigal_tf
0 0 0 0 0 0 0 0 0 0 0 0 0 
frame-shift correction for long read (meta)genomics - fix frameshifts in reads
0101
0 0 
frame-shift correction for long read (meta)genomics
frame-shift correction for long read (meta)genomics - maps proteins to reads
012
0 0 
frame-shift correction for long read (meta)genomics
Perform Gene Ratio Enrichment Analysis
0101
0 0 0 
Gene Ratio Enrichment Analysis
Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)
01
0 0 0 
Logratio methods for omics data
Perform differential proportionality analysis
0123012
0 0 0 0 0 0 0 0 0 
Logratio methods for omics data
Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.
01
0 0 0 0 0 0 0 
Logratio methods for omics data
Efficient Estimation of Covariance and (Partial) Correlation
Proseg (probabilistic segmentation) is a cell segmentation method for in situ spatial transcriptomics.
01mode012
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Convert proseg outputs to baysor format for import to Xenium explorer
0101
0 0 0 
Proseg (probabilistic segmentation) is a cell segmentation method for in situ spatial transcriptomics.
Proteinortho is a tool to detect orthologous genes within different species.
01
0 0 0 0 
reads a maxQuant proteinGroups file with Proteus
012
0 0 0 0 0 0 0 0 0 0 
R package for analysing proteomics data
Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.
012intervals
0 0 0 0 0 
Copy number calling and SNV classification using targeted short read sequencing
Generate on and off-target intervals for PureCN from a list of targets
0101genome
0 0 0 
Copy number calling and SNV classification using targeted short read sequencing
Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.
0123genomeassay
0 0 0 0 0 0 
Copy number calling and SNV classification using targeted short read sequencing
Run PureCN workflow to normalize, segment and determine purity and ploidy
0123normal_dbmapping_biasgenome
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Copy number calling and SNV classification using targeted short read sequencing
Calculate coverage cutoffs to determine when to purge duplicated sequence.
01
0 0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Separates out sequences purged of falsely duplicated sequences.
012
0 0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Plots the read coverage from a purge dups statistics file and cutoffs.
012
0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Create read depth histogram and base-level read depth for an assembly based on pacbio data
01
0 0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Purge haplotigs and overlaps for an assembly
0123
0 0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Split fasta file by 'N's to aid in self alignment for duplicate purging
01
0 0 
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
PyClone-VI is a software for inferring the clonal population structure of cancers by using variant allele frequencies and copy number data of single or multiple samples.
012
0 0 0 0 0 
Damage parameter estimation for ancient DNA
012
0 0 
Damage parameter estimation for ancient DNA
Damage parameter estimation for ancient DNA
01
0 0 
Damage parameter estimation for ancient DNA
Compute summary statistics for control gene from BAM files.
012control_gene
0 0 
A Python package for pharmacogenomics research
Call SNVs/indels from BAM files for all target genes.
01201
0 0 0 
A Python package for pharmacogenomics research
Prepare a depth of coverage file for all target genes with SV from BAM files.
012
0 0 
A Python package for pharmacogenomics research
PyPGx pharmacogenomics genotyping pipeline for NGS data.
01234501
0 0 0 0 
A Python package for pharmacogenomics research
Short read polisher for long read assemblies
0101
0 0 0 0 
Standalone Python re-implementation of the POLCA polisher from MaSuRCA
Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.
01output_format
0 0 0 0 0 
Evaluate alignment data
01gff
0 0 
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
012gfffastafasta_fai
0 0 
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
0101
0 0 
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Convert DIA-NN results to mzTab, MSstats, and Triqler formats using quantms-utils
01234567
0 0 0 0 0 
quantms-utils is a Python package for proteomics data processing
Generate DIA-NN configuration arguments from enzyme and Unimod modification parameters. This module uses quantms-utils to convert enzyme and modification specifications into DIA-NN-compatible command-line arguments. Only supports Unimod modifications. For custom modifications, pass arguments directly to the DIANN module.
0123
0 0 0 
quantms-utils is a Python package with scripts and functions for quantitative proteomics data analysis. The dianncfg command converts enzyme and modification parameters to DIA-NN-compatible format.
Generate statistics from mzML files using quantms-utils
01
0 0 0 0 0 
A Python package for quantitative mass spectrometry data analysis
QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.
012345678910111213141501
0 0 0 0 0 
Read aware low coverage whole genome sequence imputation from a reference panel
Homology-based assembly patching: Make continuous joins and fill gaps in 'target.fa' using sequences from 'query.fa'
01010101
0 0 0 0 0 0 0 0 0 0 
Fast reference-guided genome assembly scaffolding
Scaffolding is the process of ordering and orienting draft assembly (query) sequences into longer sequences. Gaps (stretches of "N" characters) are placed between adjacent query sequences to indicate the presence of unknown sequence. RagTag uses whole-genome alignments to a reference assembly to scaffold query sequences. RagTag does not alter input query sequence in any way and only orders and orients sequences, joining them with gaps.
010101012
0 0 0 0 
Fast reference-guided genome assembly scaffolding
Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.
alignment
0 0 0 
Assess positive C->T conversion as a readout for methylation on a genome-wide basis
010101010101
0 0 
A tool for rapid genome-wide assessment of positive C->T conversion as a readout for methylation.
Assess C->T conversion as a readout for methylation on a per-read-position basis.
01010101
0 0 
A tool for rapid genome-wide assessment of positive C->T conversion as a readout for methylation.
Parses Rastair mbias output to assess the ideal cutoff for read trimming and reports the values.
01
0 0 0 0 
A tool for rapid genome-wide assessment of C->T conversion as a readout for methylation. The module uses R scripts bundled with Rastair.
Parses Rastair mbias output to assess the ideal cutoff for read trimming and reports the values.
01
0 0 0 0 
A tool for rapid genome-wide assessment of C->T conversion as a readout for methylation. The module uses R scripts bundled with Rastair.
Parses rastair call output and converts it into a MethylKit-compatible format.
01
0 0 
A tool for rapid genome-wide assessment of C->T conversion as a readout for methylation. The module uses a script bundled with Rastair to convert formats.
Randomly subsample sequencing reads to a specified coverage
012depth_cutoff
0 0 
De novo genome assembler for long uncorrected reads.
01
0 0 0 
RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.
012
0 0 0 
Extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format.
012
0 0 
RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.
Screening DNA sequences for interspersed repeats and low complexity DNA sequences
01lib
0 0 0 0 0 
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
A utility script to assist to convert old RepeatMasker *.out files to version 3 gff files.
01
0 0 
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences
Create a database for RepeatModeler
01
0 0 
RepeatModeler is a de-novo repeat family identification and modeling package.
Performs de novo transposable element (TE) family identification with RepeatModeler
01
0 0 0 0 
RepeatModeler is a de-novo repeat family identification and modeling package.
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
012db_pointdb_res
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data
card
0 0 0 0 
This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.
Predict antibiotic resistance from protein or nucleotide data
01cardwildcard
0 0 0 0 0 0 
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Markup VCF file using rho-calls.
01201bed
0 0 
Call regions of homozygosity and make tentative UPD calls.
Call regions of homozygosity and make tentative UPD calls
0101
0 0 0 
Call regions of homozygosity and make tentative UPD calls.
Quality control of riboseq bam data
012012012010101
0 0 0 0 
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Quality control of riboseq bam data
01201
0 0 0 0 
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Accurate detection of short and long active ORFs using Ribo-seq data
01201
0 0 0 0 0 0 0 0 0 0 0 
Python package to detect translating ORF from Ribo-seq data
Accurate detection of short and long active ORFs using Ribo-seq data
012
0 0 
Python package to detect translating ORF from Ribo-seq data
Render an rmarkdown notebook. Supports parametrization.
01parametersinput_files
0 0 0 0 0 
Dynamic Documents for R
Assess the quality of an RNAseq assembly with or without a reference genome
010101
0 0 
Calculate pan-genome from annotated bacterial assemblies in GFF3 format
01
0 0 0 
Calculate expression with RSEM
01index
0 0 0 0 0 0 0 0 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Prepare a reference genome for RSEM
fastagtf
0 0 0 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Generate statistics from a bam file
01
0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Infer strandedness from sequencing reads
01bed
0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate inner distance between read pairs.
01bed
0 0 0 0 0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
01bed
0 0 0 0 0 0 0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
01bed
0 0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
01bed
0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate read duplication rate
01
0 0 0 0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate TIN (transcript integrity number) from RNA-seq reads
012bed
0 0 0 
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
The bndeval tool of RTG tools. It is used to evaluate called BND type of variants for agreement with a BND baseline variant set
012345
0 0 0 0 0 0 0 0 0 0 0 
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Converts a PED file to VCF headers
01
0 0 
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.
01
0 0 0 
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The svdecompose tool of RTG tools. It is used to decompose structural variants to BNDs
012
0 0 0 
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set
012345601
0 0 0 0 0 0 0 0 0 0 0 0 0 0 
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Uses the RTN R package for transcriptional regulatory network inference (TNI).
01
0 0 0 0 0 
RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons
CAZyme annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
01dbcan_db
0 0 0 0 0 
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
command from run_dbcan to prepare the database for dbCAN annotation.
NO input
0 0 
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
CGC annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
01012dbcan_db
0 0 0 0 0 0 0 0 0 0 
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.
01012dbcan_db
0 0 0 0 0 0 0 0 0 0 0 0 0 
Standalone version of dbCAN annotation tool for automated CAZyme annotation.
Prediction of a protein's secondary structure from its amino acid sequence
01
0 0 
Accurate prediction of a protein's secondary structure from its amino acid sequence
sage is a search software for proteomics data
010101
0 0 0 0 0 0 
Proteomics searching so fast it feels like magic.
Create index for salmon
genome_fastatranscript_fasta
0 0 
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
gene/transcript quantification with Salmon
01indexgtftranscript_fastaalignment_modelib_type
0 0 0 0 
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
SALSA, A tool to scaffold long read assemblies with HiC
012bedgfadupfilter_bed
0 0 0 0 
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
012database
0 0 0 0 
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
Outputs some statistics drawn from read flags.
01
0 0 
Tools for working with SAM/BAM data
find and mark duplicate reads in BAM file
01
0 0 0 
process your BAM data faster!
This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file
01
0 0 
Module to validate illumina® Sample Sheet v2 files.
01file_schema_validator
0 0 
Clips read alignments where they match BED file defined regions
01bedsave_cliprejectssave_clipstats
0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
01split
0 0 
Tools for dealing with SAM, BAM and CRAM files
Outputs a FASTA file compressed with the BGZF algorithm
01
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
calculates MD and NM tags
0101
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Concatenate BAM or CRAM file
01
0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
shuffles and groups reads together by their names
0101
0 0 0 0 
Tools for dealing with SAM, BAM and CRAM files
The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0101interleave
0 0 0 0 0 
Tools for dealing with SAM, BAM and CRAM files
Produces a consensus FASTA/FASTQ/PILEUP
01
0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
convert and then index CRAM -> BAM or BAM -> CRAM file
0120101
0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
produces a histogram or table of coverage per chromosome
0120101
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
List CRAM Content-ID and Data-Series sizes
01
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Computes the depth at each position or region.
0101
0 0 
Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region
Create a sequence dictionary file from a FASTA file
01
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index FASTA file, and optionally generate a file of chromosome sizes
0101get_sizes
0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Converts a SAM/BAM/CRAM file to FASTA
01interleave
0 0 0 0 0 
Tools for dealing with SAM, BAM and CRAM files
Converts a SAM/BAM/CRAM file to FASTQ
01interleave
0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
01
0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
012
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
01
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Reports alignment summary statistics for a BAM/CRAM/SAM file
012
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
converts FASTQ files to unmapped SAM/BAM/CRAM
01
0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index SAM/BAM/CRAM file
01
0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
mark duplicate alignments in a coordinate sorted file
0101
0 0 0 0 
Tools for dealing with SAM, BAM and CRAM files
Merge BAM or CRAM file
01010101
0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Generate text pileup output for one or multiple BAM files. Each input file produces a separate group of pileup columns in the output.
01201
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
01
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file
0101
0 0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Sort SAM/BAM/CRAM file
0101index_format
0 0 0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces comprehensive statistics from SAM/BAM/CRAM file
01201
0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
01201qnameindex_format
0 0 0 0 0 0 0 0 0 
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SV candidate discovery from PacBio HiFi data
01201010101
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Joint structural variant and copy number variant caller for HiFi sequencing data
Joint calling of structural variants from multiple samples using Sawfish
010101201
0 0 0 0 0 0 0 0 0 0 0 
Joint structural variant and copy number variant caller for HiFi sequencing data
Filter cells and genes in single-cell RNA-seq data using Scanpy
01000000
h5ad versions 
Single-Cell Analysis in Python
Probabilistic demultiplexing of cell hashing data
012
0 0 0 0 
Single-cell analysis in Python. Scales to >100M cells.
Perform principal component analysis (PCA) on single-cell RNA-seq data using Scanpy
01key_added
0 0 0 
Single-Cell Analysis in Python
Detect doublets in single-cell RNA-seq data using Scrublet via Scanpy
01batch_col
0 0 0 
Single-Cell Analysis in Python
SCIMAP is a suite of tools that enables spatial single-cell analyses
01
0 0 0 
Scimap is a scalable toolkit for analyzing spatial molecular data.
SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.
01
0 0 0 0 
Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.
The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier
0101mei_ref
0 0 0 0 
Soft Clipped Read Alignment Mapper
The cluster_identifier tool of Scramble identifies soft clipped clusters
01201
0 0 
Soft Clipped Read Alignment Mapper
Module to use scAR to remove ambient RNA from single-cell RNA-seq data
012input_layeroutput_layermax_epochsn_batch
0 0 
scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data
scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.
Detect doublets in single-cell RNA-Seq data
01batch_keymax_epochs
0 0 0 
A scalable toolkit for probabilistic modeling applied to single-cell omics data
Call peaks using SEACR on sequenced reads in bedgraph format
012threshold
0 0 
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
01fastaindex
0 0 0 0 0 
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
Generate genome indices for segemehl align
fasta
0 0 
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
metagenomic binning with self-supervised learning
012
0 0 0 0 0 
Metagenomic binning with semi-supervised siamese neural network
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
0123450101
0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Create BWA index for reference genome
01
0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
01010101
0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the Picard CollectVariantCallingMetrics tool.
012012010101
0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the GATK DepthOfCoverage tool.
01201010101
0 0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects multiple quality metrics from a bam file
0120101plot_results
0 0 0 0 0 0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
0120101
0 0 0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
012010101
0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
01230101010101pcr_indel_modelemit_vcfemit_gvcf
0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
012301010101
0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
0123401010101emit_vcfemit_gvcf
0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Generate recalibration table and optionally perform base quality recalibration
0120101010101generate_recalibrated_bams
0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.
0120101
0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Calculate expression with RSEM
01index
0 0 0 0 0 0 0 0 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Prepare a reference genome for RSEM
fastagtf
0 0 0 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Align reads to a reference genome using Sentieon STAR
010101star_ignore_sjdbgtfseq_platformseq_center
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Filters the raw output of sentieon/tnhaplotyper2.
01234560101
0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
012301010101010101emit_orientation_dataemit_contamination_data
0 0 0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
01230101010101010101
0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
012resource_vcfresource_tbilabelsfastafai
0 0 0 0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects whole genome quality metrics from a bam file
012010101
0 0 
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.
01
0 0 
Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.
Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.
01
0 0 
DNA sequence utilities for FASTX files
Statistics for FASTA or FASTQ files
01
0 0 0 
Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files
Concatenating multiple uncompressed sequence files together
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Convert FASTQ to FASTA format
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Select sequences from a large file based on name/ID
01pattern
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
match up paired-end reads from two fastq files
01
0 0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Use seqkit to find/replace strings within sequences and sequence headers
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
01
0 0 0 
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Sanitize broken single line FASTQ files
01
0 0 0 
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
01
0 0 
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Use seqkit to generate sliding windows of input fasta
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Sorts sequences by id/name/sequence/length
01
0 0 
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Split single or paired-end fastq.gz files
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
simple statistics of FASTA/Q files
01
0 0 
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Translate DNA/RNA to protein sequence
01
0 0 
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Salmonella serotype prediction from reads and assemblies
01
0 0 0 0 
Generates a BED file containing genomic locations of lengths of N.
01
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
01
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Rename sequence names in FASTQ or FASTA files.
01
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.
Subsample reads from FASTQ files
012
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Common transformation operations on FASTA or FASTQ files.
01
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.
Select only sequences that match the filtering condition
01filter_list
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Trim low quality bases from FastQ files
01
0 0 
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Sequence quality metrics for FASTQ and uBAM files.
01
0 0 0 
PileupCaller is a tool to create genotype calls from bam files using read-sampling methods
01snpfilesample_names_fn
0 0 0 0 
Tools for population genetics on sequencing data
Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file
012fastawigfile
0 0 
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.
Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.
01
0 0 
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.
Induce a variation graph in GFA format from alignments in PAF format
012
0 0 
seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
01
0 0 0 
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.
01sample_list_file
0 0 0 
Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.
012
0 0 
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.
0123401201201
0 0 
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.
01234012301
0 0 
Fast and accurate method for estimation of haplotypes (phasing)
Program to compute switch error rate and genotyping error rate given simulated or trio data.
01234012012
0 0 
Fast and accurate method for estimation of haplotypes (phasing)
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
01
0 0 0 0 
Determine Shigella serotype from Illumina or Oxford Nanopore reads
01
0 0 0 
Determine Shigella serotype from assemblies or Illumina paired-end reads
01
0 0 
build and deploy Shiny apps for interactively mining differential abundance data
0123012contrast_stats_assay
0 0 
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make plots for interpretation of differential abundance statistics
010123
0 0 0 
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots
01234
0 0 0 0 0 0 0 0 0 0 0 0 
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
validate consistency of feature and sample annotations with matrices and contrasts
0120101
0 0 0 0 0 
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
A windowed adaptive trimming tool for FASTQ files using quality
012
0 0 0 0 0 
mutational signature deconvolution of cancer cells
01genomegenome_installed_path
0 0 
Sigprofilermatrixgenerator is a Python-based tool. It creates mutational matrices for all types of somatic mutations (SBS, DBS, and IDs)
SigProfilerExtractor is a Python-based tool. It allows de novo extraction of mutational signatures from data generated in a matrix format, identification of the number of operative mutational signatures, and their activities in each sample.
Indexing of transcriptome for gene expression quantification using SimpleAF
012010101
0 0 0 0 
SimpleAF is a tool for quantification of gene expression from RNA-seq data
simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
0120120123resolution01
0 0 0 
SimpleAF is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
Calculate pairwise distances and basic clustering from SKA sketches
012
0 0 0 0 0 
SKA (Split Kmer Analysis)
Create genome sketch using split k-mers
012
0 0 
SKA (Split Kmer Analysis)
Simple ANI calculation between reference and query genomes.
0101
0 0 
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
Memory-efficient ANI database queries with skani.
0101
0 0 
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
Storing skani sketches/indices on disk.
01
0 0 0 0 
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
All-to-all ANI computation.
01
0 0 
skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.
Complete SLAMseq analysis pipeline including read mapping, filtering, SNP calling, and quantification
01010101
0 0 0 0 0 0 0 0 
Slamdunk is a software tool for SLAMseq data analysis that performs mapping, filtering, SNP calling, and quantification of metabolic RNA labeling experiments.
Slamdunk read mapping using NextGenMap’s SLAMSeq alignment settings.
0101
0 0 
Slamdunk is a software tool for SLAMseq data analysis that performs mapping, filtering, SNP calling, and quantification of metabolic RNA labeling experiments.
tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.
012
0 0 0 
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.
01230101
0 0 
structural variant calling and genotyping with existing tools, but, smoothly
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."
0101
0 0 0 
Create a SNAP index for reference genome
01234
0 0 
Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
structural-variant calling with sniffles
0120101vcf_outputsnf_output
0 0 0 0 
Core-SNP alignment from Snippy outputs
012reference
0 0 0 0 0 0 
Rapid bacterial SNP calling and core genome alignments
Rapid haploid variant calling
01reference
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Rapid bacterial SNP calling and core genome alignments
Genetic variant annotation and functional effect prediction toolbox
01
0 0 
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Genetic variant annotation and functional effect prediction toolbox
01db01
0 0 0 0 0 
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Annotate a VCF file with another VCF file
012012
0 0 
SnpSift is a toolbox that allows you to filter and manipulate annotated files
The dbNSFP is an integrated database of functional predictions from multiple algorithms
012012
0 0 
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Splits/Joins VCF(s) file into chromosomes
01
0 0 
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
01012
0 0 0 
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
012010101
0 0 
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
012sample_groups
0 0 0 0 
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Local sequence alignment tool for filtering, mapping and clustering.
010101
0 0 0 0 
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
souporcell is a method for clustering mixed-genotype scRNAseq experiments by individual.
01201clusters
0 0 0 
Classifies and predicts the origin of metagenomic samples
01sourceslabelstaxa_sqlitetaxa_sqlite_traverse_pkl
0 0 
Compare many FracMinHash signatures generated by sourmash sketch.
01file_listsave_numpy_matrixsave_csv
0 0 0 0 
Compute and compare FracMinHash signatures for DNA and protein data sets.
Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.
01databasesave_unassignedsave_matches_sigsave_prefetchsave_prefetch_csv
0 0 0 0 0 0 
Compute and compare FracMinHash signatures for DNA data sets.
Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.
01ksize
0 0 
Compute and compare FracMinHash signatures for DNA data sets.
Create a signature (a group of FracMinHash sketches) of a sequence using sourmash
01
0 0 
Compute and compare FracMinHash signatures for DNA and protein data sets.
Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.
01taxonomy
0 0 
Compute and compare FracMinHash signatures for DNA data sets.
Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data
0123456789referenceprobeset
0 0 
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.
gtf
0 0 
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.
fastagtfreference_name
0 0 
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Assembles a small genome (bacterial, fungal, viral)
0123ymlhmm
0 0 0 0 0 0 0 0 
mutational signature deconvolution of cancer cells
01genome
0 0 0 0 0 0 0 
SparseSignatures is an R-based computational framework which performs de novo extraction, inference, interpretation, or deconvolution of mutational counts of a large number of patients.
Reference Genome Sequence (hs37d5), based on NCBI GRCh37
Full genomic sequences for Homo sapiens (UCSC genome hg38)
Spotiflow, accurate and efficient spot detection with stereographic flow.
01
0 0 
Fast, efficient, lossless compression of FASTQ files.
012
0 0 
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Fast, efficient, lossless decompression of FASTQ files.
01write_one_fastq_gz
0 0 
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
01ncbi_settingscertificate
0 0 
SRA Toolkit and SDK from NCBI
Download sequencing data from the NCBI Sequence Read Archive (SRA).
01ncbi_settingscertificate
0 0 
SRA Toolkit and SDK from NCBI
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
versions ncbi_settings 
SRA Toolkit and SDK from NCBI
Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
012db_type
0 0 0 0 0 0 
Short Read Sequence Typing for Bacterial Pathogens
Advanced sequence file format conversions
01fastafaigzi
0 0 0 
Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Predicts Staphylococcus aureus SCCmec type based on primers.
01
0 0 
Align reads to a reference genome using STAR
010101star_ignore_sjdbgtfseq_platformseq_center
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create index for STAR
0101
0 0 
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Get the minimal allowed index version from STAR
NO input
0 0 
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
01
0 0 0 0 0 0 0 0 0 
Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.
Framework that scores enhancer–gene interactions using the Activity-By-Contact model and derives transcription factor affinities on gene level
01230101010101
0 0 
Download STAR-fusion genome resource required to run STAR-Fusion caller
0101fusion_annot_libdfam_species
0 0 
Fusion calling algorithm for RNAseq data
Fast and Accurate Fusion Transcript Detection from RNA-Seq
012reference
0 0 0 0 
Fast and Accurate Fusion Transcript Detection from RNA-Seq
Serotype STEC samples from paired-end reads or assemblies
01
0 0 
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
012345678910012seed
0 0 0 0 0 0 
Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.
0101
0 0 0 
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
01234fastafai
0 0 0 0 0 
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
012345678fastafai
0 0 0 0 0 
Strelka calls somatic and germline small variants from mapped sequencing reads
Merges the annotation gtf file and the stringtie output gtf files
stringtie_gtfannotation_gtf
0 0 
Transcript assembly and quantification for RNA-Seq
Transcript assembly and quantification for RNA-Se
01annotation_gtf
0 0 0 0 0 
Transcript assembly and quantification for RNA-Seq
Align short reads using dynamic seed size with strobemers
010101sort_bam
0 0 0 0 0 0 0 0 
a structural variant classifier for exonic deletions and duplications
01230101
0 0 0 
StrVCTVRE, a structural variant classifier for exonic deletions and duplications
Count reads that map to genomic features
012
0 0 0 
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.
SummarizedExperiment container
010101
0 0 0 
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Converts a bedpe file to a VCF file (beta version)
01
0 0 
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
012minsvmaxsvminallelefreqminnumreads
0 0 
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
01max_distance_breakpointsmin_supporting_callersaccount_for_typeaccount_for_sv_strandsestimate_distanced_by_sv_sizemin_sv_size
0 0 
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
010101snp_mutation_frequencysim_reads
0 0 0 0 0 0 
Toolset for SV simulation, comparison and filtering
Report multiple stats over a VCF file
01minsvmaxsvminnumreads
0 0 
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
01234010101010101
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
0123450101
0 0 0 0 0 0 
SVanalyzer: tools for the analysis of structural variation in genomes
Build a structural variant database
01input_type
0 0 
structural variant database software
The merge module merges structural variants within one or more vcf files.
01input_prioritysort_inputs
0 0 0 0 
structural variant database software
Query a structural variant database, using a vcf file as query
01in_occsin_frqsout_occsout_frqsvcf_dbsbedpe_dbs
0 0 
structural variant database software
Performs tests on BAF files
01234
0 0 
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
01
0 0 
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
012fasta_fai
0 0 0 
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
0101
0 0 
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
012
0 0 
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert a VCF file to a BEDPE file.
01
0 0 
Tools for processing and analyzing structural variants
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
01230101
0 0 0 0 
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
012301
0 0 0 
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
0123
0 0 0 
Sylph profile command for taxonoming profiling
01database
0 0 
Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.
Sketching/indexing sequencing reads
01reference
0 0 
Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.
Merge multiple taxonomic profiles from sylphtaxt/taxprof into a tsv table
01data_type
0 0 
Integrating taxonomic information into the sylph metagenome profiler.
Incorporates taxonomy into sylph metagenomic classifier
01taxonomy
0 0 
Integrating taxonomic information into the sylph metagenome profiler.
Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.
010101file_type
0 0 0 
Compresses/decompresses files
01
0 0 0 
Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.
create tabix index from a sorted bgzip tab-delimited genome file
01
0 0 0 
Generic indexer for TAB-delimited genome position files.
Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA
01
0 0 
Convert taxonids to taxon lineages
012taxdb
0 0 
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Generate taxonomic subtrees based on taxonids
012taxdb
0 0 
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Convert taxon names to TaxIds
012taxdb
0 0 
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Standardise and merge two or more taxonomic profiles into a single table
01profilerformattaxonomysamplesheet
0 0 
TAXonomic Profile Aggregation and STAndardisation
Standardise the output of a wide range of taxonomic profilers
01profilerformattaxonomy
0 0 
TAXonomic Profile Aggregation and STAndardisation
A tool to detect resistance and lineages of M. tuberculosis genomes
01
0 0 0 0 0 0 
Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data
Aligns sequences using T_COFFEE
0101012compress
0 0 0 
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Compares 2 alternative MSAs to evaluate them.
012
0 0 
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Computes a consensus alignment using T_COFFEE
0101compress
0 0 0 
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats the header of PDB files with t-coffee
01
0 0 
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Computes the irmsd score for a given alignment and the structures.
01012
0 0 
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package
0101012compress
0 0 
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Reformats files with t-coffee
01
0 0 
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
0101
0 0 0 
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Parses a Thermo RAW file containing mass spectra to an open file format
01
0 0 
Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle
01
0 0 0 0 
Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.
Computes the coverage of different regions from the bam file.
0101
0 0 0 
TIDDIT - structural variant calling.
Identify chromosomal rearrangements.
0120101
0 0 0 
Search for structural variants.
tidk explore attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
01
0 0 0 
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Searches a genome for a telomere string such as TTAGGG
01string
0 0 0 
tidk is a toolkit to identify and visualise telomeric repeats in genomes
TINC is a package to determine the contamination of tumour DNA in a matched normal sample. The approach uses evolutionary theory applied to read counts data from whole-genome sequencing assays.
012
0 0 0 0 0 
Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA
01010101vcf_output
0 0 0 0 0 
This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.
A post sequencing QC tool for Oxford Nanopore sequencers
01
0 0 0 0 0 
TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.
01
0 0 0 0 0 0 
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf
01fold
0 0 0 0 0 
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Tandem repeat genotyping from PacBio HiFi data
0123010101
0 0 0 
Tandem repeat genotyping and visualization from PacBio HiFi data
Merge TRGT VCFs from multiple samples
0120101
0 0 
Tandem repeat genotyping and visualization from PacBio HiFi data
Trim FastQ files using Trim Galore!
01
0 0 0 0 0 0 
Performs quality and adapter trimming on paired end and single end reads
01
0 0 0 0 0 0 
Assembles a de novo transcriptome from RNAseq reads
01
0 0 0 
Detection of tRNA sequences using covariance models
01
0 0 0 0 0 0 0 
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
0123450101
0 0 0 0 0 0 0 0 0 0 
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
01
0 0 
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
01
0 0 
Structural variant comparison tool for VCFs
Subsample a long-read sequencing fastq file for multiple assemblies
01
0 0 
Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Transcript Selector for BRAKER TSEBRA combines gene predictions by selecting transcripts based on their extrisic evidence support
01hints_fileskeep_gtfsconfig
0 0 0 
Import transcript-level abundances and estimated counts for gene-level analysis packages
0101quant_type
0 0 0 0 0 0 0 0 0 
Remove lines from bed file that refer to off-chromosome locations.
01sizes
0 0 
Remove lines from bed file that refer to off-chromosome locations.
Convert a bedGraph file to bigWig format.
01sizes
0 0 
Convert a bedGraph file to bigWig format.
Convert file from bed to bigBed format
01sizesautosql
0 0 
Convert file from bed to bigBed format
compute average score of bigwig over bed file
01bigwig
0 0 
Compute average score of big wig over each bed, which may have introns.
compute average score of bigwig over bed file
01
0 0 0 
Convert GTF files to GenePred format
convert between genome builds
01chain
0 0 0 
Move annotations from one assembly to another
Convert ascii format wig file to binary big wig format
01sizes
0 0 
Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format
Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.
01barcode_file
0 0 0 0 
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
012mode
0 0 0 0 
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
012get_output_stats
0 0 0 0 0 0 
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
01
0 0 0 
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
012create_bamget_group_info
0 0 0 0 
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Make the output from umi_tools dedup or group compatible with RSEM
012
0 0 0 
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data
0101technology
0 0 
Unzip ZIP archive files
01
0 0 
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Simple software to call UPD regions from germline exome/wgs trios.
01
0 0 
Variational autoencoder for metagenomic binning
01234
0 0 0 0 0 0 0 0 0 0 
Variational autoencoder for metagenomic binning.
Runs a differential expression analysis with dream() from variancePartition R package
012345012
0 0 0 
Differential expression for repeated measures
Filtering, downsampling and profiling alignments in BAM/CRAM formats
01
0 0 
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
01scenarioscenario_aliases
0 0 
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
0120101
0 0 
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
012340101
0 0 
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
VarScan2 is a tool for variant detection in massively parallel sequencing data. It can detect SNPs, indels, and copy number variations in both somatic and germline samples. It is particularly useful for analyzing tumor/normal sample pairs. Subtool fpfilter is used to filter a set of SNPs/indels based on coverage, reads, p-value, etc.
012
0 0 0 
variant detection in massively parallel sequencing data
VarScan2 is a tool for variant detection in massively parallel sequencing data. It can detect SNPs, indels, and copy number variations in both somatic and germline samples. It is particularly useful for analyzing tumor/normal sample pairs. This subtool divides variants based on status (germline, somatic, loss of heterozygosity) and confidence level (high-confidence or not) and outputs them in separate VCF files.
01
0 0 0 0 0 0 0 
variant detection in massively parallel sequencing data
VarScan2 is a tool for variant detection in massively parallel sequencing data. It can detect SNPs, indels, and copy number variations in both somatic and germline samples. It is particularly useful for analyzing tumor/normal sample pairs.
012
0 0 0 
variant detection in massively parallel sequencing data
Convert VCF with structural variations to CytoSure format
01010101blacklist_bed
0 0 
Convert VCF data to the VCF Zarr specification reliably, in parallel or distributed over a cluster
01
0 0 
Convert bioinformatics data to Zarr
If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields
012
0 0 
Command-line tools for manipulating VCF files
Command line tools for parsing and manipulating VCF files.
012
0 0 
Command line tools for parsing and manipulating VCF files.
Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.
012
0 0 
Command-line tools for manipulating VCF files
List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.
012
0 0 
Command-line tools for manipulating VCF files
The align command performs pairwise sequence alignments of viral genomes and provides similarity measures like ANI and coverage (alignment fraction)
0101save_alignment
0 0 0 0 
Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.
Vclust cluster performs threshold-based clustering by assigning a genome sequence to a cluster if its similarity (e.g., ANI) to the cluster meets or exceeds a user-defined threshold.
0101metrictaniganiani
0 0 0 
"Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes."
The prefilter command creates a pre-alignment filter that reduces the number of genome pairs to be aligned by filtering out dissimilar sequences before the alignment step.
01
0 0 
Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.
Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use
Path(resolve_path=True) and breaks the nextflow logic of symbolic links.
If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME]
it will skip the samtools sort step.
Cellsorted bam file should be cell sorted with:
    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bamSee module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:
    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }Optional mask must be passed with ext.args and option --mask
This is why I need to stage in the work dir 2 bam files (cellsorted and original).
See also velocyto tutorial
0123gtf
0 0 
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
012refvcf
0 0 0 0 0 0 0 0 
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
012012refvcfreferences
0 0 0 0 0 0 0 
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
01230101
0 0 
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
01pbgbwt
0 0 
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
01
0 0 0 
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Multisample subclonal deconvolution of cancer genome sequencing data.
012
0 0 0 0 0 0 0 0 
calculate secondary structures of two RNAs with dimerization
01
0 0 0 
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
01
0 0 0 
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
fasta
0 0 
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Use vireo to perform donor deconvolution for multiplexed scRNA-seq data
01234
0 0 0 0 0 
The module compiles segmentation tiles using Vizgen's post-processing tool.
012algorithm_jsonsegmentation_tiles
0 0 0 
Vizgen's post-processing tool
The module prepares the specification JSON file for Vizgen's post-processing tool cell segmentation workflow.
012algorithm_jsonimages_regex
0 0 
Vizgen's post-processing tool
The module runs the segmentation algorithm on a specific tile using Vizgen's post-processing tool.
0123algorithm_jsoncustom_weights
0 0 
Vizgen's post-processing tool
Extracting sequences that were unbinnned by vRhyme into a FASTA file
0101
0 0 
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
01
0 0 
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
01
0 0 0 0 0 0 0 0 0 0 0 0 0 
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).
01
0 0 0 0 
A versatile open source tool for metagenomics (USEARCH alternative)
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.
01
0 0 0 
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
01db
0 0 
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
01sort_arg
0 0 
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
01dbidcutoffoutoptionuser_columns
0 0 0 0 0 0 0 0 0 0 
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
012
0 0 
A tool set for short variant discovery in genetic sequence data
Decomposes biallelic block substitutions into its constituent SNPs.
0123
0 0 
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
01230101
0 0 0 
A tool set for short variant discovery in genetic sequence data
The VueGen nf-core module is designed to automate report generation from outputs produced by other modules, subworkflows, or pipelines. The module integrates the VueGen Python library and customizes it for compatibility with the Nextflow environment. VueGen automates the creation of reports from bioinformatics outputs, supporting formats like PDF, HTML, DOCX, ODT, PPTX, Reveal.js, Jupyter notebooks, and Streamlit web applications.
input_typeinput_pathreport_type
0 0 
a pangenome-scale aligner
01234query_selffasta_query_list
0 0 
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
012fastafasta_fai
0 0 0 0 
Masks out highly repetitive DNA sequences with low complexity in a genome
01
0 0 
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to generate frequency counts of repetitive units.
01
0 0 
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to take a counts file and creates a file of genomic co-ordinates to be masked.
0101
0 0 
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter
01
0 0 
A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter.
A tool of the wipertools suite that splits FASTQ files into chunks
01num_splits
0 0 
A tool of the wipertools suite that splits FASTQ files into chunks.
A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files
01
0 0 0 
A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files.
A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper
01
0 0 
A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper.
Convert and filter aligned reads to .npz
0120101
0 0 
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase
0101
0 0 
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Create a new reference using healthy reference samples
01
0 0 
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Find copy number aberrations
010101
0 0 0 0 0 0 0 
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
A large variant benchmarking tool analogous to hap.py for small variants.
01234
0 0 0 0 
The xeniumranger import-segmentation module allows you to specify 2D nuclei and/or cell segmentation results for assigning transcripts to cells and recalculate all Xenium Onboard Analysis (XOA) outputs that depend on segmentation. Segmentation results can be generated by community-developed tools or prior Xenium segmentation result.
01expansion_distancecoordinate_transformnucleicellstranscript_assignmentviz_polygons
0 0 
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.
01gene_panel
0 0 
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.
01region_namecassette_name
0 0 
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.
01expansion_distancedapi_filterboundary_staininterior_stain
0 0 
Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.
Compresses files with xz.
01
0 0 
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Decompresses files with xz.
01
0 0 
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Align reads to a reference genome using YARA
0101
0 0 0 
Yara is an exact tool for aligning DNA sequencing reads to reference genomes.
Compress file lists to produce ZIP archive files
01
0 0 
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Click here to trigger an update.