Modules

aardvark_compare

variant calling benchmarking vcf comparison consensus

A tool to evaluate variant calling performance by comparing a query VCF against a truth VCF.

Input

012345010101

Output

0 0 0 0 0 0 0

Tools

aardvark:

A benchmarking tool that constructs haplotype sequences and it enables basepair-level comparisons that work across all variant types. The program evaluates genotypes using traditional, strict exact-match scoring, while also allowing for implicit partial credit when variant calls are an inexact but close match.

aardvark_merge

variant calling vcf comparison consensus merging

A tool to evaluate and merge multiple variant calls into a consensus VCF.

Input

0120101

Output

0 0 0 0 0 0 0 0 0

Tools

aardvark:

The aardvark merge command provides a method to merge variant calls. This functionality is helpful when multiple variant callers or technologies are used for the same sample. Merging variant sets can be complex and Aardvark simplifies this process by modeling the variants as haplotypes. This allows for different merge strategies for resolving conflicting variant sets.

abacas

genome assembly contiguate

Contiguate draft genome assembly

Input

0101

trimming adapters merging fastq

Trim sequencing adapters and collapse overlapping reads

Input

01adapterlist

Output

0 0 0 0 0 0 0 0

adapterremovalfixprefix

adapterremoval fastq dedup

Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.

Input

01

fasta virulence Staphylococcus aureus

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants

Input

01

Output

0 0 0

ale

reference-independent assembly evaluation

ALE: assembly likelihood estimator.

Input

012

Output

0 0

alignoth

genomics alignment visualization pileup plotting

Creating alignment plots from bam files

Input

01234567

Output

0 0 0 0

allelecounter

allele count coverage

Generates a count of coverage of alleles

Input

012locifasta

ampir amp antimicrobial peptide prediction

A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.

Input

01modelmin_lengthmin_probability

Output

0 0 0

amplify_predict

antimicrobial peptides AMPs prediction model

AMPlify is an attentive deep learning model for antimicrobial peptide prediction.

Input

01model_dir

Output

0 0

Tools

amplify:

Attentive deep learning model for antimicrobial peptide prediction

amps

malt MaltExtract HOPS amps alignment metagenomics ancient DNA aDNA palaeogenomics archaeogenomics microbiome authentication damage edit distance post Post-processing visualisation

Post-processing script of the MaltExtract component of the HOPS package

Input

maltextract_resultstaxon_listfilter

Output

0 0 0 0 0

amrfinderplus_run

bacteria fasta antibiotic resistance

Identify antimicrobial resistance in gene or protein sequences

Input

01db

Output

0 0 0 0 0 0

Tools

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

immunology BCR TCR translation amino acid nucleotide immunoinformatics

A module to translate BCR and TCR nucleotide sequences into amino acid sequences using amulety and igblast.

Input

01reference_igblast

Output

0 0 0 0 0 0

Tools

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

genomics SINE annotation plant

Accelerating de novo SINE annotation in plant and animal genomes

Input

01mode

Output

0 0 0

annotsv_annotsv

annotation structural variants vcf bed tsv

Annotation and Ranking of Structural Variation

Input

012301010101

Output

0 0 0 0

Tools

annotsv:

Annotation and Ranking of Structural Variation

secondary metabolites BGC biosynthetic gene cluster genome mining NRPS RiPP antibiotics prokaryotes bacteria eukaryotes fungi antismash database

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

Input

Output

0 0

Tools

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antismash_antismashlite

secondary metabolites BGC biosynthetic gene cluster genome mining NRPS RiPP antibiotics prokaryotes bacteria eukaryotes fungi antismash

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

Input

01databasesantismash_dir

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

antismashlite:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antismash_antismashlitedownloaddatabases

secondary metabolites BGC biosynthetic gene cluster genome mining NRPS RiPP antibiotics prokaryotes bacteria eukaryotes fungi antismash database

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

Input

database_cssdatabase_detectiondatabase_modules

Output

0 0 0

Tools

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

any2fasta

fasta conversion sequences format genomics

Convert various sequence formats (GenBank, GFF, FASTQ, FASTA, CLUSTAL, Stockholm, GFA) to FASTA format. Input files may be gzip, bzip2, zip, or zstd compressed.

Input

01

Output

0 0

apbs

electrostatics poisson-boltzmann solvation structural-biology biophysics computational-chemistry

Compute biomolecular electrostatics by solving the Poisson-Boltzmann equation using APBS (Adaptive Poisson-Boltzmann Solver). Produces electrostatic potential maps and solvation energy values for large biomolecular assemblages.

Input

012

Output

0 0 0 0

arcashla_extract

HLA genotype RNA-seq

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

Input

01

Output

0 0 0 0 0 0

Tools

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

argnorm

amr antimicrobial resistance arg antimicrobial resistance genes genomics metagenomics normalization drug categorization

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

Input

01tooldb

bam copy number cram

copy number profiles of tumour cells.

Input

01234allele_filesloci_filesbed_filefastagc_filert_file

Output

0 0 0 0 0 0 0 0 0 0

ashlar

image_processing alignment registration

Alignment by Simultaneous Harmonization of Layer/Adjacency Registration

Input

01opt_dfpopt_ffp

background cycif autofluorescence image_analysis mcmicro highly_multiplexed_imaging

Pixel-by-pixel channel subtraction tool for multiplexed immunofluorescence data.

Input

0101

Output

0 0 0

bacphlip

phage lifestyle temperate virulent bacphlip hmmsearch

A bacteriophage lifestyle prediction tool

Input

01

Output

0 0 0

bakta_bakta

annotation fasta bacteria

Annotation of bacterial genomes (isolates, MAGs) and plasmids

Input

01dbproteinsprodigal_tfregionshmms

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.

bakta_baktadbdownload

bakta annotation fasta bacteria database download

Downloads BAKTA database from Zenodo

Input

Output

0 0

Tools

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

bam2fastx_bam2fastq

bam2fastx bam2fastq pacbio

Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data

Input

rrna sequences removal

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

Input

012

Output

0 0

bases2fastq

demultiplex element fastq

Demultiplex Element Biosciences bases files

Input

012

info bcftools tags vcf

Compute and fill various INFO tags

Input

012regionstargetssamples

Output

0 0 0

Tools

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin fill-tags:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The fill-tags plugin compute and fill various INFO tags

bcftools_pluginfixploidy

fixploidy bcftools ploidy vcf

The fixploidy plugin fixes ploidy in genotype fields according to specified ploidy rules, sample sex assignments, or a forced ploidy value. For example, haploid genotypes can be converted to diploid genotypes.

Input

012ploidysexregionstargets

Output

0 0 0

Tools

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin fixploidy:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The fixploidy plugin fixes ploidy in genotype fields according to specified ploidy rules, sample sex assignments, or a forced ploidy value. For example, haploid genotypes can be converted to diploid genotypes.

bcftools_pluginimputeinfo

impute-info bcftools imputation metrics tags vcf

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

Input

012regionstargets

Output

0 0 0

Tools

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

bcftools_pluginscatter

scatter vcf bcf genomics

Split VCF by chunks or regions, creating multiple VCFs.

Input

012sites_per_chunkscatterscatter_fileregionstargets

Output

0 0 0

Tools

pluginscatter:

Split VCF by chunks or regions, creating multiple VCFs.

bcftools_pluginsetgt

setgt bcftools genotype vcf

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

Input

012target_gtnew_gtregionstargets

Output

0 0 0

Tools

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

bcftools_pluginsplit

split vcf genomics

Split VCF by sample, creating single- or multi-sample VCFs.

Input

0123456

Output

0 0 0

Tools

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

bcftools_plugintag2tag

tag2tag bcftools VCF

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

Input

012regionstargets

Output

0 0 0

Tools

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

bcftools_pluginvcf2table

bcftools vcf table variant calling

Converts VCF/BCF files into a tab-delimited table using the bcftools +vcf2table plugin. Each variant is output as one row, with INFO and FORMAT fields as columns.

Input

0120123

Output

0 0

Tools

bcftools:

BCFtools is a set of utilities for variant calling and manipulating VCF/BCF files. The +vcf2table plugin converts VCF records into a tab-delimited table format.

bcftools_query

query variant calling bcftools VCF

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Input

012regionstargetssamples

Output

0 0

Tools

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

bcftools_reheader

reheader vcf update header

Reheader a VCF file

Input

012301

Output

0 0 0

Tools

reheader:

Modify header of VCF/BCF files, change sample names.

bcftools_roh

roh biallelic homozygosity autozygosity

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

Input

01201genetic_mapregions_filesamples_filetargets_file

Output

0 0

Tools

roh:

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

bcftools_rohviz

visualisation RoH VCF

Visualise the output of bcftools roh

Input

0101regions_listsamples_file

Output

0 0 0

Tools

roh-viz:

"Visualise the output of bcftools roh. It creates an HTML/JavaScript document which can be interactively viewed in a web browser."

bcftools_sort

sorting VCF variant calling

Sorts VCF files

Input

01

Output

0 0 0

Tools

sort:

Sort VCF files by coordinates.

bcftools_split

vcf split genomics

Split a vcf file into files per chromosome

Input

012

Output

0 0

Tools

bcftools:

Sort VCF files by coordinates.

bcftools_stats

variant calling stats VCF

Generates stats from VCF files

Input

0120101010101

Output

0 0

Tools

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

bcftools_view

variant calling view bcftools VCF

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Input

012regionstargetssamples

Output

0 0 0

Tools

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

bcl2fastq

demultiplex illumina fastq

Demultiplex Illumina BCL files

Input

012

Output

0 0 0 0 0 0 0 0 0

bclconvert

demultiplex illumina fastq

Demultiplex Illumina BCL files

Input

012

Output

0 0 0 0 0 0 0 0

beagle5_beagle

phasing imputation genotype

Beagle v5.5 is a software package for phasing genotypes and for imputing ungenotyped markers.

Input

012345678

Output

0 0 0

Tools

demultiplexing hashing-based deconvolution single-cell

Generating cell hashing calls from a matrix of count data.

Input

0123

Output

0 0 0 0

bftools_showinf

metadata ome-tif ome-tiff imaging bioinformatics tools

Extract OME xml data from OME-tif

Input

01

Output

0 0

Tools

bioawk fastq fasta sam file manipulation awk

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats.

Input

01program_filedisable_redirect_outputoutput_file_extension

Output

0 0

biobambam_bammarkduplicates2

markduplicates bam cram

Locate and tag duplicate reads in a BAM file

Input

01

Output

0 0 0

Tools

biobambam:

biobambam is a set of tools for early stage alignment file processing.

biobambam_bammerge

merge bam sorted

Merge a list of sorted bam files

Input

01

Output

0 0 0 0

Tools

biobambam:

biobambam is a set of tools for early stage alignment file processing.

biobambam_bamsormadup

markduplicates sort bam cram

Parallel sorting and duplicate marking

Input

01012

Output

0 0 0 0 0

Tools

biobambam:

biobambam is a set of tools for early stage alignment file processing.

bioformats2raw

zarr ome-ngff imaging

Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.

Input

01

biscuit DNA methylation WGBS scWGBS bisulfite sequencing aligner bam

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

Input

010101

Output

0 0 0 0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

biscuit_bsconv

biscuit DNA methylation WGBS scWGBS bisulfite sequencing aligner bam filter

Summarize and/or filter reads based on bisulfite conversion rate

Input

01010101

Output

0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_epiread

biscuit DNA methylation WGBS scWGBS bisulfite sequencing aligner bam

Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.

Input

0101010101

Output

0 0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_index

biscuit DNA methylation WGBS scWGBS bisulfite sequencing index reference fasta

Indexes a reference genome for use with Biscuit

Input

01

Output

0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_mergecg

biscuit DNA methylation WGBS scWGBS bisulfite sequencing aligner bed

Merges methylation information for opposite-strand C's in a CpG context

Input

010101

Output

0 0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_pileup

bisulfite DNA methylation pileup variant calling WGBS scWGBS bam vcf

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

Input

012340101

Output

0 0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_qc

biscuit DNA methylation WGBS scWGBS bisulfite sequencing index BAM quality control

Perform basic quality control on a BAM file generated with Biscuit

Input

010101

Output

0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

biscuit_vcf2bed

biscuit DNA methylation WGBS scWGBS bisulfite sequencing aligner vcf

Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.

Input

01

Output

0 0 0

Tools

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

bismark_align

bismark 3-letter genome map methylation 5mC methylseq bisulphite bisulfite bam

Performs alignment of BS-Seq reads using bismark

Input

010101

Output

0 0 0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_coverage2cytosine

bismark consensus map methylation 5mC methylseq bisulphite bisulfite bam bedGraph

Relates methylation calls back to genomic cytosine contexts.

Input

010101

Output

0 0 0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_deduplicate

bismark 3-letter genome map methylation 5mC methylseq bisulphite bisulfite bam

Removes alignments to the same position in the genome from the Bismark mapping output.

Input

01

Output

0 0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_genomepreparation

bismark 3-letter genome index methylation 5mC methylseq bisulphite bisulfite fasta

Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.

Input

01

Output

0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_methylationextractor

bismark consensus map methylation 5mC methylseq bisulphite bisulfite bam bedGraph

Extracts methylation information for individual cytosines from alignments.

Input

0101

Output

0 0 0 0 0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_report

bismark qc methylation 5mC methylseq bisulphite bisulfite report

Collects bismark alignment reports

Input

01234

Output

0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

bismark_summary

bismark qc methylation 5mC methylseq bisulphite bisulfite report summary

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

Input

bamalign_reportdedup_reportsplitting_reportmbias

Output

0 0

Tools

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

blast_blastdbcmd

fasta blast database retrieval identifier

Retrieve entries from a BLAST database

Input

01201

Output

0 0 0

Tools

blast:

BLAST finds regions of similarity between biological sequences.

blast_blastn

fasta blast blastn DNA sequence taxids

Queries a BLAST DNA database

Input

0101taxidlisttaxidsnegative_tax

Output

0 0

Tools

blast:

BLAST finds regions of similarity between biological sequences.

blast_blastp

fasta blast blastp protein

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

Input

0101out_ext

Output

0 0 0 0

Tools

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

blast_makeblastdb

fasta blast database

Builds a BLAST database

Input

01taxid_map

Output

0 0

Tools

blast:

BLAST finds regions of similarity between biological sequences.

blast_tblastn

fasta blast tblastn DNA sequence

Queries a BLAST DNA database

Input

0101

Output

0 0

Tools

blast:

Protein to Translated Nucleotide BLAST.

blast_updateblastdb

fasta blast download database

Downloads a BLAST database from NCBI

Input

01

Output

0 0

Tools

genome annotation braker gff gtf

Gene prediction in novel genomes using RNA-seq and protein homology information

Input

01bamrnaseq_sets_dirsrnaseq_sets_idsproteinshintsfile

mem bwa alignment map fastq bam sam

Align fastq reads to a fasta reference using bwa-mem3

Input

010101sort_bam

Output

0 0 0 0

Tools

bwamem3:

Short-read aligner derived from bwa-mem2 with correctness fixes, performance improvements, and methylation-aware alignment.

samtools:

Tools for dealing with SAM, BAM and CRAM files

bwameme_index

index fasta genome reference

Create BWA-MEME index for reference genome

Input

01

Output

0 0

Tools

bwameme:

Faster BWA-MEM2 using learned-index

bwameme_mem

mem bwa bwamem2 bwameme alignment map fastq bam sam cram

Performs fastq alignment to a fasta reference using BWA-MEME

Input

010101sort_bam

Output

0 0 0 0 0

Tools

bwameme:

Faster BWA-MEM2 using learned-index

bwameth_align

bwameth alignment 3-letter genome map methylation 5mC methylseq bisulphite bisulfite fastq bam

Performs alignment of BS-Seq reads using bwameth

Input

010101

Output

0 0 0

Tools

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

bwameth_index

bwameth 3-letter genome index methylseq bisulphite bisulfite fasta

Performs indexing of c2t converted reference genome

Input

01use_mem2

Output

0 0

Tools

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

caalm_caalm

cazyme annotation protein language model deep learning classification

Annotates carbohydrate-active enzyme (CAZyme) families from protein sequences using protein language model (ESM) embeddings and FAISS-based nearest-neighbour search. Performs three-level hierarchical classification: binary CAZyme detection (Level 0), CAZy class assignment (Level 1), and CAZy family assignment (Level 2).

Input

01012

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

caalm:

CAALM (Carbohydrate Activity Annotation with protein Language Models) predicts CAZyme class and family membership from protein FASTA sequences using ESM-based embeddings and FAISS nearest-neighbour retrieval.

caalm_downloadmodels

cazyme model download huggingface deep learning

Downloads the CAALM model weights from HuggingFace Hub (lczong/CAALM) into a local models/ directory. The downloaded directory is used as input to the CAALM_CAALM annotation module for CAZyme prediction.

Input

Output

0 0 0

Tools

huggingface_hub:

huggingface_hub is a Python library for interacting with the Hugging Face Hub, providing utilities for downloading and uploading models, datasets, and spaces.

cadd

cadd annotate variants

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

Input

010101

Output

0 0

caddsv_get

caddsv structural variants annotations segmentnt resource download

Download CADD-SV annotation resources or SegmentNT model files.

Input

flag

Output

0 0

Tools

caddsv:

CADD-SV is a command-line tool for scoring structural variants (SVs)

cafe

gene phylogeny genomics

Analysis of gene family evolution

Input

01tree

Output

0 0 0 0 0 0

calder2

calder2 genome topology compartments domains hi-c

Hierarchical Hi-C compartment computation

Input

01resolution

highly_multiplexed_imaging cell_type_identification cell_phenotyping image_analysis mcmicro machine_learning

Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth

Input

01signaturehigh_thresholdslow_thresholds

Output

0 0 0

cellbender_merge

single-cell scRNA-seq ambient RNA removal

Module to use CellBender to remove ambient RNA from single-cell RNA-seq data

Input

0123output_layer_name

Output

0 0

Tools

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

cellbender_removebackground

single-cell scRNA-seq ambient RNA removal

Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data

Input

01

Output

0 0 0 0 0 0 0 0 0 0

Tools

QC Illumina genomics

A simple program to parse Illumina NGS data and check it for quality criteria

Input

01checkqc_config

Output

0 0

filter trimming fastq nanopore qc

Filter and trim long read data.

Input

01fasta

Output

0 0

Tools

zcat:

zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

chromap_chromap

chromap alignment map fastq bam sam hi-c atac-seq chip-seq trimming duplicate removal

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

Input

010101barcodeswhitelistchr_orderpairs_chr_order

Output

0 0 0 0 0 0

Tools

chromap:

Fast alignment and preprocessing of chromatin profiles

chromap_index

index fasta genome reference

Indexes a fasta reference genome ready for chromatin profiling.

Input

01

Output

0 0

Tools

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

clustalo_guidetree

guide tree msa newick

Renders a guidetree in clustalo

Input

01

Output

0 0

Tools

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

clusty

cluster network contig scaffold alignment protein

Clusty is a tool for large-scale clustering using sparse distance matrices, suitable for datasets with millions of objects.

Input

0101

Output

0 0

cmaple

phylogeny phylogenetic tree maximum likelihood dna amino acid alignment tree reconstruction cmaple iqtree

Efficient phylogenetic tree reconstruction for sequences using the CMAPLE algorithm

Input

012

Output

0 0 0

cmseq_polymut

polymut polymorphic mags assembly polymorphic sites estimation protein coding genes cmseq bam coverage

Calculates polymorphic site rates over protein coding genes

Input

01234

Output

0 0

Tools

spectrum identification search engine proteomics fasta mzml

Comet is an open source tandem mass spectrometry (MS/MS) sequence database search tool

Input

0123

Output

0 0 0 0 0 0 0

concoct_concoct

contigs fragment mags binning concoct kmer nucleotide composition metagenomics bins

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

Input

012

Output

0 0 0 0 0 0 0

Tools

concoct:

Clustering cONtigs with COverage and ComposiTion

concoct_concoctcoveragetable

contigs fragment mags binning bed bam subcontigs coverage

Generate the input coverage table for CONCOCT using a BEDFile

Input

0123

Output

0 0

Tools

concoct:

Clustering cONtigs with COverage and ComposiTion

concoct_cutupfasta

contigs fragment mags binning fasta cut cut up

Cut up fasta file in non-overlapping or overlapping parts of equal length.

Input

01bed

Output

0 0 0

Tools

concoct:

Clustering cONtigs with COverage and ComposiTion

concoct_extractfastabins

contigs fragment mags binning fasta cut cut up bins merge

Creates a FASTA file for each new cluster assigned by CONCOCT

Input

012

Output

0 0

Tools

concoct:

Clustering cONtigs with COverage and ComposiTion

concoct_mergecutupclustering

contigs fragment mags binning fasta cut cut up merge

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

Input

01

Output

0 0

Tools

UNet TMA dearray Segmentation Cores

Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.

Input

01

Output

0 0 0 0 0

coverm_contig

mapping genomics metagenomics coverage

Map reads to contigs and estimate coverage

Input

0101bam_inputinterleavedenable_bam_output

Output

0 0 0

Tools

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

coverm_genome

mapping genomics metagenomics coverage

Calculate read coverage per-genome

Input

0101bam_inputinterleavedref_modeenable_bam_output

Output

0 0 0

Tools

quality assessment bam long-read genomics

Quality assessment of long-read bam files using cramino.

Input

012

Output

0 0 0

crisprcleanr_normalize

sort CNV correction CRISPR

remove false positives of functional crispr genomics due to CNVs

Input

012min_readsmin_targeted_genes

Output

0 0

Tools

crisprcleanr:

Analysis of CRISPR functional genomics, remove false positive due to CNVs.

crispresso2

genome editing CRISPR sequencing amplicon

A software pipeline for the analysis of genome editing outcomes from deep sequencing data

Input

01amplicon_sequencesamplicon_file

phylogenetic-trees cancer-genomics cancer-evolution clonal-evolution

Clone trees for Cancer Evolution studies from bulk sequencing data.

Input

01

Output

0 0 0 0 0 0

custom_addmostsevereconsequence

annotation vep consequence vcf

Annotate a VEP annotated VCF with the most severe consequence field

Input

0101

Output

0 0 0

Tools

custom:

Custom module to annotate a VEP annotated VCF with the most severe consequence field

custom_addmostseverepli

annotation vep pli vcf

Annotate a VEP annotated VCF with the most severe pLi field

Input

01

Output

0 0 0

Tools

custom:

Custom module to annotate a VEP annotated VCF with the most severe pLi field

custom_bed12codonpositions

bed12 bed6 codon splicing coordinates

Expand a BED12 into a BED6 of in-frame mRNA positions, projected back to genomic coordinates. Default behaviour emits one row per codon (the 5' nucleotide); --step / --width / --frame control the stride, span and offset on the spliced mRNA. Useful for codon-level work on spliced features (e.g. ribo-seq P-site counts per codon, frame / periodicity QC, novel-ORF tiling).

Input

01

Output

0 0

Tools

bed12codonpositions:

Python helper that expands a BED12 into per-codon BED6 positions along the spliced feature, with configurable frame, step and span width via ext.args.

custom_catadditionalfasta

fasta gtf genomics

Custom module to Add a new fasta file to an old one and update an associated GTF

Input

01201biotype

Output

0 0 0

Tools

custom:

Custom module to Add a new fasta file to an old one and update an associated GTF

CUSTOM_CLUSTERMETRICS

clustering metrics silhouette calinski-harabasz davies-bouldin evaluation

Computes clustering quality metrics (silhouette, Calinski-Harabasz, Davies-Bouldin) and performs k-sweep analysis

Input

012

Output

0 0 0 0 0

Tools

scikit-learn:

Machine learning library for clustering metrics

CUSTOM_CLUSTERVISUALIZATION

clustering visualization pca umap tsne dimension-reduction

Generates UMAP and t-SNE visualizations colored by cluster

Input

012

Output

0 0 0 0 0

Tools

scikit-learn:

Machine learning library for dimension reduction (PCA, t-SNE)

umap-learn:

Uniform Manifold Approximation and Projection for dimension reduction

custom_dumpsoftwareversions

custom dump version

Custom module used to dump software versions within the nf-core pipeline template

Input

versions

Output

0 0 0

Tools

custom:

Custom module used to dump software versions within the nf-core pipeline template

custom_filterdifferentialtable

filter differential expression logFC significance statistic p-value

Filters a differential expression table based on logFC and adjusted p-value thresholds

Input

01012012

Output

0 0 0 0

Tools

pandas:

Python library for data manipulation and analysis

custom_geneticmapconvert

map convertion R

This R script allows to automatically detect the different genetic map format and convert the input file in all the other format type.

Input

01

Output

0 0 0 0 0 0

Tools

custom:

Custom script to convert any genetic map format

custom_getchromsizes

fasta chromosome indexing

Generates a FASTA file of chromosome sizes and a fasta index file

Input

01

Output

0 0 0 0

Tools

samtools:

Tools for dealing with SAM, BAM and CRAM files

custom_gtffilter

gtf fasta filter

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

Input

0101

Output

0 0

Tools

gtffilter:

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

custom_matrixfilter

matrix filter abundance na

filter a matrix based on a minimum value and numbers of samples that must pass.

Input

0101

Output

0 0 0 0 0

Tools

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

custom_multiqccustombiotype

biotype featurecounts multiqc rnaseq qc

Generate MultiQC-compatible biotype count summaries from featureCounts output

Input

0101

Output

0 0 0

Tools

custom:

Summarise featureCounts biotype assignments for MultiQC reporting

custom_orfcollapse

orf ribo-seq catalogue smorf deduplication

Collapse small ORFs that share an amino-acid sequence cluster into a single catalogue entry. Pair with custom/orfmerge (coordinate-based catalogue), bedtools/getfasta + seqkit/translate (AA FASTA keyed by orf_id), and mmseqs/easycluster (AA clusters) upstream.

The coordinate-based merge in custom/orfmerge only groups ORFs that overlap on the genome, so the same micropeptide encoded at several distinct, non-overlapping loci (typically repetitive regions) survives as separate rows. This adopts the peptide-level deduplication and 0.9 amino-acid-similarity threshold of the GENCODE Ribo-seq ORF consolidation (Mudge et al. 2022, Nat Biotechnol, doi:10.1038/s41587-022-01369-0; gencode-riboseqORFs collapse_cutoff 0.9), implemented here with MMseqs2 sequence-identity clustering rather than that tool's longest-shared-string / P-site-overlap metric. Small ORFs (orf_class "smORF", i.e. aa_length <= 100) are clustered by amino-acid identity upstream and this module folds each multi-member cluster down to one representative.

Only smORF rows are collapsed; larger ORFs and transcript-anchored classes are passed through untouched. Among the smORF members of a cluster the representative is chosen by longest aa_length (ties broken by orf_id), so the result does not depend on which sequence MMseqs2 labelled the cluster representative. Catalogue row order is preserved; dropped members fold their called_by_<caller> / score_<caller> evidence, n_samples / samples recurrence and gene mappings into the survivor.

Input

012345

Output

0 0 0 0 0 0 0 0 0

Tools

orfcollapse:

Python helper that folds small-ORF catalogue rows sharing an MMseqs2 amino-acid cluster into a single representative, merging cross-caller provenance, cross-sample recurrence and gene mappings.

custom_orfmerge

orf ribo-seq catalogue merge clustering

Cluster normalised per-sample, per-caller ORF predictions into a single cohort-level catalogue. Pair with custom/orfnormalise upstream and (typically) bedtools/getfasta + seqkit/translate downstream to obtain the AA FASTA.

Strategy is class-aware (operating on the harmonised orf_class written by custom/orfnormalise):

canonical_cds: collapse by (transcript_id, strand). One canonical CDS per transcript by definition.
uORF, dORF, other: collapse by (transcript_id, strand, start, end). A single transcript can host multiple distinct uORFs / dORFs / internal ORFs, so keying on the outer span keeps them in separate clusters while still merging cross-caller calls that agree on coordinates.
novel_u, smORF: greedy reciprocal-overlap clustering on the outer genomic span at --reciprocal-overlap (default 0.8). Catches fuzzy cross-caller matches and exact-coordinate collapses in one pass. Order-dependent at the boundary: a chain A-B-C where A-B and B-C overlap at ~0.85 but A-C only at ~0.75 may cluster as {A,B,C} or {A,B}+{C} depending on iteration order. Rare in practice at 0.8.

Cross-caller consensus is recorded in two column families on the catalogue TSV:

called_by_<caller>: 0/1 indicator per supported caller (ribotish, ribocode, ribotricer, rpbp, price).
score_<caller>: best score from that caller within the cluster. Score direction is per-caller (p-values are minimised; Bayes factors / phase scores are maximised).

Cross-sample recurrence is recorded in two further columns:

n_samples: number of distinct samples contributing to the cluster (a cohort recurrence metric).
samples: sorted, comma-separated list of those sample ids.

Emits a small MultiQC custom-content TSV (per-class counts) for inclusion in downstream MultiQC reports.

Alongside the full catalogue, emits a consensus view (*.consensus.*) filtered to ORFs supported by at least --min-callers distinct callers and recurring in at least --min-samples samples (both default 1, i.e. no filtering, so the consensus view equals the full catalogue). Raising either threshold yields a higher-confidence catalogue without altering the full one.

Input

012

Output

0 0 0 0 0 0 0 0

Tools

orfmerge:

Python helper that clusters normalised ORF BED12+TSV pairs across callers and samples into one unified catalogue, recording per-caller provenance and best score in the output table.

custom_orfnormalise

orf ribo-seq normalisation bed12 translation

Convert one ORF caller's per-sample output table into a unified BED12 plus a sidecar metadata TSV, ready for cross-caller merging.

An "ORF caller" is a tool that scans ribosome-profiling (Ribo-seq) data and predicts which open reading frames are being translated. Each caller writes its own table format and uses its own location encoding, classification vocabulary, and confidence score. This module reconciles five callers into one harmonised schema. The caller val input selects the parser; supported values:

ribocode (RiboCode predicted ORF table; transcript-coord input, lifted to genomic blocks against the GTF)
ribotish (Ribo-TISH predict output; GenomePos + optional Blocks)
ribotricer (Ribotricer detect-orfs translating ORFs TSV; ORF span parsed from ORF_ID, multi-exon blocks recovered by intersecting with host-transcript exon structure from the GTF)
rpbp (Rp-Bp predicted-orfs BED12 with extra columns)
price (PRICE orfs.tsv; Gedi-style Location field, already genomic)

Output BED12 column order: chrom start end name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts The BED name column carries <caller>|<caller-native-id>. The BED score column is the caller's native score rescaled to 0-1000 (higher == more confident regardless of native direction).

Output sidecar TSV columns: orf_id caller sample_id chrom start end strand gene_id transcript_id orf_class aa_length score

Harmonised orf_class vocabulary written into the sidecar TSV:

canonical_cds: ORF maps to an annotated CDS (including truncated / extended variants of one).
uORF: upstream ORF (5'UTR-resident).
dORF: downstream ORF (3'UTR-resident).
novel_u: novel / intergenic ORF not assigned to an annotated CDS.
smORF: small ORF (aa_length <= 100); promoted regardless of location-based class so downstream tools can treat smORFs uniformly.
other: internal / overlap / frame variants and anything else.

Per-caller mapping notes (lossy collapses):

PRICE iORF (internal ORF), intronic, and orphan map to other. Cross-caller catalogue tracking still flags these via called_by_price, but the specific PRICE sub-type is not preserved.
Rp-Bp's predicted-orfs BED carries no ORF-type column; this module defaults every Rp-Bp call to canonical_cds (the post- selectfinalpredictionset curated set is dominated by canonical CDSs). uORF/dORF/novel calls present in Rp-Bp's separate .tab.gz / extracted-orfs.bed.gz files are not propagated here.

Each caller's native confidence score has a "direction" - some are lower-is-better (p-values), some are higher-is-better (Bayes factors, phase scores):

ribocode: min (combined p-value) ribotish: min (combined p-value) ribotricer: max (phase_score) rpbp: max (Bayes factor mean) price: min (p-value)

Downstream merging uses this to pick the best per-ORF call.

Input

01201

Output

0 0 0

Tools

orfnormalise:

Python helper that parses any of five Ribo-seq ORF caller output tables and emits a unified BED12 + sidecar TSV. Caller is selected by the caller val input.

ribocode:

Identifying genome-wide translated ORFs from Ribo-seq data

ribotish:

Ribo-seq based ORF prediction with Ribo-TISH

ribotricer:

Ribosome profiling P-site phasing-based ORF detection

rpbp:

Translated ORF identification with Rp-Bp

price:

Probabilistic inference of codon activities by an EM algorithm (PRICE)

CUSTOM_PCACLUSTERING

clustering kmeans dbscan pca embeddings

Performs KMeans or DBSCAN clustering on a sample-by-feature numeric matrix (e.g. principal components, embeddings)

Input

01algorithmn_clustersdbscan_epsdbscan_min_samples

Output

0 0 0

Tools

scikit-learn:

Machine learning library for clustering

custom_rsemmergecounts

rsem merge counts gene expression

Merge per-sample RSEM results into wide and long format TSV matrices

Input

01isoforms/*

Output

0 0 0 0 0 0 0

Tools

custom:

Custom module to merge RSEM gene and isoform count results across samples into count/TPM matrices and long-format tables.

custom_sratoolsncbisettings

NCBI settings sra-tools prefetch fasterq-dump

Test for the presence of suitable NCBI settings or create them on the fly.

Input

ids

Output

0 0

Tools

sratools:

SRA Toolkit and SDK from NCBI

custom_summarisetelomereestimation

telomere summary telseq telogator2 telomerehunter

Normalise and combine telomere length and content estimates from telseq, telogator2, and telomerehunter into a unified summary

Input

0123

Output

0 0

Tools

pandas:

Python Data Analysis Library

custom_tabulartogseachip

gsea chip convert tabular

Make a GSEA class file (.chip) from tabular inputs

Input

0101

Output

0 0

Tools

custom:

Make a GSEA annotation file (.chip) from tabular inputs

custom_tabulartogseacls

gsea cls convert tabular

Make a GSEA class file (.cls) from tabular inputs

Input

01

Output

0 0

Tools

custom:

Make a GSEA class file (.cls) from tabular inputs

custom_tabulartogseagct

gsea gct tabular

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

Input

01

Output

0 0

Tools

tabulartogseagct:

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

custom_tx2gene

gene gtf pseudoalignment rsem transcript

Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.

Input

0101quant_typeidextra

Output

0 0

Tools

custom:

"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"

cutadapt

adapter primers poly-A tails trimming fastq

Removes adapter sequences from sequencing reads

Input

01

Output

0 0 0

cutesv

cutesv structural-variant calling sv

structural-variant calling with cutesv

Input

01201

enrichment omics biological activity functional analysis prior knowledge

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

Input

010101

Output

0 0 0 0

decoupler_decoupler

enrichment omics biological activity functional analysis prior knowledge

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

Input

010101

Output

0 0 0 0

Tools

decoupler:

Ensemble of methods to infer biological activities from omics data

dedup

dedup deduplication pcr duplicates ancient DNA paired-end bam

DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).

Input

01

Output

0 0 0 0 0

variant calling machine learning neural network

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

Input

0123401010101

variant calling machine learning neural network

(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Input

012301010101

Output

0 0 0 0 0

deepvariant_callvariants

variant calling machine learning neural network

Call variants from the examples produced by make_examples

Input

01

Output

0 0

Tools

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

deepvariant_makeexamples

variant calling machine learning neural network

Transforms the input alignments to a format suitable for the deep neural network variant caller

Input

012301010101

Output

0 0 0 0

Tools

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

deepvariant_postprocessvariants

variant calling machine learning neural network

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Input

01234010101

Output

0 0 0 0 0

Tools

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

deepvariant_rundeepvariant

variant calling machine learning neural network

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Input

012301010101

Output

0 0 0 0 0 0

Tools

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

deepvariant_vcfstatsreport

variant calling machine learning neural network

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Input

01

Output

0 0

Tools

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

delly_call

genome structural variants bcf

Call structural variants

Input

0123450101suffix

Output

0 0 0

Tools

proteomics mass spectrometry DIA spectral library quantification

Generic DIA-NN module for running any DIA-NN operation including in-silico library generation, preliminary analysis, empirical library assembly, individual analysis, and final quantification

Input

012345

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

diann_insilicolibrarygeneration

diann spectral library proteomics deep learning dia

Generate in silico predicted spectral library using DIA-NN deep learning predictor. This module uses DIA-NN software for data-independent acquisition (DIA) proteomics data processing. Output materials should include attribution: "Generated using DIA-NN".

Input

01

Output

0 0 0

Tools

diann:

DIA-NN is a universal software for data-independent acquisition (DIA) proteomics data processing. It uses deep learning to predict spectral libraries and perform peptide identification and quantification.

disambiguate

disambiguation xenograft explant rna-seq alignment bam

Disambiguates reads aligned to two different organisms (e.g. human and mouse) from the same source of FASTQ files. Useful in explant RNA/DNA-Seq workflows where reads from two species are present. For reads aligned to both organisms, the algorithm compares alignment quality scores to determine the most likely species of origin. Produces four BAM files (uniquely assigned to species A or B, ambiguous for species A or B) and a summary file.

Input

012

Output

0 0 0 0 0 0

Tools

ngs-disambiguate:

Disambiguation algorithm for reads aligned to two different organisms using Tophat, Hisat2, BWA or STAR alignments.

dotseq_dotseq

riboseq rnaseq translation differential orf

Detect differential ORF usage (DOU) and ORF-level differential translation efficiency (DTE) from Ribo-seq with matched RNA-seq using DOTSeq. Wraps DOTSeqDataSetsFromSummarizeOverlaps() + DOTSeq() + getContrasts() and emits the package's native contrast tables plus plotDOT() visualisations.

Input

01230123

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

dotseq:

Differential ORF Translation analysis for Ribo-seq with matched RNA-seq

doubletdetection

single-cell doublets doublet_detection

Doublet detection in single-cell RNA-seq data

Input

01

duplicates markduplicates bam alignment

Streaming duplicate marking for NGS alignments with dupblaster; for maximum throughput it can also run inline in the alignment pipe (aligner | dupblaster | sort) rather than as a standalone step

Input

01

Output

0 0 0

duphold

sort duphold structural variation depth information

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

Input

012345fastafasta_fai

Output

0 0

dupradar

rnaseq duplication genomics

Assessment of duplication rates in RNA-Seq datasets

Input

0101

Output

0 0 0 0 0 0 0 0

dysgu

structural variants sv vcf

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

Input

012012

Output

0 0 0

dysgu_run

structural variants sv vcf

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

Input

012010101010101

Output

0 0 0

Tools

escherichia coli fasta serotype

In silico prediction of E. coli serotype

Input

01

eklipse mitochondria mtDNA circos deletion SV

tool for detection and quantification of large mtDNA rearrangements.

Input

012ref_gb

Output

0 0 0 0

elprep_fastatoelfasta

fasta elfasta elprep

Convert a file in FASTA format to the ELFASTA format

Input

01

Output

0 0 0

Tools

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

elprep_filter

sort bam sam filter variant calling

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

Input

0123456010101run_haplotypecallerrun_bqsrbqsr_tables_onlyget_activity_profileget_assembly_regions

Output

0 0 0 0 0 0 0 0 0

Tools

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

elprep_merge

bam sam merge

Merge split bam/sam chunks in one file

Input

01

Output

0 0

Tools

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

elprep_split

bam split by chromosome chunk

Split bam file into manageable chunks

Input

01

Output

0 0

Tools

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

emboss_cons

emboss consensus fasta multiple sequence alignment MSA

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.

Input

01

Output

0 0

Tools

emboss:

The European Molecular Biology Open Software Suite

emboss_revseq

nucleotides reverse complement sequences

the revseq program from emboss reverse complements a nucleotide sequence

Input

01

Output

0 0

Tools

emboss:

The European Molecular Biology Open Software Suite

emboss_seqret

emboss gff embl genbank fasta convert swissprot

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

Input

01out_ext

Output

0 0

Tools

emboss:

The European Molecular Biology Open Software Suite

emmtyper

fasta Streptococcus pyogenes typing

EMM typing of Streptococcus pyogenes assemblies

Input

01

Output

0 0

emu_abundance

metagenomics 16S nanopore

A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.

Input

01db

Output

0 0 0 0 0 0

Tools

site frequency spectrum ancestral alleles derived alleles

estimation of the unfolded site frequency spectrum

Input

0123

Output

0 0 0

evigene_tr2aacds

genomics transcript assembly clean polish filter redundant duplicate

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

Input

01

Output

0 0 0

Tools

fasta validation genome

A fasta linter/validator

Input

01

Output

0 0 0

famsa_align

alignment MSA genomics

Aligns sequences using FAMSA

Input

0101compress

Output

0 0

Tools

famsa:

Algorithm for large-scale multiple sequence alignments

famsa_guidetree

guide tree msa newick

Renders a guidetree in famsa

Input

01

Output

0 0

Tools

famsa:

Algorithm for large-scale multiple sequence alignments

faqcs

trimming quality control fastq faqcs

Perform adapter and quality trimming on sequencing reads with reporting

Input

01

Output

0 0 0 0 0 0 0 0

fargene

antibiotic resistance genes ARGs identifier metagenomic contigs

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

Input

01hmm_model

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

fast2q

CRISPRi FASTQ genomics

A program that counts sequence occurrences in FASTQ files.

Input

0101

Output

0 0 0 0 0 0

Tools

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

fastani

genome fasta ANI

Alignment-free computation of Average Nucleotide Identity (ANI)

Input

0101qlrl

Output

0 0 0 0

fastavalidator

fasta validation genome

"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."

Input

01

Output

0 0 0

Tools

fasta_validate:

"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"

fastawindows

genome fasta tsv bed

Quickly compute statistics over a fasta file in windows.

Input

01

Output

0 0 0 0 0 0

fastcov

coverage bam map

Generate a coverage plot from one or more bam files

Input

012file_ext

SRA ENA GEO metadata fetch public databases

A command line tool that makes it easier to find sequencing data from the SRA / GEO / ENA.

Input

ids

Output

0 0

fgbio_callduplexconsensusreads

umi duplex fgbio

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

Input

01min_readsmin_baseq

Output

0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgbio_callmolecularconsensusreads

UMIs consensus sequence bam

Calls consensus sequences from reads with the same unique molecular tag.

Input

01min_readsmin_baseq

Output

0 0

Tools

fgbio:

Tools for working with genomic and high throughput sequencing data.

fgbio_collectduplexseqmetrics

UMIs QC bam duplex

Collects a suite of metrics to QC duplex sequencing data.

Input

012

Output

0 0 0 0 0 0 0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

r-ggplot2:

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

fgbio_copyumifromreadname

fgbio copy umi readname

Copies the UMI at the end of a bam files read name to the RX tag.

Input

012

Output

0 0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgbio_fastqtobam

unaligned bam cram

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

Input

01

Output

0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgbio_filterconsensusreads

fgbio filter consensus umi duplexumi

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

Input

010123min_readsmin_baseqmax_base_error_rate

Output

0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgbio_groupreadsbyumi

UMI groupreads fgbio

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

Input

01strategy

Output

0 0 0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgbio_sortbam

sort bam sam

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.

Input

01

Output

0 0

Tools

fgbio:

Tools for working with genomic and high throughput sequencing data.

fgbio_zipperbams

fgbio umi unmapped ubam zipperbams

FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM

Input

0120123

Output

0 0

Tools

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

fgumi_clip

clip bam overlapping UMIs

Clip overlapping reads in a BAM file with fgumi

Input

0101

Output

0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_codec

UMIs consensus codec bam duplex

Call CODEC consensus reads from a UMI-grouped BAM with fgumi

Input

01keep_rejected

Output

0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_correct

UMIs correction bam

Correct UMIs in a BAM file to a fixed set of known UMIs with fgumi

Input

01umismin_distancekeep_rejected

Output

0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_dedup

UMIs deduplication duplicates bam

Mark or remove PCR duplicates using UMI information with fgumi

Input

01

Output

0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_downsample

UMIs downsample bam subsample

Downsample a BAM by UMI family using streaming with fgumi

Input

01fractionkeep_rejected

Output

0 0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_duplex

umi duplex consensus bam

Calls duplex consensus sequences from reads generated from the same double-stranded source molecule. This is a high-performance replacement for fgbio CallDuplexConsensusReads.

Input

01min_readskeep_rejected

Output

0 0 0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_duplexmetrics

UMIs QC bam duplex metrics

Collects a suite of metrics to QC duplex sequencing data

Input

012

Output

0 0 0 0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_extract

umi extract fastq bam

Extract unique molecular indices (UMIs) from FASTQ files and write an unaligned BAM file.

Input

012

Output

0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_fastq

bam fastq convert UMIs

Convert a BAM file to interleaved FASTQ format with fgumi

Input

01

Output

0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_filter

umi filter consensus bam

Filters consensus reads generated by simplex or duplex consensus calling. This is a high-performance replacement for fgbio FilterConsensusReads.

Input

0101min_readskeep_rejected

Output

0 0 0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_group

umi groupreads bam

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5' mapping positions of the reads from the template. Reads that have the same end positions are then sub-grouped by UMI sequence. This is a high-performance replacement for fgbio GroupReadsByUmi.

Input

01strategy

Output

0 0 0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_merge

merge bam alignment sort

Merge pre-sorted BAM files into a single sorted BAM

Input

01

Output

0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_simplex

umi consensus simplex bam

Calls simplex consensus sequences from reads with the same unique molecular tag. This is a high-performance replacement for fgbio CallMolecularConsensusReads.

Input

01min_readskeep_rejected

Output

0 0 0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_simplexmetrics

UMIs QC bam simplex metrics

Collects a suite of metrics to QC simplex (single-strand) UMI sequencing data

Input

012

Output

0 0 0 0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fgumi_sort

sort bam sam

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, and template-coordinate. This is a high-performance replacement for fgbio SortBam.

Input

01

Output

0 0 0

Tools

fgumi:

High-performance tools for working with UMI-tagged sequencing data.

fgumi_zipper

UMIs zipper bam alignment merge

Zip an unmapped UMI BAM together with its aligned BAM using fgumi

Input

0120123

Output

0 0

Tools

fgumi:

High-performance tools for UMI-tagged sequencing data.

fibertoolsrs_addnucleosomes

methylation genomics bam m6A nucleosome fiberseq

Add nucleosomes positions and MSP position to ONT BAM files

Input

01

Output

0 0

Tools

fibertoolsrs:

Mitchell Vollger's rust tools for fiberseq data.

fibertoolsrs_extract

methylation genomics bam m6A fiberseq

Extract Fiber-seq information (such as m6A, CpG, nucleosomes, and MSPs) from BAM file into BED file

Input

01extract_type

Output

0 0

Tools

fibertoolsrs:

Mitchell Vollger's rust tools for fiberseq data.

fibertoolsrs_predictm6a

methylation genomics bam m6A nucleosome fiberseq

Predict m6A positions using HiFi kinetics data and encode the results in the MM and ML bam tags. Also adds nucleosome (nl, ns) and MTase sensitive patches (al, as)

Input

01

Output

0 0

Tools

concatenate gzip cat find pigz

A module for concatenation of gzipped or uncompressed files getting around UNIX terminal argument size

Input

01

Output

0 0 0 0

Tools

find:

GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression

pigz:

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

find_unpigz

gzip find pigz

A module for decompressing a large number of gzipped files, getting around the UNIX terminal argument limit

Input

01

Output

0 0 0

Tools

find:

GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression

pigz:

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

flash

sort reads merging merge mate pairs

Perform merging of mate paired-end sequencing reads

Input

01

Output

0 0 0 0

flye

assembly genome de novo genome assembler single molecule

De novo assembler for single molecule sequencing reads

Input

01mode

demultiplex fastq rust

Demultiplex fastq files

Input

0123

fungal effector protein prediction language model deep learning model download

Downloads the pretrained ESM-1b weights (esm1b_t33_650M_UR50S.pt) that fungtion uses for fungal effector prediction into a local models/ directory via fungtion setup-models. The downloaded directory is used as input to the FUNGTION_FUNGTION prediction module.

Input

Output

0 0 0

Tools

fungtion:

Fungtion predicts and visualizes fungal effector proteins from protein FASTA sequences using ESM-1b protein language model embeddings and R-based SVM models. For predicted effectors it can generate similarity-network and relationship-tree visualizations and an HTML report.

fungtion_fungtion

fungal effector protein prediction language model deep learning svm classification

Predicts fungal effector proteins from protein FASTA sequences using ESM-1b protein language model embeddings and R-based SVM models. Optionally produces similarity-network and relationship-tree visualizations and an HTML report for predicted effectors. Requires the pretrained ESM-1b weights from FUNGTION_DOWNLOADMODELS as input.

Input

01pretrain

Output

0 0 0 0 0 0 0 0 0

Tools

fungtion:

Fungtion predicts and visualizes fungal effector proteins from protein FASTA sequences using ESM-1b protein language model embeddings and R-based SVM models. For predicted effectors it can generate similarity-network and relationship-tree visualizations and an HTML report.

fusioncatcher_build

references fusions rna

Build references for fusioncatcher

Input

meta

Output

0 0

Tools

fusioncatcher:

Build genome for fusioncatcher

fusioncatcher_fusioncatcher

fusion rna fastq

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data

Input

0101

Output

0 0 0 0

Tools

genomics cluster genome metagenomics

Cluster genome FASTA files by average nucleotide identity

Input

0123

Output

0 0 0

gamma_gamma

gamma gene-calling microbial allele

Gene Allele Mutation Microbial Assessment

Input

01db

Output

0 0 0 0 0

Tools

gawk awk txt text file parsing

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.

Input

01program_filedisable_redirect_output

Output

0 0

gcta_addgrms

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Combine multiple GRMs listed in an MGRM manifest into a single dense GRM

Input

012

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_adjustgrm

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Adjust a dense GRM for incomplete tagging using gcta --grm-adj

Input

012

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_bivariatereml

gcta genome-wide complex trait analysis reml restricted maximum likelihood bivariate grm genetic relationship matrix genetics

Run bivariate REML analysis with a single dense GRM

Input

0101230101

Output

0 0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_bivariateremlldms

gcta genome-wide complex trait analysis reml restricted maximum likelihood bivariate ldms linkage disequilibrium and minor allele frequency stratification grm genetic relationship matrix genetics

Run bivariate REML-LDMS analysis with an MGRM manifest

Input

01201230101

Output

0 0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_calculateldscores

gcta genome-wide complex trait analysis ld score linkage disequilibrium score ldms linkage disequilibrium and minor allele frequency stratification genetics

Calculate LD scores with GCTA and derive GREML-LDMS SNP groups

Input

0123ld_score_region

Output

0 0 0 0

Tools

gcta:

GCTA is a tool for genome-wide complex trait analysis.

gcta_fastgwa

gcta genome-wide complex trait analysis fastgwa fast genome-wide association gwas genome-wide association study genetics

Run GCTA fastGWA mixed linear model association analysis with PLINK genotype inputs

Input

012301010101

Output

0 0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_grmcutoff

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Apply a genetic relationship cutoff to a dense GRM using gcta --grm-cutoff

Input

01cutoff

Output

0 0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_keep

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Apply a keep file to a dense GRM using gcta --keep

Input

0101

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_makebksparse

gcta genome-wide complex trait analysis grm genetic relationship matrix sparse genetics

Create a sparse GRM from a dense GRM for downstream fastGWA analyses

Input

01cutoff

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_makegrm

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Compute a whole dense GRM with GCTA

Input

01234

Output

0 0

Tools

gcta:

GCTA is a tool for genome-wide complex trait analysis.

gcta_makegrmpart

gcta genome-wide complex trait analysis grm genetic relationship matrix genetics

Compute one partition of a GCTA genetic relationship matrix

Input

012345601

Output

0 0

Tools

gcta:

GCTA is a tool for genome-wide complex trait analysis.

gcta_reml

gcta genome-wide complex trait analysis reml restricted maximum likelihood grm genetic relationship matrix genetics

Run univariate REML heritability estimation with a dense GRM

Input

01010101

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gcta_remlldms

gcta genome-wide complex trait analysis reml restricted maximum likelihood ldms linkage disequilibrium and minor allele frequency stratification grm genetic relationship matrix genetics

Run REML-LDMS heritability estimation with an MGRM manifest

Input

012010101

Output

0 0

Tools

gcta:

Genome-wide Complex Trait Analysis (GCTA) estimates genetic relationships, variance components, and association statistics from genome-wide data.

gecco_convert

bgc reformatting clusters gbk gff bigslice faa fna

This command helps transforming the output files created by GECCO into helpful format, should you want to use the results in combination with other tools.

Input

012modeformat

Output

0 0 0 0 0

Tools

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

gecco_run

bgc detection metagenomics contigs

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

Input

012model_dir

Output

0 0 0 0 0 0

Tools

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

gedi_indexgenome

riboseq index genome gedi price orf

Build a GEDI genome index from a FASTA and GTF for downstream PRICE ORF prediction

Input

012

Output

0 0

Tools

gedi:

Gedi is a Java software platform for working with genomic data (sequencing reads, sequences, per-base numeric values, annotations). It provides the PRICE algorithm for ribosome profiling ORF discovery.

gedi_price

riboseq orf price gedi translation

Identify translated ORFs from Ribo-seq BAMs using the PRICE algorithm

Input

01201

Output

0 0 0 0 0 0 0 0

Tools

gedi:

Gedi is a Java software platform for working with genomic data (sequencing reads, sequences, per-base numeric values, annotations). It provides the PRICE algorithm (Probabilistic Inference of Codon Activities by an EM algorithm) for ribosome profiling ORF discovery with near-cognate start codon detection.

gem2_gem2bedmappability

mappability bedgraph index gem

Convert a mappability file to bedgraph format

Input

0101

Output

0 0 0

Tools

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

gem2_gemindexer

fasta index reference mappability

Create a GEM index from a FASTA file

Input

01

Output

0 0 0

Tools

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

gem2_gemmappability

mappability gem index reference

Define the mappability of a reference

Input

01read_length

Output

0 0

Tools

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

gem3_gem3indexer

fastq genomics mappability

Create a GEM index from a FASTA file

Input

01

Output

0 0 0

Tools

gem3:

The GEM indexer (v3).

gem3_gem3mapper

fastq genomics mappability

Performs fastq alignment to a fasta reference using using gem3-mapper

Input

0101sort_bam

Output

0 0

Tools

gem3:

The GEM indexer (v3).

gemmi_cif2json

structure mmcif cif json structural-biology

Convert macromolecular structure files from mmCIF format to JSON format using gemmi.

Input

01

Output

0 0

Tools

gemmi:

Gemmi is a library and command-line tool for parsing, manipulating, and converting macromolecular structural biology data formats such as mmCIF, PDB, and MTZ.

genescopefk

k-mer genome profile histogram

A derivative of GenomeScope2.0 modified to work with FastK

Input

01

genome size genome heterozygosity repeat content

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

Input

01

Output

0 0 0 0 0 0 0 0 0 0

genometester4_glistmaker

kmer sort alignment-free

Create and count k-mer lists from nucelotide sequences.

Input

01

Output

0 0

Tools

genometester4:

A toolkit for performing set operations - union, intersection and complement - on k-mer lists.

genotyphi_parse

genotype Salmonella Typhi Mykrobe

Genotype Salmonella Typhi from Mykrobe results

Input

01

Output

0 0

Tools

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

genrich

peak-calling ChIP-seq ATAC-seq

Peak-calling for ChIP-seq and ATAC-seq enrichment experiments

Input

012blacklist_bed

gfa graph pangenome variation graph

Collapse walk-preserving shared affixes in variation graphs in GFA format

Input

01

transcripts gtf merge compare

Compare, merge, annotate and estimate accuracy of generated gtf files

Input

0101201

merge gvcf joint-variant-calling

merge gVCF files and perform joint variant calling

Input

01201

metagenomics contamination visualisation taxonomy viromics microbiome dashboard

Generates an interactive HTML dashboard integrating taxonomy, annotation, and metadata to detect contamination in metagenomic and amplicon sequencing datasets. GRIMER is independent of quantification methods and directly analyses contingency tables.

Input

01sample_metadataconfig

Output

0 0

gsea_gsea

gene set analysis enrichment gsea gene set

run the Broad Gene Set Enrichment tool in GSEA mode

Input

01230101

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

gsea:

Gene Set Enrichment Analysis (GSEA)

GSTAMA_COLLAPSE

tama_collapse.py isoseq nanopore long-read transcriptome gene model TAMA

Collapse redundant transcript models in Iso-Seq data.

Input

01fasta

Output

0 0 0 0 0 0 0 0 0 0

Tools

tama_collapse.py:

Collapse similar gene model

gstama_merge

gstama gstama/merge long-read isoseq nanopore tama trancriptome annotation

Merge multiple transcriptomes while maintaining source information.

Input

01filelist

Output

0 0 0 0 0

Tools

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

gstama_polyacleanup

gstama gstama/polyacleanup long-read isoseq tama trancriptome annotation

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

Input

01

Output

0 0 0 0

Tools

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

gt_gff3

genome gff3 annotation

GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files

Input

01

Output

0 0 0

Tools

gt:

The GenomeTools genome analysis system

gt_gff3validator

genome gff3 annotation validation

GenomeTools gt-gff3validator utility to strictly validate a GFF3 file

Input

01

Output

0 0 0

Tools

gt:

The GenomeTools genome analysis system

gt_ltrharvest

genomics genome annotation repeat transposons retrotransposons

Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility

Input

01

Output

0 0 0 0 0

Tools

gt:

The GenomeTools genome analysis system

gt_stat

genome gff3 annotation statistics stats

GenomeTools gt-stat utility to show statistics about features contained in GFF3 files

Input

01

Output

0 0

Tools

gt:

The GenomeTools genome analysis system

gt_suffixerator

genomics genome fasta index

Computes enhanced suffix array using GenomeTools gt-suffixerator utility

Input

01mode

Output

0 0

Tools

gt:

The GenomeTools genome analysis system

gtdbtk_classifywf

GTDB taxonomy taxonomic classification metagenomics classification genome taxonomy database bacteria archaea

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Input

0101use_pplacer_scratch_dir

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

gtdbtk_gtdbtoncbimajorityvote

gtdb taxonomy ncbi taxonomy taxonomic classification conversion taxonomy classification genome taxonomy database bacteria archaea

Converts the output classifications of GTDB-TK from GTDB taxonomy to NCBI taxonomy

Input

0120101

Output

0 0 0

Tools

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

gtfsort

sort genomics gtf

Sort GTF files in chr/pos/feature order

Input

01

gunzip compression decompression

Compresses and decompresses files.

Input

01

Output

0 0

gvcftools_extractvariants

gvcftools extract_variants extractvariants gvcf vcf

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

Input

01

Output

0 0

Tools

ibd hbd beagle

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

Input

01mapexclude

demultiplexing hashing-based deconvolution single-cell

Generating cell hashing calls from a matrix of count data.

Input

0123

Output

0 0 0 0 0 0 0 0 0

helitronscanner_draw

genomics helitron scanner

HelitronScanner draw tool for Helitron transposons in genomes

Input

010101

Output

0 0

Tools

helitronscanner:

HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes

helitronscanner_scan

genomics helitron scanner

HelitronScanner scanHead and scanTail tools for Helitron transposons in genomes

Input

01commandlcv_filepathbuffer_size

Output

0 0

Tools

helitronscanner:

HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes

hhsuite_hhblits

sensitive search HMM alignment

Fast and sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)

Input

0101

Output

0 0

Tools

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

hhsuite_hhsearch

sensitive search HMM alignment

Sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)

Input

0101

Output

0 0

Tools

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

hhsuite_reformat

reformat MSA hhsuite alignment

Reformat a Multiple Sequence Alignment (MSA) file

Input

01informatoutformat

Output

0 0

Tools

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

hicap

fasta serotype Haemophilus influenzae

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

Input

01database_dirmodel_fp

Output

0 0 0 0

hicexplorer_hicpca

eigenvectors PCA hicPCA

Computes PCA eigenvectors for a Hi-C matrix.

Input

01

Output

0 0 0 0

Tools

hicexplorer:

Set of programs to process, analyze and visualize Hi-C and capture Hi-C data

hifiadapterfilt_downloaddb

pacbio hifi adapter blast database download

Downloads the pre-built PacBio adapter BLAST database from the HiFiAdapterFilt GitHub repository. The database contains two adapter sequences: NGB00972.1 (Pacific Biosciences Blunt Adapter, 45 bp) and NGB00973.1 (C2 Primer, 35 bp). This module is consumed by hifiadapterfilt/hifiadapterfilt as a prerequisite to provide the BLAST database for adapter detection.

Input

Output

0 0

Tools

hifiadapterfilt:

HiFiAdapterFilt uses BLAST to identify and remove adapter sequences from PacBio HiFi reads, producing adapter-filtered FASTQ output along with detailed statistics.

hifiadapterfilt_hifiadapterfilt

pacbio hifi adapter filtering long reads ccs

Remove adapter sequences from PacBio HiFi (CCS) reads using BLAST-based detection. Produces filtered FASTQ output, filtering statistics, BLAST hits, and a list of blocked read IDs.

Input

012

Output

0 0 0 0 0

Tools

hifiadapterfilt:

HiFiAdapterFilt uses BLAST to identify and remove adapter sequences from PacBio HiFi reads, producing adapter-filtered FASTQ output along with detailed statistics.

hifiasm

genome assembly haplotype resolution phasing PacBio HiFi long reads

Whole-genome assembly using PacBio HiFi reads

Input

01201201201

Output

0 0 0 0 0 0 0 0 0 0 0

hificnv

copy number variation cnv PacBio HiFi long reads structural variation

Copy number variant calling from PacBio HiFi reads

Input

0123010101

Output

0 0 0 0 0

hifitrimmer_filterbam

pacbio bam fasta hifi_trimmer filtering trimming quality control adapter removal

Run hifi_trimmer filter_bam to filter and trim adapter hits from PacBio HiFi reads (BAM/FASTA/FASTQ) using BLAST against adapter sequences. Primary output is filtered FASTA/FASTQ.

Input

012

Output

0 0 0

Tools

hifi_trimmer:

hifi_trimmer: tools for processing HiFi read BLAST results and filtering/trimming read files to output processed FASTA/FASTQ and accompanying summary and hit (optional) files.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

hifitrimmer_processblast

pacbio bam hifi_trimmer processblast quality control adapter removal

Run hifi_trimmer process_blast to process a BLAST search of adapter sequences against PacBio HiFi reads. Primary output is a BED describing regions to exclude, a json of summary information, and an optional hits file.

Input

0101

Output

0 0 0 0

Tools

hifi_trimmer:

hifi_trimmer: tools for processing HiFi read BLAST results and filtering/trimming BAM files to output processed FASTA/FASTQ and accompanying summary and hit (optional) files.

hiphase

pacbio structural variant phasing pacbio hifi snv haplotagging

Small and structural variant phasing tool for PacBio HiFi reads, supporting co-phasing of SNVs and SVs across multiple BAM files and samples

Input

01234567012output_bamsummary_fileblocks_filestats_filehaplotag_filefile_format

hmmer hmmsearch rank

R script that scores output from multiple runs of hmmer/hmmsearch

Input

01

Output

0 0

Tools

hmmer:

Biosequence analysis using profile hidden Markov models

R:

A Language and Environment for Statistical Computing

Tidyverse:

Tidyverse: R packages for data science

hmmer_hmmsearch

Hidden Markov Model HMM hmmer hmmsearch

search profile(s) against a sequence database

Input

012345

Output

0 0 0 0 0

Tools

hmmer:

Biosequence analysis using profile hidden Markov models

hmmer_hmmstat

hidden markov model HMM hmmer statistics profile summary

display summary statistics for each profile HMM in an HMM database file

Input

01

Output

0 0

Tools

hmmer:

Biosequence analysis using profile hidden Markov models

hmmer_jackhmmer

HMM homologs iterative model refinement

iterative searches to detect distant homologs by refining an HMM profile from hits

Input

012345

Output

0 0 0 0 0

Tools

hmmer:

Biosequence analysis using profile hidden Markov models

hmtnote_annotate

hmtnote mitochondria annotation

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

Input

01

Output

0 0

Tools

hmtnote:

Human mitochondrial variants annotation using HmtVar.

holodeck_eval

simulation evaluation alignment accuracy benchmarking

Evaluate alignment accuracy of holodeck-simulated reads

Input

012

Output

0 0

Tools

holodeck:

Modern NGS read simulator written in Rust.

holodeck_methylate

simulation vcf methylation bisulfite benchmarking

Generate a methylation-annotated VCF from a reference genome with holodeck

Input

0123

Output

0 0

Tools

holodeck:

Modern NGS read simulator written in Rust.

holodeck_mutate

simulation vcf mutations variants benchmarking

Generate a VCF of random mutations from a reference genome with holodeck

Input

012

Output

0 0

Tools

holodeck:

Modern NGS read simulator written in Rust.

holodeck_simulate

simulation fastq reads sequencing benchmarking

Simulate sequencing reads from a reference genome with holodeck

Input

0123

Output

0 0 0 0 0

Tools

holodeck:

Modern NGS read simulator written in Rust.

homer_annotatepeaks

annotations peaks bed

Annotate peaks with HOMER suite

Input

01fastagtf

Output

0 0 0

Tools

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

homer_findpeaks

annotation peaks enrichment

Find peaks with HOMER suite

Input

01uniqmap

Output

0 0

Tools

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

homer_maketagdirectory

peaks bed bam sam

Create a tag directory with the HOMER suite

Input

01fasta

Output

0 0 0 0 0 0

Tools

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

homer_makeucscfile

peaks bed bedGraph

Create a UCSC bed graph with the HOMER suite

Input

01

Output

0 0

Tools

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

homer_pos2bed

peaks bed pos

Converting from HOMER peak to BED file formats

Input

01

Output

0 0

Tools

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

hostile_clean

hostile decontamination human removal host removal clean

Removes host reads from short- and long-read FASTQ sequencing files

Input

0101

Output

0 0 0

Tools

hostile:

Hostile: accurate host decontamination

hostile_fetch

hostile decontamination human removal download

Downloads required reference genomes for Hostile

Input

index_name

Output

0 0

Tools

hostile:

Hostile: accurate host decontamination

hpsuissero

bacteria fasta haemophilus

Serotype prediction of Haemophilus parasuis assemblies

Input

01

umi fastq deduplication hamming-distance clustering

HUMID is a tool to quickly and easily remove duplicate reads from FASTQ files, with or without UMIs.

Input

0101

demultiplex lexogen fastq

Demultiplex paired-end FASTQ files from QuantSeq-Pool

Input

012

Output

0 0 0 0

idr

IDR peaks ChIP-seq ATAC-seq

Measures reproducibility of ChIP-seq, ATAC-seq peaks using IDR (Irreproducible Discovery Rate)

Input

012

Output

0 0 0 0 0

igv_js

igv igv.js js genome browser

igv.js is an embeddable interactive genome visualization component

Input

012

Output

0 0 0 0

Tools

imaging ome-tif staging MCMICRO

Staging module transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.

Input

01

bacteria fasta mobile genetic elements integron

Detect integrons in DNA sequences

Input

01

phylogeny newick maximum likelihood

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

Input

012tree_telmclustmdefpartitions_equalpartitions_proportionalpartitions_unlinkedguide_treesitefreq_inconstraint_treetrees_zsuptreetrees_rf

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

irescue

scRNA-seq transposons repeats

Quantification of transposable elements expression in scRNA-seq

Input

01genomebed

Output

0 0 0 0 0

islandpath

genomes genomic islands prediction

Genomic island prediction in bacterial and archaeal genomes

Input

01

jasminesv jasmine structural variants vcf bam

Jointly Accurate Sv Merging with Intersample Network Edges

Input

012340101chr_norm

Output

0 0 0

jellyfish_count

k-mer DNA substrings

Efficiently counts k-mers from DNA sequencing reads using a fast, memory-efficient, parallelized algorithm

Input

01kmer_lengthsize

Output

0 0

Tools

jellyfish:

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence

jellyfish_dump

k-mer DNA substrings

Dumps the results from a jellyfish binary file into a human readable format

Input

01

Output

0 0

Tools

jellyfish:

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence

juicertools_pre

hic contact map genomics

Create a multi-resolution .hic contact matrix for analysis with Juicer

Input

01012

Output

0 0

Tools

juicertools:

Visualization and analysis software for Hi-C data

jupyternotebook

Python Jupyter jupytext papermill notebook reports

Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.

Input

01parametersinput_fileskernel_

Output

0 0 0 0 0 0

Tools

jupytext:

Jupyter notebooks as plain text scripts or markdown documents

papermill:

Parameterize, execute, and analyze notebooks

nbconvert:

Parameterize, execute, and analyze notebooks

jvarkit_dict2bed

vcf bcf bed dict dictionary fasta fai

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

Input

01

Output

0 0

Tools

jvarkit:

Java utilities for Bioinformatics.

jvarkit_dict2xml

vcf bcf bed dict dictionary fasta fai

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

Input

01

Output

0 0

Tools

jvarkit:

bam cram depth coverage xml svg visualization

Plot whole genome coverage from BAM/CRAM file as SVG

Input

012010101

Output

0 0

Tools

jvarkit:

Java utilities for Bioinformatics.

kaiju_kaiju

classify metagenomics fastq taxonomic profiling

Taxonomic classification of metagenomic sequence data using a protein reference database

Input

01db

Output

0 0

Tools

kaiju:

Fast and sensitive taxonomic classification for metagenomics

kaiju_kaiju2krona

taxonomy visualisation krona chart metagenomics

Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.

Input

01db

Output

0 0

Tools

kaiju:

Fast and sensitive taxonomic classification for metagenomics

kaiju_kaiju2table

classify metagenomics taxonomic profiling

write your description here

Input

01dbtaxon_rank

Output

0 0

Tools

kaiju:

Fast and sensitive taxonomic classification for metagenomics

kaiju_mergeoutputs

classify metagenomics fastq taxonomic profiling

Merge two tab-separated output files of Kaiju and Kraken in the column format

Input

012db

Output

0 0

Tools

kaiju:

Fast and sensitive taxonomic classification for metagenomics

kaiju_mkfmi

classify metagenomics fastq taxonomic profiling database index

Make Kaiju FMI-index file from a protein FASTA file

Input

01nodes_dmpnames_dmpkeep_intermediate

Output

0 0 0 0

Tools

kaiju:

Fast and sensitive taxonomic classification for metagenomics

kalign_align

alignment MSA genomics

Aligns sequences using kalign

Input

01compress

Output

0 0 0

Tools

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

kallisto_index

kallisto kallisto/index index

Create kallisto index

Input

01

Output

0 0

Tools

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

kallisto_quant

quant kallisto pseudoalignment

Computes equivalence classes for reads and quantifies abundances

Input

0101gtfchromosomesfragment_lengthfragment_length_sd

Output

0 0 0 0

Tools

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

kallistobustools_count

scRNA-seq count single-cell kallisto bustools

quantifies scRNA-seq data from fastq files using kb-python.

Input

0101t2gt1ct2ctechnologyworkflow_mode

Output

0 0 0

Tools

kb:

kallisto and bustools are wrapped in an easy-to-use program called kb

kallistobustools_ref

scRNA-seq count single-cell kallisto bustools index

index creation for kb count quantification of single-cell data.

Input

0101workflow_mode

Output

0 0 0 0 0 0 0

Tools

kb:

kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.

kat_hist

k-mer histogram count

Creates a histogram of the number of distinct k-mers having a given frequency.

Input

01

Output

0 0 0 0 0 0 0

Tools

kat:

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts

khmer_normalizebymedian

digital normalization khmer k-mer counting

Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.

Input

012

Output

0 0

Tools

khmer:

khmer k-mer counting library

khmer_trimlowabund

quality control genomics filtering reads khmer k-mer

Removes low abundance k-mers from FASTA/FASTQ files

Input

01

Output

0 0

Tools

khmer:

khmer k-mer counting library

khmer_uniquekmers

khmer k-mer effective genome size

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

Input

01kmer_size

Output

0 0 0

Tools

khmer:

khmer k-mer counting library

kleborate

screen assembly Klebsiella pneumoniae

Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).

Input

01

k-mer count genome assembly

KmerGenie estimates the best k-mer length for genome de novo assembly

Input

01

Output

0 0 0 0 0 0

knotannotsv

annotation structural variants annotsv tsv html

Simple tool to create a customizable html file (to be displayed on a web browser) from an AnnotSV output

Input

012

ancient DNA adapter removal clipping trimming merging collapsing preprocessing bayesian

Bayesian reconstruction of ancient DNA fragments

Input

01

genome annotation gff3 gtf liftover

Uses Liftoff to accurately map annotations in GFF or GTF between assemblies of the same, or closely-related species

Input

01ref_faref_annotationref_db

Output

0 0 0 0

lima

isoseq ccs primer pacbio barcode

lima - The PacBio Barcode Demultiplexer and Primer Remover

Input

01primers

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0

limma_differential

differential expression microarray limma

runs a differential expression analysis with Limma

Input

012345012

Output

0 0 0 0 0 0 0

Tools

limma:

Linear Models for Microarray Data

links

scaffold long-reads genomics

LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. This module is for LINKS >=2.0.0 and does not support MPET input.

Input

0101

scaffolding long read assembly correction

"A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps:

Tigmint cuts the draft assembly at potentially misassembled regions
ntLink is then used to scaffold the corrected assembly
followed by ARKS for further scaffolding (optional)"

Input

0101commandspangenomesizelongmap

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

lsa_cosine

similarity cosine clustering rnaseq heatmap

Calculates the cosine similarity matrix between samples based on a gene expression matrix.

Input

01

Output

0 0 0

Tools

lsa:

Latent Semantic Analysis (LSA) package for R.

pheatmap:

Pretty Heatmaps package for R.

ltrfinder

genomics annotation parallel repeat long terminal retrotransposon retrotransposon

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

Input

01

Output

0 0 0 0

Tools

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

ltrharvest

genomics genome annotation repeat transposons retrotransposons

Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain

Input

01

Output

0 0 0 0

Tools

LTR_HARVEST_parallel:

A Perl wrapper for LTR_harvest

gt:

The GenomeTools genome analysis system

ltrretriever

genomics annotation repeat long terminal repeat retrotransposon

Identifies LTR retrotransposons using LTR_retriever

Input

metagenomeharvestfindermgescannon_tgca

Output

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

Tools

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

ltrretriever_lai

genomics annotation repeat long terminal retrotransposon retrotransposon stats qc

Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments

Input

01pass_listannotation_outmonoploid_seqs

Output

0 0 0

Tools

lai:

Assessing genome assembly quality using the LTR Assembly Index (LAI)

ltrretriever_ltrretriever

genomics annotation repeat long terminal repeat retrotransposon

Identifies LTR retrotransposons using LTR_retriever

Input

01harvestfindermgescannon_tgca

Output

0 0 0 0 0 0 0

Tools

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

macrel_contigs

AMP antimicrobial peptides genome mining metagenomes peptide prediction

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

Input

01

Output

0 0 0 0 0 0 0

Tools

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

macs2_callpeak

alignment atac-seq chip-seq peak-calling

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

Input

012macs2_gsize

Output

0 0 0 0 0 0

Tools

macs2:

Model Based Analysis for ChIP-Seq data

macs3_callpeak

alignment atac-seq chip-seq peak-calling

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

Input

012macs3_gsize

Output

0 0 0 0 0 0

Tools

macs3:

Model Based Analysis for ChIP-Seq data

macse_refinealignment

multiple sequence alignment codon-aware fasta

improves the input nucleotide alignment in a codon-aware manner

Input

01

Output

0 0 0

Tools

macse:

MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons.

macsyfinder_download

genomics protein macromolecular systems models database

Download MacSyFinder models using msf_data

Input

model_name

Output

0 0 0

Tools

macsyfinder:

Detection of macromolecular systems in protein datasets using systems modelling and similarity search

macsyfinder_search

genomics protein macromolecular systems hmmer search

Search for macromolecular systems in protein datasets using MacSyFinder

Input

01modelsmodel_names

Output

0 0 0 0 0 0 0 0

Tools

macsyfinder:

Detection of macromolecular systems in protein datasets using systems modelling and similarity search

mafft

fasta msa multiple sequence alignment

Multiple sequence alignment using MAFFT

Input

0101010101010

Output

fas versions

Tools

pigz:

Parallel implementation of the gzip algorithm.

mafft_align

fasta msa multiple sequence alignment

Multiple sequence alignment using MAFFT

Input

010101010101compress

Output

0 0 0

Tools

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

pigz:

Parallel implementation of the gzip algorithm.

mafft_guidetree

fasta msa guide tree

Guide tree rendering using MAFFT

Input

01

Output

0 0

Tools

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

mageck_count

sort functional genomics sgRNA CRISPR-Cas9

mageck count for functional genomics, reads are usually mapped to a specific sgRNA

Input

01library

Output

0 0 0

Tools

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

mageck_mle

sort maximum-likelihood CRISPR

maximum-likelihood analysis of gene essentialities computation

Input

01design_matrix

Output

0 0 0

Tools

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

mageck_test

sort rra CRISPR

Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.

Input

01

Output

0 0 0 0

Tools

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

magus_align

MSA alignment genomics graph

Multiple Sequence Alignment using Graph Clustering

Input

0101compress

Output

0 0 0

Tools

magus:

Multiple Sequence Alignment using Graph Clustering

magus_guidetree

MSA guidetree genomics graph

Multiple Sequence Alignment using Graph Clustering

Input

01

Output

0 0

Tools

ancient DNA DNA damage NGS damage patterns bam

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

Input

01fasta

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

mash_dist

distance estimate reference query

Calculate Mash distances between reference and query sequences

Input

01reference

Output

0 0

Tools

mash:

Fast sequence distance estimator that uses MinHash

mash_screen

screen containment contamination taxonomic assignment

Screens query sequences against large sequence databases

Input

0101

Output

0 0

Tools

mash:

Fast sequence distance estimator that uses MinHash

mash_sketch

mash mash/sketch minhash reduced representations sequences sketch

Creates vastly reduced representations of sequences using MinHash

Input

maxquant:

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.

mcquant

quantification image_analysis mcmicro highly_multiplexed_imaging

Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.

Input

010101

checksum MD5 128 bit

Create MD5 (128-bit) checksums

Input

01as_separate_files

Output

0 0

mdust

genomics dna low-complexity masking

mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences

Input

01

profile metagenomics melon classification long reads nanopore

Performs taxonomic profiling of long metagenomic reads against the melon database

Input

01databasek2_db

Output

0 0 0 0 0

memote_report

metabolic model sbml quality control

Generate HTML quality report snapshot for genome-scale metabolic models

Input

01

Output

0 0

Tools

memote:

Benchmark and compare genome-scale metabolic models and generate HTML reports.

memote_run

metabolic model sbml quality control testing

Run memote test suite and collect results as JSON

Input

01

Output

0 0

Tools

memote:

Benchmark and compare genome-scale metabolic models and generate HTML reports.

meningotype

fasta Neisseria meningitidis serotype

Serotyping of Neisseria meningitidis assemblies

Input

01

Output

0 0 0

merfin_hist

assembly evaluation quality completeness

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

Input

0101lookup_tableseqmerspeak

Output

0 0 0

Tools

genomics metagenomics taxonomy short reads long reads kmer k-mer metacache build reference

Taxonomic profiling database building with MetaCache

Input

01taxonomyseq2taxid

Output

0 0

Tools

metacache:

MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin. It aims to reduce the memory requirement usually associated with k-mer based methods while retaining their speed. MetaCache uses locality sensitive hashing to quickly identify candidate regions within one or multiple reference genomes. A read is then classified based on the similarity to those regions.

For an independent comparison to other tools in terms of classification accuracy see the LEMMI benchmarking site.

The latest version of MetaCache classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs.

metacache_query

metagenomics classification metacache

Metacache query command for taxonomic classification

Input

01dbdo_abundances

Output

0 0 0

Tools

metacache:

MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin.

metaeuk_easypredict

genomics annotation fasta

Annotation of eukaryotic metagenomes using MetaEuk

Input

01database

Output

0 0 0 0 0

Tools

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

metamaps_classify

metamaps long reads metagenomics taxonomy

Strain-level metagenomic assignment

Input

01234database_folder

Output

0 0 0 0 0 0 0 0

Tools

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

metamaps_mapdirectly

metamaps long reads metagenomics taxonomy

Maps long reads to a metamaps database

Input

01database

Output

0 0 0 0 0

Tools

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

METAMDBG_ASM

assembly long reads metagenome metagenome assembler

Metagenome assembler for long-read sequences (HiFi and ONT).

Input

01input_type

Output

0 0 0

Tools

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

metaphlan_makedb

metaphlan index database metagenomics

Build MetaPhlAn database for taxonomic profiling.

Input

NO input

Output

0 0

Tools

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

metaphlan_mergemetaphlantables

metagenomics classification merge table profiles

Merges output abundance tables from MetaPhlAn4

Input

01

Output

0 0

Tools

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

metaphlan_metaphlan

metagenomics classification fastq fasta sam

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

Input

01db_metaphlan_latestsave_samfile

Output

0 0 0 0 0

Tools

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

metaphlan3_mergemetaphlantables

metagenomics classification merge table profiles

Merges output abundance tables from MetaPhlAn3

Input

01

Output

0 0

Tools

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

metaphlan3_metaphlan3

metagenomics classification fastq bam fasta

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

Input

01db_metaphlan

Output

0 0 0 0

Tools

methylation bisulfite em-seq unconverted bam qc

Filter/tag unconverted reads in methylation sequencing with methylsieve; for maximum throughput it can also run inline in the alignment pipe (aligner | methylsieve | sort) rather than as a standalone step

Input

01012

Output

0 0 0

mgikit_demultiplex

demultiplex mgi fastq

Demultiplex MGI fastq files

Input

012

Output

0 0 0 0 0 0 0 0 0 0

Tools

mgikit demultiplex:

Demultiplex MGI fastq files

midas_run

bacteria metagenomic abundance

A tool to estimate bacterial species abundance

Input

0101mode

Output

0 0

Tools

assembler short-read de Bruijn

Minia is a short-read assembler based on a de Bruijn graph

Input

01

Output

0 0 0 0 0

miniasm

assembly pacbio nanopore

A very fast OLC-based de novo assembler for noisy long reads

Input

012

Output

0 0 0

minibwa_index

index fasta genome reference

Create a minibwa index for a reference genome

Input

01

Output

0 0

Tools

minibwa:

Aligning short and accurate long reads to a reference genome, a reimplementation of bwa-mem.

minibwa_map

map mem alignment fastq bam sam

Align fastq reads to a fasta reference using minibwa

Input

010101sort_bam

Output

0 0 0 0

Tools

minibwa:

Aligning short and accurate long reads to a reference genome, a reimplementation of bwa-mem.

samtools:

Tools for dealing with SAM, BAM and CRAM files

minimac4_compressref

haplotypes reference compression genomics

Compression of a reference panel for genotype imputation to .msav format

Input

012

Output

0 0

Tools

minimac4:

Computationally efficient genotype imputation

minimac4_impute

impute haploype genomics

Imputation of genotypes using a reference panel

Input

01234567

Output

0 0

Tools

minimac4:

Computationally efficient genotype imputation

minimap2_align

align fasta fastq genome paf reference

A versatile pairwise aligner for genomic and spliced nucleotide sequences

Input

0101bam_formatbam_index_extensioncigar_paf_formatcigar_bam

Output

0 0 0 0

Tools

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

minimap2_index

index fasta reference

Provides fasta index required by minimap2 alignment.

Input

01

Output

0 0

Tools

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

miniprot_align

align fasta protein genome paf gff

A versatile pairwise aligner for genomic and spliced nucleotide sequences

Input

0101

Output

0 0 0

Tools

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

miniprot_index

index fasta genome reference

Provides fasta index required by miniprot alignment.

Input

01

Output

0 0

Tools

mitochondrial genome reference genome NCBI

Download a mitochondrial genome to be used as reference for MitoHiFi.

NOTE: An optional NCBI API key can be supplied to MITOHIFI_FINDMITOREFERENCE. This should be set using Nextflow's secrets functionality:

nextflow secrets set NCBI_API_KEY <key>

See https://www.nextflow.io/docs/latest/secrets.html for more information.

Input

01

Output

0 0

Tools

findMitoReference.py:

Fetch mitochondrial genome in Fasta and Genbank format from NCBI

MITOHIFI_MITOHIFI

mitochondrion chloroplast PacBio

A python workflow that assembles mitogenomes from Pacbio HiFi reads

Input

01012input_modemito_code

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

mitorsaw_haplotype

heteroplasmy homoplasmy mitochondrial mitorsaw haplotype

Mitorsaw analyses mitochondrial variants and identifies heteroplasmy and homoplasmy

Input

012012include_hap_statsinclude_debug_output

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

subclonal deconvolution genomics cancer evolution

Subclonal deconvolution of cancer genome sequencing data.

Input

01

Output

0 0 0 0 0 0 0

mobsuite_recon

bacteria plasmid cluster

A tool to reconstruct plasmids in bacterial assemblies

Input

01

Output

0 0 0 0 0

Tools

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

modkit_bedmethyltobigwig

long-read ont methylation

Convert a bedMethyl file to bigWig format using modkit

Input

0101modcodes

Output

0 0

Tools

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.

modkit_callmods

methylation ont long-read

Call mods from a modbam, creates a new modbam with probabilities set to 100% if a base modification is called or 0% if called canonical

Input

01

Output

0 0 0

Tools

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.

modkit_pileup

methylation ont long-read

A bioinformatics tool for working with modified bases

Input

01201201

Output

0 0 0

Tools

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data

modkit_repair

methylation ont long-read

Repair the MM/ML tags on trimmed or hard-clipped ONT reads using untrimmed ONT reads.

Input

012

Output

0 0 0

Tools

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data.

molkartgarage_clahe

clahe image_processing imaging correction

Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.

Input

01

Output

0 0 0

Tools

alignment MSA genomics structure

Aligns protein structures using mTM-align

Input

01compress

Output

0 0 0 0

Tools

mTM-align:

Algorithm for structural multiple sequence alignments

demultiplexing hashing-based deconvolution single-cell

Identify singlets, doublets and negative cells from multiplexing experiments. Annotate singlets by tags.

Input

012

Output

0 0 0 0

multivcfanalyzer

vcf ancient DNA aDNA SNP GATK UnifiedGenotyper SNP table

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

Input

01010101allele_freqsgenotype_qualitycoveragehomozygous_freqheterozygous_freq01

Output

0 0 0 0 0 0 0 0 0 0 0 0 0

mummer

align genome fasta

MUMmer is a system for rapidly aligning entire genomes

Input

012

Output

0 0

muscle

msa multiple sequence alignment phylogeny

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

Input

01

Output

0 0 0 0 0 0 0 0 0

muscle5_super5

align msa multiple sequence alignment

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

Input

01compress

Output

0 0 0

Tools

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

muse_call

variant calling somatic wgs wxs vcf

pre-filtering and calculating position-specific summary statistics using the Markov substitution model

Input

012345

Output

0 0

Tools

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

muse_sump

variant calling somatic wgs wxs vcf

Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants

Input

0123

Output

0 0 0 0

Tools

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

mygene

mygene go annotation

Fetch the GO concepts for a list of genes

Input

01

Output

0 0 0

mykrobe_predict

fastq bam antimicrobial resistance

AMR predictions for supported species

Input

01species

Output

0 0 0

Tools

bam fasta fastq qc nanopore

Compare multiple runs of long read sequencing data and alignments

Input

01

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

nanofilt

nanopore filtering QC

Filtering and trimming of Oxford Nanopore Sequencing data

Input

01summary_file

quality control qc fastq sequencing summary nanopore

Run NanoPlot on nanopore-sequenced reads

Input

01

fasta Neisseria gonorrhoeae serotype

Serotyping Neisseria gonorrhoeae assemblies

Input

01

metagenomics statistics coverage complexity

write your description here

Input

metareadsformatmode

Output

meta versions npa npc npl npo

nonpareil_curve

metagenomics statistics coverage complexity redundancy diversity visualisation

Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file

Input

01

Output

0 0

Tools

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

nonpareil_nonpareil

metagenomics statistics coverage redundancy diversity complexity

Calculate metagenome redundancy curve from FASTQ files

Input

01formatmode

Output

0 0 0 0 0

Tools

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

nonpareil_nonpareilcurvesr

metagenomics statistics coverage redundancy diversity complexity multiqc

Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files

Input

01

Output

0 0 0 0 0

Tools

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

nonpareil_set

metagenomics statistics coverage complexity redundancy diversity visualisation

Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image

Input

01

Output

0 0

Tools

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

nucmer

align nucleotide sequence

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

Input

012

cnv bam tumor/normal

Calls CNVs in bam files from tumor patients

Input

01234bedfasta

Output

0 0 0 0

openms_decoydatabase

decoy database openms proteomics fasta

Create a decoy peptide database from a standard FASTA database.

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_fileconverter

file conversion mass spectrometry mzml mzxml openms proteomics

Converts between different mass spectrometry file formats (e.g. mzML, mzXML, mgf, mzData, dta, dta2d, featureXML, consensusXML, idXML).

Input

012

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_filefilter

filter mzML openms proteomics

Filters peptide/protein identification results by different criteria.

Input

01

Output

0 0 0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_idfilter

filter idXML openms proteomics

Filters peptide/protein identification results by different criteria.

Input

012

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_idmassaccuracy

mass_error openms proteomics

Calculates a distribution of the mass error from given mass spectra and IDs.

Input

012

Output

0 0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_idmerger

merge idXML openms proteomics

Merges several idXML files into one idXML file.

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_idripper

split idXML openms proteomics

Split a merged identification file into their originating identification files

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_idscoreswitcher

switch score idXML openms proteomics

Switches between different scores of peptide or protein hits in identification data

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_peakpickerhires

peak picking mzml openms proteomics

A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_peptideindexer

refresh idXML openms proteomics

Refreshes the protein references for all peptide hits.

Input

012

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_psmfeatureextractor

features idXML openms percolator proteomics psm

Computes extra features for each input PSM for use with Percolator rescoring.

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openms_textexporter

export openms proteomics text tsv

Exports various OpenMS XML formats (featureXML, consensusXML, idXML, mzML) to a human-readable text format.

Input

01

Output

0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

openmsthirdparty_cometadapter

search engine fasta mzml openms proteomics

Annotates MS/MS spectra using Comet.

Input

012

Output

0 0 0 0

Tools

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

CometAdapter:

Annotates MS/MS spectra using Comet.

Comet:

Comet is an open source tandem mass spectrometry (MS/MS) sequence database search tool.

opt_flip

opt opt flip transcripts off-target probes align probes

flip corrects probes that are aligning to the opposite strand of their intended target genes by reverse complementing them

Input

01012

Output

0 0

Tools

opt:

opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity

opt_stat

opt opt stat transcripts binding predictions off-target probes align probes summary stats

stat summarizes opt binding predictions

Input

0101gene_synonyms

Output

0 0

Tools

opt:

opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity

opt_track

opt opt track transcripts off-target probes align probes traget transcriptome

track aligns query probe sequences to any target transcriptome

Input

01012

Output

0 0

Tools

opt:

opt is a simple program that aligns probe sequences to transcript sequences to detect potential off-target probe activity

optitype

hla-typing ILP HLA-I

Perform HLA-I typing of sequencing data

Input

012

Output

0 0 0

orfipy

orfipy orfs open reading frames

orfipy is a tool written in python/cython to extract ORFs in an extremely and fast and flexible manner.

Input

01

covid pangolin lineage

Phylogenetic Assignment of Named Global Outbreak LINeages

Input

01

Output

report versions

pangolin_run

covid pangolin lineage run

Phylogenetic Assignment of Named Global Outbreak LINeages

Input

01db

Output

0 0

Tools

pangolin:

Phylogenetic Assignment of Named Global Outbreak LINeages

pangolin_updatedata

covid pangolin database lineage updatedata

Phylogenetic Assignment of Named Global Outbreak LINeages

Input

dbname

Output

0 0

Tools

pangolin:

Phylogenetic Assignment of Named Global Outbreak LINeages

parabricks_applybqsr

bqsr bam GPU-accelerated base quality score recalibration

NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).

Input

0101010101

Output

0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_dbsnp

annotation dbsnp vcf germline

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

Input

0123

Output

0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_deepvariant

variant deep variant vcf haplotypecaller germline

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

Input

012301

Output

0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_fq2bam

align sort bqsr duplicates

NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.

Input

0101010101output_fmt

Output

0 0 0 0 0 0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_fq2bammeth

align sort bqsr duplicates bwameth

NVIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

Input

010101known_sites

Output

0 0 0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_genotypegvcf

joint-genotyping gvcf vcf genotypegvcf germline

NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs

Input

0101

Output

0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_haplotypecaller

variant vcf haplotypecaller germline

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

Input

012301

Output

0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_indexgvcf

vcf gvcf tbi idx index GPU-accelerated

NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.

Input

01

Output

0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_minimap2

align sort bqsr duplicates long read

NVIDIA Clara Parabricks GPU-accelerated minimap2 for aligning long read sequences against a large reference database using an accelerated KSW2 to convert FASTQ to BAM/CRAM.

Input

01010101output_fmt

Output

0 0 0 0 0 0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_mutectcaller

variant vcf mutect2 mutect somatic

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

Input

01234501panel_of_normalspanel_of_normals_index

Output

0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_rnafq2bam

align star rna

This tool is the equivalent of fq2bam for RNA-Seq samples, receiving inputs in FASTQ format, performing alignment with the splice-aware STAR algorithm, optionally marking of duplicate reads, and outputting an aligned BAM file ready for variant and fusion calling.

Input

010101qc_metricsmark_duplicates

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_starfusion

fusion starfusion rna

This tool uses the GPU to perform fusion calling for RNA-Seq samples, utilizing the STAR-Fusion algorithm. This requires input of a genome resource library, in accordance with the original STAR-Fusion tool, and outputs candidate fusion transcripts.

Input

0101

Output

0 0 0

Tools

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

parabricks_starfusion_build

download starfusion build

Download STAR-fusion genome resource required to run STAR-Fusion caller

Input

0101fusion_annot_libdfam_speciespfam_urldfam_urlsannot_filter_url

Output

0 0

Tools

star-fusion:

Fusion calling algorithm for RNAseq data

parabricks_stargenomegenerate

index fasta genome reference

This is near identical to the existing star/genomegenerate however it runs on an older version (2.7.2a) that is required for Parabricks compatibility.

Input

0101

Output

0 0

Tools

paraphase long-read HiFi

HiFi-based caller for highly homologous genes

Input

0120101

Output

0 0 0 0 0 0 0 0

paraphrase

long-read paraphrase annotate

Parse and annotate paraphrase JSONs

Input

01201tsv_output

Output

0 0 0

parsesdrf_convert

sdrf sdrf-pipelines samplesheet proteomics immunopeptidomics mhcquant openms maxquant msstats normalyzerde diann metadata

Convert an SDRF (Sample and Data Relationship Format) file into a pipeline-specific samplesheet/configuration using the parse_sdrf convert-<format> subcommands of the sdrf-pipelines package. The chosen format selects the subcommand; the module owns the output filenames and emits one tuple per supported format (mhcquant, openms, maxquant, msstats, normalyzerde, diann).

Input

0123format

Output

0 0 0 0 0 0 0

Tools

sdrf-pipelines:

A set of tools to validate and convert SDRF files for proteomics pipelines.

parsnp

alignment bacteria phylogeny microbial genomics core genome SNP

Parsnp is a command-line-tool for efficient microbial core genome alignment and SNP detection.

Input

01reference

Output

0 0 0 0 0 0

pasty

bacteria serogroup fasta assembly

Serogroup Pseudomonas aeruginosa assemblies

Input

01

Output

0 0 0 0

pbbam_pbmerge

pbbam pbmerge bam

The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.

Input

01

Output

0 0 0

Tools

pbbam:

PacBio BAM C++ library

pbccs

ccs pacbio isoseq subreads

Pacbio ccs - Generate Highly Accurate Single-Molecule Consensus Reads

Input

012chunk_numchunk_on

Output

0 0 0 0 0 0

pbcpgtools_alignedbamtocpgscores

methylation cpg pacbio

Converts aligned BAM files into CpG methylation scores

Input

012

Output

0 0 0 0 0 0 0 0 0 0

Tools

pbcpgtools:

Collection of tools for the analysis of CpG data

pbjasmine

genomics methylation bam pacbio

Identify specific base modifications in PacBio HiFi reads by analyzing polymerase kinetic signatures

Input

01

Output

0 0

pbmarkdup

markdup bam fastq fasta

Takes one or multiple sequencing chips of an amplified library as HiFi reads and marks or removes duplicates.

Input

01

Output

0 0 0 0

pbmm2_align

align pacbio genomics

Alignment with PacBio's minimap2 frontend

Input

0101

Output

0 0

Tools

imaging registration ome-tif Staging MCMICRO

Formatting PhenoImager TIFF output files into stacked and normalized OME-TIFF files per cycle, compatible as ASHLAR and MCMICRO input.

Input

01

Output

0 0 0 0

PhiSpy

genomics virus phage prophage annotation identification

Predict prophages in bacterial genomes

Input

01

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

polishing assembly variant calling

Automatically improve draft assemblies and find variation among strains, including large event detection

Input

01012pilon_mode

Output

0 0 0 0 0 0

pindel_pindel

deletions insertions tandem duplications

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Input

012fastafaibed

Output

0 0 0 0 0 0 0 0 0 0 0

Tools

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

pints_caller

peak-calling CoPRO GRO-cap PRO-cap CAGE NETCAGE RAMPAGE csRNA-seq STRIPE-seq PRO-seq GRO-seq

Main caller script for peak calling

Input

012assay_type

Output

0 0 0 0 0

Tools

pints:

Peak Identifier for Nascent Transcripts Starts (PINTS)

pirate

gff pan-genome alignment

Pangenome toolbox for bacterial genomes

Input

01

Output

0 0 0

plasmidfinder

fasta fastq plasmid

Identify plasmids in bacterial sequences and assemblies

Input

01

genomics synteny rearrangements chromosome

Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes.

Input

0101010101010101

Output

0 0

pmdtools_filter

pmdtools aDNA filter damage

pmdtools command to filter ancient DNA molecules from others

Input

012thresholdreference

Output

0 0

Tools

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

pneumocat

fastq serotype Streptococcus pneumoniae

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

Input

01

Output

0 0 0

polypolish_polish

assembly polishing genome polishing ont

Polishing genome assemblies with short reads.

Input

0101save_debug

Output

0 0 0

Tools

porechop_abi adapter nanopore

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

Input

01custom_adapters

Output

0 0 0

porechop_porechop

adapter nanopore demultiplexing

Adapter removal and demultiplexing of Oxford Nanopore reads

Input

01

Output

0 0 0

Tools

porechop:

Adapter removal and demultiplexing of Oxford Nanopore reads

portcullis_full

rnaseq genome splice junction

Run all Portcullis steps in one go

Input

010101

Output

0 0 0 0 0 0 0 0

Tools

fasta genome qc nucleotides

Calculate pairwise nucleotide identity with respect to a reference sequence

Input

0101compress

Output

0 0 0 0 0

presto_filterseq

immcantation airrseq genomics immunoinformatics

Filter reads by quality score.

Input

01

Output

0 0 0 0

Tools

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

pretextmap

contact bam map

converts sam/bam/cram/pairs into genome contact map

Input

01012

fastq fasta filter trim

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

Input

01

Output

0 0 0 0 0

prodigal

prokaryotes gene finding microbial

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

Input

01output_format

coexpression correlation proportionality logratio propr corpcor

Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.

Input

01

Output

0 0 0 0 0 0 0

Tools

propr:

Logratio methods for omics data

orthology co-orthology homology sequence similarity spectral clustering comparative genomics genomics

Proteinortho is a tool to detect orthologous genes within different species.

Input

01

Output

0 0 0 0

proteus_readproteingroups

proteomics proteus readproteingroups

reads a maxQuant proteinGroups file with Proteus

Input

012

Output

0 0 0 0 0 0 0 0 0 0

Tools

clonal population WGS subclonal deconvolution copy number alterations somatic mutations

PyClone-VI is a software for inferring the clonal population structure of cancers by using variant allele frequencies and copy number data of single or multiple samples.

Input

012

sort annotation prediction prokaryote

Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.

Input

01output_format

Output

0 0 0 0 0

qcat

demultiplex nanopore sample

Demultiplexer for Nanopore samples

Input

01barcode_kit

quast assembly quality contig scaffold

Quality Assessment Tool for Genome Assemblies

Input

010101

Output

0 0 0 0 0 0

quilt_quilt

imputation low-coverage genotype genomics vcf

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

Input

0123456789101112131415012

Output

0 0 0 0 0 0

Tools

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

quilt_quilt2

imputation low-coverage genotype genomics vcf

QUILT2 is an R and C++ program for fast genotype imputation from low-coverage sequence using a large phased reference panel in VCF/BCF format.

Input

0123456789101112131415012

Output

0 0 0 0 0 0

Tools

coverage depth subsampling

Randomly subsample sequencing reads to a specified coverage

Input

012depth_cutoff

Output

0 0

rattle_cluster

transcriptomes cluster nanopore

Reference-free reconstruction and quantification of transcriptomes from long-read sequencing

Input

01

Output

0 0

Tools

rattle:

Reference-free reconstruction and quantification of transcriptomes from long-read sequencing

raven

de novo assembly genome genome assembler long uncorrected reads

De novo genome assembler for long uncorrected reads.

Input

01

Output

0 0 0

raw2ometiff

ome tiff imaging

write your description here

Input

01

RNA RNAseq rRNA ribosomal RNA rRNA depletion rRNA removal rRNA filtering deep learning Riboseq genomics

Accurate and rapid RiboRNA sequences Detector based on deep learning

Input

01length

Output

0 0 0 0

ribotish_predict

riboseq predict bam

Quality control of riboseq bam data

Input

01201201201010101

Output

0 0 0 0

Tools

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

ribotish_quality

riboseq quality bam

Quality control of riboseq bam data

Input

01201

Output

0 0 0 0

Tools

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

ribotricer_detectorfs

riboseq orf genomics

Accurate detection of short and long active ORFs using Ribo-seq data

Input

01201

Output

0 0 0 0 0 0 0 0 0 0 0

Tools

ribotricer:

Python package to detect translating ORF from Ribo-seq data

ribotricer_prepareorfs

riboseq orf genomics

Accurate detection of short and long active ORFs using Ribo-seq data

Input

012

Output

0 0

Tools

rna assembly rnaseq de novo quality control

Assess the quality of an RNAseq assembly with or without a reference genome

rpbp metagene orf riboseq

Build per-read-length pileups of Ribo-seq read 5'-ends around annotated start codons - the "metagene profile". For each read length, the profile counts how many reads of that length have their 5' end at each position in a window around every annotated start codon, summed across all transcripts. Looking at the profile across the window reveals whether reads of that length show the 3-nucleotide periodicity characteristic of translating ribosomes.

This per-length view matters because different ribosome footprint lengths place the ribosomal P-site (the codon being decoded) at different offsets from the read's 5' end, so each length needs its own offset calibration. Output is consumed by rpbp/estimatemetagenebayesfactors, which scores each (length, offset) combination for periodicity.

Input

01201

Output

0 0

Tools

rpbp:

Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames

rpbp_extractorfprofiles

rpbp orf psite profile riboseq

Build a per-ORF P-site count vector for every candidate open reading frame (ORF) in the catalogue. For each ORF, walks the spliced exons in 3-nucleotide codon steps and counts the P-site positions (read 5'-end coordinate plus the length-specific offset selected upstream) that fall in each codon. Counts are summed across all read lengths that passed the periodicity filter from rpbp/getperiodiclengthsoffsets.

The resulting per-ORF vectors are the input to Bayesian translation scoring in rpbp/estimateorfbayesfactors: a translated ORF should show P-site density concentrated at codon-start positions, while a non-translated region should look flat or noisy. Emitted as a sparse matrix (one row per ORF, columns indexed by codon position).

Input

01230101

Output

0 0

Tools

rpbp:

Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames

rpbp_getperiodiclengthsoffsets

rpbp psite offset filter riboseq

Filter the per-read-length P-site offset table down to the (length, offset) pairs that will actually drive ORF-level scoring. Drops read lengths whose metagene profile is too sparsely populated, or whose periodicity Bayes factor is too low / too uncertain, so that downstream P-site counting only uses read lengths with a clean 3-nucleotide signal.

Wraps Rp-Bp's get_periodic_lengths_and_offsets Python helper directly. Thresholds are configured via named flags in ext.args: --min-count (default: 1000), --min-bf-mean (default: 5), --max-bf-var (default: no limit), --min-bf-likelihood (default: 0.5). Defaults mirror rpbp.defaults.metagene_options.

Input

01

Output

0 0

Tools

rpbp:

Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames

rpbp_preparegenome

rpbp orf prepare genome bed riboseq

Build the per-ORF reference files that Rp-Bp's downstream scoring needs, starting from a genome FASTA and an annotation GTF. Enumerates every candidate open reading frame (ORF) in the annotation (annotated CDSs plus alternative start codons within transcript exons), records their genomic and per-exon coordinates, and labels them with the transcript and gene they belong to.

Invokes Rp-Bp's get_orfs Python function directly, chaining the upstream helpers gtf-to-bed12, extract-bed-sequences, extract-orf-coordinates, split-bed12-blocks and label-orfs. Bypasses Rp-Bp's prepare-rpbp-genome umbrella script, which would also build bowtie2 (rRNA filtering) and STAR (alignment) indices - neither is consumed by the Rp-Bp tools wrapped here, since alignment is supplied externally as a BAM.

A minimal chrName.txt (one contig name per line) is seeded from the FASTA headers because gtf-to-bed12 reads it via --chr-name-file to control output sort order.

Note: emits the *.annotated.bed.gz filenames produced by get_orfs directly, rather than the *.bed.gz-renamed forms that the upstream umbrella prepare-rpbp-genome script produces. The downstream module outputs and consumers in this module set reference these names explicitly, so the two are functionally equivalent.

Input

012

Output

0 0 0 0

Tools

rpbp:

Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames

rpbp_selectfinalpredictionset

rpbp orf bayes prediction riboseq

Produce the final filtered set of predicted translated ORFs from the per-ORF Bayes factor table. Applies the standard Rp-Bp prediction rules: a minimum Bayes-factor cutoff (favouring translated over untranslated), a minimum ORF length, and overlap resolution so that among overlapping candidates only the highest-scoring representative is kept.

Emits three files describing the same prediction set: a BED of ORF genomic coordinates plus score, a FASTA of ORF DNA sequences (extracted from the genome FASTA), and a FASTA of the corresponding translated protein sequences. This is the terminal step of the Rp-Bp per-sample chain.

Input

0101

Output

0 0 0 0

Tools

rpbp:

Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames

rpbp_selectperiodicoffsets

rpbp psite offset orf riboseq

Pick the single best P-site offset for each read length from the per-(length, offset) Bayes factor table produced upstream. For each read length, the offset with the highest periodicity Bayes-factor mean is selected - this is the offset that, when added to a read's 5' end, is estimated to land closest to the ribosomal P-site (the codon being decoded). Downstream, these offsets are used to convert raw read 5'-end coordinates into P-site positions when counting reads against candidate ORFs.

Emits one row per read length (length, best offset, supporting Bayes factor statistics). The next step (rpbp/getperiodiclengthsoffsets) filters this table to the high-quality pairs that pass user-specified count / signal thresholds before P-site counting in rpbp/extractorfprofiles.

Input

01

Output

0 0

Tools

dbCAN download CAZyme CAZyme gene Cluster genomes

command from run_dbcan to prepare the database for dbCAN annotation.

Input

Output

0 0

Tools

run_dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

rundbcan_easycgc

dbCAN download CAZyme CAZyme gene Cluster genomes

CGC annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

Input

01012dbcan_db

Output

0 0 0 0 0 0 0 0 0 0

Tools

dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

rundbcan_easysubstrate

dbCAN download CAZyme CAZyme gene Cluster genomes

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

Input

01012dbcan_db

Output

0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

assembly hi-c scaffolding long reads salsa salsa2

SALSA, A tool to scaffold long read assemblies with HiC

Input

012bedgfadupfilter_bed

Output

0 0 0 0

saltshaker_call

saltshaker mitosalt mtDNA structural-variant calling

mtDNA deletion and duplication calling downstream of mitosalt

Input

01201flankheteroplasmy_limitmito_lengthheavy_strand_origin_startheavy_strand_origin_endlight_strand_origin_startlight_strand_origin_end

Output

0 0

Tools

saltshaker:

A Python package for classifying and visualizing mitochondrial structural variants from MitoSAlt pipeline output.

saltshaker_classify

saltshaker mitosalt mtDNA structural-variant calling

mtDNA deletion and duplication classification downstream of mitosalt

Input

01mito_name

Output

0 0 0 0

Tools

saltshaker:

A Python package for classifying and visualizing mitochondrial structural variants from MitoSAlt pipeline output.

saltshaker_plot

saltshaker mitosalt mtDNA structural-variant calling

mtDNA deletion and duplication plotting downstream of mitosalt

Input

01

Output

0 0

Tools

saltshaker:

A Python package for classifying and visualizing mitochondrial structural variants from MitoSAlt pipeline output.

sam2lca_analyze

LCA alignment bam metagenomics Ancestor multimapper

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

Input

012database

Output

0 0 0 0

Tools

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

sambamba_depth

depth coverage sambamba

Outputs a coverage file from bam files

Input

01201mode

Output

0 0

Tools

sambamba:

Tools for working with SAM/BAM data

sambamba_flagstat

stats flagstat sambamba

Outputs some statistics drawn from read flags.

Input

01

Output

0 0

Tools

sambamba:

Tools for working with SAM/BAM data

sambamba_markdup

markduplicates duplicates bam

find and mark duplicate reads in BAM file

Input

01

Output

0 0 0

Tools

sambamba:

process your BAM data faster!

samblaster

sort duplicate marking bam

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

Input

01

cat collate fixmate sort markduplicates bam sam cram multi-tool

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

Input

01012

Output

0 0 0 0 0 0

Tools

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort

sort bam sam cram

Sort SAM/BAM/CRAM file

Input

01012index_format

Output

0 0 0 0 0

Tools

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_splitheader

view bam sam cram readgroup program sequence header

Extract header lines from a SAM/BAM/CRAM file into separate files depending on type

Input

01

Output

0 0 0 0

Tools

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_stats

statistics counts bam sam cram

Produces comprehensive statistics from SAM/BAM/CRAM file

Input

012012

Output

0 0

Tools

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_view

view bam sam cram

filter/convert SAM/BAM/CRAM file

Input

0120120101index_format

Output

0 0 0 0 0 0 0 0 0

Tools

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

savana_classify

classify structural variants somatic germline genomics

Classify structural variants using SAVANA

Input

01

Output

0 0 0 0 0 0

Tools

savana:

SAVANA: a somatic structural variant caller for long-read data.

savana_cna

cna copy-number long-read nanopore pacbio genomics

Copy-number aberration calling with SAVANA for long-read tumour/normal alignments.

Input

01234567012contigsblacklistg1000_vcf

Output

0 0 0 0 0 0 0 0 0

Tools

savana:

SAVANA: somatic SV and copy-number aberration caller for long-read data.

savana_run

structural variants long read genomics

Identify and cluster SV breakpoints from long-read alignments.

Input

01234012

Output

0 0 0 0 0

Tools

savana:

SAVANA: a somatic structural variant and copy-number caller for long-read data.

savana_to

structural variants somatic variants copy number analysis long-read sequencing genomics

Tumour-only somatic SV calling with optional copy-number analysis in SAVANA

Input

012345012010101

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

savana:

SAVANA: a somatic structural variant and copy-number caller for long-read data.

sawfish_discover

sawfish structural-variant calling CNV calling Pacbio

SV candidate discovery from PacBio HiFi data

Input

01201010101

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

sawfish:

Joint structural variant and copy number variant caller for HiFi sequencing data

sawfish_jointcall

sawfish structural-variant calling joint calling CNV calling Pacbio

Joint calling of structural variants from multiple samples using Sawfish

Input

010101201

Output

0 0 0 0 0 0 0 0 0 0 0 0

Tools

sawfish:

Joint structural variant and copy number variant caller for HiFi sequencing data

scanpy_filter

filter quality-control scanpy single-cell preprocessing

Filter cells and genes in single-cell RNA-seq data using Scanpy

Input

01min_genesmin_cellsmin_counts_genemin_counts_cellmax_mito_percentagesymbol_col

Output

0 0

Tools

scanpy:

Single-Cell Analysis in Python

SCANPY_HASHSOLO

anndata single-cell hashing demultiplexing scanpy

Probabilistic demultiplexing of cell hashing data

Input

012

Output

0 0 0 0

Tools

scanpy:

Single-cell analysis in Python. Scales to >100M cells.

scanpy_pca

pca principal-component-analysis scanpy single-cell dimensionality-reduction

Perform principal component analysis (PCA) on single-cell RNA-seq data using Scanpy

Input

01key_added

Output

0 0 0

Tools

scanpy:

Single-Cell Analysis in Python

scanpy_scrublet

scrublet doublet-detection scanpy single-cell

Detect doublets in single-cell RNA-seq data using Scrublet via Scanpy

Input

01batch_col

Output

0 0 0

Tools

gwas pangenome prokaryote

Use pangenome outputs for GWAS

Input

012tree

Output

0 0

scramble_clusteranalysis

soft-clipped clusters scramble cluster analysis clusteridentifier

The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier

Input

0101mei_ref

Output

0 0 0 0

Tools

scramble:

Soft Clipped Read Alignment Mapper

scramble_clusteridentifier

bam cram soft-clipped clusters

The cluster_identifier tool of Scramble identifies soft clipped clusters

Input

01201

Output

0 0

Tools

scramble:

Soft Clipped Read Alignment Mapper

scvitools_scar

single-cell scRNA-seq ambient RNA removal

Module to use scAR to remove ambient RNA from single-cell RNA-seq data

Input

012input_layeroutput_layermax_epochsn_batch

Output

0 0

Tools

scvitools:

scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data

scar:

scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.

scvitools_solo

scvi solo doublets

Detect doublets in single-cell RNA-Seq data

Input

01batch_keymax_epochs

Output

0 0 0

Tools

scvitools:

A scalable toolkit for probabilistic modeling applied to single-cell omics data

seacr_callpeak

peak-caller peaks bedgraph cut&tag cut&run chromatin seacr

Call peaks using SEACR on sequenced reads in bedgraph format

Input

012threshold

Output

0 0

Tools

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

segemehl_align

alignment circrna splicing fusions

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

Input

01fastaindex

Output

0 0 0 0 0

Tools

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

segemehl_index

index circrna splicing fusions

Generate genome indices for segemehl align

Input

fasta

Output

0 0

Tools

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

segul_aligntrim

alignment trimming phylogenomics

Trim multiple sequence alignments by filtering columns based on missing data proportion or parsimony informative sites using SEGUL.

Input

01

Output

0 0 0

Tools

segul:

An ultrafast and memory efficient tool for phylogenomics

semibin_singleeasybin

binning assembly-binning metagenomics

metagenomic binning with self-supervised learning

Input

012

Output

0 0 0 0 0

Tools

semibin:

Metagenomic binning with semi-supervised siamese neural network

sentieon_applyvarcal

sentieon applyvarcal varcal VQSR

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

Input

0123450101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_bwaindex

index fasta genome reference sentieon

Create BWA index for reference genome

Input

01

Output

0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_bwamem

mem bwa alignment map fastq bam sentieon

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

Input

01010101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_collectvcmetrics

vcf sentieon genomics

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

Input

012012010101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_coveragemetrics

coverage sentieon genomics

Accelerated implementation of the GATK DepthOfCoverage tool.

Input

01201010101

Output

0 0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_datametrics

metrics bam sentieon

Collects multiple quality metrics from a bam file

Input

0120101plot_results

Output

0 0 0 0 0 0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_dedup

mem dedup map bam cram sentieon

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

Input

0120101

Output

0 0 0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_dnamodelapply

dnamodelapply vcf filter sentieon

modifies the input VCF file by adding the MLrejected FILTER to the variants

Input

012010101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_dnascope

dnascope sentieon variant_calling

DNAscope algorithm performs an improved version of Haplotype variant calling.

Input

01230101010101pcr_indel_modelemit_vcfemit_gvcf

Output

0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

SENTIEON_GVCFTYPER

joint genotyping genotype gvcf

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

Input

012301010101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_haplotyper

sentieon haplotypecaller haplotype

Runs Sentieon's haplotyper for germline variant calling.

Input

0123401010101emit_vcfemit_gvcf

Output

0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_hsmetrics

metrics bam sentieon

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

Input

012340101

Output

0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_qualcal

base quality score recalibration bqsr sentieon

Generate recalibration table and optionally perform base quality recalibration

Input

0120101010101generate_recalibrated_bams

Output

0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_readwriter

merge convert readwriter sentieon

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

Input

0120101

Output

0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_rsemcalculateexpression

rsem expression quantification sentieon

Calculate expression with RSEM

Input

01index

Output

0 0 0 0 0 0 0 0 0 0

Tools

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

sentieon_rsempreparereference

rsem genome index sentieon

Prepare a reference genome for RSEM

Input

fastagtf

Output

0 0 0 0 0

Tools

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

sentieon_staralign

align fasta genome reference

Align reads to a reference genome using Sentieon STAR

Input

010101star_ignore_sjdbgtf

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_tnfilter

tnfilter filter sentieon tnhaplotyper2 vcf

Filters the raw output of sentieon/tnhaplotyper2.

Input

01234560101

Output

0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_tnhaplotyper2

tnseq tnhaplotyper2 sentieon variant_calling

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

Input

012301010101010101emit_orientation_dataemit_contamination_data

Output

0 0 0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_tnscope

tnscope sentieon variant_calling

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

Input

01230101010101010101

Output

0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_varcal

sentieon varcal variant recalibration

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

Input

012resource_vcfresource_tbilabelsfastafai

Output

0 0 0 0 0

Tools

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

sentieon_wgsmetrics

metrics bam sentieon

Collects whole genome quality metrics from a bam file

Input

012010101

Output

0 0

Tools

fasta fastq salmonella sertotype

Salmonella serotype prediction from reads and assemblies

Input

01

quality_control qc preprocessing

Sequence quality metrics for FASTQ and uBAM files.

Input

01

Output

0 0 0

sequencetools_pileupcaller

genotyping mpileup random draw pseudohaploid pseudodiploid freqsum plink bed eigenstrat

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

Input

01snpfilesample_names_fn

Output

0 0 0 0

Tools

sequencetools:

Tools for population genetics on sequencing data

sequenzautils_bam2seqz

sequenzautils copy number bam2seqz

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

Input

012fastawigfile

Output

0 0

Tools

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

sequenzautils_gcwiggle

sequenzautils copy number gc_wiggle

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

Input

01

Output

0 0

Tools

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

seqwish_induce

induce paf gfa graph variation graph

Induce a variation graph in GFA format from alignments in PAF format

Input

012

Output

0 0

Tools

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

seroba_run

fastq serotype Streptococcus pneumoniae

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

Input

01

Output

0 0 0

Tools

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

severus

structural variation somatic germline long-read

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

Input

01234501

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

sexdeterrmine

sex determination genetic sex relative coverage ancient dna

Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.

Input

01sample_list_file

nanopore de-novo assembly longread

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

Input

01model

Output

0 0 0 0

shasum

checksum sha256 256 bit

Print SHA256 (256-bit) checksums.

Input

01as_separate_files

Output

0 0

Tools

md5sum:

Create an SHA256 (256-bit) checksum.

shigapass

bacteria shigella stec

ShigaPass is an in silico tool used to predict Shigella serotypes and to differentiate between Shigella, EIEC (Enteroinvasive E. coli), and non Shigella/EIEC using assembled whole genomes.

Input

01

Output

0 0 0

shigatyper

fastq shigella serotype

Determine Shigella serotype from Illumina or Oxford Nanopore reads

Input

01

bacterial assembly illumina

Assemble bacterial isolate genomes from Illumina paired-end reads

Input

01

Output

0 0 0 0 0 0

sickle

fastq sliding window trimming

A windowed adaptive trimming tool for FASTQ files using quality

Input

012

Output

0 0 0 0 0

sigprofiler

mutational signatures SBSs, DBSs, IDs cancer genomics

mutational signature deconvolution of cancer cells

Input

01genomegenome_installed_path

Output

0 0

Tools

sigprofilermatrixgenerator:

Sigprofilermatrixgenerator is a Python-based tool. It creates mutational matrices for all types of somatic mutations (SBS, DBS, and IDs)

sigprofilerextractor:

SigProfilerExtractor is a Python-based tool. It allows de novo extraction of mutational signatures from data generated in a matrix format, identification of the number of operative mutational signatures, and their activities in each sample.

simpleaf_index

indexing transcriptome gene expression SimpleAF

Indexing of transcriptome for gene expression quantification using SimpleAF

Input

012010101

Output

0 0 0 0 0 0

Tools

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

simpleaf_quant

quantification gene expression SimpleAF

simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

Input

0120120123resolution01

Output

0 0 0 0 0

Tools

simpleaf:

SimpleAF is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

singlem_dbdownload

metagenomics database download profiling singlem metapackage

Download the SingleM metapackage database used for metagenome profiling

Input

NO input

Output

0 0

Tools

assembly de-novo assembly microbial genomics fasta fastq

Assemble microbial genomes from short-read FASTQ files into contigs in FASTA format using SKESA.

Input

01

Output

0 0

slamdunk_all

slamseq rna-seq mapping snp quantification

Complete SLAMseq analysis pipeline including read mapping, filtering, SNP calling, and quantification

Input

01010101

Output

0 0 0 0 0 0 0 0

Tools

slamdunk:

Slamdunk is a software tool for SLAMseq data analysis that performs mapping, filtering, SNP calling, and quantification of metabolic RNA labeling experiments.

slamdunk_map

slamseq rna-seq mapping

Slamdunk read mapping using NextGenMap’s SLAMSeq alignment settings.

Input

0101

Output

0 0

Tools

slamdunk:

Slamdunk is a software tool for SLAMseq data analysis that performs mapping, filtering, SNP calling, and quantification of metabolic RNA labeling experiments.

slimfastq

FASTQ compression lossless

Fast, efficient, lossless compression of FASTQ files.

Input

01

Output

0 0

smncopynumbercaller

copy number BAM CRAM SMN1 SMN2

tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.

Input

012

Output

0 0 0

smoothxg

gfa graph pangenome variation graph POA

Linearize and simplify variation graph in GFA format using blocked partial order alignment

Input

01

Output

0 0 0

smoove_call

structural variants SV vcf wgs

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

Input

01230101

Output

0 0

Tools

sniffles structural-variant calling long-read

structural-variant calling with sniffles

Input

0120101vcf_outputsnf_output

Output

0 0 0 0

snippy_core

core alignment bacteria snippy

Core-SNP alignment from Snippy outputs

Input

012reference

Output

0 0 0 0 0 0

Tools

snippy:

Rapid bacterial SNP calling and core genome alignments

snippy_run

variant fastq bacteria

Rapid haploid variant calling

Input

01reference

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

SNPs invariant constant

Rapidly extracts SNPs from a multi-FASTA alignment.

Input

alignment

Output

0 0 0 0

somalier_ancestry

relatedness QC bam cram vcf gvcf ancestry identity kinship informative sites family

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Input

01012

Output

0 0 0

Tools

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

somalier_extract

relatedness QC bam cram vcf gvcf ancestry identity kinship informative sites family

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Input

012010101

Output

0 0

Tools

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

somalier_relate

relatedness QC bam cram vcf gvcf ancestry identity kinship informative sites family

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Input

012sample_groups

Output

0 0 0 0

Tools

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

sortmerna

filtering mapping clustering rRNA ribosomal RNA

Local sequence alignment tool for filtering, mapping and clustering.

Input

010101

Output

0 0 0 0

Tools

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

souporcell

clustering mixed-genotype genomics

souporcell is a method for clustering mixed-genotype scRNAseq experiments by individual.

Input

012301

Output

0 0 0 0

soupx

single-cell transcriptomics ambient

Estimation and removal of cell free mRNA contamination in droplet based single cell RNA-seq data.

The filtered counts are preprocessed with Seurat (LogNormalize, PCA, kNN graph, clustering) to provide cluster assignments to SoupX, which then estimates per-cluster contamination and adjusts counts. The adjusted counts are written to the output H5AD as an ambient layer.

Input

012npcscluster_algorithm

genome assembly genome assembler small genome de novo assembler

Assembles a small genome (bacterial, fungal, viral)

Input

0123ymlhmm

Output

0 0 0 0 0 0 0 0

sparse_signatures

mutational signatures SBS bs genome reference

mutational signature deconvolution of cancer cells

Input

01genome

Output

0 0 0 0 0 0 0

Tools

sparsesignatures:

SparseSignatures is an R-based computational framework which performs de novo extraction, inference, interpretation, or deconvolution of mutational counts of a large number of patients.

bsgenome.hsapiens.1000genomes.hs37d5:

Reference Genome Sequence (hs37d5), based on NCBI GRCh37

bsgenome.hsapiens.ucsc.hg38:

Full genomic sequences for Homo sapiens (UCSC genome hg38)

spatyper

fasta spatype spa

Computational method for finding spa types.

Input

01repeatsrepeat_order

Output

0 0

splitubam

long-read bam genomics

split one ubam into multiple, per line, fast

Input

01

bacteria fasta streptococcus

Serotype prediction of Streptococcus suis assemblies

Input

01

Output

0 0

stacks_refmap

rad-seq gbs variant-calling population-genomics genomics

ref_map.pl script from Stacks for the analysis of RAD-seq data when a reference genome is available.

Input

01popmap

Output

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tools

stacks:

Stacks is a software pipeline for building loci from short-read sequences.

stadeniolib_scramble

sam bam cram compression

Advanced sequence file format conversions

Input

01fastafaigzi

Output

0 0 0

Tools

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

stainwarpy_extractchannel

image registration histology hne multiplexed channel extraction

Extract a single channel image from multiplexed tissue images using stainwarpy

Input

01

Output

0 0

Tools

stainwarpy:

Register H&E stained and Multiplexed tissue images using feature-based image registration

stainwarpy_register

image registration histology hne multiplexed

Register H&E stained and Multiplexed tissue images using feature-based image registration

Input

0101fixed_imgfinal_sz

Output

0 0 0 0

Tools

stainwarpy:

Register H&E stained and Multiplexed tissue images using feature-based image registration

stainwarpy_transformsegmask

image registration histology hne multiplexed segmentation mask

Transform segmentation mask of multiplexed or H&E stained tissue images using stainwarpy

Input

01010101fixed_imgfinal_sz

Output

0 0

Tools

stainwarpy:

Register H&E stained and Multiplexed tissue images using feature-based image registration

staphopiasccmec

amr fasta sccmec

Predicts Staphylococcus aureus SCCmec type based on primers.

Input

01

stardist segmentation image gpu spatial-transcriptomics

Cell and nuclear segmentation with star-convex shapes

Input

0101

align count genome reference

Create a counts matrix for single-cell data using STARSolo, handling cell barcodes and UMI information.

Input

012opt_whitelist01

Output

0 0 0 0 0 0

stecfinder

serotype Escherichia coli fastq fasta

Serotype STEC samples from paired-end reads or assemblies

Input

01

Output

0 0

stitch

imputation genomics vcf bgen cram bam sam

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

Input

01234567891011121314012seed

strobealign strobemers alignment map fastq bam sam

Align short reads using dynamic seed size with strobemers

Input

010101sort_bam

Output

0 0 0 0 0 0 0 0 0 0

strvctvre_strvctvre

structural variants sv deletions duplications annotations

a structural variant classifier for exonic deletions and duplications

Input

01230101

Output

0 0 0

Tools

strvctvre:

StrVCTVRE, a structural variant classifier for exonic deletions and duplications

subread_featurecounts

counts fasta genome reference

Count reads that map to genomic features

Input

012

Output

0 0 0

Tools

featurecounts:

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

summarizedexperiment_summarizedexperiment

gene transcript sample matrix assay

SummarizedExperiment container

Input

010101

Output

0 0 0

Tools

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

suppa_generateevents

suppa alternative splicing genomics

Generates alternative splicing events from a GTF file using SUPPA.

Input

01formatpool_genesevent_typeboundarythresholdexon_length

Output

0 0

Tools

suppa:

Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions.

survivor_bedpetovcf

bedpe conversion vcf structural variants

Converts a bedpe file to a VCF file (beta version)

Input

01

Output

0 0

Tools

survivor:

Toolset for SV simulation, comparison and filtering

survivor_filter

survivor filter vcf structural variants

Filter a vcf file based on size and/or regions to ignore

Input

012minsvmaxsvminallelefreqminnumreads

Output

0 0

Tools

survivor:

Toolset for SV simulation, comparison and filtering

survivor_merge

survivor merge vcf structural variants

Compare or merge VCF files to generate a consensus or multi sample VCF files.

Input

01max_distance_breakpointsmin_supporting_callersaccount_for_typeaccount_for_sv_strandsestimate_distanced_by_sv_sizemin_sv_size

Output

0 0

Tools

survivor:

Toolset for SV simulation, comparison and filtering

survivor_simsv

structural variants simulation sv vcf

Simulate an SV VCF file based on a reference genome

Input

010101snp_mutation_frequencysim_reads

Output

0 0 0 0 0 0

Tools

survivor:

Toolset for SV simulation, comparison and filtering

survivor_stats

survivor statistics vcf structural variants

Report multiple stats over a VCF file

Input

01minsvmaxsvminnumreads

Output

0 0

Tools

survivor:

Toolset for SV simulation, comparison and filtering

sushie

gwas fine mapping ancestry

Software to perform multi-ancestry SNP fine-mapping on molecular data

Input

01ld_filessample_sizes

structural variants vcf standardization standardize sv

A tool to standardize VCF files from structural variant callers

Input

0123

Output

0 0 0

sylph_profile

profile metagenomics sylph classification

Sylph profile command for taxonoming profiling

Input

01database

Output

0 0

Tools

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

sylph_sketch

sketch metagenomics sylph indexing

Sketching/indexing sequencing reads

Input

01reference

Output

0 0

Tools

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

sylph_sketchgenomes

profile metagenomics sylph classification genomes sketch

Sylph profile command for taxonoming profiling of genomes

Input

01

Output

0 0

Tools

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

sylph_sketchsamples

sketch metagenomics sylph samples indexing

Sketching/indexing sequencing reads

Input

01

Output

0 0

Tools

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

sylphtax_merge

sylph metagenomics merge

Merge multiple taxonomic profiles from sylphtaxt/taxprof into a tsv table

Input

01data_type

Output

0 0

Tools

sylphtax:

Integrating taxonomic information into the sylph metagenome profiler.

sylphtax_taxprof

taxonomy sylph metagenomics

Incorporates taxonomy into sylph metagenomic classifier

Input

01taxonomy

Output

0 0

Tools

long-read bam genomics

A tool for tagging BAM files.

Input

01

Output

0 0

tailfindr

polya tail fast5 nanopore

Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA

Input

01

alignment MSA genomics

Aligns sequences using T_COFFEE

Input

0101012compress

Output

0 0 0 0

Tools

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

tcoffee_alncompare

alignment MSA evaluation

Compares 2 alternative MSAs to evaluate them.

Input

012

Output

0 0 0

Tools

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

tcoffee_consensus

alignment MSA genomics

Computes a consensus alignment using T_COFFEE

Input

0101compress

Output

0 0 0 0

Tools

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

reformatting alignment genomics

Reformats files with t-coffee

Input

01

Output

0 0

Tools

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

tcoffee_tcs

alignment MSA evaluation

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

Input

0101

Output

0 0 0 0

Tools

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

td2_longorfs

td2 orfs longorfs transcripts

TD2 identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly

Input

01

Output

0 0

Tools

td2:

TD2 identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly

td2_predict

predict orfs coding regions td2.predict

TD2 identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly

Input

012

Output

0 0

Tools

td2:

TD2 identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly

telescope_assign

EM single-locus transcriptomics

The telescope assign program finds overlapping reads between an alignment (SAM/BAM) and an annotation (GTF) then reassigns reads using a statistical model.

Input

0101

Output

0 0 0 0 0

Tools

telescope:

Single locus resolution of Transposable ELEment expression

telogator2

bam cram genomics telomere long-read pacbio nanopore

Allele-specific telomere length estimation and TVR characterization from long reads

Input

012012

Output

0 0 0 0 0 0 0 0

telomerehunter

bam genomics telomere telomerehunter cancer

In silico estimation of telomere content and composition from cancer genomes

Input

012340123

Output

0 0 0 0 0

telseq

bam cram genomics samtools telomere telseq

Telseq: a software for calculating telomere length

Input

0123012

Output

0 0 0

Tools

samtools:

Tools for dealing with SAM, BAM and CRAM files

tesorter

genomics classify LTR retrotransposons plant

An accurate and fast method to classify LTR-retrotransposons in plant genomes

Input

0101

Output

0 0 0 0 0 0 0 0

tetranscripts

transposable TE transcriptomics

Runs TEtranscripts which summarises transposable element content of a bam file.

Input

01010101

genomics tumour contamination normal purity

TINC is a package to determine the contamination of tumour DNA in a matched normal sample. The approach uses evolutionary theory applied to read counts data from whole-genome sequencing assays.

Input

012

Output

0 0 0 0 0

tmb_pytmb

tumor mutation burden score

This module calculates Tumor Mutational Burden (TMB) scores from VCF files using the pyTMB tool.

Input

012345

Output

0 0 0 0

Tools

tmb:

This tool was designed to calculate a Tumor Mutational Burden (TMB) score from a VCF file.

topas_gencons

consensus fasta ancient DNA

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

Input

01010101vcf_output

Output

0 0 0 0 0

Tools

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

toulligqc

nanopore sequencing quality control genomics

A post sequencing QC tool for Oxford Nanopore sequencers

Input

01

alignment trimming phylogeny

trimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment.

Input

01out_format

Output

0 0 0

trimgalore

trimming adapters sequencing fastq

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data

Input

01

Output

0 0 0 0 0 0

trimmomatic

trimming adapter trimming quality trimming

Performs quality and adapter trimming on paired end and single end reads

Input

01

Output

0 0 0 0 0 0

trinity

assembly de novo assembler fasta fastq

Assembles a de novo transcriptome from RNAseq reads

Input

01

genomics transcript selector gene prediction evidence

Transcript Selector for BRAKER TSEBRA combines gene predictions by selecting transcripts based on their extrisic evidence support

Input

01hints_fileskeep_gtfsconfig

Output

0 0 0

tximeta_tximport

gene kallisto pseudoalignment rsem salmon transcript

Import transcript-level abundances and estimated counts for gene-level analysis packages

Input

0101quant_type

Output

0 0 0 0 0 0 0 0 0 0

Tools

tximeta:

Transcript Quantification Import with Automatic Metadata

ucsc_bedclip

bed genomics ucsc

Remove lines from bed file that refer to off-chromosome locations.

Input

01sizes

Output

0 0

Tools

ucsc:

Remove lines from bed file that refer to off-chromosome locations.

ucsc_bedgraphtobigwig

bedgraph bigwig ucsc bedgraphtobigwig converter

Convert a bedGraph file to bigWig format.

Input

01sizes

Output

0 0

Tools

ucsc:

Convert a bedGraph file to bigWig format.

ucsc_bedtobigbed

bed bigbed ucsc bedtobigbed converter

Convert file from bed to bigBed format

Input

01sizesautosql

Output

0 0

Tools

ucsc:

Convert file from bed to bigBed format

ucsc_bigwigaverageoverbed

bigwig bedGraph ucsc

compute average score of bigwig over bed file

Input

01bigwig

Output

0 0

Tools

ucsc:

Compute average score of big wig over each bed, which may have introns.

ucsc_gtftogenepred

gtf genepred refflat ucsc gtftogenepred

compute average score of bigwig over bed file

Input

01

Output

0 0 0

Tools

ucsc:

Convert GTF files to GenePred format

ucsc_liftover

bed ucsc ucsc/liftover

convert between genome builds

Input

01chain

Output

0 0 0

Tools

ucsc:

Move annotations from one assembly to another

ucsc_wigtobigwig

wig bigwig ucsc

Convert ascii format wig file to binary big wig format

Input

01sizes

Output

0 0

Tools

ucsc:

Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format

ultra_align

uLTRA align minimap2 long_read isoseq ont

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

Input

0101012

Output

0 0 0

Tools

ultra:

Splice aligner of long transcriptomic reads to genome.

ultra_index

uLTRA index minimap2 long_read isoseq ont

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

Input

0101

Output

0 0 0 0 0

Tools

ultra:

Splice aligner of long transcriptomic reads to genome.

ultra_pipeline

uLTRA index minimap2 long_read isoseq ont

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

Input

010101

Output

0 0

Tools

ultra:

Splice aligner of long transcriptomic reads to genome.

ultraplex

demultiplex fastq umi

Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.

Input

01barcode_file

genome assembly genome assembler small genome

Assembles bacterial genomes

Input

012

Output

0 0 0 0

universc

demultiplex align single-cell scRNA-Seq count umi

Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data

Input

0101technology

Output

0 0

untar

untar uncompress extract

Extract files from tar, tar.gz, tar.bz2, tar.xz archives

Input

01

Output

0 0

untarfiles

untar uncompress files

Extract files.

Input

01

Output

0 0

Tools

untar:

Extract tar.gz files.

unzip

unzip decompression zip archiving

Unzip ZIP archive files

Input

01

Output

0 0

unzipfiles

unzip decompression zip archiving

Unzip ZIP archive files

Input

01

Output

0 0

Tools

variant calling vcf bam snv sv

The Java port of the VarDict variant caller

Input

01230101

Output

0 0 0 0

variancepartition_dream

rnaseq dream variancepartition

Runs a differential expression analysis with dream() from variancePartition R package

Input

012345012

Output

0 0 0 0

Tools

structural_variants array_cgh vcf cytosure

Convert VCF with structural variations to CytoSure format

Input

01010101blacklist_bed

Output

0 0

vcf2db

vcf2db vcf gemini

A tool to create a Gemini-compatible DB file from an annotated VCF

Input

012

vcf bed annotate variant lua toml

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

Input

0123tomlluaresources

vcf vcflib vcflib/vcffixup AC/NS/AF

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

Input

012

Output

0 0 0 0

Tools

vcflib:

Command-line tools for manipulating VCF files

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

vcflib_vcfuniq

vcf uniq deduplicate

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

Input

012

Output

0 0

Tools

vcflib:

Command-line tools for manipulating VCF files

vcfpgloader_load

vcf postgresql database variants genomics clinical annotation

High-performance VCF to PostgreSQL loader using asyncpg for bulk variant ingestion

Input

01234567

Output

0 0 0 0

Tools

count rnaseq rna velocity bam

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto tutorial

Input

0123gtf

Output

0 0

vembrane_filter

filter vcf bcf genomics variant annotation

Filter VCF files with vembrane

Input

01expression

Output

0 0

Tools

vembrane:

Filter VCF/BCF files with Python expressions.

vembrane_sort

vcf bcf sort genomics variant prioritization

Sort VCF/BCF files by custom Python expressions for variant prioritization

Input

01expression

Output

0 0

Tools

vembrane:

Filter VCF/BCF files with Python expressions

vembrane_table

vcf bcf table genomics variant annotation

Creates tabular (TSV) files from VCF/BCF data with flexible Python expressions

Input

01expression

Output

0 0

Tools

vembrane:

Filter VCF/BCF files with Python expressions

VERIFYBAMID_VERIFYBAMID

qc contamination bam

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

Input

012refvcf

Output

0 0 0 0 0 0 0 0

Tools

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

VERIFYBAMID_VERIFYBAMID2

contamination bam verifybamid DNA contamination estimation

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

Input

012012refvcfreferences

Output

0 0 0 0 0 0 0

Tools

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

vg_construct

vg graph construct fasta vcf structural variants

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

Input

01230101

Output

0 0

Tools

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

vg_deconstruct

vcf gfa graph pangenome graph variation graph graph projection to vcf

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

Input

01pbgbwt

Output

0 0

Tools

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

vg_index

vg index graph structural_variants

write your description here

Input

01

Output

0 0 0

virus genomics consensus bam fasta

Fast and memory-efficient viral consensus genome sequence generation from read alignments

Input

0101primer_bedsave_pos_countssave_ins_counts

Output

0 0 0 0

vireo

genotype-based demultiplexing donor deconvolution cellsnp

Use vireo to perform donor deconvolution for multiplexed scRNA-seq data

Input

01234

reports data-visualization streamlit quarto

The VueGen nf-core module is designed to automate report generation from outputs produced by other modules, subworkflows, or pipelines. The module integrates the VueGen Python library and customizes it for compatibility with the Nextflow environment. VueGen automates the creation of reports from bioinformatics outputs, supporting formats like PDF, HTML, DOCX, ODT, PPTX, Reveal.js, Jupyter notebooks, and Streamlit web applications.

Input

input_typeinput_pathreport_type

Output

0 0

wfmash

long read alignment pangenome-scale all versus all mashmap wavefront

a pangenome-scale aligner

Input

01234query_selffasta_query_list

Output

0 0

wget

wget download network

The non-interactive network downloader

Input

012

Output

0 0

wgsim

simulate fasta reads

simulating sequence reads from a reference genome

Input

01

structural-variants benchmarking vcf

A large variant benchmarking tool analogous to hap.py for small variants.

Input

01234

Output

0 0 0 0

xengsort_index

index QC reference fasta xenograft sort k-mer

Fast lightweight accurate xenograft sorting

Input

host_fastagraft_fastaindexnobjectsmask

Output

0 0 0

Tools

xengsort:

A fast xenograft read sorter based on space-efficient k-mer hashing

xeniumranger_import_segmentation

spatial segmentation import segmentation nuclear segmentation cell segmentation xeniumranger imaging

The xeniumranger import-segmentation module runs xeniumranger import-segmentation to recompute Xenium Onboard Analysis outputs using external segmentation results. It supports two execution modes mirroring the Xenium Ranger CLI: an image-based mode that accepts nuclei and/or cell masks (TIFF/NPY) or GeoJSON polygons together with optional coordinate transforms and unit definitions, and a transcript-based mode that ingests Baysor-style transcript assignment CSV files plus visualization polygons. Use the image-based inputs when providing label masks or polygons, or switch to the transcript-based inputs when supplying transcript-level assignments so the appropriate command-line arguments are passed to Xenium Ranger.

Input

01234567

Output

0 0

Tools

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

xeniumranger_relabel

spatial relabel gene labels transcripts xeniumranger

The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.

Input

012

Output

0 0

Tools

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

xeniumranger_rename

spatial rename gene labels transcripts xeniumranger

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

Input

0123

Output

0 0

Tools

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

xeniumranger_resegment

spatial resegment morphology segmentation xeniumranger

The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.

Input

01

Output

0 0

Tools

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

xz_compress

xz compression archive

Compresses files with xz.

Input

01

Output

0 0

Tools

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

xz_decompress

xz decompression compression

Decompresses files with xz.

Input

01

Output

0 0

Tools

yaml template python

A YAML template engine with Python expressions

Input

0123

Output

0 0

zip

unzip decompression zip archiving

Compress file lists to produce ZIP archive files

Input

01

Output

0 0

Tools

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Available Modules