Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
Trim sequencing adapters and collapse overlapping reads
0
1
0
singles_truncated
discarded
paired_truncated
collapsed
collapsed_truncated
paired_interleaved
settings
versions
The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.
0
1
0
passed_gff
failed_gff
versions
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.
0
1
extracted_reads_fastq
log
intermediate_sam
intermediate_bam
intermediate_sorted_bam
versions
arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
Simulation tool to generate synthetic Illumina next-generation sequencing reads
0
1
0
0
0
fastq
aln
sam
versions
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
Aggregates fastq files with demultiplexed reads
0
1
fastq
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
0
1
0
1
2
0
1
2
results
bam
bai
bam_trimmed
bai_trimmed
bam_primertrimmed
bai_primertrimmed
fasta
vcf
tbi
json
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
split single end read groups by length and merge paired end reads
0
1
2
3
4
bam
txt
versions
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).
0
1
2
primary_filtered_bam
contamination_bam
versions
Adapter and quality trimming of sequencing reads
0
1
0
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Merging overlapping paired reads into a single read.
0
1
0
merged
unmerged
ihist
versions
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.
0
1
fastq
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates
0
1
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Filter out sequences by sequence header name(s)
0
1
0
0
0
reads
log
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Re-pairs reads that became disordered or had some mates eliminated.
0
1
0
repaired
singleton
versions
log
Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated tools.
Locate and tag duplicate reads in a BAM file
0
1
bam
metrics
versions
biobambam is a set of tools for early stage alignment file processing.
Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.
0
1
0
1
0
1
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
0
1
0
1
0
1
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Summarize and/or filter reads based on bisulfite conversion rate
0
1
0
1
0
1
0
1
bam
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Performs alignment of BS-Seq reads using bismark
0
1
0
1
0
1
bam
report
unmapped
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Relates methylation calls back to genomic cytosine contexts.
0
1
0
1
0
1
coverage
report
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Removes alignments to the same position in the genome from the Bismark mapping output.
0
1
bam
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.
0
1
index
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Extracts methylation information for individual cytosines from alignments.
0
1
0
1
bedgraph
methylation_calls
coverage
report
mbias
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Collects bismark alignment reports
0
1
2
3
4
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.
0
0
0
0
0
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Performs alignment of BS-Seq reads using bwameth
0
1
0
1
0
1
bam
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Performs indexing of c2t converted reference genome
0
1
index
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Concatenates fastq files
0
1
reads
versions
The cat utility reads files sequentially, writing them to the standard output.
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
txt
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).
0
1
0
1
0
1
0
1
0
1
orf2lca
bin2classification
log
diamond
faa
gff
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
orf2lca
contig2classification
log
diamond
faa
gff
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
rat_log
complete_abundance
contig_abundance
read2classification
alignment_diamond
contig2classification
cat_log
orf2lca
faa
gff
unmapped_diamond
unmapped_fasta
unmapped2classification
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Summarises results from CAT/BAT/RAT classification steps
0
1
0
1
txt
versions
CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.
0
1
0
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.
0
1
2
fastq
undetermined_fastq
reports
stats
interop
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.
0
gtf
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.
0
0
0
reference
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.
0
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
config
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.
0
0
versions
fastq
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.
0
gtf
versions
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.
0
0
versions
fastq
Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Realign reads mapped with BWA to elongated reference genome
0
1
0
1
0
1
0
1
bam
versions
A method to improve mappings on circular genomes such as Mitochondria.
CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.
0
1
2
0
1
0
1
0
1
root
tab
versions
Tool for calling copy number variations.
Calculates peak-to-through ratio (PTR) from metagenomic sequence data
0
1
ptr
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Computes the coverage map along the reference genome
0
1
coverage
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Indexes a directory of fasta files for use with CoPTR
0
1
index_dir
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Merge reads that were mapped to multiple indices
0
1
bam
versions
Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.
Map reads to contigs and estimate coverage
0
1
0
1
0
0
coverage
versions
CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications
Perform adapter/quality trimming on sequencing reads
0
1
reads
log
versions
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.
0
1
2
3
4
0
1
0
1
0
1
0
1
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
0
1
2
0
0
bigwig
bedgraph
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots cumulative reads coverages by BAM file
0
1
2
pdf
matrix
metrics
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Assemble bacterial isolate genomes from Nanopore reads
0
1
2
contigs
log
raw_contigs
gfa
txt
versions
Export assembly segment sequences in GFA 1.0 format to FASTA format
0
1
fasta
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
0
1
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
0
1
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
0
1
2
3
4
5
0
0
vcf
versions
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
0
1
2
0
1
2
vcf
tbi
versions
A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.
0
1
0
report
assignment_report
samfile
unclassified_fa
versions
Emu is a relative abundance estimator for 16s genomic data.
Run falco on sequenced reads
0
1
html
txt
versions
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Perform adapter and quality trimming on sequencing reads with reporting
0
1
reads
stats
debug
statspdf
reads_fail
reads_unpaired
log
versions
A program that counts sequence occurrences in FASTQ files.
0
1
0
1
count_matrix
stats
distribution_plot
reads_plot
reads_plot_percentage
versions
2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.
Perform adapter/quality trimming on sequencing reads
0
1
0
0
0
0
reads
json
html
log
reads_fail
reads_merged
versions
Align reads to multiple reference genomes using fastq-screen
0
1
0
txt
png
html
fastq
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
0
1
fasta
versions
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
0
1
0
0
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
0
1
0
0
bam
versions
Tools for working with genomic and high throughput sequencing data.
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
0
1
bam
cram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
0
1
0
1
0
0
0
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
0
1
0
bam
histogram
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Filtlong filters long reads based on quality measures or short read data.
0
1
2
reads
log
versions
Perform merging of mate paired-end sequencing reads
0
1
merged
notcombined
histogram
versions
De novo assembler for single molecule sequencing reads
0
1
0
fasta
gfa
gv
txt
log
json
versions
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
0
fastq
versions
fq is a library to generate and validate FASTQ file pairs.
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
0
1
2
0
0
0
lineages
summarized
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
bam
bai
cram
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
0
1
2
contamination
segmentation
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
3
4
0
0
0
split_read_evidence
split_read_evidence_index
paired_end_evidence
paired_end_evidence_index
site_depths
site_depths_index
versions
Genome Analysis Toolkit (GATK4)
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
0
1
2
3
0
1
0
1
0
1
0
0
table
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
cram
bam
crai
bai
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
0
1
2
0
1
0
1
0
1
bam
cram
sam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
0
1
2
0
0
0
0
printed_evidence
printed_evidence_index
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Splits reads that contain Ns in their cigar string
0
1
2
3
0
1
0
1
0
1
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
bam
bai
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0
1
0
0
0
output
bam_index
metrics
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach
0
1
linear_plot_png
transformed_linear_plot_png
log_plot_png
transformed_log_plot_png
model
summary
lookup_table
fitted_histogram_png
versions
Assembles organelle genomes from genomic data
0
1
0
1
fasta
etc
versions
Get organelle genomes from genome skimming data
Collapse redundant transcript models in Iso-Seq data.
0
1
0
bed
bed_trans_reads
local_density_error
polya
read
strand_check
trans_report
versions
varcov
variants
Collapse similar gene model
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
0
1
fasta
report
tails
versions
Gene-Switch Transcriptome Annotation by Modular Algorithms
Whole-genome assembly using PacBio HiFi reads
0
1
2
0
1
2
0
1
2
0
1
raw_unitigs
bin_files
processed_unitigs
primary_contigs
alternate_contigs
hap1_contigs
hap2_contigs
corrected_reads
read_overlaps
log
versions
Align RNA-Seq reads to a reference with HISAT2
0
1
0
1
0
1
bam
summary
fastq
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Builds HISAT2 index for reference genome
0
1
0
1
0
1
index
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Extracts splicing sites from a gtf files
0
1
txt
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Pre-compute the graph index structure.
0
1
graph
versions
HLA typing from short and long reads
Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.
0
1
2
3
results
extraction
extraction_mapped
extraction_unmpapped
hla
fastq
reads_per_level
remapped
versions
HLA typing from short and long reads
Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.
0
1
bigwig
versions
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
count how many reads map to each feature
0
1
2
0
1
txt
versions
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.
0
1
0
1
log
dedup
annotated
stats
versions
Remove polyA tail and artificial concatemers
0
1
0
bam
pbi
consensusreadset
summary
report
versions
IsoSeq - Scalable De Novo Isoform Discovery
Remove polyA tail and artificial concatemers
meta
bam
primers
meta
bam
pbi
consensusreadset
summary
report
versions
IsoSeq3 - Scalable De Novo Isoform Discovery
Create kallisto index
0
1
index
versions
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
0
1
0
1
0
0
0
0
results
json_info
log
versions
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.
0
0
0
reads
versions
khmer k-mer counting library
Removes low abundance k-mers from FASTA/FASTQ files
0
1
trimmed
versions
khmer k-mer counting library
Classifies metagenomic sequence data
0
1
0
0
0
classified_reads_fastq
unclassified_reads_fastq
classified_reads_assignment
report
versions
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Extract reads classified at any user-specified taxonomy IDs.
0
0
1
0
1
0
1
extracted_kraken2_reads
versions
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Classifies metagenomic sequence data using unique k-mer counts
0
1
2
0
0
0
0
0
0
classified_reads
unclassified_reads
classified_assignment
report
versions
Metagenomics classifier with unique k-mer counting for more specific results
Converting aligned short and long reads records from one reference to another
0
1
0
1
bam
versions
Fast and accurate coordinate conversion between assemblies
mageck count for functional genomics, reads are usually mapped to a specific sgRNA
0
1
0
count
norm
versions
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
0
1
0
1
vcf
tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
5
6
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
somatic_sv_vcf
somatic_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
0
1
2
3
4
0
1
0
1
0
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
tumor_sv_vcf
tumor_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Map short-reads to an indexed reference genome
0
1
0
1
0
0
0
0
0
0
0
bam
versions
An aDNA aware short-read mapper
Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
0
1
0
runtime_log
fragmisincorporation_plot
length_plot
misincorporation
lgdistribution
dnacomp
stats_out_mcmc_hist
stats_out_mcmc_iter
stats_out_mcmc_trace
stats_out_mcmc_iter_summ_stat
stats_out_mcmc_post_pred
stats_out_mcmc_correct_prob
dnacomp_genome
rescaled
pctot_freq
pgtoa_freq
fasta
folder
versions
Analyses a DAA file and exports information in text format
0
1
0
txt_gz
megan
versions
A tool for studying the taxonomic content of a set of DNA reads
Analyses an RMA file and exports information in text format
0
1
0
txt
megan_summary
versions
A tool for studying the taxonomic content of a set of DNA reads
Performs taxonomic profiling of long metagenomic reads against the melon database
0
1
0
0
tsv_output
json_output
log
versions
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
0
1
0
1
0
0
0
hist
log_stderr
versions
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
Strain-level metagenomic assignment
0
1
2
3
4
0
wimp
evidence_unknown_species
reads2taxon
em
contig_coverage
length_and_id
krona
versions
Maps long reads to a metamaps database
0
1
0
classification_res
meta_file
meta_unmappedreadsLengths
para_file
versions
Metagenome assembler for long-read sequences (HiFi and ONT).
0
1
0
contigs
log
versions
MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.
0
1
0
1
outputs
versions
miRDeep2 Mapper (mapper.pl
) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.
miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
0
1
2
0
1
0
1
2
3
outputs
versions
miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
It includes three main scripts: miRDeep2.pl
, mapper.pl
, and quantifier.pl
for comprehensive miRNA detection and quantification.
A python workflow that assembles mitogenomes from Pacbio HiFi reads
0
1
0
0
0
0
fasta
stats
gb
gff
all_potential_contigs
contigs_annotations
contigs_circularization
contigs_filtering
coverage_mapping
coverage_plot
final_mitogenome_annotation
final_mitogenome_choice
final_mitogenome_coverage
potential_contigs
reads_mapping_and_assembly
shared_genes
versions
A python workflow that assembles mitogenomes from Pacbio HiFi reads
A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.
0
1
0
mtnucratio
json
versions
Compare multiple runs of long read sequencing data and alignments
0
1
report_html
lengths_violin_html
log_length_violin_html
n50_html
number_of_reads_html
overlay_histogram_html
overlay_histogram_normalized_html
overlay_log_histogram_html
overlay_log_histogram_normalized_html
total_throughput_html
quals_violin_html
overlay_histogram_identity_html
overlay_histogram_phredscore_html
percent_identity_violin_html
active_pores_over_time_html
cumulative_yield_plot_gigabases_html
sequencing_speed_over_time_html
stats_txt
versions
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
0
1
2
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
0
1
html
png
txt
log
versions
Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
0
1
0
stats
reads
versions
Merging paired-end reads and removing sequencing adapters.
0
1
merged_reads
unstitched_read1
unstitched_read2
versions
Determines the gender of a sample from the BAM/CRAM file.
0
1
2
0
1
0
1
0
tsv
versions
Short-read sequencing tools
write your description here
meta
reads
format
mode
meta
versions
npa
npc
npl
npo
VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence
0
1
0
1
0
1
0
bam
bai
qc_metrics
bqsr_table
duplicate_metrics
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Assigns all the reads in a file to a single new read-group
0
1
0
1
0
1
bam
bai
cram
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
0
1
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list
0
1
2
0
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Locate and tag duplicate reads in a BAM file
0
1
0
1
0
1
bam
bai
cram
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads
0
1
2
bam
bai
num_reads
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
xml
txt
versions
Polishing genome assemblies with short reads.
0
1
0
1
0
fasta
versions
debug
Polishing genome assemblies with short reads.
Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
0
1
2
cel
plp
var
umi
versions
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
0
1
reads
log
versions
Adapter removal and demultiplexing of Oxford Nanopore reads
0
1
reads
log
versions
Adapter removal and demultiplexing of Oxford Nanopore reads
Filter reads by quality score.
0
1
reads
logs
versions
log_tab
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
0
1
good_reads
single_reads
bad_reads
log
versions
frame-shift correction for long read (meta)genomics - fix frameshifts in reads
0
1
0
1
out_fa
versions
frame-shift correction for long read (meta)genomics
frame-shift correction for long read (meta)genomics - maps proteins to reads
0
1
2
tsv
versions
frame-shift correction for long read (meta)genomics
reads a maxQuant proteinGroups file with Proteus
0
1
2
dendro_plot
mean_var_plot
raw_dist_plot
norm_dist_plot
raw_rdata
norm_rdata
raw_tab
norm_tab
session_info
versions
R package for analysing proteomics data
Randomly subsample sequencing reads to a specified coverage
0
1
2
0
reads
versions
De novo genome assembler for long uncorrected reads.
0
1
fasta
gfa
versions
Infer strandedness from sequencing reads
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
0
1
0
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate TIN (transcript integrity number) from RNA-seq reads
0
1
2
0
txt
xls
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
SALSA, A tool to scaffold long read assemblies with HiC
0
1
2
0
0
0
0
fasta
agp
agp_original_coordinates
versions
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
0
1
2
0
csv
json
bam
versions
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
find and mark duplicate reads in BAM file
0
1
bam
bai
versions
process your BAM data faster!
The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
0
1
0
reads
versions
Tools for dealing with SAM, BAM and CRAM files
shuffles and groups reads together by their names
0
1
0
1
bam
cram
sam
versions
Tools for dealing with SAM, BAM and CRAM files
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
0
1
bam
cram
sam
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Call peaks using SEACR on sequenced reads in bedgraph format
0
1
2
0
bed
versions
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
0
1
2
3
4
5
0
1
0
1
vcf
tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Create BWA index for reference genome
0
1
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
0
1
0
1
0
1
0
1
bam_and_bai
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the Picard CollectVariantCallingMetrics tool.
0
1
2
0
1
2
0
1
0
1
0
1
metrics
summary
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Accelerated implementation of the GATK DepthOfCoverage tool.
0
1
2
0
1
0
1
0
1
0
1
per_locus
sample_summary
statistics
coverage_counts
coverage_proportions
interval_summary
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects multiple quality metrics from a bam file
0
1
2
0
1
0
1
0
mq_metrics
qd_metrics
gc_summary
gc_metrics
aln_metrics
is_metrics
mq_plot
qd_plot
is_plot
gc_plot
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
0
1
2
0
1
0
1
cram
crai
bam
bai
score
metrics
metrics_multiqc_tsv
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
0
1
2
0
1
0
1
0
1
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
0
1
2
3
0
1
0
1
0
1
0
1
vcf_gz
vcf_gz_tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
0
1
2
3
4
0
1
0
1
0
1
0
1
0
0
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Generate recalibration table and optionally perform base quality recalibration
0
1
2
0
1
0
1
0
1
0
1
0
1
0
table
table_post
recal_alignment
csv
pdf
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.
0
1
2
0
1
0
1
output
index
output_index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Filters the raw output of sentieon/tnhaplotyper2.
0
1
2
3
4
5
6
0
1
0
1
vcf
vcf_tbi
stats
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
0
1
2
3
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
orientation_data
contamination_data
contamination_segments
stats
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
0
1
2
0
1
0
1
0
1
2
0
1
2
0
1
2
0
1
vcf
index
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
0
1
2
0
0
0
0
0
recal
idx
tranches
plots
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects whole genome quality metrics from a bam file
0
1
2
0
1
0
1
0
1
wgs_metrics
versions
Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
match up paired-end reads from two fastq files
0
1
reads
unpaired_reads
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Split single or paired-end fastq.gz files
0
1
reads
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Salmonella serotype prediction from reads and assemblies
0
1
log
tsv
txt
versions
Generates a BED file containing genomic locations of lengths of N.
0
1
bed
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Subsample reads from FASTQ files
0
1
2
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Trim low quality bases from FastQ files
0
1
reads
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
0
1
tsv
txt
versions
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)
0
1
2
3
4
5
0
1
log
read_qual
breakpoints_double
read_alignments
read_ids
collapsed_dup
loh
all_vcf
all_breakpoints_clusters_list
all_breakpoints_clusters
all_plots
somatic_vcf
somatic_breakpoints_clusters_list
somatic_breakpoints_clusters
somatic_plots
versions
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
0
1
assembly
gfa
results
versions
Determine Shigella serotype from Illumina or Oxford Nanopore reads
0
1
tsv
hits
versions
Determine Shigella serotype from assemblies or Illumina paired-end reads
0
1
tsv
versions
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.
0
1
2
3
0
1
0
1
vcf
versions
structural variant calling and genotyping with existing tools, but, smoothly
Local sequence alignment tool for filtering, mapping and clustering.
0
1
0
1
0
1
reads
log
index
versions
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Fast, efficient, lossless compression of FASTQ files.
0
1
2
spring
versions
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Fast, efficient, lossless decompression of FASTQ files.
0
1
0
fastq
versions
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
0
1
0
0
reads
versions
SRA Toolkit and SDK from NCBI
Align reads to a reference genome using STAR
0
1
0
1
0
1
0
0
0
log_final
log_out
log_progress
versions
bam
bam_sorted
bam_sorted_aligned
bam_transcript
bam_unsorted
fastq
tab
spl_junc_tab
read_per_gene_tab
junction
sam
wig
bedgraph
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Serotype STEC samples from paired-end reads or assemblies
0
1
tsv
versions
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
0
1
2
3
4
5
6
7
8
9
10
0
1
2
0
input
rdata
plots
vcf
bgen
versions
Tandem repeat genotyper for long reads
0
1
2
0
1
0
1
0
1
vcf
tbi
versions
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
0
1
2
3
4
0
0
vcf
vcf_tbi
genome_vcf
genome_vcf_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
0
1
2
3
4
5
6
7
8
0
0
vcf_indels
vcf_indels_tbi
vcf_snvs
vcf_snvs_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Count reads that map to genomic features
0
1
2
counts
summary
versions
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.
Sketching/indexing sequencing reads
0
1
0
sketch_fastq_genome
versions
Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.
Trim FastQ files using Trim Galore!
0
1
reads
log
unpaired
html
zip
versions
Performs quality and adapter trimming on paired end and single end reads
0
1
trimmed_reads
unpaired_reads
trim_log
out_log
summary
versions
Assembles a de novo transcriptome from RNAseq reads
0
1
transcript_fasta
log
versions
Subsample a long-read sequencing fastq file for multiple assemblies
0
1
subreads
versions
Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
0
bam
fastq
log
versions
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
0
1
2
0
bam
log
tsv_edit_distance
tsv_per_umi
tsv_umi_per_position
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
0
1
reads
log
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
0
1
2
0
0
log
bam
tsv
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
log
selfsm
depthsm
selfrg
depthrg
bestsm
bestrg
versions
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
0
1
2
0
1
2
0
0
log
ud
bed
mu
self_sm
ancestry
versions
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
0
1
rnafold_txt
rnafold_ps
versions
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files
0
1
wiped_fastq
report
versions
A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files.
Convert and filter aligned reads to .npz
0
1
2
0
1
0
1
npz
versions
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Align reads to a reference genome using YARA
0
1
0
1
bam
bai
versions
Yara is an exact tool for aligning DNA sequencing reads to reference genomes.
Click here to trigger an update.