Available Modules
Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.
contiguate draft genome assembly
meta
scaffold
fasta
meta
results
versions
Screen assemblies for antimicrobial resistance against multiple databases
meta
assembly
databasedir
meta
versions
report
Mass screening of contigs for antibiotic resistance genes
Screen assemblies for antimicrobial resistance against multiple databases
meta
assembly
meta
versions
summary
Mass screening of contigs for antibiotic resistance genes
A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes
meta
fasta
versions
matches
partials
virulence
txt
out
A pipeline for running AMRfinderPlus and collating results into functional classes
Trim sequencing adapters and collapse overlapping reads
meta
reads
adapterlist
singles_truncated
discarded
paired_truncated
collapsed
collapsed_truncated
paired_interleaved
settings
versions
Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.
meta
fastq
meta
versions
fixed_fastq
ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).
meta
bed_ped_geno
bim_map
fam
K
meta
versions
Q-ancestry-fractions
P-allele-frequencies
Read CEL files into an ExpressionSet and generate a matrix
meta
samplesheet
celfiles_dir
description
meta
expression
annotation
rds
versions
Methods for Affymetrix Oligonucleotide Arrays
Converts a GFF/GTF file into a proper GTF file
meta
gff
output_gtf
log
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Converts a GFF/GTF file into a TSV file
meta
gff
tsv
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file
meta
gxf
output_gff
log
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Add intron features to gtf/gff file without intron features.
meta
gff
config
versions
gff
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.
meta
gff
kill_list
config
meta
versions
gff
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.
meta
gffs
config
meta
versions
gff
Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.
Provides different type of statistics in text format from a GFF/GTF annotation file
meta
gff
stats_txt
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Provides basic statistics in text format from a GFF/GTF annotation file
meta
gff
stats_txt
versions
AGAT is a toolkit for manipulation and getting information from GFF/GTF files
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants
meta
fasta
meta
summary
results_dir
versions
ALE: assembly likelihood estimator.
meta
asm
bam
meta
ale
versions
Generates a count of coverage of alleles
meta
input
input_index
loci
fasta
meta
versions
allelecount
A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.
meta
amp_input
faa_input
opt_amp_db
meta
versions
sample_dir
txt
csv
faa
summary_csv
summary_html
log
results_db
results_db_dmnd
results_db_fasta
results_db_tsv
A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.
summary_file
cluster_tsv
rep_cluster_tsv
log
versions
A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.
A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.
summaries
tsv
log
versions
This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.
A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.
meta
amp_input
faa_input
gbk_input
opt_amp_db
meta
sample_dir
contig_gbks
txt
tsv
faa
sample_log
full_log
results_db
results_db_dmnd
results_db_fasta
results_db_tsv
versions
A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.
A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.
meta
faa
model
min_length
min_probability
meta
versions
amps_faa
amps_tsv
AMPlify is an attentive deep learning model for antimicrobial peptide prediction.
meta
faa
model_dir
meta
versions
tsv
Attentive deep learning model for antimicrobial peptide prediction
Post-processing script of the MaltExtract component of the HOPS package
maltextract_results
taxon_list
filter
versions
json
summary_pdf
tsv
candidate_pdfs
Identify antimicrobial resistance in gene or protein sequences
meta
fasta
db
meta
versions
report
mutation_report
tool_version
db_version
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
Identify antimicrobial resistance in gene or protein sequences
NO input
meta
versions
db
AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.
A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.
meta
icounts
hapmap_file
meta
versions
txt
ANGSD: Analysis of next generation Sequencing Data
Calculates base frequency statistics across reference positions from BAM.
meta
bam
bai
minqfile
meta
versions
depth_sample
depth_global
qs
pos
counts
icounts
ANGSD: Analysis of next generation Sequencing Data
Calculated genotype likelihoods from BAM files.
meta
bam
meta2
fasta
meta3
error_file
meta
versions
genotype_likelihood
ANGSD: Analysis of next generation Sequencing Data
Annotation and Ranking of Structural Variation
meta
sv_vcf
sv_vcf_index
candidate_small_variants
meta2
annotations
meta3
candidate_genes
meta4
false_positive_snv
meta5
gene_transcripts
meta
versions
tsv
unannotated_tsv
vcf
Annotation and Ranking of Structural Variation
Install the AnnotSV annotations
NO input
versions
annotations
Annotation and Ranking of Structural Variation
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
meta
sample_treatment_col
reference
target
meta2
samplesheet
counts
meta
translated_mrna
total_mrna
translation
buffering
mrna_abundance
rdata
fold_change_plot
interaction_p_distribution_plot
residual_distribution_summary_plot
residual_vs_fitted_plot
rvm_fit_for_all_contrasts_group_plot
rvm_fit_for_interactions_plot
rvm_fit_for_omnibus_group_plot
simulated_vs_obt_dfbetas_without_interaction_plot
session_info
versions
Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.
meta
sequence_input
databases
antismash_dir
gff
meta
versions
clusterblast_file
html_accessory_files
knownclusterblast_html
knownclusterblast_dir
knownclusterblast_txt
svg_files_clusterblast
svg_files_knownclusterblast
gbk_input
json_results
log
zip
gbk_results
clusterblastoutput
html
knownclusterblastoutput
json_sideloading
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.
database_css
database_detection
database_modules
versions
database
antismash_dir
antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell
Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.
meta
bam
meta
extracted_reads_fastq
log
intermediate_sam
intermediate_bam
intermediate_sorted_bam
versions
arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.
Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).
meta
input_tsv
tool
db
meta
tsv
versions
Download and prepare database for Ariba analysis
meta
db_name
versions
db
ARIBA: Antibiotic Resistance Identification By Assembly
Query input FASTQs against Ariba formatted databases
meta
reads
db
versions
results
ARIBA: Antibiotic Resistance Identification By Assembly
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
meta
bam
meta2
fasta
meta3
gtf
meta4
blacklist
meta5
known_fusions
meta6
structural_variants
meta7
tags
meta8
protein_domains
meta
versions
fusions
fusions_fail
Fast and accurate gene fusion detection from RNA-Seq data
Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
NO input
versions
reference
Fast and accurate gene fusion detection from RNA-Seq data
Simulation tool to generate synthetic Illumina next-generation sequencing reads
meta
fasta
sequencing_system
fold_coverage
read_length
versions
meta
fastq
aln
sam
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.
Aggregates fastq files with demultiplexed reads
meta
fastq_dir
meta
fastq
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
Run the alignment/variant-call/consensus logic of the artic pipeline
meta
fastq
fast5_dir
sequencing_summary
primer_scheme_fasta
primer_scheme_bed
medaka_model_file
medaka_model_string
scheme
scheme_version
meta
results
bam
bai
bam_trimmed
bai_trimmed
bam_primertrimmed
bai_primertrimmed
fasta
vcf
tbi
json
versions
ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
copy number profiles of tumour cells.
args
meta
input_normal
index_normal
input_tumor
index_tumor
allele_files
loci_files
bed_file
fasta
gc_file
rt_file
meta
allelefreqs
metrics
png
purityploidy
segments
versions
Alignment by Simultaneous Harmonization of Layer/Adjacency Registration
meta
images
meta
tif
versions
Assembly summary statistics in JSON format
meta
assembly
meta
versions
json
ataqv function of a corresponding ataqv tool
meta
bam
bai
peak_file
organism
mito_name
tss_file
excl_regs_file
autosom_ref_file
meta
json
problems
versions
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
mkarv function of a corresponding ataqv tool
json
versions
html
ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.
generate VCF file from a BAM file using various calling methods
meta
bam
bai
fasta
fai
recal
pmd
known_alleles
method
meta
versions
bam
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Estimate the post-mortem damage patterns of DNA
meta
bam
bai
fasta
fai
pool_rg_txt
meta
versions
empiric
exponential
counts
table
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
gives an estimation of the sequencing bias based on known invariant sites
meta
bam
bai
empiric
alleles
invariant_sites
meta
versions
recal_patterns
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
split single end read groups by length and merge paired end reads
meta
bam
bai
read_group_setting
blacklist
meta
versions
bam
filelist
ATLAS, a suite of methods to accurately genotype and estimate genetic diversity
Generate tables of feature metadata from GTF files
meta
meta2
gtf
fasta
versions
feature_annotation
filtered_cdna
Scripts for manipulating gene annotation
Use deamination patterns to estimate contamination in single-stranded libraries
meta
bam
config
positions
meta
versions
txt
Estimates present-day DNA contamination in ancient DNA single-stranded libraries.
Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif
images.
meta
image
meta2
markerfile
meta
versions
backsub_tif
meta2
markerout
Annotation of bacterial genomes (isolates, MAGs) and plasmids
meta
fasta
db
proteins
prodigal_tf
meta
versions
txt
tsv
gff
gbff
embl
fna
faa
ffn
hypotheticals_tsv
hypotheticals_faa
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Downloads BAKTA database from Zenodo
NO input
versions
db
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data
meta
bam
index
meta
versions
fastq
Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files
removes unused references from header of sorted BAM/CRAM files.
meta
bam
meta
versions
bam
This module is used to clip primer sequences from your alignments.
meta
bam
bai
bedpe
meta
versions
bam
bai
Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).
meta
primary_aligned_bam
contaminant_aligned_bam
versions
primary_filtered_bam
contamination_bam
write your description here
meta
bam
meta
versions
json
A command line tool to compute mapping statistics from a BAM file
Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis
meta
bam
meta
versions
fastq
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
meta
bam
meta
versions
bam
C++ API & command-line toolkit for working with BAM data
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
meta
bam
meta
versions
stats
C++ API & command-line toolkit for working with BAM data
trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’, or by soft clipping
meta
bam
trim_left
trim_right
meta
versions
bam
Programs that perform operations on SAM/BAM files, all built into a single executable, bam.
Render an assembly graph in GFA 1.0 format to PNG and SVG image formats
meta
gfa
meta
png
svg
versions
Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
Demultiplex Element Biosciences bases files
meta
run_manifest
run_dir
meta
versions
sample_fastq
sample_json
qc_report
run_stats
generated_run_manifest
metrics
unassigned
BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.
meta
image
meta
versions
fields
Adapter and quality trimming of sequencing reads
meta
reads
contaminants
meta
reads
versions
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
Merging overlapping paired reads into a single read.
meta
reads
interleave
meta
merged
unmerged
ihist
versions
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.
meta
fastq
meta
versions
fastq
BBMap is a short read aligner, as well as various other bioinformatic tools.
Split sequencing reads by mapping them to multiple references simultaneously
meta
reads
index
primary_ref
other_ref_names
other_ref_paths
only_build_index
meta
versions
index
primary_fastq
all_fastq
stats
BBMap is a short read aligner, as well as various other bioinformatic tools.
Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates
meta
reads
meta
reads
versions
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
Filter out sequences by sequence header name(s)
meta
reads
names_to_filter
output_format
interleaved_output
meta
versions
reads
log
BBMap is a short read aligner, as well as various other bioinformatic tools.
Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.
fasta
versions
db
BBMap is a short read aligner, as well as various other bioinformatic tools.
Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.
meta
bam
meta
stats
hist
versions
BBMap is a short read aligner, as well as various other bioinformatic tools.
Compares query sketches to reference sketches hosted on a remote server via the Internet.
meta
file
meta
versions
hits
BBMap is a short read aligner, as well as various other bioinformatic tools.
This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.
meta
vcf
index
regions
targets
samples
meta
vcf
csi
tbi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Concatenate VCF files
meta
vcfs
tbi
meta
vcf
csi
tbi
versions
Concatenate VCF files.
Compresses VCF files
meta
vcf
tbi
fasta
meta
fasta
versions
Create consensus sequence by applying VCF variants to a reference fasta file.
Converts certain output formats to VCF
meta
input
input_index
meta2
fasta
bed
meta
versions
vcf_gz
vcf
bcf_gz
bcf
hap
legend
sample
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Filters VCF files
meta
vcf
meta
vcf
csi
tbi
versions
Apply fixed-threshold filters to VCF files.
Index VCF tools
meta
vcf
meta
versions
csi
tbi
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.
Apply set operations to VCF files
meta
vcfs
tbis
meta
results
versions
Computes intersections, unions and complements of VCF files.
Merge VCF files
meta
vcfs
tbis
meta2
fasta
meta3
fai
bed
meta
vcf_gz
vcf
bcf_gz
bcf
versions
Merge VCF files.
Compresses VCF files
meta
bam
intervals
meta
fasta
save_mpileup
meta
vcf
tbi
stats
mpileup
versions
Generates genotype likelihoods at each genomic position with coverage.
Normalize VCF file
meta
vcf
tbi
meta2
fasta
meta
vcf
csi
tbi
versions
Normalize VCF files.
Split VCF by chunks or regions, creating multiple VCFs.
meta
vcf
tbi
sites_per_chunk
scatter
scatter_file
regions
targets
meta
versions
scatter
csi
tbi
Split VCF by chunks or regions, creating multiple VCFs.
Split VCF by sample, creating single- or multi-sample VCFs.
meta
vcf
tbi
samples
groups
regions
targets
meta
versions
vcf
Split VCF by sample, creating single- or multi-sample VCFs.
Extracts fields from VCF or BCF files and outputs them in user-defined format.
meta
vcf
tbi
regions
targets
samples
meta
output
versions
Extracts fields from VCF or BCF files and outputs them in user-defined format.
Reheader a VCF file
meta
vcf
header
samples
meta2
fai
meta
versions
vcf
Modify header of VCF/BCF files, change sample names.
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
meta
vcf
af_file
af_file_tbi
genetic_map
regions_file
samples_file
targets_file
meta
versions
roh
A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.
Sorts VCF files
meta
vcf
meta
versions
vcf
csi
tbi
Sort VCF files by coordinates.
Split a vcf file into files per chromosome
meta
vcf
tbi
meta
split_vcf
versions
Sort VCF files by coordinates.
Generates stats from VCF files
meta
vcf
tbi
regions
targets
samples
exons
fasta
meta
stats
versions
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
meta
vcf
index
regions
targets
samples
meta
vcf
csi
tbi
versions
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Demultiplex Illumina BCL files
meta
samplesheet
run_dir
versions
fastq
fastq_idx
undetermined
undetermined_idx
reports
stats
interop
Demultiplex Illumina BCL files
meta
samplesheet
run_dir
versions
fastq
fastq_idx
undetermined
undetermined_idx
reports
logs
interop
Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.
meta
vcf
ref
genmap
exclsamples
exclmarkers
meta
versions
vcf
log
Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
Convert a BED file to a VCF file according to a YAML config
meta
bed
config
meta2
fai
meta
versions
vcf
Convert BAM/GFF/GTF/GVF/PSL files to bed
meta
input
meta
versions
bed
High-performance genomic feature operations.
Convert gtf format to bed format
meta
gtf
bed
versions
The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.
For each feature in A, finds the closest feature (upstream or downstream) in B.
meta
input_1
input_2
fasta_fai
meta
versions
output
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.
meta
bed
sizes
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.
meta
intervals
scale
sizes
extension
meta
genomecov
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
extract sequences in a FASTA file based on intervals defined in a feature file.
meta
bed
fasta
meta
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.
meta
bed
summary_column
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Allows one to screen for overlaps between two sets of genomic features.
meta
intervals1
intervals2
meta2
chrom_sizes
meta
intersect
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Calculate Jaccard statistic b/w two feature files.
meta
input_a
input_b
meta2
genome_file
meta
versions
tsv
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Makes adjacent or sliding windows across a genome or BED file.
meta
regions
meta
versions
bed
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Allows one to screen for overlaps between two sets of genomic features.
meta
intervals1
intervals2
meta2
chrom_sizes
meta
mapped
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
masks sequences in a FASTA file based on intervals defined in a feature file.
meta
bed
fasta
meta
fasta
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
meta
bed
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.
meta
beds
chrom_sizes
meta
versions
bed
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)
meta
bed
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Sorts a feature file by chromosome and other criteria.
meta
intervals
genome_file
meta
sorted
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Split BED files into several smaller BED files
meta
bed
meta
versions
beds
A powerful toolset for genome arithmetic
Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.
meta
intervals1
intervals2
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Combines multiple BedGraph files into a single file
meta
bedgraph
meta2
chrom_sizes
meta
bed
versions
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Locate and tag duplicate reads in a BAM file
meta
bam
meta
bam
metrics
versions
biobambam is a set of tools for early stage alignment file processing.
Merge a list of sorted bam files
meta
bam
meta
bam
bam_index
checksum
versions
biobambam is a set of tools for early stage alignment file processing.
Parallel sorting and duplicate marking
meta
bams
meta2
fasta
meta
bam
bam_index
cram
metrics
versions
biobambam is a set of tools for early stage alignment file processing.
Use k-mers to rapidly subtype S. enterica genomes
meta
seqs
scheme_metadata
meta
versions
summary
kmer_results
simple_summary
Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.
meta
reads
index
meta
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit
meta
reads
index
meta
bam
bai
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Summarize and/or filter reads based on bisulfite conversion rate
meta
bam
bai
index
meta
bsconv_bam
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.
meta
bam
bai
snp_bed
index
meta
epiread_bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Indexes a reference genome for use with Biscuit
fasta
index
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Merges methylation information for opposite-strand C's in a CpG context
meta
bed
index
meta
mergecg_bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants
meta
normal_bams
normal_bais
tumor_bam
tumor_bai
index
meta
versions
vcf
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Perform basic quality control on a BAM file generated with Biscuit
meta
bam
biscuit_qc_reports
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.
meta
vcf
meta
bed
versions
A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Performs alignment of BS-Seq reads using bismark
meta
reads
index
meta
bam
unmapped
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Relates methylation calls back to genomic cytosine contexts.
meta
coverage_file
index
meta
coverage
report
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Removes alignments to the same position in the genome from the Bismark mapping output.
meta
bam
meta
bam
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.
fasta
index
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Extracts methylation information for individual cytosines from alignments.
meta
bam
index
meta
bedgraph
methylation_calls
coverage
report
mbias
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Collects bismark alignment reports
meta
align_report
splitting_report
dedup_report
mbias
fasta
meta
report
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.
bam
align_report
dedup_report
splitting_report
mbias
summary
versions
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.
Retrieve entries from a BLAST database
meta
entry
entry_batch
meta2
db
meta
fasta
text
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
meta
fasta
meta2
db
meta
txt
versions
BLAST finds regions of similarity between biological sequences.
BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database
meta
fasta
meta2
db
out_ext
meta
xml
tsv
csv
versions
BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.
Builds a BLAST database
meta
fasta
meta
db
versions
BLAST finds regions of similarity between biological sequences.
Queries a BLAST DNA database
meta
fasta
meta2
db
meta
txt
versions
Protein to Translated Nucleotide BLAST.
Downloads a BLAST database from NCBI
meta
name
meta
db
versions
BLAST finds regions of similarity between biological sequences.
Create bowtie index for reference genome
meta
fasta
meta
index
versions
bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Align reads to a reference genome using bowtie2
meta
reads
meta2
index
meta3
fasta
save_unaligned
sort_bam
sam
bam
cram
csi
crai
log
fastq
versions
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.
meta
kraken_report
database
meta
versions
reports
txt
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Extends a Kraken2 database to be compatible with Bracken
meta
kraken2db
meta
versions
db
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Combine output of metagenomic samples analyzed by bracken.
meta
input
meta
versions
txt
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Benchmarking Universal Single Copy Orthologs
meta
fasta
mode
lineage
busco_lineages_path
config_file
meta
batch_summary
short_summaries_txt
short_summaries_json
busco_dir
full_table
missing_busco_list
single_copy_proteins
seq_dir
translated_proteins
versions
Benchmarking Universal Single Copy Orthologs
meta
fasta
mode
lineage
busco_lineages_path
config_file
meta
batch_summary
short_summaries_txt
short_summaries_json
busco_dir
full_table
missing_busco_list
single_copy_proteins
seq_dir
translated_dir
versions
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
BUSCO plot generation tool
short_summary_txt
png
versions
BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Create BWA-mem2 index for reference genome
meta
fasta
meta
index
versions
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create BWA-MEME index for reference genome
meta
fasta
meta
versions
index
Faster BWA-MEM2 using learned-index
Performs alignment of BS-Seq reads using bwameth
meta
reads
index
meta
bam
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
Performs indexing of c2t converted reference genome
fasta
index
versions
Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.
A module for concatenation of gzipped or uncompressed files
meta
files_in
versions
file_out
Just concatenation
Concatenates fastq files
meta
reads
meta
reads
versions
The cat utility reads files sequentially, writing them to the standard output.
Cluster protein sequences using sequence similarity
meta
sequences
meta
fasta
clusters
versions
Clusters and compares protein or nucleotide sequences
Cluster nucleotide sequences using sequence similarity
meta
sequences
meta
versions
fasta
clusters
Clusters and compares protein or nucleotide sequences
Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth
meta
img_data
signature
high_thresholds
low_thresholds
meta
versions
celltypes
quality
cellpose segments cells in images
meta
image
model
meta
versions
mask
flows
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.
meta
reads
reference
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.
bcl
csv
fastq
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.
gtf
gtf
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.
fasta
gtf
reference_name
reference
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.
reference_name
genes
fasta
seqs
reference
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.
meta
gex_fastqs
vdj_fastqs
ab_fastqs
beam_fastqs
cmo_fastqs
gex_reference
gex_frna_probeset
gex_targetpanel
vdj_reference
vdj_primer_index
fb_reference
beam_antigen_panel
beam_control_panel
cmo_reference
cmo_barcodes
cmo_barcode_assignment
frna_sampleinfo
config
outs
versions
Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.
meta
reads
reference
outs
versions
Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj
takes FASTQ files from cellranger mkfastq
or bcl2fastq
for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe
file which can be loaded into Loupe V(D)J Browser.
Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.
meta
lib_csv
reference
outs
versions
Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.
bcl
csv
fastq
versions
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.
gtf
gtf
versions
Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.
fasta
gtf
motifs
reference_config
reference_name
reference
versions
Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.
Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.
meta
reads
reference
outs
versions
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.
bcl
csv
fastq
versions
Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.
fasta
gtf
motifs
reference_config
reference_name
reference
versions
Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.
meta
bam
bai
region_vcf
barcode
meta
versions
base
cell
sample
allele_depth
depth_coverage
depth_other
Efficient genotyping bi-allelic SNPs on single cells
Build centrifuge database for taxonomic profiling
meta
fasta
conversion_table
taxonomy_tree
name_table
size_table
meta
versions
cf
Classifier for metagenomic sequences
Classifies metagenomic sequence data
meta
reads
db
save_unaligned
save_aligned
meta
report
results
sam
fastq_unmapped
fastq_mapped
versions
Centrifuge is a classifier for metagenomic sequences.
Creates Kraken-style reports from centrifuge out files
meta
report
db
meta
versions
kreport
Centrifuge is a classifier for metagenomic sequences.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
meta
fasta
fasta_ext
db
meta
versions
checkm_output
checkm_output
checkm_tsv
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
meta
analysis_dir
marker_file
coverage_file
exclude_marker_file
meta
versions
output
fasta
Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
CheckM2 database download
NO input
meta
versions
CheckM2 - Rapid assessment of genome bin quality using machine learning
CheckM2 bin quality prediction
meta
dbmeta
fasta
db
meta
versions
checkm2_output
checkm2_tsv
CheckM2 - Rapid assessment of genome bin quality using machine learning
Construct the database necessary for checkv's quality assessment
NO input
versions
checkv_db
Assess the quality of metagenome-assembled viral genomes.
Assess the quality of metagenome-assembled viral genomes.
meta
fasta
db
meta
versions
quality_summary
completeness
contamination
complete_genomes
proviruses
viruses
Assess the quality of metagenome-assembled viral genomes.
Construct the database necessary for checkv's quality assessment
meta
fasta
db
meta
versions
checkv_db
Assess the quality of metagenome-assembled viral genomes.
Create a schema to determine the allelic profiles of a genome
meta
fasta
prodigal_tf
cds
versions
meta
schema
cds_coordinates
invalid_cds
A complete suite for gene-by-gene schema creation and strain identification.
Filter and trim long read data.
meta
fastq
meta
versions
fastq
zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.
Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).
Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.
meta
reads
meta2
fasta
meta3
index
barcodes
whitelist
chr_order
pairs_chr_order
meta
versions
bed
bam
tagAlign
pairs
Fast alignment and preprocessing of chromatin profiles
Indexes a fasta reference genome ready for chromatin profiling.
meta
fasta
versions
meta
index
Fast alignment and preprocessing of chromatin profiles
Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.
meta
meta2
meta3
meta4
meta5
meta6
meta7
autozyg
coverage
exome
fracsnp
ideogram
regions
sites
meta
versions
plots
Annotate circRNAs detected in the output from CIRCexplorer2 parse
meta
junctions
fasta
gene_annotation
meta
txt
versions
Circular RNA analysis toolkits
CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.
meta
fusions
meta
bed
versions
Circular RNA analysis toolkit
A method to improve mappings on circular genomes, using the BWA mapper.
meta
reference
meta2
elongation_factor
meta3
target
meta
versions
fasta
Creating a modified reference genome, with an elongation of the an specified amount of bases
Realign reads mapped with BWA to elongated reference genome
meta
bam
meta2
fasta
meta3
elongation_factor
meta
bam
versions
A method to improve mappings on circular genomes such as Mitochondria.
Predict recomination events in bacterial genomes
meta
msa
newick
meta
versions
emsim
em
fasta
newick
pos_ref
status
Align sequences using Clustal Omega
meta
fasta
meta2
tree
compress
meta
alignment
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Parallel implementation of the gzip algorithm.
Renders a guidetree in clustalo
meta
fasta
meta
tree
versions
Latest version of Clustal: a multiple sequence alignment program for DNA or proteins
Calculates polymorphic site rates over protein coding genes
meta
bam
bai
gff
fasta
meta
versions
polymut
Set of utilities on sequences and BAM files
Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.
meta
fasta
meta2
exclude_bed
meta
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Derive off-target (“antitarget”) bins from target regions.
meta
targets
meta
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
meta
tumor
normal
meta2
fasta
meta3
fasta_fai
meta4
targets
meta5
reference
panel_of_normals
meta
bed
cnn
cnr
cns
pdf
png
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number
meta
cns
vcf
meta
versions
output
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.
meta
cns
meta
versions
cns
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Copy number variant detection from high-throughput sequencing data
meta
cnr
cns
meta
txt
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Compile a coverage reference from the given files (normal samples).
fasta
targets
antitargets
meta
cnn
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Transform bait intervals into targets more suitable for CNVkit.
meta
baits
meta2
annotation
meta
bed
versions
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.
meta
meta2
meta3
meta4
bam
bai
root
fasta
fai
output_meta
versions
root
tab
Tool for calling copy number variations.
convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.
meta
calls
meta
versions
vcf
Tool for calling copy number variations.
command line tool for calling CNVs in whole genome sequencing data
meta
pytor
bin_sizes
meta
pytor
versions
calling CNVs using read depth
calculates read depth histograms
meta
pytor
bin_sizes
meta
pytor
versions
calling CNVs using read depth
command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.
meta
input_file
index
fasta
fai
meta
pytor
versions
calling CNVs using read depth
partitioning read depth histograms
meta
pytor
bin_sizes
meta
partitions
versions
calling CNVs using read depth
view function to generate vcfs
meta
pytor_files
bin_sizes
output_format
meta
tsv
vcf
xls
versions
calling CNVs using read depth
A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.
meta
fasta
coverage
query
bam
assembler
mink
maxk
meta
extended_assemblies
extended_circular
extended_partial
extended_failed
orphan_end
all_assemblies
joining_summary
log
versions
Builds a classic bloom filter COBS index
meta
input
meta
index
versions
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Builds a compact bloom filter COBS index
meta
input
meta
index
versions
Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples
meta
coverage_file
fasta
meta
versions
args_txt
clustering_csv
log_txt
original_data_csv
pca_components_csv
pca_transformed_csv
Clustering cONtigs with COverage and ComposiTion
Generate the input coverage table for CONCOCT using a BEDFile
meta
bed
bamfiles
baifiles
meta
versions
tsv
Clustering cONtigs with COverage and ComposiTion
Calculate confidence scores from Kraken2 output
meta
kraken_result
kraken_taxon_db
meta
score
versions
Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC
meta
cnvs
ratio
meta
versions
p_value_txt
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Copy number and genotype annotation from whole genome and whole exome sequencing data
args
meta
mateFile_normal
mateFile_tumor
cpn_normal
cpn_tumor
minipileup_normal
minipileup_tumor
fasta
fai
snp_position
known_snps
known_snps_tbi
chr_directory
mappability
target_bed
meta
versions
bedgraph
control_cpn
sample_cpn
gcprofile_cpn
BAF
CNV
info
ratio
config
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
meta
ratio
meta
versions
bed
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Format Freec output to circos input format
meta
ratio
meta
versions
circos
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
meta
ratio
baf
ploidy
meta
versions
png_baf
png_ratio_log2
png_ratio
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Plot Freec output
meta
ratio
baf
meta
versions
png_baf
png_ratio_log2
png_ratio
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Run matrix balancing on a cool file
meta
cool
resolution
meta
versions
cool
Sparse binary format for genomic interaction matrices
Create a cooler from genomic pairs and bins
meta
pairs
index
cool_bin
chromsizes
meta
version
cool
cool_bin
Sparse binary format for genomic interaction matrices
Generate fragment-delimited genomic bins
fasta
chromsizes
enzyme
versions
bed
Sparse binary format for genomic interaction matrices
Dump a cooler’s data to a text stream.
meta
cool
resolution
meta
versions
bedpe
Sparse binary format for genomic interaction matrices
Generate fixed-width genomic bins
chromsize
cool_bin
versions
bed
Sparse binary format for genomic interaction matrices
Merge multiple coolers with identical axes
meta
cool
meta
versions
cool
Sparse binary format for genomic interaction matrices
Generate a multi-resolution cooler file by coarsening
meta
cool
meta
versions
mcool
Sparse binary format for genomic interaction matrices
Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.
image
meta
versions
cores
masks
tma_map
centroids
meta
Compress files with crabz
meta
file
meta
versions
archive
Like pigz, but rust
Decompress files with crabz
meta
archive
meta
versions
file
Like pigz, but rust
remove false positives of functional crispr genomics due to CNVs
meta
count_file
library_file
meta
versions
norm_count_file
Analysis of CRISPR functional genomics, remove false positive due to CNVs.
Concatenate two or more CSV (or TSV) tables into a single table
meta
csv
in_format
out_format
meta
versions
csv
A cross-platform, efficient, practical CSV/TSV toolkit
Join two or more CSV (or TSV) tables by selected fields into a single table
meta
csv
meta
versions
csv
A cross-platform, efficient, practical CSV/TSV toolkit
Splits CSV/TSV into multiple files according to column values
meta
csv
in_format
out_format
meta
versions
split_csv
CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.
Custom module to Add a new fasta file to an old one and update an associated GTF
meta
meta2
fasta
gtf
add_fasta
biotype
meta
fasta
gtf
versions
Custom module to Add a new fasta file to an old one and update an associated GTF
Custom module used to dump software versions within the nf-core pipeline template
versions
yml
mqc_yml
versions
Custom module used to dump software versions within the nf-core pipeline template
Generates a FASTA file of chromosome sizes and a fasta index file
meta
fasta
meta
sizes
fai
gzi
versions
Tools for dealing with SAM, BAM and CRAM files
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
meta
gtf
fasta
meta
gtf
versions
Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file
filter a matrix based on a minimum value and numbers of samples that must pass.
meta
abundance
samplesheet_meta
samplesheet
minimum_abundance
minimum_samples
minimum_proportion
grouping_variable
minimum_proportion_not_na
minimum_samples_not_na
most_variant_features
versions
meta
filtered
tests
filter a matrix based on a minimum value and numbers of samples
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
versions
ncbi_settings
SRA Toolkit and SDK from NCBI
Make a GSEA class file (.cls) from tabular inputs
meta
samples
meta
cls
versions
Make a GSEA class file (.cls) from tabular inputs
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
meta
tabular
meta
gct
versions
Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA
Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.
meta
gtf
meta2
quants
quant_type
id
extra
meta
tx2gene
versions
"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"
Perform adapter/quality trimming on sequencing reads
meta
reads
meta
reads
log
versions
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
structural-variant calling with cutesv
meta
bam
bai
meta2
fasta
meta
vcf
versions
A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage
meta
bam
fasta
fai
specieslist
versions
results
DAS Tool binning step.
meta
contigs
bins
proteins
db_directory
meta
version
log
summary
contig2bin
eval
bins
pdfs
fasta_proteins
fasta_archaea_scg
fasta_bacteria_scg
b6
seqlength
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
meta
fasta
extension
meta
versions
fastatocontig2bin
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format
meta
fasta
extension
meta
versions
scaffolds2bin
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.
meta
config_file
table
versions
report
decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
meta
mat
net
args
meta
dc_estimate
dc_pvals
versions
DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).
meta
bam
meta
versions
bam
json
hist
log
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
NO input
versions
db
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
meta
fasta
model
db
meta
versions
daa
daa_tsv
arg
potential_arg
A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes
Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.
NO input
versions
deepbgc_db
DeepBGC - Biosynthetic Gene Cluster detection and classification
DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.
meta
genome
meta
versions
readme
log
json
bgc_gbk
bgc_tsv
full_gbk
pfam_tsv
bgc_png
pr_png
roc_png
score_png
DeepBGC - Biosynthetic Gene Cluster detection and classification
Deepcell/mesmer segmentation for whole-cell
meta
img
meta2
membrane_img
meta
mask
versions
Deep cell is a collection of tools to segment imaging data
A Deep Learning Model for Transmembrane Topology Prediction and Classification
meta
fasta
meta
gff3
line3
md
csv
png
versions
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
meta
input
input_index
fasta
fasta_fai
meta
versions
bigWig
bedgraph
A set of user-friendly tools for normalization and visualzation of deep-sequencing data
calculates scores per genome regions for other deeptools plotting utilities
meta
bigwig
bed
meta
matrix
table
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Computes read coverage for genomic regions (bins) across the entire genome.
meta
bam
bais
labels
meta
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.
meta
matrix
method
plot_type
meta
pdf
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots cumulative reads coverages by BAM file
meta
bam
bais
meta
pdf
matrix
metrics
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a heatmap
meta
matrix
meta
pdf
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
Generates principal component analysis (PCA) plot using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.
meta
matrix
meta
pdf
tab
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
plots values produced by deeptools_computematrix as a profile plot
meta
matrix
meta
pdf
matrix
versions
A set of user-friendly tools for normalization and visualization of deep-sequencing data
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
meta
input
index
interval
meta2
fasta
meta3
fai
meta4
gzi
meta
vcf
gvcf
version
Call structural variants
meta
input
input_index
vcf
vcf_index
exclude_bed
meta2
fasta
meta3
fai
meta
versions
bcf
csi
Structural variant discovery by integrated paired-end and split-read analysis
Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.
meta
input_raw_gene_bc_matrices_h5
input_hto_csv_file
output_name
generate_gender_plot
genome
generate_diagnostic_plots
meta
zarr
out_zarr
versions
runs a differential expression analysis with DESeq2
meta
contrast_variable
reference
target
meta2
samplesheet
counts
meta3
control_genes_file
meta4
transcript_lengths_file
results
dispersion_plot
rdata
size_factors
normalised_counts
rlog_counts
vst_counts
model
session_info
versions
Differential gene expression analysis based on the negative binomial distribution
Queries a DIAMOND database using blastp mode
meta
fasta
meta2
db
out_ext
blast_columns
meta
blast
xml
txt
daa
sam
tsv
paf
versions
Accelerated BLAST compatible local sequence aligner
Queries a DIAMOND database using blastx mode
meta
fasta
meta2
db
out_ext
blast_columns
meta
blast
xml
txt
daa
sam
tsv
paf
log
versions
Accelerated BLAST compatible local sequence aligner
calculate clusters of highly similar sequences
meta
db
meta
versions
tsv
Accelerated BLAST compatible local sequence aligner
Builds a DIAMOND database
meta
fasta
taxonmap
taxonnodes
taxonnames
meta
db
versions
Accelerated BLAST compatible local sequence aligner
Create DRAGEN hashtable for reference genome
meta
fasta
meta
hashmap
versions
Dragmap is the Dragen mapper/aligner Open Source Software.
Assemble bacterial isolate genomes from Nanopore reads
meta
shortreads
longreads
meta
versions
contigs
log
raw_contigs
txt
gfa
Export assembly segment sequences in GFA 1.0 format to FASTA format
meta
gfa
meta
fasta
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped BED format
meta
bed
meta
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Filter features in gzipped GFF3 format
meta
gff3
meta
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped BED format
meta
bed
meta
bed
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
Split features in gzipped GFF3 format
meta
gff3
meta
gff3
versions
Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.
SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.
meta
aligment_file
aligment_file_index
sv_variants
snp_variants
snp_variants
meta
versions
vcf
Assessment of duplication rates in RNA-Seq datasets
meta
bam
meta2
gtf
meta
scatter2d
boxplot
hist
dupmatrix
intercept_slope
multiqc
session_info
versions
Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.
meta
input
index
fasta
meta2
fai
meta
vcf
tbi
versions
In silico prediction of E. coli serotype
meta
fasta
meta
versions
log
tsv
txt
Fast genome-wide functional annotation through orthology assignment.
meta
fasta
eggnog_db
eggnog_data_dir
eggnog_diamond_db
meta
annotations
orthologs
hits
versions
Convert any PEP project or Nextflow samplesheet to any format
samplesheet
format
pep_input_base_dir
versions
samplesheet_converted
Convert any PEP project or Nextflow samplesheet to any format
Provide the SNP coverage of each individual in an eigenstrat formatted dataset.
meta
geno
snp
ind
meta
versions
tsv
json
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.
meta
bam
run_haplotypecaller
run_bqsr
reference_sequences
filter_regions_bed
reference_elfasta
known_sites
target_regions_bed
intermediate_bqsr_tables
bqsr_tables_only
get_activity_profile
get_assembly_regions
meta
versions
bam
metrics
recall
gvcf
table
activity_profile
assembly_regions
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Merge split bam/sam chunks in one file
meta
bam
meta
versions
bam
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
Split bam file into manageable chunks
meta
bam
meta
versions
bam
elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.
cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.
meta
fasta
meta
consensus
versions
The European Molecular Biology Open Software Suite
the revseq program from emboss reverse complements a nucleotide sequence
meta
sequences
meta
versions
revseq
The European Molecular Biology Open Software Suite
EMM typing of Streptococcus pyogenes assemblies
meta
fasta
meta
versions
tsv
endorS.py calculates endogenous DNA from samtools flagstat files and print to screen
meta
stats_raw
stats_qualityfiltered
stats_deduplicated
meta
versions
json
Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args
.
meta
assembly
species
cache_version
cache
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.
meta
input
feature_file
meta
versions
output
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args
.
meta
vcf
custom_extra_files
genome
species
cache_version
cache
meta2
fasta
extra_files
vcf
tab
json
report
versions
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Searches a term in a public NCBI database
meta
database
term
meta
versions
result_xml
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using Unique Identifier(s)
meta
database
uid
uids_file
meta
versions
xml
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
Queries an NCBI database using an UID
meta
xml_input
pattern
element
sep
meta
versions
xtract_table
Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.
phylogenetic placement of query sequences in a reference tree
meta
queryaln
referencealn
referencetree
bfastfile
binaryfile
meta
epang
jplace
log
versions
Massively parallel phylogenetic placement of genetic sequences
splits an alignment into reference and query parts
meta
refaln
fullaln
meta
query
reference
versions
Massively parallel phylogenetic placement of genetic sequences
estimation of the unfolded site frequency spectrum
meta
e_config
data
seed
meta
versions
sfs_out
pvalues_out
Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly
meta
fasta
meta
dropset
okayset
versions
EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.
Estimate repeat sizes using NGS data
meta
bam
bai
meta2
fasta
meta3
fasta_fai
meta4
variant_catalog
meta
versions
bam
vcf
json
Merge STR profiles into a multi-sample STR profile
meta
manifest
meta2
fasta
meta3
fasta_fai
meta
versions
merged_profiles
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Compute genome-wide STR profile
meta
alignment_file
alignment_index
meta2
fasta
meta3
fasta_fai
meta
versions
locus_tsv
motif_tsv
str_profile
ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
Run falco on sequenced reads
meta
reads
meta
html
txt
txt
versions
falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.
Aligns sequences using FAMSA
meta
fasta
meta2
tree
compress
meta
alignment
versions
Algorithm for large-scale multiple sequence alignments
Renders a guidetree in famsa
meta
fasta
meta
tree
versions
Algorithm for large-scale multiple sequence alignments
Perform adapter and quality trimming on sequencing reads with reporting
meta
reads
meta
versions
reads
reads_fail
reads_unpaired
stats
debug
statspdf
log
tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.
meta
input
hmm_model
meta
versions
log
txt
hmm
orfs
orfs_amino
contigs
contigs_pept
filtered
filtered_pept
fragments
trimmed
spades
metagenome
tmp
"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."
meta
fasta
meta
success_log
error_log
versions
"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"
Quickly compute statistics over a fasta file in windows.
meta
fasta
meta
versions
freq
mononuc
dinuc
trinuc
tetranuc
A fast K-mer counter for high-fidelity shotgun datasets
meta
reads
meta
versions
hist
ktab
prof
A fast K-mer counter for high-fidelity shotgun datasets
A fast K-mer counter for high-fidelity shotgun datasets
meta
histogram
meta
versions
hist
A fast K-mer counter for high-fidelity shotgun datasets
A tool to merge FastK histograms
meta
fastk_hist
fastk_ktab
fastk_prof
meta
versions
fastk_hist
fastk_ktab
fastk_prof
A fast K-mer counter for high-fidelity shotgun datasets
Distance-based phylogeny with FastME
meta
infile
topo
versions
nwk
stats
matrix
bootstrap
Perform adapter/quality trimming on sequencing reads
meta
reads
adapter_fasta
discard_trimmed_pass
save_trimmed_fail
save_merged
meta
reads
json
html
log
versions
reads_fail
reads_merged
Run FastQC on sequenced reads
meta
reads
meta
html
zip
versions
FASTQ summary statistics in JSON format
meta
reads
meta
versions
json
Build fastq screen config file from bowtie index files
genome_names
indexes
versions
database
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Align reads to multiple reference genomes using fastq-screen
meta
reads
database
fastq_screen
versions
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
meta
fastx
meta
versions
fasta
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
Run NCBI's FCS adaptor on assembled genomes
meta
assembly
meta
versions
cleaned_assembly
adaptor_report
log
pipeline_args
skipped_trims
The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.
Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.
meta
assembly
database
meta
versions
fcs_gx_report
taxonomy_report
"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."
Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.
meta
bam
min_reads
min_baseq
meta
versions
bam
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Calls consensus sequences from reads with the same unique molecular tag.
meta
grouped_bam
min_reads
min_baseq
meta
bam
versions
Tools for working with genomic and high throughput sequencing data.
Collects a suite of metrics to QC duplex sequencing data.
meta
grouped_bam
interval_list
meta
versions
family_sizes
duplex_family_sizes
duplex_yield_metrics
umi_counts
duplex_qc
duplex_umi_counts
A set of tools for working with genomic and high throughput sequencing data, including UMIs
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
reads
meta
version
bam
cram
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
meta
bam
meta2
fasta
min_reads
min_baseq
max_base_error_rate
meta
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
meta
bam
strategy
meta
versions
bam
histogram
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.
meta
bam
meta
bam
versions
Tools for working with genomic and high throughput sequencing data.
FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM
meta
mapped_bam
meta2
unmapped_bam
meta3
fasta
meta4
dict
meta
bam
versions
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Filtlong filters long reads based on quality measures or short read data.
meta
shortreads
longreads
meta
versions
reads
log
Perform merging of mate paired-end sequencing reads
meta
reads
meta
merged
notcombined
histogram
versions
De novo assembler for single molecule sequencing reads
meta
reads
mode
meta
versions
fasta
gfa
gv
txt
log
json
Efficient compression tool for protein structures
meta
pdb
meta
fcz
versions
Foldcomp: a library and format for compressing and indexing large protein structure sets
Decompression tool for foldcomp compressed structures
meta
fcz
meta
pdb
versions
Foldcomp: a library and format for compressing and indexing large protein structure sets
Create a database from protein structures
meta
pdb
meta
db
versions
Foldseek: fast and accurate protein structure search
Search for protein structural hits against a foldseek database of protein structures
meta
pdb
meta_db
db
meta
aln
versions
Foldseek: fast and accurate protein structure search
fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.
meta
meta
fastq
versions
fq is a library to generate and validate FASTQ file pairs.
fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.
meta
fastq
meta
versions
fastq
fq is a library to generate and validate FASTQ file pairs.
Demultiplex fastq files
meta
sample_sheet
fastq_readstructure_pairs
meta
versions
sample_fastq
metrics
most_frequent_unmatched
A haplotype-based variant detector
meta
input_1
input_1_index
input_2
input_2_index
target_bed
ref_meta
fasta
ref_idx_meta
fasta_fai
samples_meta
samples
populations_meta
populations
cnv_meta
cnv
meta
versions
vcf
Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.
meta
variants
depths
repeats
barcodes
lineages_meta
meta
lineages
summarized
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
specify the relative abundance of each known haplotype
meta
variants
depths
barcodes
lineages_meta
meta
demix
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
downloads new versions of the curated SARS-CoV-2 lineage file and barcodes
db_name
barcodes
lineages_topology
lineages_meta
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
call variant and sequencing depth information of the variant
meta
bam
fasta
meta
variants
depths
versions
Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.
Cluster genome FASTA files by average nucleotide identity
meta
bins
qc_table
qc_format
meta
tsv
dereplicated_bins
versions
Gene Allele Mutation Microbial Assessment
meta
fasta
db
meta
versions
gamma
psl
gff
fasta
Tool for Gene Allele Mutation Microbial Assessment
Build ganon database using custom reference sequences.
meta
input
taxonomy_files
genome_size_files
meta
versions
db
info
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Classify FASTQ files against ganon database
meta
fastqs
db
meta
versions
tre
report
one
all
unc
log
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a ganon report file from the output of ganon classify
meta
rep
db
meta
versions
tre
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
Generate a multi-sample report file from the output of ganon report runs
meta
tre
meta
versions
txt
ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently
assigns taxonomy to query sequences in phylogenetic placement output
meta
jplace
meta
examineassign
profile
labelled_tree
per_query
versions
Genesis Applications for Phylogenetic Placement Analysis
grafts query sequences from phylogenetic placement on the reference tree
meta
jplace
meta
versions
newick
Genesis Applications for Phylogenetic Placement Analysis
colours a phylogeny with placement densities
meta
jplace
meta
versions
newick
nexus
phyloxml
svg
colours
log
Genesis Applications for Phylogenetic Placement Analysis
Performs local realignment around indels to correct for mapping errors
meta
bam
bai
intervals
meta2
fasta
meta3
fai
meta4
dict
meta5
known_vcf
meta
versions
bam
bai
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Generates a list of locations that should be considered for local realignment prior genotyping.
meta
bam
bai
meta2
fasta
meta3
fai
meta4
dict
meta5
known_vcf
meta
versions
intervals
The full Genome Analysis Toolkit (GATK) framework, license restricted.
SNP and Indel variant caller on a per-locus basis
meta
bam
bai
meta2
fasta
meta3
fai
meta4
dict
meta5
intervals
meta6
contamination
meta7
dbsnp
meta8
comp
meta
versions
vcf
The full Genome Analysis Toolkit (GATK) framework, license restricted.
Assigns all the reads in a file to a single new read-group
meta
meta2
meta3
bam
fasta
fasta_index
meta
versions
bam
bai
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Annotates intervals with GC content, mappability, and segmental-duplication content
meta
intervals
meta2
fasta
meta3
fasta_fai
meta4
dict
meta5
mappable_regions
meta6
mappable_regions_tbi
meta7
segmental_duplication_regions
meta8
segmental_duplication_regions_tbi
meta
versions
annotated_intervals
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
meta
vcf
vcf_tbi
recal
recal_index
tranches
fasta
fai
dict
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the allele-specific read counts for alle-specific expression analysis of RNAseq data
meta
input
input_index
vcf
tbi
meta2
fasta
meta3
fai
meta4
dict
intervals
meta
versions
csv
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an interval list from a bed file and a reference dict
meta
bed
meta2
dict
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
meta
pileup
matched
contamination
segmentation
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
estimates the parameters for the DRAGstr model
meta
bam
bam_index
intervals
fasta
fasta_fai
dict
strtablefile
meta
versions
dragstr_model
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply a Convolutional Neural Net to filter annotated variants
meta
vcf
tbi
aligned_input
intervals
fasta
fai
dict
architecture
weights
meta
versions
vcf
tbi
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
meta
meta2
meta3
meta4
input
input_index
intervals
fasta
fai
dict
meta
versions
hdf5
tsv
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
meta
input
input_index
site_depth_vcf
site_depth_vcf_index
fasta
fasta_fai
dict
meta
versions
split_read_evidence
split_read_evidence_index
paired_end_evidence
paired_end_evidence_index
site_depths
site_depths_index
Genome Analysis Toolkit (GATK4)
Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
meta
vcf
vcf_idx
fasta
fai
dict
combined_gvcf
versions
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.
fasta
fasta_fai
dict
versions
str_table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges adjacent DepthEvidence records
meta
depth_evidence
depth_evidence_index
fasta
fasta_fai
dict
meta
versions
condensed_evidence
condensed_evidence_index
Genome Analysis Toolkit (GATK4)
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.
meta
counts
meta
versions
pon
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates a sequence dictionary for a reference sequence
meta
fasta
dict
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a panel of normals contraining germline and artifactual sites for use with mutect2.
meta
genoomicsdb
meta2
fasta
meta3
fai
meta4
dict
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Denoises read counts to produce denoised copy ratios
meta
meta2
counts
pon
meta
versions
standardized
denoised
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Determines the baseline contig ploidy for germline samples given counts data
meta
meta2
counts
bed
exclude_beds
contig_ploidy_table
ploidy_model
meta
versions
calls
model
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Estimates the numbers of unique molecules in a sequencing library.
meta
input
fasta
fai
dict
meta
versions
metrics
Genome Analysis Toolkit (GATK4)
Converts FastQ file to SAM/BAM format
meta
reads
meta
bam
versions
Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters intervals based on annotations and/or count statistics.
meta
intervals
meta2
read_counts
meta3
annotated_intervals
meta
versions
interval_list
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
meta
vcf
vcf_tbi
stats
orientationbias
segmentation
table
estimate
meta2
fasta
meta3
fai
meta4
dict
vcf
tbi
stats
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply tranche filtering
meta
vcf
tbi
resources
resources_index
fasta
fai
dict
meta
versions
vcf
tbi
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Gathers scattered BQSR recalibration reports into a single file
meta
table
meta
table
versions
Genome Analysis Toolkit (GATK4)
write your description here
meta
pileup
meta
table
versions
Genome Analysis Toolkit (GATK4)
merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
meta
vcf
tbi
wspace
interval_file
interval_value
run_intlist
run_updatewspace
input_map
genomicsdb
updatedb
intervallist
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.
meta
gvcf
gvcf_index
intervals
intervals_index
fasta
fai
dict
dbsnp
dbsnp_tbi
meta
vcf
tbi
versions
Genome Analysis Toolkit (GATK4)
Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.
meta
tsv
intervals
model
ploidy
meta
versions
cohortcalls
cohortmodel
casecalls
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
meta
input
input_index
intervals
meta2
fasta
meta3
fai
meta4
dict
variants
variants_tbi
pileup
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call germline SNPs and indels via local re-assembly of haplotypes
meta
input
input_index
intervals
dragstr_model
meta2
fasta
meta3
fai
meta4
dict
meta5
dbsnp
meta6
dbsnp_tbi
meta
versions
vcf
tbi
bam
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Creates an index for a feature file, e.g. VCF or BED file.
meta
feature_file
meta
index
versions
Genome Analysis Toolkit (GATK4)
Converts an Picard IntervalList file to a BED file.
meta
interval
meta
bed
versions
Genome Analysis Toolkit (GATK4)
Splits the interval list file into unique, equally-sized interval files and place it under a directory
meta
interval_list
meta
versions
interval_list
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
meta
f1r2
artifactprior
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Left align and trim variants using GATK4 LeftAlignAndTrimVariants.
meta
vcf
tbi
intervals
fasta
fai
dict
meta
versions
vcf
tbi
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fasta_fai
meta
versions
bam
cram
bai
crai
metrics
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merge unmapped with mapped BAM files
meta
aligned
unaligned
meta2
fasta
meta3
dict
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Merges mutect2 stats generated on different intervals/regions
meta
stats
meta
versions
stats
Genome Analysis Toolkit (GATK4)
Merges several vcf files
meta
vcf
meta2
dict
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Call somatic SNVs and indels via local assembly of haplotypes.
meta
input
input_index
intervals
meta2
fasta
meta3
fai
meta4
dict
germline_resource
germline_resource_tbi
panel_of_normals
panel_of_normals_tbi
vcf
tbi
stats
f1r2
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
meta
ploidy
calls
model
meta
versions
denoised
segments
intervals
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Prepares bins for coverage collection.
meta
fasta
meta2
fai
meta3
dict
meta4
intervals
meta5
exclude_intervals
meta
versions
interval_list
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Print reads in the SAM/BAM/CRAM file
meta
input
index
meta2
fasta
meta3
fai
meta4
dict
meta
versions
bam
cram
sam
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.
meta
evidence_files
evidence_indices
bed
fasta
fasta_fai
dict
meta
versions
printed_evidence
printed_evidence
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Condenses homRef blocks in a single-sample GVCF
meta
gvcf
tbi
intervals
fasta
fai
dict
dbsnp
dbsnp_tbi
meta
versions
gvcf
tbi
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Reverts SAM or BAM files to a previous state.
meta
bam
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Converts BAM/SAM file to FastQ format
meta
bam
fastq
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Select a subset of variants from a VCF file
meta
vcf
vcf_idx
intervals
meta
vcf
vcf_tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Create a fasta with the bases shifted by offset
meta
fasta
meta2
fasta_fai
meta3
dict
meta
versions
dict
intervals
shift_back_chain
shift_fa
shift_intervals
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence
meta
site_depths
site_depths_indices
vcf
tbi
fasta
fasta_fai
dict
meta
versions
baf
baf_tbi
Genome Analysis Toolkit (GATK4)
Splits CRAM files efficiently by taking advantage of their container based structure
meta
cram
meta
versions
split_crams
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Split intervals into sub-interval files.
meta
interval
meta2
fasta
meta3
fai
meta4
dict
meta
bed
versions
Genome Analysis Toolkit (GATK4)
Splits reads that contain Ns in their cigar string
meta
bam
bai
intervals
meta2
fasta
meta3
fai
meta4
dict
bam
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.
meta
vcf
tbi
bed
fasta
fasta_fai
dict
meta
versions
annotated_vcf
index
Genome Analysis Toolkit (GATK4)
Clusters structural variants based on coordinates, event type, and supporting algorithms
meta
vcfs
indices
ploidy_table
fasta
fasta_fai
dict
meta
versions
clustered_vcf
clustered_vcf_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Filter variants
meta
vcf
vcf_tbi
meta2
fasta
meta3
fai
meta4
dict
vcf
tbi
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
meta
vcf
tbi
resource_vcf
resource_tbi
labels
fasta
fai
dict
recal
idx
tranches
plots
version
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Apply base quality score recalibration (BQSR) to a bam file
meta
input
input_index
bqsr_table
intervals
fasta
fai
dict
meta
versions
bam
cram
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Generate recalibration table for Base Quality Score Recalibration (BQSR)
meta
input
input_index
intervals
fasta
fai
dict
known_sites
known_sites_tbi
meta
versions
table
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
meta
bam
fasta
fai
dict
meta
versions
output
bam_index
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.
meta
input
program_file
meta
versions
output
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
meta
input
hmm
model_dir
meta
versions
genes
features
clusters
gbk
json
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
Convert a mappability file to bedgraph format
meta
map
meta2
index
meta
versions
bedgraph
sizes
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
meta
fasta
meta
versions
gem
log
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Define the mappability of a reference
meta
index
read_length
meta
versions
map
GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.
Create a GEM index from a FASTA file
meta
fasta
meta
versions
gem
info
The GEM indexer (v3).
Performs fastq alignment to a fasta reference using using gem3-mapper
meta
meta2
fastq
gem
sort_bam
meta
versions
bam
The GEM indexer (v3).
A derivative of GenomeScope2.0 modified to work with FastK
meta
fastk_histex_histogram
meta
versions
linear_plot
transformed_linear_plot
log_plot
transformed_log_plot
model
summary
kmer_cov
create index file for genmap
meta
fasta
meta
versions
index
Ultra-fast computation of genome mappability.
create mappability files for a genome
meta
index
meta2
regions
meta
versions
wig
bedgraph
txt
csv
Ultra-fast computation of genome mappability.
for annotating regions, frequencies, cadd scores
meta
input_vcf
meta
versions
vcf
Annotate genetic inheritance models in variant files
Score compounds
meta
input_vcf
meta
versions
vcf
Annotate genetic inheritance models in variant files
annotate models of inheritance
meta
input_vcf
reduced_penetrance
family_file
meta
versions
vcf
Annotate genetic inheritance models in variant files
Score the variants of a vcf based on their annotation
meta
input_vcf
family_file
score_config
meta
versions
vcf
Annotate genetic inheritance models in variant files
Download geNomad databases and related files
NO input
versions
genomad_db
Identification of mobile genetic elements
Identify mobile genetic elements present in genomic assemblies
meta
fasta
genomad_db
score_calibration
meta
aggregated_classification
taxonomy
provirus
compositions
calibrated_classification
plasmid_fasta
plasmid_genes
plasmid_proteins
plasmid_summary
virus_fasta
virus_genes
virus_proteins
virus_summary
versions
Identification of mobile genetic elements
Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach
meta
histogram
meta
versions
linear_plot_png
linear_plot_png
transformed_linear_plot_png
log_plot_png
transformed_log_plot_png
model
summary
lookup_table
fitted_histogram_png
Genotype Salmonella Typhi from Mykrobe results
meta
json
meta
versions
tsv
Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)
Peak-calling for ChIP-seq and ATAC-seq enrichment experiments
meta
treatment_bam
control_bam
blacklist_bed
save_pvalues
save_pileup
save_bed
save_duplicates
meta
peaks
bedgraph_pvalues
bedgraph_pileup
bed_intervals
duplicates
version
geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA
geo_accession
versions
geo_accession
samples
Retrieves GEO data from the Gene Expression Omnibus (GEO)
meta
querygse
rds
expression
annotation
versions
Get data from NCBI Gene Expression Omnibus (GEO)
Downloads databases needed for running getorganelle
organelle_type
versions
organelle_type
db
Get organelle genomes from genome skimming data
Assembles organelle genomes from genomic data
meta
fastq
organelle_type
db
meta
versions
fasta
etc
Get organelle genomes from genome skimming data
Collapse walk-preserving shared affixes in variation graphs in GFA format
meta
gfa
meta
gfa
affixes
versions
A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
meta
assembly
out_fmt
genome_size
target
agpfile
include_bed
exclude_bed
instructions
meta
versions
assembly_summary
assembly
Converts GFA or rGFA files to FASTA
meta
gfa
meta
versions
fasta
Tools for manipulating sequence graphs in the GFA and rGFA formats
Summary statistics for GFA files
meta
gfa
meta
versions
stats
Tools for manipulating sequence graphs in the GFA and rGFA formats
Compare, merge, annotate and estimate accuracy of generated gtf files
meta
gtfs
fasta
fai
reference_gtf
meta
annotated_gtf
combined_gtf
tmap
refmap
loci
stats
tracking
versions
Validate, filter, convert and perform various other operations on GFF files
meta
gff
fasta
meta
gtf
gffread_gff
gffread_fasta
versions
gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
meta
files
meta
versions
output
file
gget enables efficient querying of genomic databases
Defines chunks where to run imputation
meta
input
region
meta
versions
txt
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.
meta
region
freq
truth
estimate
min_prob
min_dp
bins
meta
versions
errors_cal
errors_grp
errors_spl
rsquared_grp
rsquared_spl
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Concatenates imputation chunks in a single VCF/BCF file ligating phased information.
meta
input_list
input_index
meta
versions
merged_variants
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
meta
input
input_index
samples_file
input_region
output_region
reference
reference_index
map
meta
versions
phased_variants
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Generates haplotype calls by sampling haplotype estimates
meta
input
meta
versions
haplo_sampled
GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.
Defines chunks where to run imputation
meta
input
input_index
region
meta2
map
model
meta
versions
chunk_chr
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Program to compute the genotyping error rate at the sample or marker level.
meta
region
freq
truth
estimate
samples
groups
bins
ac_bins
allele_counts
min_val_gl
min_val_dp
meta
versions
errors_cal
errors_grp
errors_spl
rsquare_grp
rsquare_spl
rsquare_per_site
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.
meta
input_list
input_index
meta
versions
merged_variants
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool for imputation and phasing from vcf file or directly from bam files.
meta
input
input_index
samples_file
input_region
output_region
meta2
reference
reference_index
map
fasta_reference
fasta_reference_index
meta
versions
phased_variants
stats_coverage
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Tool to create a binary reference panel for quick reading time.
meta
reference
reference_index
input_region
output_region
meta2
map
meta
versions
bin_ref
GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.
Writes a sorted concatenation of file/s
meta
input
meta
sorted
versions
Writes a sorted concatenation of file/s
Split a file into consecutive or interleaved sections
meta
input
meta
split
versions
The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.
Query metadata for any taxon across the tree of life.
meta
taxon
taxa_file
meta
versions
taxonsearch
goat-cli is a command line interface to query the Genomes on a Tree Open API.
Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
meta
bams
indexes
fai
meta
output
bams
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
runs a functional enrichment analysis with gprofiler2
meta
de_file
contrast_variable
reference
target
background_file
gmt_file
meta
all_enrich
rds
plot_png
plot_html
sub_enrich
sub_plot
filtered_gmt
session_info
versions
An R interface corresponding to the 2019 update of g:Profiler web tool.
Checks if the input file is bgzip compressed or not
input
versions
compress_bgzip
a wee tool for random access into BGZF files.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
fasta
gmidx
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Tools for population-scale genotyping using pangenome graphs.
meta
bam
bai
ref
ref_fai
region_file
meta
versions
vcf
tbi
A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.
Tools for population-scale genotyping using pangenome graphs.
meta
vcf
meta
versions
vcf
tbi
A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
meta
inputs
fasta
fasta_fai
bwa_index
meta
versions
vcf
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
meta
meta2
meta3
meta4
vcf
bedpe
bed
fasta
fasta_fai
bwa_index
meta
versions
bedpe
bed
GRIDSS: the Genomic Rearrangement IDentification Software Suite
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.
meta
meta2
vcf
pondir
meta
versions
high_conf_sv
all_sv
GRIDSS: the Genomic Rearrangement IDentification Software Suite
run the Broad Gene Set Enrichment tool in GSEA mode
meta
gct
cls
gene_sets
reference
target
chip
meta
rpt
index_html
heat_map_corr_plot
report_tsvs_ref
report_htmls_ref
report_tsvs_target
report_htmls_target
ranked_gene_list
gene_set_sizes
butterfly_plot
histogram
heatmap
pvalues_vs_nes_plot
ranked_list_corr
gene_set_tsv
gene_set_html
gene_set_heatmap
snapshot
gene_set_enplot
gene_set_dist
archive
versions
Gene Set Enrichment Analysis (GSEA)
Collapse redundant transcript models in Iso-Seq data.
meta
bam
fasta
meta
versions
bed
bed_trans_reads
local_density_error
polya
read
strand_check
trans_report
varcov
variants
Collapse similar gene model
Merge multiple transcriptomes while maintaining source information.
meta
bed
meta
bed
gene_report
merge
trans_report
versions
Gene-Switch Transcriptome Annotation by Modular Algorithms
Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)
meta
fasta
meta
versions
fasta
report
tails
Gene-Switch Transcriptome Annotation by Modular Algorithms
GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files
meta
gff3
meta
gt_gff3
error_log
versions
The GenomeTools genome analysis system
GenomeTools gt-gff3validator utility to strictly validate a GFF3 file
meta
gff3
meta
success_log
error_log
versions
The GenomeTools genome analysis system
Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility
meta
index
meta
tabout
gff3
fasta
inner_fasta
versions
The GenomeTools genome analysis system
GenomeTools gt-stat utility to show statistics about features contained in GFF3 files
meta
gff3
meta
stats
versions
The GenomeTools genome analysis system
Computes enhanced suffix array using GenomeTools gt-suffixerator utility
meta
fasta
mode
meta
index
versions
The GenomeTools genome analysis system
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
meta
bins
database
mash_db
meta
versions
summary
tree
markers
msa
user_msa
filtered
log
warnings
failed
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
alignment
versions
fasta
embl_predicted
gff
embl_branch
vcf
stats
phylip
tree
tree_labelled
Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes
db_name
versions
db
Python package for detection of chimerism and contamination in prokaryotic genomes.
Merging of CheckM and GUNC results in one summary table
meta
gunc_file
checkm_file
meta
versions
tsv
Python package for detection of chimerism and contamination in prokaryotic genomes.
Detection of Chimerism and Contamination in Prokaryotic Genomes
meta
fasta
db
meta
versions
maxcss_levels_tsv
all_levels_tsv
Python package for detection of chimerism and contamination in prokaryotic genomes.
Compresses and decompresses files.
meta
archive
gunzip
versions
Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.
meta
gvcf
meta
versions
vcf
gvcftools is a package of small utilities for creating and analyzing gVCF files
Tool to convert and summarize ABRicate outputs using the hAMRonization specification
meta
report
format
software_version
reference_db_version
meta
versions
json
tsv
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.
meta
report
format
software_version
reference_db_version
meta
versions
json
tsv
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize DeepARG outputs using the hAMRonization specification
meta
report
format
software_version
reference_db_version
meta
versions
json
tsv
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize fARGene outputs using the hAMRonization specification
meta
report
format
software_version
reference_db_version
meta
versions
json
tsv
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to convert and summarize RGI outputs using the hAMRonization specification.
meta
report
format
software_version
reference_db_version
meta
versions
json
tsv
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Tool to summarize and combine all hAMRonization reports into a single file
reports
format
versions
json
tsv
html
Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification
Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.
meta
vcf
meta
versions
txt
html
classification into haplogroups
meta
inputfile
format
meta
versions
txt
A tool for mtDNA haplogroup classification.
Somatic VCF Feature Extraction tool from hap.y.
meta
meta2
meta3
vcf
regions_bed
targets_bed
bam
fasta
fasta_fai
meta
features
versions
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.
meta
query_vcf
truth_vcf
regions_bed
targets_bed
fasta
fasta_fai
false_positives_bed
stratification_tsv
stratification_beds
meta
summary_csv
roc_all_csv
roc_indel_locations_csv
roc_indel_locations_pass_csv
roc_snp_locations_csv
roc_snp_locations_pass_csv
extended_csv
json
runinfo
vcf
tbi
versions
Haplotype VCF comparison tools
Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py
meta
meta2
meta3
vcf
bed
fasta
fasta_fai
meta
vcf
versions
Haplotype VCF comparison tools
Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.
meta
query_vcf
truth_vcf
regions_bed
targets_bed
fasta
fasta_fai
false_positives_bed
stratification_tsv
bams
meta
features
metrics
stats
versions
Haplotype VCF comparison tools somatic variant comparison
Identify cap locus serotype and structure in your Haemophilus influenzae assemblies
meta
fasta
database_dir
model_fp
meta
versions
gbk
svg
tsv
Computes PCA eigenvectors for a Hi-C matrix.
meta
matrix
meta
versions
results
pca1
pca2
Set of programs to process, analyze and visualize Hi-C and capture Hi-C data
Whole-genome assembly using PacBio HiFi reads
meta
reads
paternal_kmer_dump
maternal_kmer_dump
use_parental_kmers
hic_read1
hic_read2
meta
versions
raw_unitigs
processed_unitigs
primary_contigs
alternate_contigs
paternal_contigs
maternal_contigs
corrected_reads
source_overlaps
reverse_overlaps
log
Align RNA-Seq reads to a reference with HISAT2
meta
reads
meta2
index
meta3
splicesites
meta
bam
summary
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Builds HISAT2 index for reference genome
meta
fasta
meta2
gtf
meta3
splicesites
meta
index
versions
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Extracts splicing sites from a gtf files
meta
gtf
meta
versions
splicesites
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
Pre-compute the graph index structure.
graph
versions
folder
HLA typing from short and long reads
Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.
meta
bam
graph
meta
versions
folder
HLA typing from short and long reads
gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference
meta
fasta
meta
versions
wig
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.
meta
fasta
meta
versions
bigwig
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file
meta
bigwig
meta
versions
wig
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
readCounter function from HMMcopy utilities, used to generate read in windows
meta
bam
meta
versions
wig
C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy
Mask multiple sequence alignments
meta
unmaskedaln
fmask_rf
fmask_all
gmask_rf
gmask_all
pmask_rf
pmask_all
maskfile
meta
versions
maskedaln
fmask_rf
fmask_all
gmask_rf
gmask_all
pmask_rf
pmask_all
Biosequence analysis using profile hidden Markov models
reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.
meta
seqfile
meta
versions
seqreformated
Biosequence analysis using profile hidden Markov models
hmmalign from the HMMER suite aligns a number of sequences to an HMM profile
meta
fasta
hmm
meta
versions
sthlm
Biosequence analysis using profile hidden Markov models
create an hmm profile from a multiple sequence alignment
meta
alignment
mxfile
versions
hmm
Biosequence analysis using profile hidden Markov models
extract hmm from hmm database file or create index for hmm database
meta
hmm
key
keyfile
meta
versions
hmm
index
Biosequence analysis using profile hidden Markov models
search profile(s) against a sequence database
meta
hmmfile
seqdb
write_align
write_target
write_domain
meta
versions
output
alignments
target_summary
domain_summary
Biosequence analysis using profile hidden Markov models
Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline
meta
vcf
meta
versions
vcf
Human mitochondrial variants annotation using HmtVar.
Annotate peaks with HOMER suite
meta
peaks
fasta
gtf
meta
annotated_peaks
annotation_stats
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Find peaks with HOMER suite
meta
tagDir
meta
peaks
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Create a tag directory with the HOMER suite
meta
bam
fasta
meta
tagdir
taginfo
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Differential gene expression analysis based on the negative binomial distribution
Empirical Analysis of Digital Gene Expression Data in R
Create a UCSC bed graph with the HOMER suite
meta
tagDir
meta
bedGraph
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Coverting from HOMER peak to BED file formats
meta
tagDir
meta
bed
versions
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
Serotype prediction of Haemophilus parasuis assemblies
meta
fasta
meta
versions
tsv
count how many reads map to each feature
meta
meta2
input
index
gtf
meta
txt
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.
meta
vcf
tbi
meta2
background_vcf
background_tbi
meta
versions
tsv
useful command-line tools written to show-case hts-nim
HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.
meta
reads
meta2
umis
meta
dedup
annotated
stats
log
versions
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals
wigs
gc_wig
map_wig
centromere
versions
rds
txt
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA
meta
wig
gc_wig
map_wig
panel_of_normals
centromere
meta
versions
cna_seg
ichorcna_params
genome_plot
Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
Plot a metagene of cross-link events/sites around various transcriptomic landmarks.
meta
bed
segmentation
meta
tsv
versions
Runs iCount peaks on a BED file of crosslinks
meta
bed
sigxls
meta
peaks
versions
Computational pipeline for analysis of iCLIP data
Formats a GTF file for use with iCount sigxls
meta
gtf
fai
gtf
versions
Computational pipeline for analysis of iCLIP data
Runs iCount sigxls on a BED file of crosslinks
meta
bed
segmentation
meta
peaks
scores
versions
Computational pipeline for analysis of iCLIP data
Report proportion of cross-link events/sites on each region type.
meta
bed
segmentation
meta
summary_type
summary_subtype
summary_gene
versions
Computational pipeline for analysis of iCLIP data
Demultiplex paired-end FASTQ files from QuantSeq-Pool
meta
reads
samplesheet
versions
fastq
undetermined
stats
igv.js is an embeddable interactive genome visualization component
meta
alignment
index
meta
browser
align_files
index_files
versions
Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as teh genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements
A Python application to generate self-contained HTML reports for variant review and other genomic applications
meta
sites
tracks
tracks_indices
meta2
fasta
fai
versions
report
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
meta
h5
meta2
ilp
meta3
probs
meta
versions
out_tiff
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.
meta
input_img
meta2
ilp
meta
versions
output
Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.
inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification
meta
bam
genome_fasta
genes_fasta
stb_file
meta
versions
profile
Calculation of strain-level metrics
Produces protein annotations and predictions from an amino acids FASTA file
meta
fasta
interproscan_database
tsv
xml
gff3
json
versions
Download, extract, and check md5 of iPHoP databases
NO input
iphop_db
versions
Predict host genus from genomes of uncultivated phages.
Predict phage host using iPHoP
meta
fasta
iphop_db
meta
versions
iphop_genus
iphop_genome
iphop_detailed_output
Predict host genus from genomes of uncultivated phages.
Produces a Newick format phylogeny from a multiple sequence alignment using the maxium likelihood algorithm. Capable of bacterial genome size alignments.
meta
alignment
tree
tree_te
lmclust
mdef
partitions_equal
partitions_proportional
partitions_unlinked
guide_tree
sitefreq_in
constraint_tree
trees_z
suptree
trees_rf
phylogeny
report
mldist
lmap_svg
lmap_eps
lmap_quartetlh
sitefreq_out
bootstrap
state
contree
nex
splits
suptree
alninfo
partlh
siteprob
sitelh
treels
rate
mlrate
exch_matrix
log
versions
Genomic island prediction in bacterial and archaeal genomes
meta
genome
gff
log
versions
Identify insertion sites positions in bacterial genomes
meta
reads
reference
query
meta
versions
results
IsoSeq - Cluster - Cluster trimmed consensus sequences
meta
bam
meta
versions
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
IsoSeq - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
meta
bam
primers
meta
bam
pbi
consensusreadset
summary
report
versions
IsoSeq - Scalable De Novo Isoform Discovery
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
meta
bam
meta
version
bam
pbi
cluster
cluster_report
transcriptset
hq_bam
hq_pbi
lq_bam
lq_pbi
singletons_bam
singletons_pbi
IsoSeq3 - Cluster - Cluster trimmed consensus sequences
Remove polyA tail and artificial concatemers
meta
bam
primers
meta
bam
pbi
consensusreadset
summary
report
versions
IsoSeq3 - Scalable De Novo Isoform Discovery
Extract UMI and cell barcodes
meta
bam
design
meta
versions
bam
pbi
Iso-Seq - Scalable De Novo Isoform Discovery
Generate a consensus sequence from a BAM file using iVar
meta
bam
fasta
save_mpileup
meta
fasta
qual
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Trim primer sequences rom a BAM file with iVar
meta
bam
bai
bed
meta
bam
log
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Call variants from a BAM file using iVar
meta
bam
fasta
fai
gff
save_mpileup
meta
tsv
mpileup
versions
iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.
Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.
meta
notebook
parameters
input_files
meta
report
versions
Jupyter notebooks as plain text scripts or markdown documents
Parameterize, execute, and analyze notebooks
Parameterize, execute, and analyze notebooks
Taxonomic classification of metagenomic sequence data using a protein reference database
meta
reads
db
meta
versions
results
Fast and sensitive taxonomic classification for metagenomics
Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.
meta
tsv
meta
versions
txt
Fast and sensitive taxonomic classification for metagenomics
write your description here
meta
results
taxon_rank
meta
versions
results
Fast and sensitive taxonomic classification for metagenomics
Merge two tab-separated output files of Kaiju and Kraken in the column format
meta
kaiju
kraken
db
meta
merged
versions
Fast and sensitive taxonomic classification for metagenomics
Make Kaiju FMI-index file from a protein FASTA file
meta
fasta
meta
versions
fmi
Fast and sensitive taxonomic classification for metagenomics
Aligns sequences using kalign
meta
fasta
compress
meta
alignment
versions
Kalign is a fast and accurate multiple sequence alignment algorithm.
Create kallisto index
meta
fasta
meta
index
versions
Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
Computes equivalence classes for reads and quantifies abundances
meta
reads
index
gtf
chromosomes
fragment_length
fragment_length_sd
meta
versions
log
abundance
abundance_hdf5
run_info
Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
quantifies scRNA-seq data from fastq files using kb-python.
meta
reads
index
t2g
t1c
t2c
workflow_mode
technology
meta
count
versions
kallisto and bustools are wrapped in an easy-to-use program called kb
index creation for kb count quantification of single-cell data.
fasta
gtf
workflow_mode
versions
kb_ref_idx
t2g
cdna
intron
cdna_t2c
intron_t2c
kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.
Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.
pe_reads
se_reads
name
versions
reads
khmer k-mer counting library
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
fasta
kmer_size
report
kmers
versions
khmer k-mer counting library
Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).
meta
fastas
meta
versions
txt
Generate k-mers (sketches) from FASTA/Q sequences
meta
sequences
meta
versions
outdir
info
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Construct KMCP database from k-mer files
meta
compute_dir
meta
versions
kmcp
log
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Merge search results from multiple databases.
meta
search_out
meta
versions
result
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Generate taxonomic profile from search results
meta
search_results
db
meta
versions
profile
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Search sequences against database
meta
reads
db
meta
result
versions
Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping
Adds fasta files to a Kraken2 taxonomic database
meta
fasta
taxonomy_names
taxonomy_nodes
accession2taxid
meta
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Builds Kraken2 database
meta
db
cleaning
meta
db
versions
Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Classifies metagenomic sequence data
meta
reads
db
save_output_fastqs
save_reads_assignment
meta
classified_reads_fastq
unclassified_reads_fastq
classified_reads_assignment
report
versions
Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads
Takes multiple kraken-style reports and combines them into a single report file
meta
kreports
meta
versions
txt
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Extract reads classified at any user-specified taxonomy IDs.
meta
taxid
classified_reads_assignment
classified_reads_fastq
report
meta
extracted_kraken2_reads
versions
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Takes a Kraken report file and prints out a krona-compatible TEXT file
meta
kreport
meta
versions
krona
KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.
Download and build (custom) KrakenUniq databases
meta
custom_library_dir
custom_taxonomy_dir
meta
versions
db
Metagenomics classifier with unique k-mer counting for more specific results
Download KrakenUniq databases and related fles
meta
pattern
meta
versions
output
Metagenomics classifier with unique k-mer counting for more specific results
Classifies metagenomic sequence data using unique k-mer counts
meta
sequences
sequence_type
db
ram_chunk_size
save_output_reads
report_file
save_output
meta
classified_reads
unclassified_reads
classified_assignment
report
versions
Metagenomics classifier with unique k-mer counting for more specific results
KronaTools Update Taxonomy downloads a taxonomy database
NO input
versions
db
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.
meta
database
report
versions
html
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Creates a Krona chart from text files listing quantities and lineages.
meta
report
meta
versions
html
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
KronaTools Update Taxonomy downloads a taxonomy database
NO input
versions
db
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
Aligns query sequences to target sequences indexed with lastdb
meta
fastx
param_file
index
meta
versions
maf
multiqc
LAST finds & aligns related regions of sequences.
Prepare sequences for subsequent alignment with lastal.
meta
fastx
meta
versions
index
LAST finds & aligns related regions of sequences.
Converts MAF alignments in another format.
meta
maf
format
meta
versions
axt_gz
blast_gz
blasttab_gz
chain_gz
gff_gz
html_gz
psl_gz
sam_gz
tab_gz
LAST finds & aligns related regions of sequences.
Reorder alignments in a MAF file
meta
maf
meta
versions
maf
LAST finds & aligns related regions of sequences.
Post-alignment masking
meta
maf
meta
versions
maf
LAST finds & aligns related regions of sequences.
Find suitable score parameters for sequence alignment
meta
fastx
index
meta
versions
param_file
multiqc
LAST finds & aligns related regions of sequences.
Align sequences using learnMSA
meta
fasta
compress
meta
alignment
versions
learnMSA: Learning and Aligning large Protein Families
Bayesian reconstruction of ancient DNA fragments
meta
reads
meta
versions
bam
fq_pass
fq_fail
unmerged_r1_fq_pass
unmerged_r1_fq_fail
unmerged_r2_fq_pass
unmerged_r2_fq_pass
log
Typing of clinical and environmental isolates of Legionella pneumophila
meta
seqs
meta
versions
tsv
Index chain files for lift over
meta
fai
chain
meta
clft
versions
Fast and accurate coordinate conversion between assemblies
Converting aligned short and long reads records from one reference to another
meta
input
meta_ref
clft
meta
bam
versions
Fast and accurate coordinate conversion between assemblies
runs a differential expression analysis with Limma
meta
contrast_variable
reference
target
meta2
samplesheeet
intensities
results
md_plot
rdata
model
session_info
versions
Linear Models for Microarray Data
Serogrouping Listeria monocytogenes assemblies
meta
fasta
meta
versions
tsv
Lofreq subcommand to for insert base and indel alignment qualities
meta
bam
fasta
meta
versions
bam
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments
meta
bam
intervals
fasta
meta
versions
vcf
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
It predicts variants using multiple processors
meta
bam
bai
intervals
meta2
fasta
meta3
fai
meta
versions
vcf
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Lofreq subcommand to remove variants with low coverage or strand bias potential
meta
vcf
meta
versions
vcf
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Inserts indel qualities in a BAM file
meta
bam
meta2
fasta
meta
versions
bam
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
meta
meta2
meta3
tumor
tumor_index
normal
normal_index
fasta
fai
target_bed
meta
versions
vcf
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available
meta
bam
meta2
fasta
meta
versions
bam
A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
meta
bam
bai
snps
svs
mods
meta2
fasta
meta3
fai
meta
versions
vcf
LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.
Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder
meta
fasta
meta
scn
gff
versions
A Perl wrapper for LTR_FINDER
An efficient program for finding full-length LTR retrotranspsons in genome sequences
Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain
meta
fasta
meta
versions
gff3
scn
A Perl wrapper for LTR_harvest
The GenomeTools genome analysis system
Identifies LTR retrotransposons using LTR_retriever
meta
genome
harvest
finder
mgescan
non_tgca
meta
log
pass_list
pass_list_gff
ltrlib
annotation_out
annotation_gff
versions
Sensitive and accurate identification of LTR retrotransposons
Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments
meta
fasta
pass_list
annotation_out
monoploid_seqs
meta
log
lai_out
versions
Assessing genome assembly quality using the LTR Assembly Index (LAI)
Identifies LTR retrotransposons using LTR_retriever
meta
genome
harvest
finder
mgescan
non_tgca
meta
log
pass_list
pass_list_gff
ltrlib
annotation_out
annotation_gff
versions
Sensitive and accurate identification of LTR retrotransposons
A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.
meta
fasta
meta
versions
amp_prediction
smorfs
all_orfs
readme_file
log_file
A pipeline for AMP (antimicrobial peptide) prediction
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
meta
ipbam
controlbam
macs2_gsize
versions
peak
xls
gapped
bed
bdg
Model Based Analysis for ChIP-Seq data
Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments
meta
ipbam
controlbam
macs3_gsize
meta
versions
peak
xls
gapped
bed
bdg
Model Based Analysis for ChIP-Seq data
Multiple sequence alignment using MAFFT
meta
fasta
meta2
add
meta3
addfragments
meta4
addfull
meta5
addprofile
meta6
addlong
compress
meta
versions
fas
Parallel implementation of the gzip algorithm.
mageck count for functional genomics, reads are usually mapped to a specific sgRNA
meta
library
inputfile
meta
versions
norm
count
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
maximum-likelihood analysis of gene essentialities computation
meta
count_table
design_matrix
meta
versions
gene_summary
sgrna_summary
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.
meta
count_table
meta
versions
gene_summary
sgrna_summary
r_script
MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.
Multiple Sequence Alignment using Graph Clustering
meta
meta2
fasta
tree
compress
meta
versions
alignment
Multiple Sequence Alignment using Graph Clustering
Multiple Sequence Alignment using Graph Clustering
meta
fasta
meta
versions
tree
Multiple Sequence Alignment using Graph Clustering
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
fastas
gff
mapping_db
versions
index
log
A tool for mapping metagenomic data
MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.
meta
fastqs
index
versions
rma6
alignments
log
A tool for mapping metagenomic data
Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening
meta
rma6
taxon_list
ncbi_dir
versions
results
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.
meta
vcf
meta2
fasta
meta
versions
vcf
tbi
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
meta
input
index
target_bed
target_bed_tbi
meta2
fasta
meta3
fai
config
meta
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
meta
input_normal
input_index_normal
input_tumor
input_index_tumor
target_bed
target_bed_tbi
meta2
fasta
meta3
fai
config
meta
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
diploid_sv_vcf
diploid_sv_vcf_tbi
somatic_sv_vcf
somatic_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
meta
input
input_index
target_bed
target_bed_tbi
meta2
fasta
meta3
fai
config
meta
candidate_small_indels_vcf
candidate_small_indels_vcf_tbi
candidate_sv_vcf
candidate_sv_vcf_tbi
tumor_sv_vcf
tumor_sv_vcf_tbi
versions
Structural variant and indel caller for mapped sequencing data
Create mapAD index for reference genome
meta
fasta
meta
versions
index
An aDNA aware short-read mapper
Map short-reads to an indexed reference genome
meta
reads
meta2
index
mismatch_parameter
double_stranded_library
five_prime_overhang
three_prime_overhang
deam_rate_double_stranded
deam_rate_single_stranded
indel_rate
meta
versions
bam
An aDNA aware short-read mapper
Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
meta
bam
fasta
meta
versions
runtime_log
fragmisincorporation_plot
length_plot
misincorporation
pctot_freq
pgtoa_freq
dnacomp
lgdistribution
stats_out_mcmc_hist
stats_out_mcmc_iter
stats_out_mcmc_trace
stats_out_mcmc_iter_summ_stat
stats_out_mcmc_post_pred
stats_out_mcmc_correct_prob
dnacomp_genome
rescaled
fasta
folder
Screens query sequences against large sequence databases
meta
query
sequence_sketch
meta
versions
screen
Fast sequence distance estimator that uses MinHash
Creates vastly reduced representations of sequences using MinHash
meta
reads
meta
mash
stats
versions
Fast sequence distance estimator that uses MinHash
MaxBin is a software that is capable of clustering metagenomic contigs
meta
contigs
reads
abund
meta
versions
binned_fastas
summary
log
marker
unbinned_fasta
tooshort_fasta
marker_genes
Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"
meta
raw
fasta
parfile
meta
versions
maxquant_txt
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.
Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.
meta
image
meta2
mask
meta3
markerfile
meta
versions
csv
Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation
meta
fasta
meta
versions
tsv
fa
Analyses a DAA file and exports information in text format
meta
daa
megan_summary
meta
versions
txt_gz
megan
A tool for studying the taxonomic content of a set of DNA reads
Analyses an RMA file and exports information in text format
meta
rma6
megan_summary
meta
versions
txt
megan_summary
A tool for studying the taxonomic content of a set of DNA reads
Serotyping of Neisseria meningitidis assemblies
meta
fasta
meta
versions
tsv
Compare k-mer frequency in reads and assembly to devise the metrics K and QV
meta
fasta_assembly
meta1
meryl_db_reads
lookup_table
seqmers
peak
meta
versions
hist
log_stderr
Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.
k-mer based assembly evaluation.
meta
meryl_db
assembly
meta
versions
assembly_only_kmers_bed
assembly_only_kmers_wig
stats
dist_hist
spectra_cn_fl_png
spectra_cn_ln_png
spectra_cn_st_png
spectra_cn_hist
spectra_asm_fl_png
spectra_asm_ln_png
spectra_asm_st_png
spectra_asm_hist
assembly_qv
scaffold_qv
read_ploidy
A script to generate hap-mer dbs for trios
meta
child_meryl
maternal_meryl
paternal_meryl
meta
versions
mat_hapmer_meryl
pat_hapmer_meryl
inherited_hapmers_fl_png
inherited_hapmers_ln_png
inherited_hapmers_st_png
Evaluate genome assemblies with k-mers and more.
k-mer based assembly evaluation.
meta
meryl_db
assembly
meta
versions
assembly_only_kmers_bed
assembly_only_kmers_wig
stats
dist_hist
spectra_cn_fl_png
spectra_cn_ln_png
spectra_cn_st_png
spectra_cn_hist
spectra_asm_fl_png
spectra_asm_ln_png
spectra_asm_st_png
spectra_asm_hist
assembly_qv
scaffold_qv
read_ploidy
hapmers_blob_png
Evaluate genome assemblies with k-mers and more.
A reimplemenation of Kat Comp to work with FastK databases
meta
fastk1_hist
fastk1_ktab
fastk2_hist
fastk2_ktab
meta
versions
filled_png
line_png
stacked_png
filled_pdf
line_pdf
stacked_pdf
FastK based version of Merqury
A reimplemenation of KatGC to work with FastK databases
meta
fastk_hist
fastk_ktab
meta
versions
filled_gc_plot_png
filled_gc_plot_pdf
line_gc_plot_png
line_gc_plot_pdf
stacked_gc_plot_png
stacked_gc_plot_pdf
FastK based version of Merqury
FastK based version of Merqury
meta
fastk_hist
fastk_ktab
assembly
haplotigs
meta
versions
stats
bed
spectra_cn_fl_png
spectra_cn_ln_png
spectra_cn_st_png
spectra_asm_fl_png
spectra_asm_ln_png
spectra_asm_st_png
spectra_cn_fl_pdf
spectra_cn_ln_pdf
spectra_cn_st_pdf
spectra_asm_fl_pdf
spectra_asm_ln_pdf
spectra_asm_st_pdf
assembly_qv
qv
FastK based version of Merqury
An improved version of Smudgeplot using FastK
meta
fastk_hist
fastk_ktab
meta
versions
filled_ploidy_plot_png
filled_ploidy_plot_pdf
line_ploidy_plot_png
line_ploidy_plot_pdf
stacked_ploidy_plot_png
stacked_ploidy_plot_pdf
FastK based version of Merqury
A genomic k-mer counter (and sequence utility) with nice features.
meta
reads
kvalue
meta
versions
meryl_db
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
meta
meryl_db
kvalue
meta
versions
hist
A genomic k-mer counter (and sequence utility) with nice features.
A genomic k-mer counter (and sequence utility) with nice features.
meta
meryl_dbs
kvalue
meta
versions
meryl_db
A genomic k-mer counter (and sequence utility) with nice features.
Depth computation per contig step of metabat2
meta
bam
bai
meta
versions
depth
Metagenome binning of contigs
meta
fasta
depth
meta
versions
fasta
tooshort
lowdepth
unbinned
membership
Annotation of eukaryotic metagenomes using MetaEuk
meta
fasta
database
meta
versions
faa
codon
tsv
gff
Strain-level metagenomic assignment
meta
classification_res
meta_file
meta_unmappedreadsLengths
para_file
database_folder
meta
versions
wimp
evidence_unknown_species
reads2taxon
em
contig_coverage
length_and_id
krona
Maps long reads to a metamaps database
meta
reads
database
meta
versions
classification_res
meta_file
meta_unmappedreadsLengths
para_file
Build MetaPhlAn database for taxonomic profiling.
NO input
db
versions
Merges output abundance tables from MetaPhlAn4
meta
profiles
meta
versions
txt
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
meta
input
metaphlan_db
meta
versions
profile
biom
bowtie2out
Merges output abundance tables from MetaPhlAn3
meta
profiles
meta
versions
txt
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
meta
input
metaphlan_db
meta
versions
profile
biom
bowtie2out
Extracts per-base methylation metrics from alignments
meta
fasta
fai
bam
bai
meta
bedgraph
methylkit
versions
Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.
Generates methylation bias plots from alignments
meta
fasta
fai
bam
bai
meta
txt
versions
Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.
A tool to estimate bacterial species abundance
meta
reads
db
mode
meta
versions
results
An integrated pipeline for estimating strain-level genomic variation from metagenomic data
marks duplicate spots along gridline edges.
meta
spot_table
meta
versions
marked_dups_spots
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.
meta
panorama
meta
versions
tiff
Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
meta
reads
meta2
reference
bam_format
bam_index_extension
cigar_paf_format
cigar_bam
meta
paf
bam
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by minimap2 alignment.
meta
fasta
meta
index
versions
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
Provides fasta index required by miniprot alignment.
meta
fasta
meta
index
versions
A versatile pairwise aligner for genomic and protein sequences.
miRanda is an algorithm for finding genomic targets for microRNAs
meta
query
mirbase
meta
txt
versions
Download a mitochondrial genome to be used as reference for MitoHiFi
species
versions
fasta
gb
Fetch mitochondrial genome in Fasta and Genbank format from NCBI
A python workflow that assembles mitogenomes from Pacbio HiFi reads
input
ref_fa
ref_gb
input_mode
code
versions
fasta
gb
gff
all_potential_contigs
contigs_annotations
contigs_circularization
contigs_filtering
coverage_mapping
coverage_plot
final_mitogenome_annotation
final_mitogenome_choice
final_mitogenome_coverage
potential_contigs
reads_mapping_and_assembly
shared_genes
versions
A python workflow that assembles mitogenomes from Pacbio HiFi reads
Cluster sequences using MMSeqs2 cluster.
meta
input_db
meta
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create an MMseqs database from an existing FASTA/Q file
meta
sequence
meta
db
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Creates sequence index for mmseqs database
meta
db
versions
db_indexed
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Create a tsv file from a query and a target database as well as the result database
meta
db_result
meta2
db_query
meta3
db_target
meta
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Download an mmseqs-formatted database
database
versions
database
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Searches for the sequences of a fasta file in a databse using MMseqs2
meta
fasta
meta2
db_target
meta
tsv
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Cluster sequences in linear time using MMSeqs2 linclust.
meta
input_db
meta
db_cluster
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Search and calculate a score for similar sequences in a query and a target database.
meta
query_db
meta
target_db
meta
versions
db_search
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Computes the lowest common ancestor by identifying the query sequence homologs against the target database.
meta
db_query
db_target
meta
db_taxonomy
versions
MMseqs2: ultra fast and sensitive sequence search and clustering suite
Conversion of expandable profile to databases to the MMseqs2 databases format
db
versions
db_exprofile
MMseqs2: ultra fast and sensitive sequence search and clustering suite
A tool to reconstruct plasmids in bacterial assemblies
meta
fasta
meta
versions
chromosome
contig_report
plasmids
mobtyper_results
Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.
A bioinformatics tool for working with modified bases
meta
bam
bai
meta2
fasta
meta3
bed
meta
versions
bed
bedgraph
log
A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data
Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.
meta
image
meta
versions
img_clahe
One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.
Download the mOTUs database
motus_downloaddb
versions
db
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
Taxonomic meta-omics profiling using universal marker genes
input
db
profile_version_yml
versions
txt
biom
Marker gene-based OTU (mOTU) profiling
Taxonomic meta-omics profiling using universal marker genes
meta
reads
db
meta
versions
out
bam
mgc
log
Marker gene-based OTU (mOTU) profiling
Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data
meta
normal_bam
normal_bai
tumor_bam
tumor_bai
homopolymers
meta
versions
txt
txt
txt
txt
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
Scan a reference genome to get microsatellite & homopolymer information
meta
fasta
meta
versions
txt
MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.
msisensor2 detection of MSI regions.
meta
tumor_bam
normal_bam
intervals
models
meta
msi
distribution
somatic
versions
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
msisensor2 detection of MSI regions.
fasta
output
versions
output
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
meta
normal
normal_index
tumor
tumor_index
intervals
fasta
msisensor_scan
meta
output_report
output_dis
output_germline
output_somatic
versions
list
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
meta
fasta
meta
versions
list
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Aligns protein structures using mTM-align
meta
pdbs
compress
meta
alignment
structure
versions
Algorithm for structural multiple sequence alignments
Parallel implementation of the gzip algorithm.
A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.
meta
bam
mt_id
meta
versions
mtnucratio
json
Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.
meta
bam
gtf
index
meta
versions
bam
rad
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.
gtf
versions
index
mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.
Aggregate results from bioinformatics analyses across many samples into a single report
multiqc_files
multiqc_config
extra_multiqc_config
multiqc_logo
replace_names
sample_names
report
data
plots
versions
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA
vcfs
fasta
snpeff_results
gff
allele_freqs
genotype_quality
coverage
homozygous_freq
heterozygous_freq
gff_exclude
versions
bam
full_alignment
info_txt
snp_alignment
snp_genome_alignment
snpstatistics
snptable
snptable_snpeff
snptable_uncertainty
structure_genotypes
structure_genotypes_nomissing
json
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two
fasta
aligned_fasta
msf
clustalw
phyi
phys
html
tree
log
versions
Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.
meta
fasta
compress
meta
versions
alignment
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
Parallel implementation of the gzip algorithm.
Fetch the GO concepts for a list of genes
meta
gene_list
meta
gmt
tsv
versions
AMR predictions for supported species
meta
seqs
species
meta
versions
csv
json
Antibiotic resistance prediction in minutes
Compare multiple runs of long read sequencing data and alignments
meta
filelist
versions
meta
report_html
lengths_violin_html
log_length_violin_html
n50_html
number_of_reads_html
overlay_histogram_html
overlay_histogram_normalized_html
overlay_log_histogram_html
overlay_log_histogram_normalized_html
total_throughput_html
quals_violin_html
overlay_histogram_identity_html
overlay_histogram_phredscore_html
percent_identity_violin_html
active_pores_over_time_html
cumulative_yield_plot_gigabases_html
sequencing_speed_over_time_html
stats_txt
DNA contaminant removal using NanoLyse
meta
fastq
fasta
meta
fastq
log
versions
Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"
meta
bam
bai
meta
insertions
insertions_index
deletions
deletions_index
rearrangements
rearrangements_index
bp_info
bp_info_index
versions
nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.
Run NanoPlot on nanopore-sequenced reads
meta
fastq
summary_txt
meta
html
png
txt
log
versions
Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.
meta
ontreads
output_format
meta
stats
reads
versions
Create DRAGEN hashtable for reference genome
meta
fasta
meta
hashmap
versions
narfmap is a fork of the Dragen mapper/aligner Open Source Software.
A tool to quickly download assemblies from NCBI's Assembly database
meta
accessions
taxids
groups
meta
versions
gbk
fna
rm
features
gff
faa
gpff
wgs_gbk
cds
rna
rna_fna
report
stats
NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose
meta
fasta_file
adapters_database_file
meta
versions
vecscreen_output
"NCBI libraries for biology applications (text-based utilities)"
Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
dataset
reference
tag
versions
prefix
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)
meta
dataset
fasta
meta
versions
csv
json
json_tree
tsv
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
Performs fastq alignment to a fasta reference using NextGenMap
meta
reads
fasta
meta
bam
versions
NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime
Serotyping Neisseria gonorrhoeae assemblies
meta
fasta
meta
versions
tsv
Merging paired-end reads and removing sequencing adapters.
meta
reads
meta
versions
merged_reads
unstitched_read1
unstitched_read2
Determines the gender of a sample from the BAM/CRAM file.
meta
bam
bai
meta2
fasta
meta3
fasta
method
meta
versions
tsv
Short-read sequencing tools
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.
meta
reads
meta2
snp_pt
meta
vaf
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.
meta
files
meta2
snp_bed
meta3
fasta
versions
pdf
corr_matrix
matched
all
vcf
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
meta
bed
meta2
fasta
bowtie_index
meta
versions
pt
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.
meta
vafs
meta
pdf
corr_matrix
matched
all
versions
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
write your description here
meta
reads
format
mode
meta
versions
npa
npc
npl
npo
Visualise metagenome redundancy curve from a single Nonpareil npo file
meta
npo
meta
versions
png
Estimate average coverage and create curves for metagenomic datasets
Calculate metagenome redundancy curve from FASTQ files
meta
reads
format
mode
meta
versions
npa
npc
npl
npo
Estimate average coverage and create curves for metagenomic datasets
Visualise metagenome redundancy curves from multiple Nonpareil npo file in a single image
meta
npos
meta
versions
png
Estimate average coverage and create curves for metagenomic datasets
NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.
meta
ref
query
meta
versions
delta
coords
An nf-core module for the OATK
meta
reads
mito_hmm
mito_hmm_h3f
mito_hmm_h3i
mito_hmm_h3m
mito_hmm_h3p
pltd_hmm
pltd_hmm_h3f
pltd_hmm_h3i
pltd_hmm_h3m
pltd_hmm_h3p
meta
versions
mito_fasta
pltd_fasta
mito_bed
pltd_bed
mito_gfa
pltd_gfa
annot_mito_txt
annot_pltd_txt
clean_gfa
final_gfa
initial_gfa
multiplex_gfa
unzip_gfa
Construct a dynamic succinct variation graph in ODGI format from a GFAv1.
meta
graph
meta
versions
og
An optimized dynamic genome/graph implementation
Draw previously-determined 2D layouts of the graph with diverse annotations.
meta
graph
lay
meta
versions
png
An optimized dynamic genome/graph implementation
Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.
meta
graph
meta
versions
lay
tsv
An optimized dynamic genome/graph implementation
Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.
meta
graph
meta
versions
sorted_graph
An optimized dynamic genome/graph implementation
Squeezes multiple graphs in ODGI format into the same file in ODGI format.
meta
graphs
meta
graph
versions
An optimized dynamic genome/graph implementation
Metrics describing a variation graph and its path relationship.
meta
graph
meta
versions
tsv
yaml
An optimized dynamic genome/graph implementation
Merge unitigs into a single node preserving the node order.
meta
graph
meta
versions
unchopped_graph
An optimized dynamic genome/graph implementation
Project a graph into other formats.
meta
graph
meta
versions
gfa
An optimized dynamic genome/graph implementation
Visualize a variation graph in 1D.
meta
graph
meta
versions
png
An optimized dynamic genome/graph implementation
Calls CNVs in bam files from tumor patients
meta
normal
normal_index
tumor
tumor_index
bed
fasta
png
profile
summary
versions
Create a decoy peptide database from a standard FASTA database.
meta
fasta
meta
versions
fasta
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Filters peptide/protein identification results by different criteria.
meta
id_file
filter_file
meta
versions
filtered
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Merges several idXML files into one idXML file.
meta
idxmls
meta
versions
idxml
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Split a merged identification file into their originating identification files
meta
merged_idxml
meta
versions
idxmls
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Switches between different scores of peptide or protein hits in identification data
meta
idxml
meta
versions
idxml
OpenMS is an open-source software C++ library for LC-MS data management and analyses
A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)
meta
mzml
meta
versions
mzml
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Refreshes the protein references for all peptide hits.
meta
id_file
id_fasta
meta
versions
id_file_pi
OpenMS is an open-source software C++ library for LC-MS data management and analyses
Perform HLA-I typing of sequencing data
meta
bam
bai
meta
versions
hla_type
coverage_plot
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.
meta
fastas
meta
versions
orthofinder
A program to convert bam into paf.
meta
bam
meta
paf
versions
A program to manipulate paf files / convert to and from paf.
Find and remove PCR/optical duplicates
meta
input
meta
versions
pairs
stat
CLI tools to process mapped Hi-C data
Flip pairs to get an upper-triangular matrix
meta
sam
chromsizes
meta
versions
flip
CLI tools to process mapped Hi-C data
Merge multiple pairs/pairsam files
meta
allpairs
meta
versions
pairs
CLI tools to process mapped Hi-C data
Find ligation junctions in .sam, make .pairs
meta
bam
chromsizes
meta
versions
pairsam
stat
CLI tools to process mapped Hi-C data
Assign restriction fragments to pairs
meta
pairs
frag
meta
versions
restrict
CLI tools to process mapped Hi-C data
Select pairs according to given condition by options.args
meta
input
meta
versions
selected
unselected
CLI tools to process mapped Hi-C data
Sort a .pairs/.pairsam file
meta
input
meta
versions
sorted
CLI tools to process mapped Hi-C data
Split a .pairsam file into .pairs and .sam.
meta
pairs
meta
versions
pairs
bam
CLI tools to process mapped Hi-C data
Calculate pairs statistics
meta
pairs
meta
versions
stat
CLI tools to process mapped Hi-C data
Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file
meta
gfa
bed_subset
bed_exclude
tsv_groupby
meta
versions
tsv
html
panacus is a tool for computing counting statistics for GFA files
Create visualizations from a tsv coverage histogram created with panacus.
meta
tsv
meta
versions
image
panacus is a tool for computing counting statistics for GFA files
A fast and scalable tool for bacterial pangenome analysis
meta
gff
meta
versions
results
aln
panaroo - an updated pipeline for pangenome investigation
NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).
meta
input
input_index
bqsr_table
interval_file
fasta
meta
versions
bam
bai
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database
meta
vcf_file
dbsnp_file
tabix_file
meta
versions
ann_vcf
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.
meta
ref_meta
input
input_index
interval_file
fasta
meta
vcf
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.
meta
reads
interval_file
meta2
fasta
meta3
index
known_sites
meta
versions
bam
bai
qc_metrics
bqsr_table
duplicate_metrics
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs
meta
ref_meta
input
fasta
meta
versions
vcf
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.
meta
ref_meta
input
input_index
interval_file
fasta
meta
versions
vcf
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.
meta
gvcf
meta
gvcf_index
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.
meta
tumor_bam
tumor_bam_index
normal_bam
normal_bam_index
interval_file
ref_meta
fasta
panel_of_normals
panel_of_normals_index
meta
vcf
stats
versions
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Determines the depth in a BAM/CRAM file
meta
meta2
meta3
input
input_index
fasta
fasta_fai
meta
versions
depth
binned_depth
Graph realignment tools for structural variants
Genotype structural variants using paragraph and grmpy
meta
variants
variants_index
reads
reads_index
manifest
meta2
fasta
meta3
fasta_fai
meta
versions
vcf
json
Graph realignment tools for structural variants
Convert a VCF file to a JSON graph
meta
vcf
meta
versions
graph
Graph realignment tools for structural variants
The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.
meta
bam
meta
versions
bam
pbi
PacBio BAM C++ library
converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq
meta
bam
pbi
meta
versions
fastq
pbtk - PacBio BAM toolkit
Minimalistic tool which creates an index file that enables random access into PacBio BAM files
meta
bam
meta
versions
pbi
pbtk - PacBio BAM toolkit
"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."
meta
bam
meta
versions
spp
pdf
rdata
Predict prophages in bacterial genomes
meta
gbk
meta
coordinates
gbk
log
information
bacteria_fasta
bacteria_gbk
phage_fasta
phage_gbk
prophage_gff
prophage_tbl
prophage_tsv
versions
Prophage finder using multiple metrics
phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
meta
reads
sliva_db
univec_db
meta
results
versions
Assigns all the reads in a file to a single new read-group
meta
meta2
meta3
reads
fasta
fasta_index
meta
versions
bam
bai
cram
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates an interval list from a bed file and a reference dict
meta
bed
meta2
dict
arguments_file
interval_list
versions
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
meta
sam
meta
versions
sam
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collects hybrid-selection (HS) metrics for a SAM or BAM file.
meta
bam
bai
bait_intervals
target_intervals
meta2
fasta
meta3
fai
meta4
dict
meta
versions
metrics
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about the insert size distribution of a paired-end library.
meta
bam
meta
versions
pdf
metrics
Java tools for working with NGS data in the BAM format
Collect multiple metrics from a BAM file
meta
bam
bai
meta2
fasta
meta3
fai
meta
metrics
pdf
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics from a RNAseq BAM file
meta
bam
ref_flat
gene_pred
fasta
rrna_intervals
meta
metrics
pdf
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
meta
bam
bai
meta2
fasta
meta3
fai
intervallist
meta
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Creates a sequence dictionary for a reference sequence.
meta
fasta
meta
versions
reference_dict
Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.
Checks that all data in the set of input files appear to come from the same individual
meta
input1
input1_index
input2
input2_index
haplotype_map
meta2
fasta
fasta_index
meta
crosscheck_metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.
meta
reference
haplotype_map
fingerprint
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Converts a FASTQ file to an unaligned BAM or SAM file.
meta
reads
meta
versions
bam
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list
meta
bam
filter
readlist
meta
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Verify mate-pair information between mates and fix if needed
meta
bam
meta
versions
bam
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Lifts over a VCF file from one reference build to another.
meta
input_vcf
meta2
fasta
meta3
dict
meta4
chain
meta
versions
vcf_lifted
vcf_unlifted
Move annotations from one assembly to another
Locate and tag duplicate reads in a BAM file
meta
reads
meta2
fasta
meta3
fai
meta
bam
bai
cram
metrics
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Merges multiple BAM files into a single file
meta
bam
meta
bam
versions
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
changes name of sample in the vcf file
meta
vcf
meta
versions
vcf
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases
meta
fasta
meta2
fai
meta3
dict
meta
versions
intervals
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts BAM/SAM files based on a variety of picard specific criteria
meta
bam
sort_order
meta
versions
bam
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Sorts vcf files
meta
vcf
meta2
fasta
meta3
dict
meta
versions
vcf
Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format
Compresses files with pigz.
meta
raw_file
meta
archive
versions
Parallel implementation of the gzip algorithm.
write your description here
meta
zip
meta
file
versions
Parallel implementation of the gzip algorithm.
Automatically improve draft assemblies and find variation among strains, including large event detection
meta
fasta
meta2
bam
bai
pilon_mode
meta
versions
improved_assembly
change_record
vcf
tracks_bed
tracks_wig
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
meta
bam
bai
bed
fasta
fai
meta
versions
bp
cem
del
dd
int_{final
inv
li
rp
si
td
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data
Main caller script for peak calling
meta
bams
meta
versions
divergent_TREs
bidirectional_TREs
unidirectional_TREs
peakcalling_log
Peak Identifier for Nascent Transcripts Starts (PINTS)
Pangenome toolbox for bacterial genomes
meta
gff
meta
versions
results
aln
Identify plasmids in bacterial sequences and assemblies
meta
seqs
meta
versions
json
txt
tsv
genome_seq
plasmid_seq
Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data
meta
tumor_file
tummor_file_bai
control_file
control_file_bai
fasta
fai
skipregions_file
meta
vcf
tbi
log
version
Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
meta
meta2
meta3
meta4
bed
bim
fam
bcf
vcf
phe
meta
versions
epi
episummary
log
nosex
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Exclude variant identifiers from plink bfiles
meta
bed
bim
fam
variants
meta
versions
bed
bim
fam
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink bfiles with a text file of variant identifiers
meta
bed
bim
fam
variants
meta
versions
bed
bim
fam
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.
meta
meta2
meta3
meta4
bed
bim
fam
bcf
vcf
phe
meta
versions
fepi
fepisummary
flog
fnosex
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate GWAS association studies
meta
meta2
meta3
meta4
bed
bim
fam
bcf
vcf
phe
meta
assoc
log
nosex
versions
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Generate Hardy-Weinberg statistics for provided input
meta
meta2
meta3
bed
bim
fam
vcf
bcf
meta
versions
hwe
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.
meta
bed
bim
fam
window_size
variant_count
variance_inflation_factor
meta
versions
prunein
pruneout
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.
meta
bed
bim
fam
window_size
variant_count
r2_threshold
meta
versions
prunein
pruneout
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
LD analysis in PLINK examines genetic variant associations within populations
meta
meta2
meta3
meta4
bed
bim
fam
vcf
bcf
snpfile
meta
versions
ld
log
nosex
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Recodes plink bfiles into a new text fileset applying different modifiers
meta
bed
bim
fam
meta
versions
ped
map
txt
raw
traw
beagle-dat
chr-dat
chr-map
geno
pheno
pos
phase
info
lgen
list
gen
genz
sample
rlist
strctin
tped
tfam
vcf
vcfgz
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Subset plink pfiles with a text file of variant identifiers
meta
pgen
psam
pvar
variants
meta
versions
extract_pgen
extract_psam
extract_pvar
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Apply a scoring system to each sample in a plink 2 fileset
meta
pgen
psam
pvar
scorefile
meta
versions
score
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Import variant genetic data using plink2
meta
vcf
meta
versions
pgen
psam
pvar
Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
pmdtools command to filter ancient DNA molecules from others
meta
bam
bai
threshold
reference
meta
versions
bam
Compute postmortem damage patterns and decontaminate ancient genomes
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
meta
reads
meta
versions
xml
txt
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.
meta
plp_prefix
bam
donor_genotype
meta
versions
demuxlet_result
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
meta
bam
vcf
meta
versions
cel
plp
var
umi
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.
meta
plp
n_sample
meta
versions
result
vcf
lmix
singlet_result
singlet_vcf
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Extension of Porechop whose purpose is to process adapter sequences in ONT reads.
meta
reads
meta
versions
reads
log
Adapter removal and demultiplexing of Oxford Nanopore reads
meta
reads
meta
versions
reads
log
Adapter removal and demultiplexing of Oxford Nanopore reads
Software for predicting library complexity and genome coverage in high-throughput sequencing
meta
bam
meta
versions
ccurve
log
Software for predicting library complexity and genome coverage in high-throughput sequencing
Software for predicting library complexity and genome coverage in high-throughput sequencing
meta
bam
meta
versions
lc_extrap
log
Software for predicting library complexity and genome coverage in high-throughput sequencing
Calculate pairwise nucleotide identity with respect to a reference sequence
meta
fasta
meta2
reference
compress
versions
valid_fasta
valid_fasta
report
log
Filter reads by quality score.
meta
reads
meta
versions
reads
logs
log_tab
A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
converts sam/bam/cram/pairs into genome contact map
meta
input
input
input
input
meta
versions
pretext
a module to generate images from Pretext contact maps.
meta
pretext_map
meta
versions
image
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data
meta
reads
meta
versions
good_reads
single_reads
bad_reads
log
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
meta
genome
output_format
meta
versions
nucleotide_fasta
amino_acid_fasta
all_gene_annotations
gene_annotations
Whole genome annotation of small genomes (bacterial, archeal, viral)
meta
fasta
proteins
prodigal_tf
meta
versions
gff
gbk
fna
faa
ffn
sqn
fsa
tbl
err
log
txt
tsv
Perform Gene Ratio Enrichment Analysis
meta
meta2
adj
gmt
meta
enrichedGO
versions
Gene Ratio Enrichment Analysis
Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)
meta
count
meta
logratio
session_info
versions
Logratio methods for omics data
Perform differential proportionality analysis
meta
count
meta2
samplesheet
meta
propd
results
fdr
adj
warnings
session_info
versions
Logratio methods for omics data
Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.
meta
count
meta
propr
matrix
fdr
adj
session_info
versions
Logratio methods for omics data
Efficient Estimation of Covariance and (Partial) Correlation
Proteinortho is a tool to detect orthologous genes within different species.
meta
fasta_files
versions
orthologgroups
orthologgraph
blastgraph
reads a maxQuant proteinGroups file with Proteus
meta
samplesheet
intensities
meta2
contrast_variable
dendro_plot
mean_var_plot
raw_dist_plot
norm_dist_plot
raw_rdata
norm_rdata
raw_tab
norm_tab
session_info
versions
R package for analysing proteomics data
Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.
meta
bam
bai
intervals
meta
txt
png
loess_qc_txt
loess_txt
versions
Copy number calling and SNV classification using targeted short read sequencing
Generate on and off-target intervals for PureCN from a list of targets
meta
target_bed
meta2
fasta
genome
meta
txt
bed
versions
Copy number calling and SNV classification using targeted short read sequencing
Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.
meta
coverage_files
normal_vcf
genome
assay
rds
png
bias_rds
bias_bed
low_cov_bed
versions
Copy number calling and SNV classification using targeted short read sequencing
Run PureCN workflow to normalize, segment and determine purity and ploidy
meta
intervals
coverage
normaldb
genome
pdf
local_optima_pdf
seg
genes_csv
amplification_pvalues_csv
vcf_gz
variants_csv
loh_csv
chr_pdf
segmentation_pdf
multisample.seg
versions
Copy number calling and SNV classification using targeted short read sequencing
Calculate coverage cutoffs to determine when to purge duplicated sequence.
meta
stat
meta
versions
cutoff
log
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Separates out sequences purged of falsely duplicated sequences.
meta
assembly
bed
meta
versions
haplotigs
purged
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Plots the read coverage from a purge dups statistics file and cutoffs.
meta
statfile
cutoff
meta
versions
png
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Create read depth histogram and base-level read depth for an assembly based on pacbio data
meta
paf_alignment
meta
versions
stat
basecov
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Purge haplotigs and overlaps for an assembly
meta
basecov
cutoff
paf
meta
versions
bed
log
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
Split fasta file by 'N's to aid in self alignment for duplicate purging
meta
assembly
meta
versions
split_fasta
Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth
write your description here
meta
summary
meta
html
json
versions
Damage parameter estimation for ancient DNA
meta
bam
bai
meta
versions
csv
Damage parameter estimation for ancient DNA
Damage parameter estimation for ancient DNA
meta
csv
meta
versions
csv
Damage parameter estimation for ancient DNA
Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.
meta
fasta
output_format
meta
versions
annotations
faa
fna
score
Evaluate alignment data
meta
bam
gff
meta
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
meta
bacramm
gff
fasta
meta
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Evaluate alignment data
meta
bam
meta2
gtf
meta
results
versions
Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.
meta
bams
bais
reference_haplotype_file
reference_legend_file
chr
regions_start
regions_end
buffer
ngen
genetic_map_file
meta2
posfile
phasefile
meta3
fasta
meta
versions
vcf
tbi
rdata
plots
Read aware low coverage whole genome sequence imputation from a reference panel
Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.
alignment
versions
phylogeny
stockholm_alignment
Randomly subsample sequencing reads to a specified coverage
meta
reads
genome_size
depth_cutoff
meta
versions
reads
De novo genome assembler for long uncorrected reads.
meta
reads
meta
versions
fasta
gfa
RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.
alignment
versions
phylogeny
phylogeny_bootstrapped
Create a database for RepeatModeler
meta
fasta
meta
db
versions
RepeatModeler is a de-novo repeat family identification and modeling package.
Performs de novo transposable element (TE) family identification with RepeatModeler
meta
db
meta
fasta
stk
log
versions
RepeatModeler is a de-novo repeat family identification and modeling package.
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
meta
fastq
fasta
db_point
db_res
meta
json
disinfinder_kma
pheno_table_species
pheno_table
pointfinder_kma
pointfinder_prediction
pointfinder_results
pointfinder_table
resfinder_blast
resfinder_hit_in_genome_seq
resfinder_kma
resfinder_resistance_gene_seq
resfinder_results_table
resfinder_results_tab
resfinder_results
versions
ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria
Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data
card
versions
db
tool_version
db_version
This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.
Predict antibiotic resistance from protein or nucleotide data
meta
fasta
card
wildcard
meta
versions
json
tsv
tmp
tool_version
db_version
This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website
Markup VCF file using rho-calls.
meta
meta2
vcf
tbi
roh
bed
meta
vcf
versions
Call regions of homozygosity and make tentative UPD calls.
Call regions of homozygosity and make tentative UPD calls
meta
vcf
roh
meta
versions
bed
wig
Call regions of homozygosity and make tentative UPD calls.
Quality control of riboseq bam data
meta
bam_ribo
bai_ribo
meta2
bam_ti
bai_ti
meta3
fasta
gtf
meta4
candidate_orfs
meta5
para_ribo
meta6
para_ribo
meta
predictions
all
transprofile
versions
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Quality control of riboseq bam data
meta
bam
bai
meta2
gtf
meta
txt
pdf
offset
versions
Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.
Accurate detection of short and long active ORFs using Ribo-seq data
meta
bam_ribo
bai_ribo
meta2
candidate_orfs
meta
protocol
bam_summary
read_length_dist
metagene_profile_5p
metagene_profile_3p
metagene_plots
psite_offsets
pos_wig
neg_wig
orfs
versions
Python package to detect translating ORF from Ribo-seq data
Accurate detection of short and long active ORFs using Ribo-seq data
meta
fasta
gtf
meta
candidate_orfs
versions
Python package to detect translating ORF from Ribo-seq data
Calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data
meta
meta2
meta3
bam
gtf
fasta
meta
best_offset
offset
offset_plot
psites
codon_coverage_rpf
codon_coverage_psite
cds_coverage
cds_window_coverage
ribowaltz_qc
versions
Render an rmarkdown notebook. Supports parametrization.
meta
notebook
parameters
input_files
meta
report
session_info
versions
Dynamic Documents for R
Calculate pan-genome from annotated bacterial assemblies in GFF3 format
meta
gff
meta
versions
results
aln
Ribosomal RNA extraction from a GTF file.
gtf
versions
rrna_gtf
Calculate expression with RSEM
meta
reads
index
counts_gene
counts_transctips
stat
logs
versions
bam_star
bam_genome
bam_transcript
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Prepare a reference genome for RSEM
fasta
gtf
rsem
transcript_fasta
versions
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Generate statistics from a bam file
meta
bam
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Infer strandedness from sequencing reads
meta
bam
bed
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate inner distance between read pairs.
meta
bam
bed
distance
freq
mean
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
meta
bam
bed
bed
interact_bed
xls
pdf
events_pdf
rscript
log
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
compare detected splice junctions to reference gene model
meta
bam
bed
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate how mapped reads are distributed over genomic features
meta
bam
bed
txt
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculate read duplication rate
meta
bam
bed
seq_xls
pos_xls
pdf
rscript
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Calculte TIN (transcript integrity number) from RNA-seq reads
meta
bam
bai
bed
txt
xls
versions
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
Converts a PED file to VCF headers
meta
input
meta
versions
output
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.
meta
input
meta
versions
png
svg
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set
meta
query_vcf
query_vcf_index
truth_vcf
truth_vcf_index
truth_bed
evaluation_bed
sdf
meta
versions
tp_vcf
tp_tbi
fn_vcf
fn_tbi
fp_vcf
fp_tbi
baseline_vcf
baseline_tbi
snp_roc
non_snp_roc
weighted_roc
summary
phasing
RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation
Uses the RTN R package for transcriptional regulatory network inference (TNI).
expression_matrix
tni
tni_perm
tni_bootstrap
tni_filtered
versions
RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons
sage is a search software for proteomics data
meta
"*.mzML"
meta2
fasta_proteome
meta3
base_config
meta
versions
results_tsv
results_json
results_pin
tmt_tsv
lfq_tsv
Proteomics searching so fast it feels like magic.
Create index for salmon
genome_fasta
transcriptome_fasta
index
versions
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
gene/transcript quantification with Salmon
meta
reads
index
gtf
transcript_fasta
alignment_mode
lib_type
results
json_info
versions
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data
SALSA, A tool to scaffold long read assemblies with HiC
meta
fasta
index
bed
meta
fasta
agp
agp_original_coordinates
versions
Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files
meta
bam
bai
database
meta
versions
csv
json
bam
Lowest Common Ancestor on SAM/BAM/CRAM alignment files
Outputs some statistics drawn from read flags.
meta
bam
meta
versions
stats
Tools for working with SAM/BAM data
find and mark duplicate reads in BAM file
meta
bam
meta
versions
bam
bai
process your BAM data faster!
This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file
meta
bam
meta
versions
bam
Clips read alignments where they match BED file defined regions
meta
bam
bed
save_cliprejects
save_clipstats
meta
versions
bam
stats
rejects_bam
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
meta
inputbam
split
meta
versions
reads
Tools for dealing with SAM, BAM and CRAM files
calculates MD and NM tags
meta
bam
fasta
meta
versions
bam
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Concatenate BAM or CRAM file
meta
input_files
meta
bam
cram
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
shuffles and groups reads together by their names
meta
input
meta
versions
output
Tools for dealing with SAM, BAM and CRAM files
The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format
meta
input
meta2
fasta
interleave
meta
fastq
fastq_interleaved
fastq_other
fastq_singleton
versions
Tools for dealing with SAM, BAM and CRAM files
convert and then index CRAM -> BAM or BAM -> CRAM file
meta
input
index
fasta
meta
bam
cram
bai
crai
version
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
produces a histogram or table of coverage per chromosome
meta
input
input_index
meta2
fasta
fai
meta
versions
coverage
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
List CRAM Content-ID and Data-Series sizes
meta
cram
meta
size
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Computes the depth at each position or region.
meta1
bam
meta2
intervals
meta1
versions
tsv
Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region
Create a sequence dictionary file from a FASTA file
meta
fasta
meta
dict
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index FASTA file
meta
fasta
meta2
fai
meta
fa
fai
gzi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Converts a SAM/BAM/CRAM file to FASTA
meta
input
interleave
meta
versions
fasta
interleaved
singleton
other
Tools for dealing with SAM, BAM and CRAM files
Converts a SAM/BAM/CRAM file to FASTQ
meta
input
interleave
meta
versions
fastq
interleaved
singleton
other
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.
meta
bam
meta
versions
bam
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
meta
bam
bai
meta
flagstat
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
meta
input
meta
readgroup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Reports alignment summary statistics for a BAM/CRAM/SAM file
meta
bam
bai
meta
idxstats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
converts FASTQ files to unmapped SAM/BAM/CRAM
meta
reads
meta
versions
sam
bam
cram
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Index SAM/BAM/CRAM file
meta
bam
meta
bai
crai
csi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
mark duplicate alignments in a coordinate sorted file
meta
input
fasta
meta2
meta
versions
output
Tools for dealing with SAM, BAM and CRAM files
Merge BAM or CRAM file
meta
input_files
meta2
fasta
meta3
fai
meta
bam
cram
versions
csi
crai
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
BAM
meta
input
fasta
intervals
meta
mpileup
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
meta
bam
meta
versions
bam
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file
meta
input
meta2
fasta
meta
bam
cram
csi
crai
metrics
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Sort SAM/BAM/CRAM file
meta
bam
meta2
fasta
meta
bam
cram
crai
csi
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Produces comprehensive statistics from SAM/BAM/CRAM file
meta
input
input_index
meta2
fasta
meta
stats
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
filter/convert SAM/BAM/CRAM file
meta
input
index
meta2
fasta
qname
meta
bam
cram
sam
bai
csi
crai
unselected
unselected_index
versions
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
SCIMAP is a suite of tools that enables spatial single-cell analyses
meta
cellByFeature
meta
versions
annotedDataCsv
annotedDataH5ad
Scimap is a scalable toolkit for analyzing spatial molecular data.
SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.
meta
phenotyped
meta
versions
spatial_lda_output
composition_plot
motif_location_plot
Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.
Use pangenome outputs for GWAS
meta
genes
traits
tree
meta
versions
csv
The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier
meta
clusters
fasta
mei_ref
meta
versions
meis_tab
dels_tab
vcf
Soft Clipped Read Alignment Mapper
The cluster_identifier tool of Scramble identifies soft clipped clusters
meta
input
input_index
fasta
meta
versions
clusters
Soft Clipped Read Alignment Mapper
Call peaks using SEACR on sequenced reads in bedgraph format
meta
bedgraph
ctrlbedgraph
threshold
meta
bed
versions
SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
meta
reads
fasta
index
meta
alignment
trans_alignments
single_bed
multi_bed
versions
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
Generate genome indices for segemehl align
fasta
index
versions
A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection
metagenomic binning with self-supervised learning
meta
bam
fasta
meta
versions
csv
h5
output_prerecluster_bins
output_recluster_bins
tsv
Metagenomic binning with semi-supervised siamese neural network
Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
meta
vcf
vcf_tbi
recal
recal_index
tranches
meta2
fasta
meta3
fai
meta
vcf
tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Create BWA index for reference genome
meta
fasta
meta
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
meta
reads
meta2
index
meta3
fasta
meta4
fasta_fai
meta
bam
bai
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects multiple quality metrics from a bam file
meta
meta2
meta3
bam
bai
fasta
fai
meta
versions
mq_metrics
qd_metrics
gc_summary
gc_metrics
aln_metrics
is_metrics
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
meta
bam
bai
meta2
fasta
meta3
fasta_fai
meta
cram
crai
bam
bai
score
metrics
metrics_multiqc_tsv
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
modifies the input VCF file by adding the MLrejected FILTER to the variants
meta
meta2
meta3
meta4
vcf
idx
fasta
fai
ml_model
meta
versions
vcf
index
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
DNAscope algorithm performs an improved version of Haplotype variant calling.
meta
bam
bai
intervals
meta2
fasta
meta3
fai
meta4
dbsnp
meta5
dbsnp_tbi
meta6
ml_model
ml_model
pcr_indel_model
emit_vcf
emit_gvcf
meta
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
meta
gvcfs
tbis
intervals
fasta
fai
dbsnp
dbsnp_tbi
meta
vcf
tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Runs Sentieon's haplotyper for germline variant calling.
meta
input
input_index
intervals
fasta
fai
dbsnp
dbsnp_tbi
emit_vcf
emit_gvcf
meta
vcf
vcf_tbi
gvcf
gvcf_tbi
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.
meta
meta2
meta3
input
index
fasta
fai
meta
output
index
output_index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Filters the raw output of sentieon/tnhaplotyper2.
meta
meta2
meta3
vcf
vcf_tbi
stats
contamination
segments
orientation_priors
fasta
fai
vcf
vcf_tbi
stats
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.
meta
meta2
meta3
meta4
meta5
meta6
meta7
meta8
input
input_index
intervals
dict
fasta
fai
germline_resource
germline_resource_tbi
panel_of_normals
panel_of_normals_tbi
emit_orientation_data
emit_contamination_data
meta
orientation_data
contamination_data
contamination_segments
vcf
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
meta
meta2
meta3
meta4
meta5
meta6
meta7
bam
bai
fasta
fai
cosmic
cosmic_tbi
pon
pon_tbi
dbsnp
dbsnp_tbi
interval
meta
vcf
index
versions
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
meta
vcf
tbi
resource_vcf
resource_tbi
labels
fasta
fai
recal
idx
tranches
plots
version
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Collects whole genome quality metrics from a bam file
meta
meta2
meta3
meta4
bam
bai
fasta
fai
interval_list
meta
versions
wgs_metrics
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.
meta
fastas
meta
versions
fasta
DNA sequence utilities for FASTX files
Statistics for FASTA or FASTQ files
meta
files
meta
versions
stats
multiqc
Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files
Concatenating multiple uncompressed sequence files together
meta
input
meta
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Convert FASTQ to FASTA format
meta
fastq
meta
versions
fasta
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Select sequences from a large file based on name/ID
meta
sequence
pattern
meta
versions
filter
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
match up paired-end reads from two fastq files
meta
reads
meta
versions
reads
unpaired_reads
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Use seqkit to find/replace strings within sequences and sequence headers
meta
fastx
meta
versions
fastx
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
meta
fastx
meta
fastx
log
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)
meta
fastx
meta
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Use seqkit to generate sliding windows of input fasta
meta
fastx
meta
fastx
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Sorts sequences by id/name/sequence/length
meta
fastx
meta
fastx
versions
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Split single or paired-end fastq.gz files
meta
reads
meta
reads
versions
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
simple statistics of FASTA/Q files
meta
reads
meta
versions
stats
Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.
Salmonella serotype prediction from reads and assemblies
meta
seqs
meta
versions
log
tsv
txt
Generates a BED file containing genomic locations of lengths of N.
meta
fasta
meta
bed
versions
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Interleave pair-end reads from FastQ files
meta
reads
meta
versions
reads
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.
Rename sequence names in FASTQ or FASTA files.
meta
sequences
meta
versions
sequences
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.
Subsample reads from FASTQ files
meta
reads
sample_size
meta
versions
reads
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.
Common transformation operations on FASTA or FASTQ files.
meta
sequences
meta
versions
sequences
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.
Select only sequences that match the filtering condition
sequences
filter_list
versions
sequences
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Trim low quality bases from FastQ files
meta
reads
meta
versions
reads
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format
PileupCaller is a tool to create genotype calls from bam files using read-sampling methods
meta
mpileup
snpfile
calling_method
output_format
meta
versions
eigenstrat
plink
freqsum
Tools for population genetics on sequencing data
Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file
meta
normalbam
tumourbam
fasta
wigfile
meta
versions
seqz
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.
Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.
meta
fasta
meta
versions
wig
Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.
Induce a variation graph in GFA format from alignments in PAF format
meta
paf
fasta
meta
gfa
versions
seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.
Determine Streptococcus pneumoniae serotype from Illumina paired-end reads
meta
reads
meta
versions
tsv
txt
SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.
Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.
meta
depth
meta
versions
json
tsv
Demultiplex bgzip'd fastq files
meta
sample_sheet
fastqs_dir
meta
versions
sample_fastq
metrics
most_frequent_unmatched
per_project_metrics
per_sample_metrics
sample_barcode_hop_metrics
Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.
meta
input_list
input_list_index
meta
versions
merged_variants
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.
meta
input
input_index
pedigree
region
reference
reference_index
scaffold
scaffold_index
map
meta
phased_variants
versions
Fast and accurate method for estimation of haplotypes (phasing)
Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.
meta
input_plain
input_plain_index
input_region
pedigree
scaffold
scaffold_index
scaffold_region
map
meta
phased_variants
versions
Fast and accurate method for estimation of haplotypes (phasing)
Program to compute switch error rate and genotyping error rate given simulated or trio data.
meta
estimate
estimate_index
region
pedigree
truth
truth_index
freq
freq_index
meta
versions
errors
Fast and accurate method for estimation of haplotypes (phasing)
The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.
meta
reads
meta
versions
assembly
gfa
results
Determine Shigella serotype from Illumina or Oxford Nanopore reads
meta
reads
meta
versions
tsv
hits
Determine Shigella serotype from assemblies or Illumina paired-end reads
meta
seqs
meta
versions
tsv
build and deploy Shiny apps for interactively mining differential abundance data
meta
meta2
sample
feature_meta
assay_files
contrasts
differential_results
meta
data
app
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make plots for interpretation of differential abundance statistics
meta
meta2
differential_results
sample
feature_meta
assay_file
meta
volcanos_png
volcanos_html
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots
meta
sample
feature_meta
assay_files
boxplots_png
boxplots_html
densities_png
densities_html
pca2d_png
pca2d_html
pca3d_png
pca3d_html
mad_png
mad_dendro
dendro
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
validate consistency of feature and sample annotations with matrices and contrasts
meta
meta2
meta3
meta4
sample
feature_meta
assay_files
contrasts
meta
sample_meta
feature_meta
assays
contrasts
versions
Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.
A windowed adaptive trimming tool for FASTQ files using quality
meta
reads
qual_type
meta
single_trimmed
paired1_trimmed
paired2_trimmed
log
versions
Indexing of transcriptome for gene expression quantification using SimpleAF
meta
genome_fasta
meta2
genome_gtf
meta3
transcript_fasta
meta
index
transcript_tsv
salmon
versions
SimpleAF is a tool for quantification of gene expression from RNA-seq data
simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.
meta
reads
meta2
index
meta3
txp2gene
chemistry
meta4
whitelist
meta
alevin_results
versions
SimpleAF is a tool for quantification of gene expression from RNA-seq data
Serovar prediction of salmonella assemblies
meta
fasta
meta
versions
tsv
allele_json
allele_fasta
cgmlst_csv
Fast, efficient, lossless compression of FASTQ files.
meta
fastq
meta
versions
sfq
tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.
bam
bai
meta
meta
run_metrics
smncopynumber
versions
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developped by Brent Pedersen.
meta
input
index
exclude_beds
fasta
fai
meta
versions
vcf
structural variant calling and genotyping with existing tools, but, smoothly
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."
meta
inputs
meta2
snakefile
meta
outputs
snakemake_dir
versions
Create a SNAP index for reference genome
meta2
fasta
altcontigfile
nonaltcontigfile
altliftoverfile
index
versions
Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
structural-variant calling with sniffles
meta
bam
bai
meta2
fasta
meta
vcf
snf
versions
Core-SNP alignment from Snippy outputs
meta
vcf
aligned_fa
reference
meta
versions
aln
full_aln
tab
vcf
txt
Rapid bacterial SNP calling and core genome alignments
Rapid haploid variant calling
meta
reads
index
meta
versions
tab
csv
html
vcf
bed
gff
bam
bai
log
aligned_fa
consensus_fa
consensus_subs_fa
raw_vcf
filt_vcf
vcf_gz
vcf_csi
txt
Rapid bacterial SNP calling and core genome alignments
Pairwise SNP distance matrix from a FASTA sequence alignment
meta
alignment
meta
tsv
versions
Genetic variant annotation and functional effect prediction toolbox
meta
vcf
db
cache
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Genetic variant annotation and functional effect prediction toolbox
meta
vcf
db
cache
vcf
report
summary_html
genes_txt
versions
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Annotate a VCF file with another VCF file
meta
vcf
vcf_tbi
meta2
database
dbs_tbi
meta
versions
vcf
SnpSift is a toolbox that allows you to filter and manipulate annotated files
The dbNSFP is an integrated database of functional predictions from multiple algorithms
meta
vcf
vcf_tbi
meta2
database
dbs_tbi
meta
vcf
versions
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Splits/Joins VCF(s) file into chromosomes
meta
vcf
meta
versions
out_vcfs
SnpSift is a toolbox that allows you to filter and manipulate annotated files
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
meta
query_somalier_files
meta2
labelled_somalier_files
labels_tsv
meta
versions
tsv
html
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
meta
input
input_index
meta2
fasta
meta3
fai
meta4
sites
meta
versions
extract
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
meta
extract
ped
sample_groups
versions
html
pairs_tsv
samples_tsv
Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs
Local sequence alignment tool for filtering, mapping and clustering.
meta
reads
meta2
fastas
meta3
index
meta
reads
log
meta2
index
versions
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Compare many FracMinHash signatures generated by sourmash sketch.
meta
signatures
file_list
save_numpy_matrix
save_csv
meta
versions
matrix
csv
labels
Compute and compare FracMinHash signatures for DNA and protein data sets.
Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.
meta
signature
db
save_unassigned
save_matches_sig
save_prefetch
save_prefetch_csv
meta
versions
result
matches
unassigned
prefetch
prefetchcsv
Compute and compare FracMinHash signatures for DNA data sets.
Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.
meta
signatures
meta
versions
signature_index
Compute and compare FracMinHash signatures for DNA data sets.
Create a signature (a group of FracMinHash sketches) of a sequence using sourmash
meta
sequence
meta
versions
signatures
Compute and compare FracMinHash signatures for DNA and protein data sets.
Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.
meta
gather_results
taxonomy
meta
result
versions
Compute and compare FracMinHash signatures for DNA data sets.
Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data
meta
reads
image
cytaimage
darkimage
colorizedimage
alignment
slidefile
reference
probeset
meta
outs
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.
gtf
gtf
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.
fasta
gtf
reference_name
reference
versions
Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.
Assembles a small genome (bacterial, fungal, viral)
meta
illumina
pacbio
nanopore
yml
hmm
meta
scaffolds
contigs
transcripts
gene_clusters
gfa
log
log
versions
Fast, efficient, lossless compression of FASTQ files.
meta
fastq1
fastq2
meta
versions
spring
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Fast, efficient, lossless decompression of FASTQ files.
meta
spring
write_one_fastq_gz
meta
versions
fastq
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).
meta
sra
ncbi_settings
certificate
meta
versions
reads
SRA Toolkit and SDK from NCBI
Download sequencing data from the NCBI Sequence Read Archive (SRA).
meta
id
ncbi_settings
certificate
meta
sra
versions
SRA Toolkit and SDK from NCBI
Test for the presence of suitable NCBI settings or create them on the fly.
NO input
versions
ncbi_settings
SRA Toolkit and SDK from NCBI
Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
meta
fasta
db
meta
versions
txt
txt
txt
bam
pileup
Short Read Sequence Typing for Bacterial Pathogens
Serotype prediction of Streptococcus suis assemblies
meta
fasta
meta
versions
tsv
Advanced sequence file format conversions
meta
reads
fasta
fai
gzi
meta
versions
reads
gzi
Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Predicts Staphylococcus aureus SCCmec type based on primers.
meta
fasta
meta
versions
tsv
Align reads to a reference genome using STAR
meta
reads
meta2
index
meta3
gtf
star_ignore_sjdbgtf
seq_platform
seq_center
bam
log_final
log_out
log_progress
versions
bam_sorted
bam_transcript
bam_unsorted
fastq
tab
junction
wig
bedgraph
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Create index for STAR
meta
fasta
meta2
gtf
meta
index
versions
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
meta
genome_fasta
results_xlsx
summary_tsv
detailed_summary_tsv
resfinder_tsv
plasmidfinder_tsv
mlst_tsv
settings_txt
pointfinder_tsv
versions
Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.
Serotype STEC samples from paired-end reads or assemblies
meta
seqs
meta
versions
tsv
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
meta
posfile
input
rdata
chromosome_name
K
nGen
meta2
collected_crams
collected_crais
cramlist
meta3
fasta
fasta_fai
meta
input
rdata
plots
vcf
bgen
versions
Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.
meta
vcf
meta2
variant_catalog
meta
versions
vcf
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
meta
input
input_index
fasta
fai
target_bed
target_bed_index
meta
vcf
vcf_tbi
genome_vcf
genome_vcf_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
meta
input_normal
input_index_normal
input_tumor
input_index_tumor
manta_candidate_small_indels
manta_candidate_small_indels_tbi
fasta
fai
target_bed
target_bed_index
meta
vcf_indels
vcf_indels_tbi
vcf_snvs
vcf_snvs_tbi
versions
Strelka calls somatic and germline small variants from mapped sequencing reads
Merges the annotation gtf file and the stringtie output gtf files
stringtie_gtf
annotation_gtf
merged_gtf
versions
Transcript assembly and quantification for RNA-Seq
Transcript assembly and quantification for RNA-Se
meta
bam
annotation_gtf
meta
transcript_gtf
coverage_gtf
abudance
ballgown
versions
Transcript assembly and quantification for RNA-Seq
Count reads that map to genomic features
meta
bam
annotation
meta
counts
summary
versions
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.
SummarizedExperiment container
meta
matrix_files
meta2
rowdata
meta3
coldata
meta
rds
log
versions
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Converts a bedpe file ot a VCF file (beta version)
meta
bedpe
meta
versions
vcf
Toolset for SV simulation, comparison and filtering
Filter a vcf file based on size and/or regions to ignore
meta
vcf
bed
minsv
maxsv
minallelefreq
minnumreads
meta
versions
vcf
Toolset for SV simulation, comparison and filtering
Compare or merge VCF files to generate a consensus or multi sample VCF files.
meta
vcfs
max_distance_breakpoints
min_supporting_callers
account_for_type
account_for_sv_strands
estimate_distanced_by_sv_size
min_sv_size
meta
versions
vcf
Toolset for SV simulation, comparison and filtering
Simulate an SV VCF file based on a reference genome
meta
fasta
meta2
fai
meta3
parameters
snp_mutation_frequency
sim_reads
meta
versions
parameters
vcf
bed
fasta
insertions
Toolset for SV simulation, comparison and filtering
Report multipe stats over a VCF file
meta
vcf
minsv
maxsv
minnumreads
meta
versions
stats
Toolset for SV simulation, comparison and filtering
SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements
meta
meta2
meta3
meta4
meta5
tumorbam
tummorbai
normalbam
normalbai
bwa_index
fasta
fasta_fai
dbsnp
dbsnp_tbi
regions
meta
versions
sv
indel
som_sv
som_indel
germ_sv
germ_indel
unfiltered_sv
unfiltered_indel
unfiltered_som_sv
unfiltered_som_indel
unfiltered_germ_sv
unfiltered_germ_indel
raw_calls
discordants
log
SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.
meta
meta2
meta3
test
test_tbi
truth
truth_tbi
fasta
bed
meta
versions
fns
fps
distances
log
report
SVanalyzer: tools for the analysis of structural variation in genomes
The merge module merges structural variants within one or more vcf files.
meta
priority
vcfs
meta
versions
vcf
structural variant database software
Query a structural variant database, using a vcf file as query
meta
in_occs
in_frqs
vcf
vcf_dbs
bedpe_dbs
meta
versions
out_occs
out_frqs
vcf
structural variant database software
Performs tests on BAF files
meta
bed
baf
baf_index
batch
meta
versions
metrics
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Count the instances of each SVTYPE observed in each sample in a VCF.
meta
vcf
meta
versions
counts
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert an RdTest-formatted bed to the standard VCF format.
meta
bed
samples
fasta_fai
meta
versions
vcf
tbi
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Convert SV calls to a standardized format.
args
meta
vcf
fasta_fai
meta
versions
standardized_vcf
Utilities for consolidating, filtering, resolving, and annotating structural variants.
Converts VCFs containing structural variants to BED format
meta
vcf
tbi
meta
versions
bed
Utilities for consolidating, filtering, resolving, and annotating structural variants.
SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data
meta
meta2
bam
vcf
fasta
fai
meta
versions
json
gt_vcf
relevant_bam
Compute genotype of structural variants based on breakpoint depth
SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample
meta
meta2
bam
bam_index
vcf
fasta
meta
versions
json
gt_vcf
Bayesian genotyper for structural variants
A tool to standardize VCF files from structural variant callers
meta
vcf
tbi
config
meta
versions
vcf
Compresses/decompresses files
meta
input
meta
output
gzi
versions
Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.
create tabix index from a sorted bgzip tab-delimited genome file
meta
tab
meta
tbi
csi
versions
Generic indexer for TAB-delimited genome position files.
Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA
meta
fast5
meta
versions
csv_gz
Convert taxon names to TaxIds
meta
name
names_txt
taxdb
meta
versions
tsv
A Cross-platform and Efficient NCBI Taxonomy Toolkit
Standardise and merge two or more taxonomic profiles into a single table
meta
profiles
profiler
format
taxonomy
samplesheet
meta
versions
merged_profiles
TAXonomic Profile Aggregation and STAndardisation
Standardise the output of a wide range of taxonomic profilers
meta
profile
profiler
format
taxonomy
meta
standardised_profile
versions
TAXonomic Profile Aggregation and STAndardisation
A tool to detect resistance and lineages of M. tuberculosis genomes
meta
reads
meta
versions
bam
csv
json
txt
vcf
Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data
Aligns sequences using T_COFFEE
meta
fasta
meta2
tree
meta3
template
accessory_informations
compress
meta
alignment
lib
versions
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Parallel implementation of the gzip algorithm.
Compares 2 alternative MSAs to evaluate them.
meta
msa
ref_msa
meta
versions
scores
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Computes the irmsd score for a given alignment and the structures.
meta
msa
template
structures
meta
versions
irmsd
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Reformats files with t-coffee
meta
fasta
meta
versions
formatted_file
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.
meta
msa
lib
meta
versions
tcs
scores
A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence
Parallel implementation of the gzip algorithm.
Parses a Thermo RAW file containing mass spectra to an open file format
meta
raw
meta
versions
spectra
Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle
meta
fasta
meta
classifications
log
fasta
versions
Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.
Computes the coverage of different regions from the bam file.
meta
input
meta2
fasta
meta
cov
wig
versions
TIDDIT - structural variant calling.
Identify chromosomal rearrangements.
meta
input
input_index
meta2
fasta
meta3
bwa_index
meta
vcf
ploidy
versions
Search for structural variants.
tidk explore
attempts to find the simple telomeric repeat unit in the genome provided.
It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).
meta
fasta
meta
explore_tsv
top_sequence
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Searches a genome for a telomere string such as TTAGGG
meta
fasta
string
meta
tsv
bedgraph
versions
tidk is a toolkit to identify and visualise telomeric repeats in genomes
Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA
meta
vcf
vcf_indels
reference
fai
vcf_output
meta
versions
fasta
vcf
ccf
log
This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.
A post sequencing QC tool for Oxford Nanopore sequencers
meta
seq_summary
fastq
bam
meta
report_data
report_html
plots_html
plotly_js
versions
TransDecoder itentifies candidate coding regions within transcript sequences. it is used to build gff file.
meta
fasta
meta
versions
pep
gff3
cds
dat
folder
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf
meta
fasta
fold
meta
versions
pep
gff3
cds
bed
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Trim FastQ files using Trim Galore!
meta
reads
meta
reads
unpaired
html
zip
log
versions
Performs quality and adapter trimming on paired end and single end reads
meta
reads
meta
trimmed_reads
unpaired_reads
trim_log
out_log
summary
versions
Assembles a de novo transcriptome from RNAseq reads
meta
reads
meta
versions
transcript_fasta
log
Given baseline and comparison sets of variants, calculate the recall/precision/f-measure
meta
vcf
tbi
truth_vcf
tbi
bed
meta2
fasta
meta3
fai
meta
versions
fn_vcf
fn_tbi
fp_vcf
fp_tbi
tp_base_vcf
tp_base_tbi
tp_comp_vcf
tp_comp_tbi
summary
Structural variant comparison tool for VCFs
Over multiple vcfs, calculate their intersection/consistency.
meta
vcfs
meta
versions
consistency
Structural variant comparison tool for VCFs
Normalization of SVs into disjointed genomic regions
meta
vcf
meta
versions
vcf
Structural variant comparison tool for VCFs
Subsample a long-read sequencing fastq file for multiple assemblies
meta
reads
out_dir
meta
subreads
versions
Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Transcript Selector for BRAKER TSEBRA combines gene predictions by selecing transcripts based on their extrisic evidence support
meta
gtfs
hints_files
keep_gtfs
config
meta
versions
tsebra_gtf
tsebra_scores
Import transcript-level abundances and estimated counts for gene-level analysis packages
meta
quants
meta2
tx2gene
meta3
coldata
quant_type
meta
tpm_gene
counts_gene
counts_gene_length_scaled
counts_gene_scaled
lengths_gene
tpm_transcript
counts_transcript
lengths_transcript
versions
Remove lines from bed file that refer to off-chromosome locations.
meta
bedgraph
meta
versions
bedgraph
Remove lines from bed file that refer to off-chromosome locations.
Convert a bedGraph file to bigWig format.
meta
bedgraph
sizes
meta
versions
bigwig
Convert a bedGraph file to bigWig format.
Convert file from bed to bigBed format
meta
bed
sizes
autosql
meta
versions
bigbed
Convert file from bed to bigBed format
compute average score of bigwig over bed file
meta
bed
bigwig
meta
versions
tab
Compute average score of big wig over each bed, which may have introns.
compute average score of bigwig over bed file
meta
gtf
meta
genepred
refflat
versions
Convert GTF files to GenePred format
convert between genome builds
meta
bed
meta
version
lifted
unlifted
Move annotations from one assembly to another
Convert ascii format wig file to binary big wig format
meta
wig
chromsizes
versions
bw
Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format
Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.
meta
fastq
meta
fastq
no_match_fastq
report
versions
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
meta
bam
bai
mode
meta
bam
log
versions
Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
meta
bam
bai
get_output_stats
meta
bam
log
tsv_edit_distance
tsv_per_umi
tsv_umi_per_position
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
meta
reads
meta
reads
log
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Group reads based on their UMI and mapping coordinates
meta
bam
bai
create_bam
get_group_info
meta
bam
log
tsv
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Make the output from umi_tools dedup or group compatible with RSEM
meta
bam
bai
meta
bam
log
versions
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Assembles bacterial genomes
meta
shortreads
longreads
meta
versions
scaffolds
gfa
log
versions
Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data
meta
reads
outs
versions
Extract files.
meta
archive
meta
files
versions
Extract tar.gz files.
Unzip ZIP archive files
meta
archive
meta
unzipped_archive
versions
Unzip ZIP archive files
meta
archive
meta
files
versions
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Simple software to call UPD regions from germline exome/wgs trios.
meta
vcf
meta
versions
bed
The Java port of the VarDict variant caller
meta
bams
bais
bed
meta2
fasta
meta3
fasta_fai
meta
versions
vcf
Filtering, downsampling and profiling alignments in BAM/CRAM formats
meta
bam
meta
versions
bam
Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
meta
normal_vcf
tumor_vcf
scenario
scenario_sample
meta
versions
vcf_gz
bcf_gz
vcf
bcf
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
meta
bam
meta2
fasta
meta3
fai
meta
alignment_properties_json
versions
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Obtains per-sample observations for the actual calling process with varlociraptor calls
meta
bam
candidates
alignment_json
meta2
fasta
meta3
fai
meta
versions
vcf_gz
bcf_gz
vcf
bcf
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Convert VCF with structural variations to CytoSure format
meta
meta2
meta3
meta4
sv_vcf
coverage_bed
cns
snv_vcf
blacklist_bed
meta
versions
cgh
If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields
meta
vcf
tbi
meta
versions
vcf
Command-line tools for manipulating VCF files
Command line tools for parsing and manipulating VCF files.
meta
vcf
tbi
meta
vcf
versions
Command line tools for parsing and manipulating VCF files.
Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.
meta
vcf
tbi
meta
versions
vcf
Command-line tools for manipulating VCF files
List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.
meta
vcf
tbi
meta
versions
vcf
Command-line tools for manipulating VCF files
A set of tools written in Perl and C++ for working with VCF files
meta
variant_file
bed
diff_variant_file
meta
versions
vcf
bcf
frq
frq_count
idepth
ldepth
ldepth_mean
gdepth
hap_ld
geno_ld
geno_chisq
list_hap_ld
list_geno_ld
interchrom_hap_ld
interchrom_geno_ld
tstv
tstv_summary
tstv_count
tstv_qual
filter_summary
sites_pi
windowed_pi
weir_fst
heterozygosity
hwe
tajima_d
freq_burden
lroh
relatedness
relatedness2
lqual
missing_individual
missing_site
snp_density
kept_sites
removed_sites
singeltons
indel_hist
hapcount
mendel
format
info
genotypes_matrix
genotypes_matrix_individual
genotypes_matrix_position
impute_hap
impute_hap_legend
impute_hap_indv
ldhat_sites
ldhat_locs
beagle_gl
beagle_pl
ped
map_
tped
tfam
diff_sites_in_files
diff_indv_in_files
diff_sites
diff_indv
diff_discd_matrix
diff_switch_error
Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use
Path(resolve_path=True)
and breaks the nextflow logic of symbolic links.
If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME]
it will skip the samtools sort step.
Cellsorted bam file should be cell sorted with:
samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam
See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:
withName: SAMTOOLS_SORT {
ext.prefix = { "cellsorted_${bam.baseName}" }
ext.args = '-t CB -O BAM'
}
Optional mask must be passed with ext.args
and option --mask
This is why I need to stage in the work dir 2 bam files (cellsorted and original).
See also velocyto turorial
meta
barcodes
bam
sorted_bam
gtf
meta
versions
loom
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
meta
bam
bai
refvcf
meta
versions
log
selfsm
depthsm
selfrg
depthrg
bestsm
bestrg
verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.
meta
bam
bai
svd_ud
svd_mu
svd_bed
references
refvcf
meta
mu
ud
bed
versions
log
self_sm
ancenstry
A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Constructs a graph from a reference and variant calls or a multiple sequence alignment file
meta
input
tbis
insertions_fasta
fasta
fasta_fai
meta
versions
graph
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
Deconstruct snarls present in a variation graph in GFA format to variants in VCF format
meta
gfa
pb
gbwt
meta
vcf
versions
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
write your description here
meta
input
meta
versions
xg
vg_index
Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.
calculate secondary structures of two RNAs with dimerization
meta
rnacofold_fasta
versions
meta
rnacofold_csv
rnacofold_ps
calculate secondary structures of two RNAs with dimerization
The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.
Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.
fasta
versions
rnafold_txt
rnafold_ps
Calculate minimum free energy secondary structures and partition function of RNAs
The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.
calculate locally stable secondary structures of RNAs
fasta
versions
rnalfold_txt
calculate locally stable secondary structures of RNAs
Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.
Use vireo to perform donor deconvolution for multiplexed scRNA-seq data
meta
cell_data
n_donor
donor_file
vartrix_data
meta
versions
summary
donor_ids
prob_singlets
prob_doublets
Extracting sequences that were unbinnned by vRhyme into a FASTA file
meta
fasta
membership
meta
versions
unbinnned_sequences
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Linking bins output by vRhyme to create one sequences per bin
meta
bins
meta
versions
linked_bins
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Binning virus genomes from metagenomes
meta
reads
fasta
meta
versions
bins
membership
summary
vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).
Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.
meta
fasta
meta
aln
biom
mothur
otu
bam
out
blast
uc
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Taxonomic classification using the sintax algorithm.
meta
queryfasta
db
tsv
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).
meta
fasta
sort_arg
meta
fasta
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
Compare target sequences to fasta-formatted query sequences using global pairwise alignment.
meta
queryfasta
db
idcutoff
outoption
user_columns
aln
biom
lca
mothur
otu
sam
tsv
txt
uc
versions
VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)
decomposes multiallelic variants into biallelic in a VCF file.
meta
vcf
intervals
meta
versions
vcf
A tool set for short variant discovery in genetic sequence data
normalizes variants in a VCF file
meta
vcf
tbi
intervals
meta2
fasta
meta3
fai
meta
versions
vcf
fai
A tool set for short variant discovery in genetic sequence data
a pangenome-scale aligner
meta
fasta_gz
paf
query_self
gzi
fai
fasta_query_list
meta
paf
versions
The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.
meta
bam
bai
fasta
fasta_fai
meta
versions
vcf
tbi
Masks out highly repetitive DNA sequences with low complexity in a genome
meta
counts
meta
counts
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to generate frequency counts of repetitive units.
meta
ref
meta
intervals
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
A program to take a counts file and creates a file of genomic co-ordinates to be masked.
meta
counts
meta
ref
meta
wm_intervals
versions
A program to mask highly repetitive and low complexity DNA sequences within a genome.
Convert and filter aligned reads to .npz
meta
bam
bai
meta2
fasta
meta3
fasta_fai
meta
versions
npz
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase
meta
npz
reference
meta
versions
gender
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Create a new reference using healthy reference samples
meta
inputs
meta
versions
npz
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
Find copy number aberrations
meta
npz
meta2
reference
meta3
blacklist
meta
versions
aberrations_bed
bins_bed
segments_bed
chr_statistics
chr_plots
genome_plot
WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes
A large variant benchmarking tool analogous to hap.py for small variants.
meta
query_vcf
truth_vcf
bed
meta
versions
report
bench_vcf
bench_vcf_tbi
Compresses files with xz.
meta
raw_file
meta
archive
versions
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Decompresses files with xz.
meta
archive
meta
file
versions
xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.
Performs assembly scaffolding using YaHS
meta
hic_map
fasta
fai
meta
versions
scaffolds_fasta
scaffolds_agp
binary
Align reads to a reference genome using YARA
meta
reads
index
meta
versions
bam
bai
Yara is an exact tool for aligning DNA sequencing reads to reference genomes.
Compress file lists to produce ZIP archive files
meta
files
meta
zipped_archive
versions
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Click here to trigger an update.