Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • fastq 47
  • bam 42
  • fasta 31
  • reference 25
  • assembly 20
  • nanopore 18
  • sentieon 17
  • genome 16
  • index 15
  • map 15
  • align 14
  • genomics 13
  • vcf 13
  • long reads 13
  • cram 12
  • filter 12
  • metagenomics 11
  • gatk4 9
  • sort 9
  • structural variants 9
  • bisulfite 9
  • bisulphite 9
  • methylseq 9
  • classification 8
  • trimming 8
  • isoseq 8
  • methylation 8
  • 5mC 8
  • sam 7
  • qc 7
  • quality control 7
  • illumina 7
  • mapping 7
  • bismark 7
  • coverage 6
  • pacbio 6
  • wgs 6
  • markduplicates 6
  • reads 6
  • alignment 5
  • mags 5
  • taxonomic classification 5
  • serotype 5
  • ptr 5
  • coptr 5
  • adapters 5
  • 3-letter genome 5
  • merge 4
  • split 4
  • somatic 4
  • picard 4
  • filtering 4
  • aligner 4
  • germline 4
  • feature 4
  • umi 4
  • deduplication 4
  • FASTQ 4
  • ont 4
  • annotation 3
  • variants 3
  • contamination 3
  • conversion 3
  • clustering 3
  • rnaseq 3
  • build 3
  • long-read 3
  • consensus 3
  • metrics 3
  • DNA methylation 3
  • scWGBS 3
  • WGBS 3
  • bisulfite sequencing 3
  • bwa 3
  • seqkit 3
  • biscuit 3
  • mkref 3
  • dedup 3
  • short-read 3
  • duplicates 3
  • wxs 3
  • ccs 3
  • bcl2fastq 3
  • umitools 3
  • sample 3
  • fgbio 3
  • small indels 3
  • panel 3
  • bwameth 3
  • mkfastq 3
  • long_read 3
  • quality trimming 3
  • adapter trimming 3
  • variant_calling 3
  • minimap2 3
  • uLTRA 3
  • bed 2
  • variant calling 2
  • statistics 2
  • gtf 2
  • classify 2
  • taxonomy 2
  • convert 2
  • count 2
  • QC 2
  • bqsr 2
  • depth 2
  • samtools 2
  • genotype 2
  • db 2
  • gff3 2
  • demultiplexing 2
  • mitochondria 2
  • report 2
  • bedGraph 2
  • mem 2
  • fastx 2
  • de novo 2
  • kallisto 2
  • clipping 2
  • merging 2
  • genome assembler 2
  • add 2
  • UMI 2
  • replace 2
  • bacterial 2
  • mapper 2
  • seqtk 2
  • PacBio 2
  • lossless 2
  • khmer 2
  • refine 2
  • megan 2
  • polyA_tail 2
  • hla 2
  • hlala 2
  • hla_typing 2
  • hlala_typing 2
  • tnhaplotyper2 2
  • metamaps 2
  • mirdeep2 2
  • adapter 2
  • read-group 2
  • RNA sequencing 2
  • shigella 2
  • Streptococcus pneumoniae 2
  • varcal 2
  • frame-shift correction 2
  • long-read sequencing 2
  • sequence analysis 2
  • merge mate pairs 2
  • reads merging 2
  • short reads 2
  • trim 2
  • simulate 2
  • artic 2
  • aggregate 2
  • demultiplexed reads 2
  • database 1
  • gff 1
  • cnv 1
  • k-mer 1
  • variant 1
  • gfa 1
  • proteomics 1
  • quality 1
  • ancient DNA 1
  • imputation 1
  • sv 1
  • gvcf 1
  • indexing 1
  • compression 1
  • cna 1
  • sequences 1
  • demultiplex 1
  • base quality score recalibration 1
  • haplotype 1
  • plot 1
  • neural network 1
  • mappability 1
  • transcriptome 1
  • completeness 1
  • archaeogenomics 1
  • transcript 1
  • machine learning 1
  • damage 1
  • phasing 1
  • palaeogenomics 1
  • sequence 1
  • metagenome 1
  • decompression 1
  • complexity 1
  • peaks 1
  • evaluation 1
  • kraken2 1
  • sketch 1
  • profile 1
  • kmers 1
  • splicing 1
  • extract 1
  • deamination 1
  • cat 1
  • concatenate 1
  • single cell 1
  • summary 1
  • counts 1
  • indels 1
  • structural 1
  • preprocessing 1
  • diamond 1
  • normalization 1
  • miscoding lesions 1
  • palaeogenetics 1
  • archaeogenetics 1
  • cut 1
  • haplotypecaller 1
  • STR 1
  • HiFi 1
  • hmmcopy 1
  • clean 1
  • SV 1
  • abundance 1
  • sequencing 1
  • bedgraph 1
  • spark 1
  • subsample 1
  • polishing 1
  • fingerprint 1
  • RNA-seq 1
  • aln 1
  • hi-c 1
  • host 1
  • cellranger 1
  • gene expression 1
  • RNA 1
  • rna_structure 1
  • organelle 1
  • genome assembly 1
  • gatk4spark 1
  • popscle 1
  • genotype-based deconvoltion 1
  • sylph 1
  • miRNA 1
  • pseudoalignment 1
  • dump 1
  • wastewater 1
  • gstama 1
  • tama 1
  • trancriptome 1
  • krakentools 1
  • barcode 1
  • corrupted 1
  • mapcounter 1
  • pair 1
  • variation 1
  • de novo assembler 1
  • kma 1
  • rrna 1
  • cnvnator 1
  • mitochondrion 1
  • leviosam2 1
  • lift 1
  • subset 1
  • removal 1
  • ancient dna 1
  • salmonella 1
  • fixmate 1
  • collate 1
  • bam2fq 1
  • scaffolding 1
  • polish 1
  • duplex 1
  • deconvolution 1
  • xenograft 1
  • graft 1
  • unaligned 1
  • emboss 1
  • gatk 1
  • joint genotyping 1
  • filtermutectcalls 1
  • vdj 1
  • UMIs 1
  • tnfilter 1
  • immcantation 1
  • airrseq 1
  • immunoinformatics 1
  • tnseq 1
  • htseq 1
  • clumping fastqs 1
  • deduping 1
  • smaller fastqs 1
  • GFF/GTF 1
  • tandem repeats 1
  • verifybamid 1
  • melon 1
  • DNA contamination estimation 1
  • hifi 1
  • Assembly 1
  • unmarkduplicates 1
  • umicollapse 1
  • crispr 1
  • antibody capture 1
  • antigen capture 1
  • multiomics 1
  • tnscope 1
  • readwriter 1
  • dnamodelapply 1
  • dnascope 1
  • genome polishing 1
  • assembly polishing 1
  • bgen 1
  • chloroplast 1
  • Escherichia coli 1
  • copy-number 1
  • mapad 1
  • adna 1
  • c to t 1
  • cumulative coverage 1
  • proteus 1
  • readproteingroups 1
  • long read 1
  • taxonomic composition 1
  • CRISPRi 1
  • HLA 1
  • nanoq 1
  • Read filters 1
  • Read trimming 1
  • Read report 1
  • 16S 1
  • fix 1
  • malformed 1
  • metagenome assembler 1
  • paired reads re-pairing 1
  • pile up 1
  • mouse 1
  • Indel 1
  • collapse 1
  • trimBam 1
  • bamUtil 1
  • SNV 1
  • read group 1
  • realign 1
  • circular 1
  • paired reads merging 1
  • overlap-based merging 1
  • hamming-distance 1
  • trimfq 1
  • getpileupsummaries 1
  • cross-samplecontamination 1
  • calculatecontamination 1
  • daa 1
  • rma6 1
  • functional genomics 1
  • sgRNA 1
  • CRISPR-Cas9 1
  • DNA damage 1
  • NGS 1
  • damage patterns 1
  • collectsvevidence 1
  • single molecule 1
  • cancer genome 1
  • somatic structural variations 1
  • mobile element insertions 1
  • sequencing summary 1
  • gender 1
  • bootstrapping 1
  • gangstr 1
  • mitochondrial to nuclear ratio 1
  • random 1
  • mtnucratio 1
  • ratio 1
  • generate 1
  • haplotype resolution 1
  • gene model 1
  • tama_collapse.py 1
  • TAMA 1
  • repeat content 1
  • gstama/polyacleanup 1
  • genome heterozygosity 1
  • genome size 1
  • readcountssummary 1
  • getpileupsumaries 1
  • kallisto/index 1
  • quant 1
  • germlinevariantsites 1
  • digital normalization 1
  • k-mer counting 1
  • printsvevidence 1
  • printreads 1
  • sequencing adapters 1
  • sertotype 1
  • interleave 1
  • de-novo 1
  • longread 1
  • track 1
  • insert size 1
  • repair 1
  • paired 1
  • read pairs 1
  • peak-caller 1
  • cut&tag 1
  • cut&run 1
  • chromatin 1
  • seacr 1
  • applyvarcal 1
  • VQSR 1
  • variant recalibration 1
  • variantcalling 1
  • rRNA 1
  • ribosomal RNA 1
  • faqcs 1
  • pcr 1
  • porechop_abi 1
  • groupreads 1
  • duplexumi 1
  • consensus sequence 1
  • subreads 1
  • multimapper 1
  • read distribution 1
  • subsampling 1
  • long uncorrected reads 1
  • depth information 1
  • strandedness 1
  • experiment 1
  • structural variation 1
  • duphold 1
  • segment 1
  • integrity 1
  • salsa 1
  • salsa2 1
  • LCA 1
  • Ancestor 1
  • swissprot 1
  • genbank 1
  • embl 1
  • bacteria 0
  • download 0
  • MSA 0
  • taxonomic profiling 0
  • binning 0
  • VCF 0
  • single-cell 0
  • copy number 0
  • phylogeny 0
  • contigs 0
  • bedtools 0
  • graph 0
  • kmer 0
  • bcftools 0
  • reporting 0
  • variation graph 0
  • visualisation 0
  • databases 0
  • protein 0
  • table 0
  • stats 0
  • tsv 0
  • phage 0
  • imaging 0
  • openms 0
  • antimicrobial resistance 0
  • protein sequence 0
  • repeat 0
  • histogram 0
  • searching 0
  • pairs 0
  • bins 0
  • example 0
  • structure 0
  • pangenome graph 0
  • matrix 0
  • aDNA 0
  • expression 0
  • amr 0
  • cluster 0
  • LAST 0
  • plink2 0
  • low-coverage 0
  • bcf 0
  • cooler 0
  • gzip 0
  • iCLIP 0
  • annotate 0
  • virus 0
  • validation 0
  • gene 0
  • mmseqs2 0
  • checkm 0
  • ncbi 0
  • hmmer 0
  • ucsc 0
  • spatial 0
  • newick 0
  • genotyping 0
  • mag 0
  • segmentation 0
  • msa 0
  • blast 0
  • glimpse 0
  • population genetics 0
  • hmmsearch 0
  • pangenome 0
  • json 0
  • cnvkit 0
  • plasmid 0
  • snp 0
  • differential 0
  • multiple sequence alignment 0
  • low frequency variant calling 0
  • antimicrobial peptides 0
  • prokaryote 0
  • prediction 0
  • scRNA-seq 0
  • single 0
  • vsearch 0
  • NCBI 0
  • antimicrobial resistance genes 0
  • tumor-only 0
  • mirna 0
  • benchmark 0
  • diversity 0
  • distance 0
  • visualization 0
  • isolates 0
  • interval 0
  • amps 0
  • tabular 0
  • detection 0
  • csv 0
  • text 0
  • mutect2 0
  • arg 0
  • fragment 0
  • call 0
  • MAF 0
  • sourmash 0
  • svtk 0
  • antibiotic resistance 0
  • de novo assembly 0
  • compare 0
  • idXML 0
  • profiling 0
  • microbiome 0
  • mpileup 0
  • reference-free 0
  • query 0
  • gridss 0
  • riboseq 0
  • view 0
  • family 0
  • bedpe 0
  • malt 0
  • ngscheckmate 0
  • matching 0
  • fai 0
  • bigwig 0
  • read depth 0
  • ampir 0
  • fungi 0
  • peak-calling 0
  • CLIP 0
  • dna 0
  • circrna 0
  • rna 0
  • microarray 0
  • bin 0
  • ganon 0
  • ATAC-seq 0
  • microsatellite 0
  • union 0
  • retrotransposon 0
  • isomir 0
  • compress 0
  • bgzip 0
  • telomere 0
  • skani 0
  • interval_list 0
  • hic 0
  • deep learning 0
  • paf 0
  • redundancy 0
  • resistance 0
  • pypgx 0
  • HMM 0
  • enrichment 0
  • chromosome 0
  • gsea 0
  • logratio 0
  • happy 0
  • hybrid capture sequencing 0
  • copy number alteration calling 0
  • chunk 0
  • biosynthetic gene cluster 0
  • propr 0
  • image 0
  • DNA sequencing 0
  • parsing 0
  • quantification 0
  • BGC 0
  • public datasets 0
  • ranking 0
  • phylogenetic placement 0
  • xeniumranger 0
  • targeted sequencing 0
  • genmod 0
  • transcriptomics 0
  • DNA sequence 0
  • mtDNA 0
  • containment 0
  • ancestry 0
  • snps 0
  • fcs-gx 0
  • arriba 0
  • deeparg 0
  • macrel 0
  • mlst 0
  • amplify 0
  • fastk 0
  • das tool 0
  • html 0
  • structural_variants 0
  • C to T 0
  • DRAMP 0
  • das_tool 0
  • angsd 0
  • insert 0
  • fam 0
  • bim 0
  • fusion 0
  • SNP 0
  • pangolin 0
  • pan-genome 0
  • rsem 0
  • pairsam 0
  • duplication 0
  • prokaryotes 0
  • covid 0
  • benchmarking 0
  • dictionary 0
  • lineage 0
  • indel 0
  • PCA 0
  • genome mining 0
  • prokka 0
  • regions 0
  • typing 0
  • genomes 0
  • neubi 0
  • entrez 0
  • eukaryotes 0
  • scores 0
  • mcmicro 0
  • npz 0
  • windowmasker 0
  • amplicon sequences 0
  • bakta 0
  • vrhyme 0
  • nucleotide 0
  • highly_multiplexed_imaging 0
  • image_analysis 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • microbes 0
  • proteome 0
  • guide tree 0
  • somatic variants 0
  • transposons 0
  • complement 0
  • roh 0
  • transcripts 0
  • remove 0
  • converter 0
  • intervals 0
  • mzml 0
  • chimeras 0
  • comparisons 0
  • combine 0
  • comparison 0
  • score 0
  • pileup 0
  • bamtools 0
  • bracken 0
  • hidden Markov model 0
  • archiving 0
  • amplicon sequencing 0
  • notebook 0
  • reports 0
  • ataqv 0
  • checkv 0
  • informative sites 0
  • kinship 0
  • identity 0
  • relatedness 0
  • repeat expansion 0
  • virulence 0
  • cut up 0
  • krona chart 0
  • survivor 0
  • cool 0
  • dist 0
  • observations 0
  • shapeit 0
  • CRISPR 0
  • krona 0
  • prefetch 0
  • spaceranger 0
  • wig 0
  • atac-seq 0
  • tabix 0
  • ambient RNA removal 0
  • chip-seq 0
  • ligate 0
  • population genomics 0
  • cfDNA 0
  • png 0
  • profiles 0
  • ichorcna 0
  • mash 0
  • pigz 0
  • bustools 0
  • resolve_bioscience 0
  • gene set 0
  • gene set analysis 0
  • spatial_transcriptomics 0
  • lofreq 0
  • screen 0
  • phase 0
  • haplotypes 0
  • split_kmers 0
  • interactive 0
  • reformat 0
  • serogroup 0
  • minhash 0
  • GC content 0
  • maximum likelihood 0
  • primer 0
  • k-mer frequency 0
  • iphop 0
  • checksum 0
  • tree 0
  • nanostring 0
  • nacho 0
  • haplogroups 0
  • mRNA 0
  • find 0
  • krakenuniq 0
  • instrain 0
  • long terminal repeat 0
  • trgt 0
  • cgMLST 0
  • regression 0
  • taxids 0
  • SimpleAF 0
  • taxon name 0
  • zlib 0
  • differential expression 0
  • vg 0
  • vcflib 0
  • ampgram 0
  • amptransformer 0
  • orthologs 0
  • WGS 0
  • image_processing 0
  • dereplicate 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • standardization 0
  • repeats 0
  • svdb 0
  • ome-tif 0
  • small genome 0
  • MCMICRO 0
  • signature 0
  • FracMinHash sketch 0
  • interactions 0
  • functional analysis 0
  • join 0
  • reformatting 0
  • function 0
  • pharokka 0
  • bloom filter 0
  • k-mer index 0
  • COBS 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • parallelized 0
  • orthology 0
  • genetics 0
  • salmon 0
  • rgfa 0
  • small variants 0
  • multiallelic 0
  • nucleotides 0
  • proportionality 0
  • orf 0
  • registration 0
  • cancer genomics 0
  • homoploymer 0
  • ped 0
  • Duplication purging 0
  • purge duplications 0
  • library 0
  • preseq 0
  • import 0
  • doublets 0
  • variant pruning 0
  • anndata 0
  • bfiles 0
  • gene labels 0
  • hostile 0
  • duplicate 0
  • decontamination 0
  • GPU-accelerated 0
  • graph layout 0
  • human removal 0
  • screening 0
  • nextclade 0
  • msisensor-pro 0
  • cleaning 0
  • micro-satellite-scan 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • Read depth 0
  • contig 0
  • soft-clipped clusters 0
  • snpsift 0
  • snpeff 0
  • effect prediction 0
  • switch 0
  • sequenzautils 0
  • transformation 0
  • rename 0
  • smrnaseq 0
  • fusions 0
  • Pharmacogenetics 0
  • scaffold 0
  • retrotransposons 0
  • dict 0
  • rtgtools 0
  • junctions 0
  • pharmacogenetics 0
  • runs_of_homozygosity 0
  • taxonomic profile 0
  • assembly evaluation 0
  • concordance 0
  • bayesian 0
  • fetch 0
  • realignment 0
  • GEO 0
  • metagenomic 0
  • identifier 0
  • microscopy 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • metadata 0
  • tab 0
  • microbial 0
  • allele-specific 0
  • panelofnormals 0
  • MaltExtract 0
  • HOPS 0
  • authentication 0
  • edit distance 0
  • secondary metabolites 0
  • NRPS 0
  • RiPP 0
  • interval list 0
  • evidence 0
  • antibiotics 0
  • antismash 0
  • RNA-Seq 0
  • concat 0
  • tbi 0
  • gwas 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • BAM 0
  • blastn 0
  • version 0
  • correction 0
  • calling 0
  • cnv calling 0
  • immunoprofiling 0
  • structural-variant calling 0
  • cvnkit 0
  • estimation 0
  • single cells 0
  • genome bins 0
  • recombination 0
  • eCLIP 0
  • splice 0
  • parse 0
  • fasterq-dump 0
  • awk 0
  • intersect 0
  • intersection 0
  • normalize 0
  • norm 0
  • scatter 0
  • reheader 0
  • eigenstrat 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • windows 0
  • metagenomes 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • region 0
  • heatmap 0
  • sizes 0
  • bases 0
  • spatial_omics 0
  • random forest 0
  • allele 0
  • gem 0
  • ChIP-seq 0
  • baf 0
  • genomad 0
  • getfasta 0
  • derived alleles 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • jaccard 0
  • overlap 0
  • array_cgh 0
  • cytosure 0
  • decomposeblocksub 0
  • ancestral alleles 0
  • gprofiler2 0
  • gost 0
  • genomecov 0
  • closest 0
  • rad 0
  • bamtobed 0
  • sorting 0
  • structural variant 0
  • bam2fastx 0
  • bam2fastq 0
  • vector 0
  • site frequency spectrum 0
  • f coefficient 0
  • bioawk 0
  • unionBedGraphs 0
  • reverse complement 0
  • simulation 0
  • hmmfetch 0
  • decompose 0
  • pca 0
  • pruning 0
  • subtract 0
  • linkage equilibrium 0
  • slopBed 0
  • transmembrane 0
  • genome graph 0
  • chunking 0
  • homozygous genotypes 0
  • decoy 0
  • heterozygous genotypes 0
  • inbreeding 0
  • shiftBed 0
  • multinterval 0
  • sompy 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • drep 0
  • homology 0
  • co-orthology 0
  • plastid 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • quarto 0
  • masking 0
  • python 0
  • r 0
  • low-complexity 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • assay 0
  • trio binning 0
  • phylogenetics 0
  • minimum_evolution 0
  • parallel 0
  • csi 0
  • Read coverage histogram 0
  • biallelic 0
  • sequence similarity 0
  • spectral clustering 0
  • agat 0
  • longest 0
  • comparative genomics 0
  • isoform 0
  • autozygosity 0
  • homozygosity 0
  • deep variant 0
  • variancepartition 0
  • mutect 0
  • idx 0
  • update header 0
  • intron 0
  • dream 0
  • md 0
  • transform 0
  • gaps 0
  • introns 0
  • nm 0
  • uq 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • BCF 0
  • short 0
  • file manipulation 0
  • plink2_pca 0
  • propd 0
  • vcf2db 0
  • gemini 0
  • maf 0
  • lua 0
  • toml 0
  • plant 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • SINE 0
  • VCFtools 0
  • network 0
  • downsample bam 0
  • wget 0
  • mkvdjref 0
  • construct 0
  • graph projection to vcf 0
  • cellpose 0
  • extractunbinned 0
  • linkbins 0
  • sintax 0
  • vsearch/sort 0
  • subsample bam 0
  • downsample 0
  • usearch 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • mobile genetic elements 0
  • genome annotation 0
  • trna 0
  • covariance models 0
  • quality assurnce 0
  • qa 0
  • snv 0
  • scanner 0
  • scRNA-Seq 0
  • files 0
  • helitron 0
  • remove samples 0
  • upd 0
  • uniparental 0
  • disomy 0
  • domains 0
  • long read alignment 0
  • nucleotide sequence 0
  • copyratios 0
  • comp 0
  • denoisereadcounts 0
  • tblastn 0
  • bedcov 0
  • groupby 0
  • genotype dosages 0
  • vcf file 0
  • postprocessing 0
  • subtyping 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • bgen file 0
  • createreadcountpanelofnormals 0
  • workflow_mode 0
  • pangenome-scale 0
  • yahs 0
  • all versus all 0
  • mashmap 0
  • wavefront 0
  • whamg 0
  • wham 0
  • compartments 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • geo 0
  • workflow 0
  • calder2 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • cadd 0
  • snakemake 0
  • distance-based 0
  • homologs 0
  • telseq 0
  • admixture 0
  • mzML 0
  • microRNA 0
  • prepare 0
  • catpack 0
  • multiqc 0
  • mass_error 0
  • search engine 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • vsearch/dereplicate 0
  • Staging 0
  • vsearch/fastqfilter 0
  • fastqfilter 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • http(s) 0
  • utility 0
  • setgt 0
  • jvarkit 0
  • translate 0
  • tar 0
  • tarball 0
  • adapterremoval 0
  • tag2tag 0
  • hhsuite 0
  • drug categorization 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • functional 0
  • impute-info 0
  • tags 0
  • sequencing_bias 0
  • mkarv 0
  • hashing-based deconvolution 0
  • rank 0
  • java 0
  • script 0
  • post mortem damage 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • atlas 0
  • staging 0
  • targz 0
  • Computational Immunology 0
  • bias 0
  • scanpy 0
  • nuclear contamination estimate 0
  • resegment 0
  • morphology 0
  • post Post-processing 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • run 0
  • model 0
  • AMPs 0
  • allele counts 0
  • antimicrobial peptide prediction 0
  • plotting 0
  • regtools 0
  • leafcutter 0
  • amp 0
  • pdb 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • affy 0
  • block substitutions 0
  • reference panels 0
  • relabel 0
  • cell segmentation 0
  • Bioinformatics Tools 0
  • quality_control 0
  • bclconvert 0
  • nucBed 0
  • AT content 0
  • Immune Deconvolution 0
  • nucleotide content 0
  • elfasta 0
  • elprep 0
  • doublet 0
  • patterns 0
  • controlstatistics 0
  • source tracking 0
  • emoji 0
  • regex 0
  • nuclear segmentation 0
  • installation 0
  • doublet_detection 0
  • barcodes 0
  • doCounts 0
  • subsetting 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • import segmentation 0
  • redundant 0
  • hmmpress 0
  • identity-by-descent 0
  • go 0
  • scimap 0
  • Bayesian 0
  • host removal 0
  • structural-variants 0
  • omics 0
  • biological activity 0
  • bamtools/split 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • haploype 0
  • mygene 0
  • yaml 0
  • associations 0
  • impute 0
  • bedgraphtobigwig 0
  • bamtools/convert 0
  • reference compression 0
  • reference panel 0
  • bacphlip 0
  • virulent 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • spatial_neighborhoods 0
  • grea 0
  • seqfu 0
  • multi-tool 0
  • predict 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • liftover 0
  • probabilistic realignment 0
  • n50 0
  • case/control 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • element 0
  • shuffleBed 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • extension 0
  • temperate 0
  • cram-size 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • background 0
  • single-stranded 0
  • regulatory network 0
  • ancientDNA 0
  • transcription factors 0
  • paraphase 0
  • selector 0
  • size 0
  • Pacbio 0
  • quality check 0
  • phylogenies 0
  • hmmscan 0
  • spot 0
  • orthogroup 0
  • authentict 0
  • sage 0
  • mass spectrometry 0
  • featuretable 0
  • extraction 0
  • guidetree 0
  • AC/NS/AF 0
  • functional enrichment 0
  • autofluorescence 0
  • translation 0
  • check 0
  • lifestyle 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • cycif 0
  • vcflib/vcffixup 0
  • contiguate 0
  • junction 0
  • MMseqs2 0
  • InterProScan 0
  • busco 0
  • droplet based single cells 0
  • antimicrobial reistance 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • bigbed 0
  • cmseq 0
  • duplicate removal 0
  • bedtointervallist 0
  • mash/sketch 0
  • calibratedragstrmodel 0
  • reduced 0
  • representations 0
  • maxbin2 0
  • metagenome-assembled genomes 0
  • mass-spectroscopy 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • megahit 0
  • taxonomic assignment 0
  • denovo 0
  • debruijn 0
  • asereadcounter 0
  • Neisseria meningitidis 0
  • vqsr 0
  • variant quality score recalibration 0
  • 3D heat map 0
  • contour map 0
  • Merqury 0
  • annotateintervals 0
  • targets 0
  • cnnscorevariants 0
  • collectreadcounts 0
  • ploidy 0
  • AMP 0
  • collapsing 0
  • determinegermlinecontigploidy 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • createsomaticpanelofnormals 0
  • limma 0
  • Listeria monocytogenes 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • lofreq/call 0
  • lofreq/filter 0
  • qualities 0
  • peptide prediction 0
  • estimate 0
  • dragstr 0
  • maximum-likelihood 0
  • rra 0
  • composestrtablefile 0
  • short variant discovery 0
  • combinegvcfs 0
  • smudgeplot 0
  • unionsum 0
  • train 0
  • graph drawing 0
  • SNP table 0
  • contaminant 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • zipperbams 0
  • graph construction 0
  • ubam 0
  • Beautiful stand-alone HTML report 0
  • squeeze 0
  • odgi 0
  • combine graphs 0
  • graph stats 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • unmapped 0
  • GATK UnifiedGenotyper 0
  • bioinformatics tools 0
  • metaphlan 0
  • methylation bias 0
  • mbias 0
  • heattree 0
  • assembler 0
  • de Bruijn 0
  • microrna 0
  • gene-calling 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • gamma 0
  • UShER 0
  • mosdepth 0
  • otu table 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • microsatellite instability 0
  • fq 0
  • lint 0
  • scan 0
  • adapter removal 0
  • spliced 0
  • flip 0
  • txt 0
  • abricate 0
  • amrfinderplus 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • genome profile 0
  • bgc 0
  • Haemophilus influenzae 0
  • file parsing 0
  • gawk 0
  • extractvariants 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • svcluster 0
  • svannotate 0
  • gccounter 0
  • splitintervals 0
  • readcounter 0
  • splitcram 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • compound 0
  • extract_variants 0
  • Hidden Markov Model 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • gvcftools 0
  • Mykrobe 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • GTDB taxonomy 0
  • genome taxonomy database 0
  • archaea 0
  • gunc 0
  • gunzip 0
  • models 0
  • shiftfasta 0
  • hmtnote 0
  • reorder 0
  • Klebsiella 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • indexfeaturefile 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • effective genome size 0
  • pneumoniae 0
  • jupytext 0
  • panelofnormalscreation 0
  • kegg 0
  • kofamscan 0
  • jointgenotyping 0
  • combining 0
  • genomicsdbimport 0
  • genomicsdb 0
  • gatherbqsrreports 0
  • tranche filtering 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • papermill 0
  • Jupyter 0
  • annotations 0
  • pixel_classification 0
  • shiftchain 0
  • pos 0
  • haemophilus 0
  • selectvariants 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • multicut 0
  • pixel classification 0
  • probability_maps 0
  • Python 0
  • reblockgvcf 0
  • interproscan 0
  • preprocessintervals 0
  • postprocessgermlinecnvcalls 0
  • genomic islands 0
  • insertion 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • jasminesv 0
  • jasmine 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • custom 0
  • header 0
  • seq 0
  • na 0
  • selection 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • bam2seqz 0
  • gc_wiggle 0
  • induce 0
  • sex determination 0
  • sequence headers 0
  • genetic sex 0
  • relative coverage 0
  • Cores 0
  • Segmentation 0
  • rare variants 0
  • error 0
  • TMA dearray 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • cls 0
  • grep 0
  • boxplot 0
  • scramble 0
  • amplicon 0
  • ampliconclip 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • readgroup 0
  • paired-end 0
  • cluster analysis 0
  • subseq 0
  • clusteridentifier 0
  • pcr duplicates 0
  • assembly-binning 0
  • cutesv 0
  • gct 0
  • exploratory 0
  • density 0
  • sambamba 0
  • rdtest2vcf 0
  • spatype 0
  • spa 0
  • streptococcus 0
  • sccmec 0
  • Sample 0
  • protein coding genes 0
  • detecting svs 0
  • short-read sequencing 0
  • polymorphic sites 0
  • svtk/baftest 0
  • baftest 0
  • countsvtypes 0
  • rdtest 0
  • antitarget 0
  • polymorphic 0
  • vcf2bed 0
  • decompress 0
  • polymut 0
  • polya tail 0
  • fast5 0
  • chromosome_visualization 0
  • Mycobacterium tuberculosis 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • access 0
  • fracminhash sketch 0
  • features 0
  • cload 0
  • mcool 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • sniffles 0
  • core 0
  • snippy 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • hash sketch 0
  • subcontigs 0
  • dbnsfp 0
  • predictions 0
  • nucleotide composition 0
  • SNPs 0
  • invariant 0
  • constant 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • signatures 0
  • duplicate marking 0
  • flagstat 0
  • ligation junctions 0
  • genetic 0
  • deletions 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • csRNA-seq 0
  • STRIPE-seq 0
  • PRO-seq 0
  • GRO-seq 0
  • ARGs 0
  • picard/renamesampleinvcf 0
  • antibiotic resistance genes 0
  • exclude 0
  • variant identifiers 0
  • str 0
  • indep 0
  • indep pairwise 0
  • recode 0
  • whole genome association 0
  • identifiers 0
  • scoring 0
  • cache 0
  • variant genetic 0
  • sortvcf 0
  • pbp 0
  • pairtools 0
  • pairstools 0
  • restriction fragments 0
  • select 0
  • public 0
  • paragraph 0
  • graphs 0
  • pbbam 0
  • pbmerge 0
  • pair-end 0
  • liftovervcf 0
  • read 0
  • pedigrees 0
  • ENA 0
  • motif 0
  • ChIP-Seq 0
  • phantom peaks 0
  • prophage 0
  • identification 0
  • illumina datasets 0
  • phylogenetic composition 0
  • SRA 0
  • ANI 0
  • hybrid-selection 0
  • mate-pair 0
  • pmdtools 0
  • percent on target 0
  • rhocall 0
  • R 0
  • escherichia coli 0
  • bamstat 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • PEP 0
  • sequence-based 0
  • mapping-based 0
  • rtg 0
  • blastx 0
  • pedfilter 0
  • rocplot 0
  • rtg-tools 0
  • neighbour-joining 0
  • quast 0
  • endogenous DNA 0
  • circos 0
  • Streptococcus pyogenes 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • intervals coverage 0
  • split by chromosome 0
  • deletion 0
  • genomic intervals 0
  • schema 0
  • normal database 0
  • panel of normals 0
  • cutoff 0
  • eklipse 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • eigenstratdatabasetools 0
  • False duplications 0
  • Assembly curation 0
  • pep 0
  • purging 0
  • integron 0

Trim sequencing adapters and collapse overlapping reads

010

singles_truncated discarded paired_truncated collapsed collapsed_truncated paired_interleaved settings versions

The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.

010

passed_gff failed_gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

01

extracted_reads_fastq log intermediate_sam intermediate_bam intermediate_sorted_bam versions

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Simulation tool to generate synthetic Illumina next-generation sequencing reads

01000

fastq aln sam versions

art:

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.

Aggregates fastq files with demultiplexed reads

01

fastq versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

split single end read groups by length and merge paired end reads

01234

bam txt versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

primary_filtered_bam contamination_bam versions

trims the end of reads in a SAM/BAM file, changing read ends to โ€˜Nโ€™ and quality to โ€˜!โ€™, or by soft clipping

0123

bam versions

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

gff versions

Align short or PacBio reads to a reference genome using BBMap

010

bam log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Adapter and quality trimming of sequencing reads

010

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Merging overlapping paired reads into a single read.

010

merged unmerged ihist versions log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.

01

fastq log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Split sequencing reads by mapping them to multiple references simultaneously

0100010

index primary_fastq all_fastq stats log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates

01

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Filter out sequences by sequence header name(s)

01000

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Re-pairs reads that became disordered or had some mates eliminated.

010

repaired singleton versions log

repair:

Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated tools.

Locate and tag duplicate reads in a BAM file

01

bam metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.

010101

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

010101

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Summarize and/or filter reads based on bisulfite conversion rate

01010101

bam versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Performs alignment of BS-Seq reads using bismark

010101

bam report unmapped versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Relates methylation calls back to genomic cytosine contexts.

010101

coverage report summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Removes alignments to the same position in the genome from the Bismark mapping output.

01

bam report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.

01

index versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Extracts methylation information for individual cytosines from alignments.

0101

bedgraph methylation_calls coverage report mbias versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Collects bismark alignment reports

01234

report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

00000

summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Align reads to a reference genome using bowtie

01010

bam log fastq versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

01010100

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

01

index versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Find SA coordinates of the input reads for bwa short-read mapping

0101

sai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs alignment of BS-Seq reads using bwameth

010101

bam versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Performs indexing of c2t converted reference genome

01

index versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

0100

report assembly contigs corrected_reads corrected_trimmed_reads metadata contig_position contig_info versions

Concatenates fastq files

01

reads versions

cat:

The cat utility reads files sequentially, writing them to the standard output.

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101010101

orf2lca bin2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101

orf2lca contig2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101001010101010101

rat_log complete_abundance contig_abundance read2classification alignment_diamond contig2classification cat_log orf2lca faa gff unmapped_diamond unmapped_fasta unmapped2classification versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Summarises results from CAT/BAT/RAT classification steps

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.

010

outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.

012

fastq undetermined_fastq reports stats interop versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.

0

gtf versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.

000

reference versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.

00101010101010000000000000

config outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.

00

versions fastq

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.

0

gtf versions

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.

00

versions fastq

cellranger-atac:

Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Realign reads mapped with BWA to elongated reference genome

01010101

bam versions

circularmapper:

A method to improve mappings on circular genomes such as Mitochondria.

Clair3 is a germline small variant caller for long-reads

012340101

vcf tbi phased_vcf phased_tbi versions

CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.

012010101

root tab versions

cnvnator:

Tool for calling copy number variations.

Calculates peak-to-through ratio (PTR) from metagenomic sequence data

01

ptr versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Computes the coverage map along the reference genome

01

coverage versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Indexes a directory of fasta files for use with CoPTR

01

index_dir versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Maps the reads to the reference database

0101

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Merge reads that were mapped to multiple indices

01

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Map reads to contigs and estimate coverage

010100

coverage versions

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

Perform adapter/quality trimming on sequencing reads

01

reads log versions

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

vcf vcf_tbi gvcf gvcf_tbi versions

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.

01200

bigwig bedgraph versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots cumulative reads coverages by BAM file

012

pdf matrix metrics versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Assemble bacterial isolate genomes from Nanopore reads

012

contigs log raw_contigs gfa txt versions

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

010

outseq versions

emboss:

The European Molecular Biology Open Software Suite

A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.

010

report assignment_report samfile unclassified_fa versions

emu:

Emu is a relative abundance estimator for 16s genomic data.

Run falco on sequenced reads

01

html txt versions

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

Perform adapter and quality trimming on sequencing reads with reporting

01

reads stats debug statspdf reads_fail reads_unpaired log versions

A program that counts sequence occurrences in FASTQ files.

0101

count_matrix stats distribution_plot reads_plot reads_plot_percentage versions

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

Perform adapter/quality trimming on sequencing reads

010000

reads json html log reads_fail reads_merged versions

Run FastQC on sequenced reads

01

html zip versions

Align reads to multiple reference genomes using fastq-screen

010

txt png html fastq versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

01

fasta versions

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

01

bam cram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

0101000

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ€™ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

010

bam histogram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Filtlong filters long reads based on quality measures or short read data.

012

reads log versions

Perform merging of mate paired-end sequencing reads

01

merged notcombined histogram versions

De novo assembler for single molecule sequencing reads

010

fasta gfa gv txt log json versions

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

0

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012000

lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

012301010100

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

bam cram sam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01

bam bai versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

01

linear_plot_png transformed_linear_plot_png log_plot_png transformed_log_plot_png model summary lookup_table fitted_histogram_png versions

Assembles organelle genomes from genomic data

0101

fasta etc versions

getorganelle:

Get organelle genomes from genome skimming data

Collapse redundant transcript models in Iso-Seq data.

010

bed bed_trans_reads local_density_error polya read strand_check trans_report versions varcov variants

tama_collapse.py:

Collapse similar gene model

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

01

fasta report tails versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Whole-genome assembly using PacBio HiFi reads

01201201201

raw_unitigs bin_files processed_unitigs primary_contigs alternate_contigs hap1_contigs hap2_contigs corrected_reads read_overlaps log versions

Align RNA-Seq reads to a reference with HISAT2

010101

bam summary fastq versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Builds HISAT2 index for reference genome

010101

index versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Extracts splicing sites from a gtf files

01

txt versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Pre-compute the graph index structure.

01

graph versions

hlala:

HLA typing from short and long reads

Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.

0123

results extraction extraction_mapped extraction_unmpapped hla fastq reads_per_level remapped versions

hlala:

HLA typing from short and long reads

Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.

01

bigwig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

count how many reads map to each feature

01201

txt versions

htseq/count:

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.

0101

log dedup annotated stats versions

Assembly polisher using short (and long) reads

0101000

fasta versions

Remove polyA tail and artificial concatemers

010

bam pbi consensusreadset summary report versions

isoseq:

IsoSeq - Scalable De Novo Isoform Discovery

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq3:

IsoSeq3 - Scalable De Novo Isoform Discovery

Create kallisto index

01

index versions

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

01010000

results json_info log versions

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.

000

reads versions

khmer:

khmer k-mer counting library

Removes low abundance k-mers from FASTA/FASTQ files

01

trimmed versions

khmer:

khmer k-mer counting library

This module wraps the index module of the KMA alignment tool.

01

index versions

kma:

Rapid and precise alignment of raw reads against redundant databases with KMA

Classifies metagenomic sequence data

01000

classified_reads_fastq unclassified_reads_fastq classified_reads_assignment report versions

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Extract reads classified at any user-specified taxonomy IDs.

0010101

extracted_kraken2_reads versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Classifies metagenomic sequence data using unique k-mer counts

012000000

classified_reads unclassified_reads classified_assignment report versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Converting aligned short and long reads records from one reference to another

0101

bam versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

mageck count for functional genomics, reads are usually mapped to a specific sgRNA

010

count norm versions

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Map short-reads to an indexed reference genome

01010000000

bam versions

mapad:

An aDNA aware short-read mapper

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

010

runtime_log fragmisincorporation_plot length_plot misincorporation lgdistribution dnacomp stats_out_mcmc_hist stats_out_mcmc_iter stats_out_mcmc_trace stats_out_mcmc_iter_summ_stat stats_out_mcmc_post_pred stats_out_mcmc_correct_prob dnacomp_genome rescaled pctot_freq pgtoa_freq fasta folder versions

Analyses a DAA file and exports information in text format

010

txt_gz megan versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Analyses an RMA file and exports information in text format

010

txt megan_summary versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Performs taxonomic profiling of long metagenomic reads against the melon database

0100

tsv_output json_output log versions

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101000

hist log_stderr versions

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

Strain-level metagenomic assignment

012340

wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Maps long reads to a metamaps database

010

classification_res meta_file meta_unmappedreadsLengths para_file versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

A very fast OLC-based de novo assembler for noisy long reads

012

gfa assembly versions

miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.

0101

outputs versions

mirdeep2:

miRDeep2 Mapper (mapper.pl) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.

miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.

012010123

outputs versions

mirdeep2:

miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs. It includes three main scripts: miRDeep2.pl, mapper.pl, and quantifier.pl for comprehensive miRNA detection and quantification.

A python workflow that assembles mitogenomes from Pacbio HiFi reads

010000

fasta stats gb gff all_potential_contigs contigs_annotations contigs_circularization contigs_filtering coverage_mapping coverage_plot final_mitogenome_annotation final_mitogenome_choice final_mitogenome_coverage potential_contigs reads_mapping_and_assembly shared_genes versions

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.

010

mtnucratio json versions

Compare multiple runs of long read sequencing data and alignments

01

report_html lengths_violin_html log_length_violin_html n50_html number_of_reads_html overlay_histogram_html overlay_histogram_normalized_html overlay_log_histogram_html overlay_log_histogram_normalized_html total_throughput_html quals_violin_html overlay_histogram_identity_html overlay_histogram_phredscore_html percent_identity_violin_html active_pores_over_time_html cumulative_yield_plot_gigabases_html sequencing_speed_over_time_html stats_txt versions

Filtering and trimming of Oxford Nanopore Sequencing data

010

filtreads log_file versions

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

01

html png txt log versions

Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.

010

stats reads versions

Merging paired-end reads and removing sequencing adapters.

01

merged_reads unstitched_read1 unstitched_read2 versions

Determines the gender of a sample from the BAM/CRAM file.

01201010

tsv versions

ngsbits:

Short-read sequencing tools

write your description here

metareadsformatmode

meta versions npa npc npl npo

VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

0101010

bam bai qc_metrics bqsr_table duplicate_metrics versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Pacbio ccs - Generate Highly Accurate Single-Molecule Consensus Reads

01200

bam pbi report_txt report_json metrics versions

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

0120

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Locate and tag duplicate reads in a BAM file

010101

bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

xml txt versions

Polishing genome assemblies with short reads.

01010

fasta versions debug

polypolish:

Polishing genome assemblies with short reads.

Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.

012

cel plp var umi versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

01

reads log versions

Adapter removal and demultiplexing of Oxford Nanopore reads

01

reads log versions

porechop:

Adapter removal and demultiplexing of Oxford Nanopore reads

Filter reads by quality score.

01

reads logs versions log_tab

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

good_reads single_reads bad_reads log versions

frame-shift correction for long read (meta)genomics - fix frameshifts in reads

0101

out_fa versions

proovframe:

frame-shift correction for long read (meta)genomics

frame-shift correction for long read (meta)genomics - maps proteins to reads

012

tsv versions

proovframe:

frame-shift correction for long read (meta)genomics

reads a maxQuant proteinGroups file with Proteus

012

dendro_plot mean_var_plot raw_dist_plot norm_dist_plot raw_rdata norm_rdata raw_tab norm_tab session_info versions

proteus:

R package for analysing proteomics data

Identify, orient and trim nanopore cDNA reads

01

fastq versions

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

Demultiplexer for Nanopore samples

010

reads versions

Consensus module for raw de novo DNA assembly of long uncorrected reads

0123

improved_assembly versions

Randomly subsample sequencing reads to a specified coverage

0120

reads versions

De novo genome assembler for long uncorrected reads.

01

fasta gfa versions

Infer strandedness from sequencing reads

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate TIN (transcript integrity number) from RNA-seq reads

0120

txt xls versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

SALSA, A tool to scaffold long read assemblies with HiC

0120000

fasta agp agp_original_coordinates versions

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

0120

csv json bam versions

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

find and mark duplicate reads in BAM file

01

bam bai versions

sambamba:

process your BAM data faster!

The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

010

reads versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

shuffles and groups reads together by their names

0101

bam cram sam versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

bam cram sam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Call peaks using SEACR on sequenced reads in bedgraph format

0120

bed versions

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Create BWA index for reference genome

01

index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

01010101

bam_and_bai versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

012012010101

metrics summary versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the GATK DepthOfCoverage tool.

01201010101

per_locus sample_summary statistics coverage_counts coverage_proportions interval_summary versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects multiple quality metrics from a bam file

01201010

mq_metrics qd_metrics gc_summary gc_metrics aln_metrics is_metrics mq_plot qd_plot is_plot gc_plot versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

0120101

cram crai bam bai score metrics metrics_multiqc_tsv versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

01230101010101000

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

012301010101

vcf_gz vcf_gz_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

012340101010100

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Generate recalibration table and optionally perform base quality recalibration

01201010101010

table table_post recal_alignment csv pdf versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

0120101

output index output_index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

01234560101

vcf vcf_tbi stats versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

01230101010101010100

orientation_data contamination_data contamination_segments stats vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

012010101201201201

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

01200000

recal idx tranches plots versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects whole genome quality metrics from a bam file

012010101

wgs_metrics versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Subset FASTA/FASTQ files to some number of sequences

012

subset versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

match up paired-end reads from two fastq files

01

reads unpaired_reads versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Split single or paired-end fastq.gz files

01

reads versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Salmonella serotype prediction from reads and assemblies

01

log tsv txt versions

Generates a BED file containing genomic locations of lengths of N.

01

bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Subsample reads from FASTQ files

012

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Trim low quality bases from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

tsv txt versions

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

01

assembly gfa results versions

Determine Shigella serotype from Illumina or Oxford Nanopore reads

01

tsv hits versions

Determine Shigella serotype from assemblies or Illumina paired-end reads

01

tsv versions

Assemble bacterial isolate genomes from Illumina paired-end reads

01

contigs corrections log raw_contigs gfa versions

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

Local sequence alignment tool for filtering, mapping and clustering.

010101

reads log index versions

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Fast, efficient, lossless compression of FASTQ files.

012

spring versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Fast, efficient, lossless decompression of FASTQ files.

010

fastq versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).

0100

reads versions

sratools:

SRA Toolkit and SDK from NCBI

Align reads to a reference genome using STAR

010101000

log_final log_out log_progress versions bam bam_sorted bam_sorted_aligned bam_transcript bam_unsorted fastq tab spl_junc_tab read_per_gene_tab junction sam wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Serotype STEC samples from paired-end reads or assemblies

01

tsv versions

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

0123456789100120

input rdata plots vcf bgen versions

Tandem repeat genotyper for long reads

012010101

vcf tbi versions

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

0123400

vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Count reads that map to genomic features

012

counts summary versions

featurecounts:

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

Sketching/indexing sequencing reads

010

sketch_fastq_genome versions

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

Trim FastQ files using Trim Galore!

01

reads log unpaired html zip versions

Performs quality and adapter trimming on paired end and single end reads

01

trimmed_reads unpaired_reads trim_log out_log summary versions

Assembles a de novo transcriptome from RNAseq reads

01

transcript_fasta log versions

Subsample a long-read sequencing fastq file for multiple assemblies

01

subreads versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

01001

bam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

00

index versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

0100

sam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam fastq log versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

simulating sequence reads from a reference genome

01

fastq versions

A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files

01

wiped_fastq report versions

fastqwiper:

A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files.

Convert and filter aligned reads to .npz

0120101

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Builds a YARA index for a reference genome

01

index versions

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Align reads to a reference genome using YARA

0101

bam bai versions

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Click here to trigger an update.