Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • vcf 58
  • structural variants 40
  • genomics 30
  • bam 23
  • alignment 17
  • fasta 13
  • cram 12
  • bed 11
  • sv 11
  • gatk4 9
  • MSA 9
  • structure 9
  • metagenomics 7
  • variant calling 7
  • variants 7
  • index 6
  • annotation 6
  • wgs 6
  • vsearch 6
  • fastq 5
  • assembly 5
  • merge 5
  • bacteria 5
  • coverage 5
  • contamination 5
  • pacbio 5
  • svtk 5
  • structural 5
  • gridss 5
  • genome 4
  • sort 4
  • database 4
  • statistics 4
  • somatic 4
  • rnaseq 4
  • graph 4
  • bcftools 4
  • gvcf 4
  • protein 4
  • spatial 4
  • genotyping 4
  • population genetics 4
  • json 4
  • bedpe 4
  • STR 4
  • xeniumranger 4
  • ancestry 4
  • sam 3
  • filter 3
  • gff 3
  • qc 3
  • download 3
  • gfa 3
  • conversion 3
  • binning 3
  • quality 3
  • ancient DNA 3
  • phylogeny 3
  • QC 3
  • compression 3
  • long-read 3
  • serotype 3
  • metrics 3
  • transcript 3
  • germline 3
  • virus 3
  • ncbi 3
  • newick 3
  • wxs 3
  • family 3
  • bin 3
  • public datasets 3
  • structural_variants 3
  • small indels 3
  • panel 3
  • entrez 3
  • amplicon sequences 3
  • vrhyme 3
  • RNA 3
  • rna_structure 3
  • informative sites 3
  • kinship 3
  • identity 3
  • relatedness 3
  • repeat expansion 3
  • survivor 3
  • gtf 2
  • nanopore 2
  • split 2
  • variant 2
  • taxonomy 2
  • proteomics 2
  • long reads 2
  • variation graph 2
  • imaging 2
  • depth 2
  • pangenome graph 2
  • expression 2
  • cluster 2
  • completeness 2
  • annotate 2
  • metagenome 2
  • checkm 2
  • mag 2
  • segmentation 2
  • profile 2
  • differential 2
  • benchmark 2
  • isolates 2
  • query 2
  • riboseq 2
  • normalization 2
  • SV 2
  • benchmarking 2
  • indel 2
  • somatic variants 2
  • transcripts 2
  • comparisons 2
  • comparison 2
  • checkv 2
  • miRNA 2
  • maximum likelihood 2
  • nanostring 2
  • nacho 2
  • mRNA 2
  • instrain 2
  • trgt 2
  • cgMLST 2
  • vg 2
  • WGS 2
  • standardization 2
  • svdb 2
  • reformatting 2
  • bloom filter 2
  • k-mer index 2
  • COBS 2
  • registration 2
  • gene labels 2
  • Streptococcus pneumoniae 2
  • realignment 2
  • expansionhunterdenovo 2
  • repeat_expansions 2
  • tab 2
  • structural-variant calling 2
  • eigenstrat 2
  • reference 1
  • align 1
  • classify 1
  • cnv 1
  • k-mer 1
  • taxonomic profiling 1
  • convert 1
  • clustering 1
  • imputation 1
  • trimming 1
  • bedtools 1
  • build 1
  • isoseq 1
  • tsv 1
  • phage 1
  • sequences 1
  • openms 1
  • DNA methylation 1
  • scWGBS 1
  • pairs 1
  • WGBS 1
  • example 1
  • filtering 1
  • matrix 1
  • aDNA 1
  • bisulfite sequencing 1
  • transcriptome 1
  • aligner 1
  • genotype 1
  • bcf 1
  • seqkit 1
  • cooler 1
  • damage 1
  • iCLIP 1
  • sequence 1
  • validation 1
  • biscuit 1
  • gff3 1
  • umi 1
  • peaks 1
  • evaluation 1
  • pangenome 1
  • plasmid 1
  • snp 1
  • low frequency variant calling 1
  • kmers 1
  • deamination 1
  • diversity 1
  • detection 1
  • FASTQ 1
  • text 1
  • indels 1
  • compare 1
  • microbiome 1
  • mpileup 1
  • clipping 1
  • merging 1
  • preprocessing 1
  • peak-calling 1
  • CLIP 1
  • microarray 1
  • microsatellite 1
  • telomere 1
  • quantification 1
  • clean 1
  • transcriptomics 1
  • abundance 1
  • snps 1
  • fgbio 1
  • arriba 1
  • insert 1
  • fusion 1
  • SNP 1
  • subsample 1
  • replace 1
  • polishing 1
  • mapper 1
  • typing 1
  • eukaryotes 1
  • dump 1
  • wastewater 1
  • population genomics 1
  • cfDNA 1
  • lofreq 1
  • hla 1
  • hlala 1
  • hla_typing 1
  • hlala_typing 1
  • variation 1
  • vcflib 1
  • orthologs 1
  • image_processing 1
  • dereplicate 1
  • ome-tif 1
  • MCMICRO 1
  • rrna 1
  • metamaps 1
  • mirdeep2 1
  • bfiles 1
  • duplicate 1
  • tumor 1
  • msi 1
  • RNA sequencing 1
  • rename 1
  • salmonella 1
  • polish 1
  • duplex 1
  • deconvolution 1
  • bayesian 1
  • metagenomic 1
  • metadata 1
  • panelofnormals 1
  • RNA-Seq 1
  • tbi 1
  • gwas 1
  • estimation 1
  • recombination 1
  • splice 1
  • intersection 1
  • deseq2 1
  • rna-seq 1
  • baf 1
  • derived alleles 1
  • covariance model 1
  • array_cgh 1
  • cytosure 1
  • ancestral alleles 1
  • closest 1
  • structural variant 1
  • site frequency spectrum 1
  • simulation 1
  • standardize 1
  • tandem repeats 1
  • phylogenetics 1
  • minimum_evolution 1
  • csi 1
  • verifybamid 1
  • DNA contamination estimation 1
  • construct 1
  • graph projection to vcf 1
  • extractunbinned 1
  • linkbins 1
  • sintax 1
  • vsearch/sort 1
  • usearch 1
  • whamg 1
  • wham 1
  • distance-based 1
  • long read 1
  • admixture 1
  • mass_error 1
  • vsearch/dereplicate 1
  • Staging 1
  • vsearch/fastqfilter 1
  • fastqfilter 1
  • setgt 1
  • CRISPRi 1
  • impute-info 1
  • tags 1
  • resegment 1
  • morphology 1
  • plotting 1
  • pdb 1
  • affy 1
  • reference panels 1
  • relabel 1
  • cell segmentation 1
  • nuclear segmentation 1
  • installation 1
  • import segmentation 1
  • redundant 1
  • Bayesian 1
  • structural-variants 1
  • single-stranded 1
  • regulatory network 1
  • ancientDNA 1
  • transcription factors 1
  • phylogenies 1
  • authentict 1
  • AC/NS/AF 1
  • translation 1
  • vcflib/vcffixup 1
  • junction 1
  • calibratedragstrmodel 1
  • mass-spectroscopy 1
  • targets 1
  • collapsing 1
  • createsomaticpanelofnormals 1
  • lofreq/filter 1
  • dragstr 1
  • composestrtablefile 1
  • DNA damage 1
  • NGS 1
  • damage patterns 1
  • collectsvevidence 1
  • SNP table 1
  • cancer genome 1
  • somatic structural variations 1
  • mobile element insertions 1
  • graph construction 1
  • GATK UnifiedGenotyper 1
  • bootstrapping 1
  • gangstr 1
  • microsatellite instability 1
  • adapter removal 1
  • Haemophilus influenzae 1
  • svcluster 1
  • svannotate 1
  • splitcram 1
  • gene model 1
  • tama_collapse.py 1
  • TAMA 1
  • printsvevidence 1
  • jasminesv 1
  • jasmine 1
  • random draw 1
  • pseudohaploid 1
  • pseudodiploid 1
  • freqsum 1
  • sequence headers 1
  • cutesv 1
  • rdtest2vcf 1
  • streptococcus 1
  • variantcalling 1
  • detecting svs 1
  • short-read sequencing 1
  • svtk/baftest 1
  • baftest 1
  • countsvtypes 1
  • rdtest 1
  • vcf2bed 1
  • chromosomal rearrangements 1
  • eucaryotes 1
  • coding 1
  • cds 1
  • transcroder 1
  • sniffles 1
  • SNPs 1
  • invariant 1
  • constant 1
  • deletions 1
  • insertions 1
  • tandem duplications 1
  • CoPRO 1
  • GRO-cap 1
  • PRO-cap 1
  • CAGE 1
  • NETCAGE 1
  • RAMPAGE 1
  • csRNA-seq 1
  • STRIPE-seq 1
  • PRO-seq 1
  • GRO-seq 1
  • str 1
  • recode 1
  • whole genome association 1
  • cache 1
  • pbp 1
  • pairstools 1
  • restriction fragments 1
  • paragraph 1
  • graphs 1
  • motif 1
  • illumina datasets 1
  • phylogenetic composition 1
  • read distribution 1
  • depth information 1
  • strandedness 1
  • experiment 1
  • structural variation 1
  • duphold 1
  • Streptococcus pyogenes 1
  • eigenstratdatabasetools 1
  • map 0
  • classification 0
  • quality control 0
  • sentieon 0
  • count 0
  • VCF 0
  • single-cell 0
  • copy number 0
  • contigs 0
  • kmer 0
  • bisulfite 0
  • mags 0
  • reporting 0
  • methylation 0
  • indexing 0
  • visualisation 0
  • databases 0
  • bisulphite 0
  • methylseq 0
  • picard 0
  • bqsr 0
  • illumina 0
  • cna 0
  • table 0
  • consensus 0
  • stats 0
  • taxonomic classification 0
  • 5mC 0
  • mapping 0
  • demultiplex 0
  • antimicrobial resistance 0
  • markduplicates 0
  • base quality score recalibration 0
  • protein sequence 0
  • repeat 0
  • histogram 0
  • searching 0
  • bins 0
  • samtools 0
  • haplotype 0
  • plot 0
  • neural network 0
  • amr 0
  • mappability 0
  • LAST 0
  • bwa 0
  • archaeogenomics 0
  • plink2 0
  • low-coverage 0
  • machine learning 0
  • phasing 0
  • palaeogenomics 0
  • gzip 0
  • gene 0
  • mmseqs2 0
  • db 0
  • decompression 0
  • hmmer 0
  • ucsc 0
  • complexity 0
  • feature 0
  • kraken2 0
  • msa 0
  • blast 0
  • bismark 0
  • mkref 0
  • glimpse 0
  • hmmsearch 0
  • dedup 0
  • sketch 0
  • reads 0
  • demultiplexing 0
  • mitochondria 0
  • cnvkit 0
  • report 0
  • multiple sequence alignment 0
  • antimicrobial peptides 0
  • prokaryote 0
  • bedGraph 0
  • short-read 0
  • deduplication 0
  • prediction 0
  • scRNA-seq 0
  • single 0
  • splicing 0
  • extract 0
  • NCBI 0
  • duplicates 0
  • antimicrobial resistance genes 0
  • tumor-only 0
  • mirna 0
  • ptr 0
  • distance 0
  • mem 0
  • visualization 0
  • cat 0
  • concatenate 0
  • interval 0
  • amps 0
  • single cell 0
  • tabular 0
  • fastx 0
  • csv 0
  • de novo 0
  • mutect2 0
  • kallisto 0
  • arg 0
  • summary 0
  • ont 0
  • fragment 0
  • call 0
  • MAF 0
  • sourmash 0
  • counts 0
  • coptr 0
  • antibiotic resistance 0
  • de novo assembly 0
  • idXML 0
  • adapters 0
  • profiling 0
  • reference-free 0
  • 3-letter genome 0
  • view 0
  • ccs 0
  • malt 0
  • ngscheckmate 0
  • genome assembler 0
  • matching 0
  • fai 0
  • bigwig 0
  • read depth 0
  • ampir 0
  • fungi 0
  • dna 0
  • diamond 0
  • circrna 0
  • rna 0
  • ganon 0
  • ATAC-seq 0
  • add 0
  • union 0
  • retrotransposon 0
  • miscoding lesions 0
  • isomir 0
  • compress 0
  • palaeogenetics 0
  • archaeogenetics 0
  • bgzip 0
  • skani 0
  • interval_list 0
  • hic 0
  • deep learning 0
  • paf 0
  • redundancy 0
  • cut 0
  • haplotypecaller 0
  • resistance 0
  • pypgx 0
  • HMM 0
  • enrichment 0
  • chromosome 0
  • gsea 0
  • logratio 0
  • happy 0
  • hybrid capture sequencing 0
  • HiFi 0
  • copy number alteration calling 0
  • chunk 0
  • biosynthetic gene cluster 0
  • bcl2fastq 0
  • propr 0
  • hmmcopy 0
  • image 0
  • DNA sequencing 0
  • umitools 0
  • parsing 0
  • BGC 0
  • ranking 0
  • phylogenetic placement 0
  • targeted sequencing 0
  • genmod 0
  • DNA sequence 0
  • mtDNA 0
  • sample 0
  • sequencing 0
  • bedgraph 0
  • containment 0
  • fcs-gx 0
  • deeparg 0
  • macrel 0
  • mlst 0
  • amplify 0
  • fastk 0
  • das tool 0
  • spark 0
  • html 0
  • C to T 0
  • DRAMP 0
  • das_tool 0
  • angsd 0
  • fam 0
  • bim 0
  • pangolin 0
  • UMI 0
  • pan-genome 0
  • rsem 0
  • pairsam 0
  • duplication 0
  • prokaryotes 0
  • bacterial 0
  • covid 0
  • dictionary 0
  • lineage 0
  • PCA 0
  • fingerprint 0
  • genome mining 0
  • prokka 0
  • regions 0
  • RNA-seq 0
  • genomes 0
  • neubi 0
  • scores 0
  • seqtk 0
  • mcmicro 0
  • aln 0
  • bwameth 0
  • npz 0
  • windowmasker 0
  • hi-c 0
  • bakta 0
  • nucleotide 0
  • highly_multiplexed_imaging 0
  • mkfastq 0
  • image_analysis 0
  • host 0
  • cellranger 0
  • gene expression 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • microbes 0
  • proteome 0
  • guide tree 0
  • long_read 0
  • transposons 0
  • complement 0
  • roh 0
  • organelle 0
  • remove 0
  • converter 0
  • intervals 0
  • genome assembly 0
  • gatk4spark 0
  • mzml 0
  • chimeras 0
  • PacBio 0
  • combine 0
  • quality trimming 0
  • score 0
  • adapter trimming 0
  • popscle 0
  • pileup 0
  • genotype-based deconvoltion 0
  • bamtools 0
  • variant_calling 0
  • bracken 0
  • hidden Markov model 0
  • archiving 0
  • minimap2 0
  • sylph 0
  • amplicon sequencing 0
  • notebook 0
  • reports 0
  • ataqv 0
  • virulence 0
  • cut up 0
  • krona chart 0
  • cool 0
  • pseudoalignment 0
  • dist 0
  • lossless 0
  • observations 0
  • shapeit 0
  • khmer 0
  • CRISPR 0
  • krona 0
  • prefetch 0
  • spaceranger 0
  • wig 0
  • atac-seq 0
  • tabix 0
  • ambient RNA removal 0
  • chip-seq 0
  • ligate 0
  • uLTRA 0
  • png 0
  • gstama 0
  • profiles 0
  • ichorcna 0
  • mash 0
  • tama 0
  • pigz 0
  • bustools 0
  • refine 0
  • resolve_bioscience 0
  • gene set 0
  • trancriptome 0
  • gene set analysis 0
  • spatial_transcriptomics 0
  • screen 0
  • krakentools 0
  • phase 0
  • haplotypes 0
  • split_kmers 0
  • interactive 0
  • reformat 0
  • serogroup 0
  • minhash 0
  • GC content 0
  • megan 0
  • polyA_tail 0
  • primer 0
  • k-mer frequency 0
  • barcode 0
  • iphop 0
  • checksum 0
  • corrupted 0
  • tree 0
  • mapcounter 0
  • haplogroups 0
  • find 0
  • krakenuniq 0
  • pair 0
  • long terminal repeat 0
  • regression 0
  • taxids 0
  • SimpleAF 0
  • taxon name 0
  • zlib 0
  • differential expression 0
  • ampgram 0
  • amptransformer 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • repeats 0
  • de novo assembler 0
  • small genome 0
  • signature 0
  • FracMinHash sketch 0
  • interactions 0
  • functional analysis 0
  • join 0
  • function 0
  • pharokka 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • kma 0
  • parallelized 0
  • orthology 0
  • genetics 0
  • salmon 0
  • tnhaplotyper2 0
  • rgfa 0
  • small variants 0
  • multiallelic 0
  • nucleotides 0
  • cnvnator 0
  • proportionality 0
  • mitochondrion 0
  • orf 0
  • leviosam2 0
  • lift 0
  • cancer genomics 0
  • homoploymer 0
  • ped 0
  • Duplication purging 0
  • purge duplications 0
  • library 0
  • preseq 0
  • adapter 0
  • import 0
  • doublets 0
  • variant pruning 0
  • anndata 0
  • subset 0
  • read-group 0
  • hostile 0
  • decontamination 0
  • GPU-accelerated 0
  • graph layout 0
  • human removal 0
  • screening 0
  • nextclade 0
  • removal 0
  • msisensor-pro 0
  • cleaning 0
  • micro-satellite-scan 0
  • instability 0
  • MSI 0
  • Read depth 0
  • contig 0
  • soft-clipped clusters 0
  • snpsift 0
  • snpeff 0
  • effect prediction 0
  • shigella 0
  • switch 0
  • ancient dna 0
  • sequenzautils 0
  • transformation 0
  • smrnaseq 0
  • varcal 0
  • fusions 0
  • Pharmacogenetics 0
  • scaffold 0
  • fixmate 0
  • retrotransposons 0
  • dict 0
  • collate 0
  • bam2fq 0
  • frame-shift correction 0
  • long-read sequencing 0
  • scaffolding 0
  • rtgtools 0
  • sequence analysis 0
  • junctions 0
  • pharmacogenetics 0
  • runs_of_homozygosity 0
  • taxonomic profile 0
  • assembly evaluation 0
  • concordance 0
  • merge mate pairs 0
  • reads merging 0
  • short reads 0
  • xenograft 0
  • graft 0
  • unaligned 0
  • fetch 0
  • GEO 0
  • trim 0
  • identifier 0
  • microscopy 0
  • microbial 0
  • allele-specific 0
  • emboss 0
  • MaltExtract 0
  • HOPS 0
  • authentication 0
  • gatk 0
  • edit distance 0
  • joint genotyping 0
  • secondary metabolites 0
  • NRPS 0
  • RiPP 0
  • interval list 0
  • evidence 0
  • antibiotics 0
  • antismash 0
  • filtermutectcalls 0
  • simulate 0
  • artic 0
  • aggregate 0
  • demultiplexed reads 0
  • concat 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • BAM 0
  • blastn 0
  • version 0
  • correction 0
  • calling 0
  • cnv calling 0
  • immunoprofiling 0
  • cvnkit 0
  • vdj 0
  • single cells 0
  • genome bins 0
  • eCLIP 0
  • parse 0
  • fasterq-dump 0
  • awk 0
  • intersect 0
  • normalize 0
  • norm 0
  • scatter 0
  • reheader 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • windows 0
  • metagenomes 0
  • blastp 0
  • region 0
  • heatmap 0
  • sizes 0
  • bases 0
  • spatial_omics 0
  • random forest 0
  • allele 0
  • UMIs 0
  • gem 0
  • ChIP-seq 0
  • genomad 0
  • getfasta 0
  • tnfilter 0
  • dereplication 0
  • microbial genomics 0
  • jaccard 0
  • overlap 0
  • decomposeblocksub 0
  • gprofiler2 0
  • gost 0
  • genomecov 0
  • rad 0
  • bamtobed 0
  • sorting 0
  • bam2fastx 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • vector 0
  • immunoinformatics 0
  • f coefficient 0
  • bioawk 0
  • unionBedGraphs 0
  • reverse complement 0
  • hmmfetch 0
  • decompose 0
  • pca 0
  • pruning 0
  • subtract 0
  • linkage equilibrium 0
  • slopBed 0
  • transmembrane 0
  • genome graph 0
  • chunking 0
  • tnseq 0
  • homozygous genotypes 0
  • decoy 0
  • heterozygous genotypes 0
  • htseq 0
  • inbreeding 0
  • shiftBed 0
  • multinterval 0
  • sompy 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • drep 0
  • homology 0
  • co-orthology 0
  • clumping fastqs 0
  • deduping 0
  • plastid 0
  • smaller fastqs 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • quarto 0
  • masking 0
  • python 0
  • r 0
  • low-complexity 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • GFF/GTF 0
  • assay 0
  • trio binning 0
  • parallel 0
  • Read coverage histogram 0
  • biallelic 0
  • sequence similarity 0
  • spectral clustering 0
  • agat 0
  • longest 0
  • comparative genomics 0
  • isoform 0
  • autozygosity 0
  • homozygosity 0
  • deep variant 0
  • variancepartition 0
  • mutect 0
  • idx 0
  • update header 0
  • intron 0
  • dream 0
  • md 0
  • transform 0
  • gaps 0
  • introns 0
  • nm 0
  • uq 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • BCF 0
  • short 0
  • file manipulation 0
  • plink2_pca 0
  • propd 0
  • vcf2db 0
  • gemini 0
  • melon 0
  • maf 0
  • lua 0
  • toml 0
  • plant 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • SINE 0
  • VCFtools 0
  • network 0
  • downsample bam 0
  • wget 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • subsample bam 0
  • downsample 0
  • unmarkduplicates 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • mobile genetic elements 0
  • genome annotation 0
  • trna 0
  • covariance models 0
  • quality assurnce 0
  • qa 0
  • umicollapse 0
  • snv 0
  • scanner 0
  • scRNA-Seq 0
  • crispr 0
  • antibody capture 0
  • files 0
  • antigen capture 0
  • helitron 0
  • multiomics 0
  • remove samples 0
  • upd 0
  • uniparental 0
  • disomy 0
  • domains 0
  • long read alignment 0
  • nucleotide sequence 0
  • tnscope 0
  • copyratios 0
  • comp 0
  • denoisereadcounts 0
  • readwriter 0
  • dnamodelapply 0
  • dnascope 0
  • tblastn 0
  • bedcov 0
  • genome polishing 0
  • groupby 0
  • assembly polishing 0
  • genotype dosages 0
  • vcf file 0
  • postprocessing 0
  • bgen 0
  • subtyping 0
  • chloroplast 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • bgen file 0
  • Escherichia coli 0
  • createreadcountpanelofnormals 0
  • workflow_mode 0
  • pangenome-scale 0
  • yahs 0
  • all versus all 0
  • mashmap 0
  • wavefront 0
  • compartments 0
  • copy-number 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • geo 0
  • workflow 0
  • mapad 0
  • adna 0
  • c to t 0
  • cumulative coverage 0
  • proteus 0
  • readproteingroups 0
  • calder2 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • cadd 0
  • snakemake 0
  • homologs 0
  • telseq 0
  • taxonomic composition 0
  • mzML 0
  • microRNA 0
  • prepare 0
  • catpack 0
  • multiqc 0
  • search engine 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • http(s) 0
  • utility 0
  • jvarkit 0
  • translate 0
  • tar 0
  • tarball 0
  • adapterremoval 0
  • HLA 0
  • tag2tag 0
  • nanoq 0
  • Read filters 0
  • Read trimming 0
  • Read report 0
  • hhsuite 0
  • drug categorization 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • functional 0
  • sequencing_bias 0
  • mkarv 0
  • hashing-based deconvolution 0
  • rank 0
  • 16S 0
  • java 0
  • script 0
  • post mortem damage 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • atlas 0
  • staging 0
  • targz 0
  • Computational Immunology 0
  • bias 0
  • scanpy 0
  • nuclear contamination estimate 0
  • fix 0
  • post Post-processing 0
  • malformed 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • metagenome assembler 0
  • run 0
  • model 0
  • AMPs 0
  • allele counts 0
  • antimicrobial peptide prediction 0
  • regtools 0
  • leafcutter 0
  • amp 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • block substitutions 0
  • Bioinformatics Tools 0
  • quality_control 0
  • bclconvert 0
  • nucBed 0
  • AT content 0
  • Immune Deconvolution 0
  • nucleotide content 0
  • elfasta 0
  • elprep 0
  • doublet 0
  • patterns 0
  • controlstatistics 0
  • source tracking 0
  • emoji 0
  • regex 0
  • paired reads re-pairing 0
  • doublet_detection 0
  • barcodes 0
  • doCounts 0
  • subsetting 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • hmmpress 0
  • identity-by-descent 0
  • go 0
  • scimap 0
  • host removal 0
  • omics 0
  • biological activity 0
  • bamtools/split 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • haploype 0
  • mygene 0
  • yaml 0
  • associations 0
  • impute 0
  • bedgraphtobigwig 0
  • bamtools/convert 0
  • reference compression 0
  • pile up 0
  • mouse 0
  • reference panel 0
  • bacphlip 0
  • virulent 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • spatial_neighborhoods 0
  • Indel 0
  • grea 0
  • seqfu 0
  • multi-tool 0
  • predict 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • probabilistic realignment 0
  • n50 0
  • case/control 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • shuffleBed 0
  • SNV 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • extension 0
  • temperate 0
  • read group 0
  • cram-size 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • background 0
  • paraphase 0
  • selector 0
  • size 0
  • Pacbio 0
  • quality check 0
  • realign 0
  • circular 0
  • hmmscan 0
  • spot 0
  • orthogroup 0
  • sage 0
  • mass spectrometry 0
  • featuretable 0
  • extraction 0
  • guidetree 0
  • functional enrichment 0
  • autofluorescence 0
  • paired reads merging 0
  • overlap-based merging 0
  • check 0
  • lifestyle 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • cycif 0
  • contiguate 0
  • MMseqs2 0
  • InterProScan 0
  • busco 0
  • droplet based single cells 0
  • antimicrobial reistance 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • trimfq 0
  • bigbed 0
  • cmseq 0
  • duplicate removal 0
  • bedtointervallist 0
  • mash/sketch 0
  • reduced 0
  • representations 0
  • maxbin2 0
  • getpileupsummaries 0
  • metagenome-assembled genomes 0
  • cross-samplecontamination 0
  • calculatecontamination 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • megahit 0
  • taxonomic assignment 0
  • denovo 0
  • debruijn 0
  • asereadcounter 0
  • daa 0
  • rma6 0
  • Neisseria meningitidis 0
  • vqsr 0
  • variant quality score recalibration 0
  • 3D heat map 0
  • contour map 0
  • Merqury 0
  • annotateintervals 0
  • cnnscorevariants 0
  • collectreadcounts 0
  • ploidy 0
  • AMP 0
  • determinegermlinecontigploidy 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • limma 0
  • Listeria monocytogenes 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • lofreq/call 0
  • qualities 0
  • peptide prediction 0
  • estimate 0
  • functional genomics 0
  • sgRNA 0
  • CRISPR-Cas9 0
  • maximum-likelihood 0
  • rra 0
  • short variant discovery 0
  • combinegvcfs 0
  • smudgeplot 0
  • unionsum 0
  • train 0
  • graph drawing 0
  • contaminant 0
  • single molecule 0
  • sequencing summary 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • gender 0
  • zipperbams 0
  • ubam 0
  • Beautiful stand-alone HTML report 0
  • squeeze 0
  • odgi 0
  • combine graphs 0
  • graph stats 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • unmapped 0
  • bioinformatics tools 0
  • metaphlan 0
  • methylation bias 0
  • mbias 0
  • heattree 0
  • assembler 0
  • de Bruijn 0
  • microrna 0
  • gene-calling 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • gamma 0
  • UShER 0
  • mosdepth 0
  • mitochondrial to nuclear ratio 0
  • otu table 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • fq 0
  • lint 0
  • random 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • generate 0
  • spliced 0
  • flip 0
  • txt 0
  • abricate 0
  • amrfinderplus 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • genome profile 0
  • bgc 0
  • haplotype resolution 0
  • file parsing 0
  • gawk 0
  • extractvariants 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • gccounter 0
  • splitintervals 0
  • readcounter 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • compound 0
  • extract_variants 0
  • Hidden Markov Model 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • gvcftools 0
  • Mykrobe 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • repeat content 0
  • gstama/polyacleanup 0
  • GTDB taxonomy 0
  • genome heterozygosity 0
  • genome taxonomy database 0
  • archaea 0
  • genome size 0
  • gunc 0
  • gunzip 0
  • models 0
  • shiftfasta 0
  • hmtnote 0
  • reorder 0
  • Klebsiella 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • indexfeaturefile 0
  • readcountssummary 0
  • getpileupsumaries 0
  • kallisto/index 0
  • quant 0
  • germlinevariantsites 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • digital normalization 0
  • k-mer counting 0
  • effective genome size 0
  • pneumoniae 0
  • jupytext 0
  • panelofnormalscreation 0
  • kegg 0
  • kofamscan 0
  • jointgenotyping 0
  • combining 0
  • genomicsdbimport 0
  • genomicsdb 0
  • gatherbqsrreports 0
  • tranche filtering 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • papermill 0
  • Jupyter 0
  • annotations 0
  • pixel_classification 0
  • shiftchain 0
  • pos 0
  • haemophilus 0
  • selectvariants 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • multicut 0
  • pixel classification 0
  • probability_maps 0
  • Python 0
  • reblockgvcf 0
  • printreads 0
  • interproscan 0
  • preprocessintervals 0
  • postprocessgermlinecnvcalls 0
  • genomic islands 0
  • insertion 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • sequencing adapters 0
  • custom 0
  • sertotype 0
  • interleave 0
  • header 0
  • seq 0
  • na 0
  • selection 0
  • bam2seqz 0
  • gc_wiggle 0
  • induce 0
  • sex determination 0
  • genetic sex 0
  • relative coverage 0
  • Cores 0
  • Segmentation 0
  • rare variants 0
  • error 0
  • TMA dearray 0
  • de-novo 0
  • longread 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • cls 0
  • grep 0
  • boxplot 0
  • scramble 0
  • amplicon 0
  • ampliconclip 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • track 0
  • insert size 0
  • repair 0
  • paired 0
  • read pairs 0
  • readgroup 0
  • paired-end 0
  • cluster analysis 0
  • subseq 0
  • clusteridentifier 0
  • peak-caller 0
  • cut&tag 0
  • cut&run 0
  • chromatin 0
  • seacr 0
  • pcr duplicates 0
  • assembly-binning 0
  • applyvarcal 0
  • VQSR 0
  • variant recalibration 0
  • gct 0
  • exploratory 0
  • density 0
  • sambamba 0
  • spatype 0
  • spa 0
  • sccmec 0
  • Sample 0
  • protein coding genes 0
  • polymorphic sites 0
  • antitarget 0
  • polymorphic 0
  • decompress 0
  • polymut 0
  • polya tail 0
  • fast5 0
  • chromosome_visualization 0
  • Mycobacterium tuberculosis 0
  • access 0
  • fracminhash sketch 0
  • features 0
  • cload 0
  • mcool 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • core 0
  • snippy 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • hash sketch 0
  • subcontigs 0
  • dbnsfp 0
  • predictions 0
  • nucleotide composition 0
  • concoct 0
  • partition histograms 0
  • rRNA 0
  • ribosomal RNA 0
  • target 0
  • export 0
  • signatures 0
  • duplicate marking 0
  • flagstat 0
  • ligation junctions 0
  • genetic 0
  • ARGs 0
  • picard/renamesampleinvcf 0
  • antibiotic resistance genes 0
  • faqcs 0
  • exclude 0
  • variant identifiers 0
  • indep 0
  • indep pairwise 0
  • identifiers 0
  • scoring 0
  • variant genetic 0
  • sortvcf 0
  • pcr 0
  • porechop_abi 0
  • pairtools 0
  • select 0
  • groupreads 0
  • duplexumi 0
  • consensus sequence 0
  • public 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pair-end 0
  • liftovervcf 0
  • read 0
  • pedigrees 0
  • ENA 0
  • ChIP-Seq 0
  • phantom peaks 0
  • prophage 0
  • identification 0
  • SRA 0
  • ANI 0
  • hybrid-selection 0
  • mate-pair 0
  • pmdtools 0
  • percent on target 0
  • multimapper 0
  • subsampling 0
  • long uncorrected reads 0
  • rhocall 0
  • R 0
  • escherichia coli 0
  • bamstat 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • PEP 0
  • sequence-based 0
  • mapping-based 0
  • segment 0
  • integrity 0
  • rtg 0
  • blastx 0
  • pedfilter 0
  • rocplot 0
  • rtg-tools 0
  • salsa 0
  • salsa2 0
  • LCA 0
  • Ancestor 0
  • neighbour-joining 0
  • quast 0
  • endogenous DNA 0
  • circos 0
  • swissprot 0
  • genbank 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • embl 0
  • intervals coverage 0
  • split by chromosome 0
  • deletion 0
  • genomic intervals 0
  • schema 0
  • normal database 0
  • panel of normals 0
  • cutoff 0
  • eklipse 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • False duplications 0
  • Assembly curation 0
  • pep 0
  • purging 0
  • integron 0

ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).

01230

ancestry_fractions allele_frequencies versions

Read CEL files into an ExpressionSet and generate a matrix

01201

rds expression annotation versions

affy:

Methods for Affymetrix Oligonucleotide Arrays

Annotation and Ranking of Structural Variation

012301010101

tsv unannotated_tsv vcf versions

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

annotations versions

annotsv:

Annotation and Ranking of Structural Variation

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

0123012

translated_mrna total_mrna translation buffering mrna_abundance rdata fold_change_plot interaction_p_distribution_plot residual_distribution_summary_plot residual_vs_fitted_plot rvm_fit_for_all_contrasts_group_plot rvm_fit_for_interactions_plot rvm_fit_for_omnibus_group_plot simulated_vs_obt_dfbetas_without_interaction_plot session_info versions

anota2seq:

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Alignment by Simultaneous Harmonization of Layer/Adjacency Registration

0100

tif versions

Use deamination patterns to estimate contamination in single-stranded libraries

010101

txt versions

authentict:

Estimates present-day DNA contamination in ancient DNA single-stranded libraries.

BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.

01

fastq log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Converts certain output formats to VCF

012010

vcf_gz vcf bcf_gz bcf hap legend samples tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Index VCF tools

01

csi tbi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

01200

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

0120000

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

For each feature in A, finds the closest feature (upstream or downstream) in B.

0120

output versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Merges methylation information for opposite-strand C's in a CpG context

010101

bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Construct species phylogenies using BUSCO proteins

01

gene_trees supermatrix versions

busco:

Construct species phylogenies using BUSCO proteins

Construct the database necessary for checkv's quality assessment

NO input

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

010

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Determine the allelic profiles of a genome using a pre-defined schema

0101

stats contigs_info alleles log paralogous_counts paralogous_loci cds_coordinates invalid_cds loci_summary_stats versions

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Create a schema to determine the allelic profiles of a genome

0100

schema cds_coordinates invalid_cds versions

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Builds a classic bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Builds a compact bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Dump a coolerโ€™s data to a text stream.

012

bedpe versions

cooler:

Sparse binary format for genomic interaction matrices

structural-variant calling with cutesv

01201

vcf versions

Call structural variants

0123450101

bcf csi versions

delly:

Structural variant discovery by integrated paired-end and split-read analysis

runs a differential expression analysis with DESeq2

01230120101

results dispersion_plot rdata size_factors normalised_counts rlog_counts vst_counts model session_info versions

deseq2:

Differential gene expression analysis based on the negative binomial distribution

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

0123

tsv json versions

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

EMM typing of Streptococcus pyogenes assemblies

01

tsv versions

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tbi tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Searches a term in a public NCBI database

010

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using Unique Identifier(s)

0120

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using an UID

01000

txt versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

estimation of the unfolded site frequency spectrum

0123

sfs_out pvalues_out versions

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

01

dropset okayset versions

evigene:

EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.

Estimate repeat sizes using NGS data

012010101

vcf json bam versions

Merge STR profiles into a multi-sample STR profile

010101

merged_profiles versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Compute genome-wide STR profile

0120101

locus_tsv motif_tsv str_profile versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

A program that counts sequence occurrences in FASTQ files.

0101

count_matrix stats distribution_plot reads_plot reads_plot_percentage versions

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

Distance-based phylogeny with FastME

012

nwk stats matrix bootstrap versions

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Efficient compression tool for protein structures

01

fcz versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Decompression tool for foldcomp compressed structures

01

pdb versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Creates a database for Foldmason.

01

db versions

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Aligns protein structures using foldmason

01010

msa_3di msa_aa versions

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Renders a visualization report using foldmason

01010101

html versions

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Create a database from protein structures

01

db versions

foldseek:

Foldseek: fast and accurate protein structure search

Search for protein structural hits against a foldseek database of protein structures

0101

aln versions

foldseek:

Foldseek: fast and accurate protein structure search

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012000

lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Performs local realignment around indels to correct for mapping errors

012301010101

bam versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

intervals versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

vcf versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

estimates the parameters for the DRAGstr model

0120000

dragstr_model versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals constraining germline and artifactual sites for use with mutect2.

01010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.

01201

output ped bed bed_index roc html png versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Quickly generate evenly sized (by amount of data) regions across a number of bam/cram files

01010

bed versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

Collapse redundant transcript models in Iso-Seq data.

010

bed bed_trans_reads local_density_error polya read strand_check trans_report versions varcov variants

tama_collapse.py:

Collapse similar gene model

GenomeTools gt-gff3validator utility to strictly validate a GFF3 file

01

success_log error_log versions

gt:

The GenomeTools genome analysis system

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

0100

gbk svg tsv versions

pacbio structural variant calling tool

01201201

vcf csv versions

Pre-compute the graph index structure.

01

graph versions

hlala:

HLA typing from short and long reads

Create a tag directory with the HOMER suite

010

tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Search covariance models against a sequence database

01200

output alignments target_summary versions

infernal:

Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.

Strain-level comparisons across multiple inStrain profiles

0120

compare comparisons_table pooled_snv snv_keys snv_info versions

instrain:

Calculation of strain-level metrics

inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification

01000

profile snvs gene_info genome_info linkage mapping_info scaffold_info versions

instrain:

Calculation of strain-level metrics

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

012000000000000

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Construct KMCP database from k-mer files

01

kmcp log versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Bayesian reconstruction of ancient DNA fragments

01

bam fq_pass fq_fail unmerged_r1_fq_pass unmerged_r1_fq_fail unmerged_r2_fq_pass unmerged_r2_fq_fail log versions

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

010

runtime_log fragmisincorporation_plot length_plot misincorporation lgdistribution dnacomp stats_out_mcmc_hist stats_out_mcmc_iter stats_out_mcmc_trace stats_out_mcmc_iter_summ_stat stats_out_mcmc_post_pred stats_out_mcmc_correct_prob dnacomp_genome rescaled pctot_freq pgtoa_freq fasta folder versions

Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"

0120

maxquant_txt versions

maxquant:

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.

Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.

01

tif versions

mcstaging:

Staging modules for MCMICRO

Strain-level metagenomic assignment

012340

wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

A tool to estimate bacterial species abundance

0100

results versions

midas:

An integrated pipeline for estimating strain-level genomic variation from metagenomic data

miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.

0101

outputs versions

mirdeep2:

miRDeep2 Mapper (mapper.pl) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.

A tool to reconstruct plasmids in bacterial assemblies

01

chromosome contig_report plasmids mobtyper_results versions

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

msisensor2 detection of MSI regions.

01234500

msi distribution somatic versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

Aligns protein structures using mTM-align

010

alignment structure versions

mTM-align:

Algorithm for structural multiple sequence alignments

pigz:

Parallel implementation of the gzip algorithm.

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

010101010000001

full_alignment info_txt snp_alignment snp_genome_alignment snpstatistics snptable snptable_snpeff snptable_uncertainty structure_genotypes structure_genotypes_nomissing json versions

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

normalized_counts normalized_counts_wo_HK versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

nacho_qc_reports nacho_qc_png nacho_qc_txt versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Construct a dynamic succinct variation graph in ODGI format from a GFAv1.

01

og versions

odgi:

An optimized dynamic genome/graph implementation

Calculates a distribution of the mass error from given mass spectra and IDs.

012

frag_err prec_err versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Assign restriction fragments to pairs

010

restrict versions

pairtools:

CLI tools to process mapped Hi-C data

Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file

01000

tsv versions

panacus:

panacus is a tool for computing counting statistics for GFA files

Determines the depth in a BAM/CRAM file

0120101

depth binned_depth versions

paragraph:

Graph realignment tools for structural variants

Genotype structural variants using paragraph and grmpy

0123450101

vcf json versions

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

0101

graph versions

paragraph:

Graph realignment tools for structural variants

Assign PBP type of Streptococcus pneumoniae assemblies

010

tsv blast versions

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

vcf versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

pbsv - PacBio structural variant (SV) signature discovery tool

0101

svsig versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

Runs PEKA CLIP peak k-mer analysis

0101000

cluster distribution rtxn pdf tsites oxn clust versions

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

0100

results versions

Collect metrics about the insert size distribution of a paired-end library.

01

metrics histogram versions

picard:

Java tools for working with NGS data in the BAM format

Automatically improve draft assemblies and find variation among strains, including large event detection

010120

improved_assembly vcf change_record tracks_bed tracks_wig versions

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

012000

bp cem del dd int_final inv li rp si td versions

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Main caller script for peak calling

0120

divergent_TREs bidirectional_TREs unidirectional_TREs peakcalling_log versions

pints:

Peak Identifier for Nascent Transcripts Starts (PINTS)

Recodes plink bfiles into a new text fileset applying different modifiers

0123

ped map txt raw traw beagledat chrdat chrmap geno pheno pos phase info lgen list gen gengz sample rlist strctin tped tfam vcf vcfgz versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

xml txt versions

Run all Portcullis steps in one go

010101

log pass_junctions_bed pass_junctions_tab intron_gff exon_gff spliced_bam spliced_bai versions

portcullis:

Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.

0

phylogeny phylogeny_bootstrapped versions

Quality control of riboseq bam data

01201

distribution pdf offset versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Infer strandedness from sequencing reads

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Uses the RTN R package for transcriptional regulatory network inference (TNI).

01

tni tni_perm tni_bootstrap tni_filtered versions

rtn:

RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons

Use seqkit to find/replace strings within sequences and sequence headers

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

0100

eigenstrat plink freqsum versions

sequencetools:

Tools for population genetics on sequencing data

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

tsv txt versions

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

Serovar prediction of salmonella assemblies

01

tsv allele_fasta allele_json cgmlst_csv versions

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

structural-variant calling with sniffles

012010100

vcf tbi snf versions

Rapidly extracts SNPs from a multi-FASTA alignment.

0

fasta constant_sites versions constant_sites_string

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

extract versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Serotype prediction of Streptococcus suis assemblies

01

tsv versions

Advanced sequence file format conversions

01000

cram gzi versions

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.

0101

vcf versions

Tandem repeat genotyper for long reads

012010101

vcf tbi versions

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

0123400

vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Merges the annotation gtf file and the stringtie output gtf files

00

gtf versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Transcript assembly and quantification for RNA-Se

010

transcript_gtf abundance coverage_gtf ballgown versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Converts a bedpe file to a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

sv indel germ_indel germ_sv som_indel som_sv unfiltered_sv unfiltered_indel unfiltered_germ_indel unfiltered_germ_sv unfiltered_som_indel unfiltered_som_sv raw_calls discordants log versions

SVbenchmark compares a set of โ€œtestโ€ structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

fns fps distances log report versions

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

Build a structural variant database

010

db versions

svdb:

structural variant database software

The merge module merges structural variants within one or more vcf files.

0100

vcf tbi csi versions

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01000000

vcf versions

svdb:

structural variant database software

Performs tests on BAF files

01234

metrics versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

0101

vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert a VCF file to a BEDPE file.

01

bedpe versions

svtools:

Tools for processing and analyzing structural variants

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

vcf tbi versions

Aligns sequences using T_COFFEE

01010120

alignment lib versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Computes a consensus alignment using T_COFFEE

01010

alignment eval versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats the header of PDB files with t-coffee

01

formatted_pdb versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Computes the irmsd score for a given alignment and the structures.

01012

irmsd versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package

01010120

alignment versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

01

formatted_file versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Computes the coverage of different regions from the bam file.

0101

cov wig versions

tiddit:

TIDDIT - structural variant calling.

Identify chromosomal rearrangements.

0120101

vcf ploidy versions

sv:

Search for structural variants.

Searches a genome for a telomere string such as TTAGGG

010

tsv bedgraph versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.

01

pep gff3 cds dat folder versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

010

pep gff3 cds bed versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Tandem repeat genotyping from PacBio HiFi data

0123010101

vcf bam versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Merge TRGT VCFs from multiple samples

0120101

vcf versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Visualize tandem repeats genotyped by TRGT

012345010101

plot versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary versions

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

consistency versions

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

vcf versions

truvari:

Structural variant comparison tool for VCFs

Cluster contigs from multiple assemblies by similarity

012

cluster_dir versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Subsample a long-read sequencing fastq file for multiple assemblies

01

subreads versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Aligns protein structures using UPP

01010

alignment versions

upp:

SATe-enabled phylogenetic placement

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

010101010

cgh versions

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

0100

vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

01

xg vg_index versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โ€œdot plotโ€ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

0

rnalfold_txt versions

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โ€œscanโ€ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

bins membership summary versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

A large variant benchmarking tool analogous to hap.py for small variants.

01234

report bench_vcf bench_vcf_tbi versions

The xeniumranger import-segmentation module allows you to specify 2D nuclei and/or cell segmentation results for assigning transcripts to cells and recalculate all Xenium Onboard Analysis (XOA) outputs that depend on segmentation. Segmentation results can be generated by community-developed tools or prior Xenium segmentation result.

01000000

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.

010

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

0100

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.

010000

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

Click here to trigger an update.