Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • assembly 28
  • quality control 24
  • bam 20
  • metagenomics 18
  • contamination 18
  • genomics 17
  • vcf 16
  • genome 16
  • fasta 15
  • qc 13
  • contigs 13
  • structural variants 11
  • binning 11
  • mags 11
  • fastq 10
  • gatk4 10
  • quality 10
  • long reads 10
  • alignment 9
  • bins 9
  • bed 8
  • cram 8
  • classification 8
  • taxonomic classification 8
  • completeness 8
  • metagenome 8
  • checkm 8
  • annotation 7
  • nanopore 7
  • cnv 7
  • somatic 7
  • mag 7
  • download 6
  • cna 6
  • single 6
  • tumor-only 6
  • database 5
  • coverage 5
  • kmer 5
  • QC 5
  • isolates 5
  • ont 5
  • fragment 5
  • gridss 5
  • index 4
  • sort 4
  • filter 4
  • bacteria 4
  • variants 4
  • transcriptome 4
  • virus 4
  • skani 4
  • cut 4
  • umitools 4
  • containment 4
  • ancestry 4
  • sam 3
  • align 3
  • merge 3
  • trimming 3
  • gvcf 3
  • isoseq 3
  • table 3
  • antimicrobial resistance 3
  • aDNA 3
  • population genetics 3
  • dedup 3
  • sketch 3
  • snp 3
  • prokaryote 3
  • NCBI 3
  • de novo 3
  • sourmash 3
  • de novo assembly 3
  • riboseq 3
  • family 3
  • fcs-gx 3
  • das tool 3
  • das_tool 3
  • proteome 3
  • long_read 3
  • chimeras 3
  • minimap2 3
  • amplicon sequencing 3
  • checkv 3
  • informative sites 3
  • kinship 3
  • identity 3
  • relatedness 3
  • cut up 3
  • observations 3
  • uLTRA 3
  • map 2
  • taxonomic profiling 2
  • pacbio 2
  • conversion 2
  • ancient DNA 2
  • copy number 2
  • rnaseq 2
  • phylogeny 2
  • methylation 2
  • compression 2
  • long-read 2
  • consensus 2
  • depth 2
  • DNA methylation 2
  • scWGBS 2
  • WGBS 2
  • haplotype 2
  • filtering 2
  • amr 2
  • cluster 2
  • bisulfite sequencing 2
  • aligner 2
  • archaeogenomics 2
  • damage 2
  • phasing 2
  • palaeogenomics 2
  • sequence 2
  • validation 2
  • biscuit 2
  • segmentation 2
  • plasmid 2
  • antimicrobial peptides 2
  • deduplication 2
  • antimicrobial resistance genes 2
  • distance 2
  • FASTQ 2
  • arg 2
  • adapters 2
  • microbiome 2
  • bedpe 2
  • malt 2
  • preprocessing 2
  • ngscheckmate 2
  • matching 2
  • bacterial 2
  • dictionary 2
  • RNA 2
  • rna_structure 2
  • microbes 2
  • somatic variants 2
  • intervals 2
  • PacBio 2
  • miRNA 2
  • dist 2
  • lossless 2
  • GC content 2
  • megan 2
  • nanostring 2
  • nacho 2
  • mRNA 2
  • signature 2
  • FracMinHash sketch 2
  • hostile 2
  • decontamination 2
  • human removal 2
  • screening 2
  • removal 2
  • cleaning 2
  • contig 2
  • scaffold 2
  • filtermutectcalls 2
  • single cells 2
  • genome bins 2
  • metagenomes 2
  • statistics 1
  • classify 1
  • split 1
  • k-mer 1
  • variant 1
  • taxonomy 1
  • sentieon 1
  • convert 1
  • proteomics 1
  • clustering 1
  • bedtools 1
  • build 1
  • bisulfite 1
  • reporting 1
  • wgs 1
  • bisulphite 1
  • methylseq 1
  • picard 1
  • illumina 1
  • stats 1
  • phage 1
  • sequences 1
  • imaging 1
  • 5mC 1
  • metrics 1
  • mapping 1
  • pairs 1
  • samtools 1
  • matrix 1
  • expression 1
  • transcript 1
  • bcf 1
  • germline 1
  • annotate 1
  • gene 1
  • decompression 1
  • ncbi 1
  • gff3 1
  • spatial 1
  • newick 1
  • umi 1
  • evaluation 1
  • bismark 1
  • hmmsearch 1
  • reads 1
  • json 1
  • mitochondria 1
  • differential 1
  • bedGraph 1
  • short-read 1
  • prediction 1
  • splicing 1
  • vsearch 1
  • extract 1
  • benchmark 1
  • deamination 1
  • visualization 1
  • cat 1
  • amps 1
  • tabular 1
  • detection 1
  • text 1
  • mutect2 1
  • summary 1
  • counts 1
  • svtk 1
  • structural 1
  • antibiotic resistance 1
  • compare 1
  • profiling 1
  • reference-free 1
  • genome assembler 1
  • fai 1
  • ampir 1
  • dna 1
  • diamond 1
  • normalization 1
  • compress 1
  • hic 1
  • deep learning 1
  • pypgx 1
  • enrichment 1
  • happy 1
  • HiFi 1
  • hmmcopy 1
  • image 1
  • parsing 1
  • clean 1
  • xeniumranger 1
  • SV 1
  • mtDNA 1
  • sample 1
  • abundance 1
  • sequencing 1
  • snps 1
  • deeparg 1
  • macrel 1
  • mlst 1
  • amplify 1
  • DRAMP 1
  • angsd 1
  • UMI 1
  • rsem 1
  • mapper 1
  • genome mining 1
  • RNA-seq 1
  • neubi 1
  • seqtk 1
  • amplicon sequences 1
  • host 1
  • transcripts 1
  • genome assembly 1
  • mzml 1
  • variant_calling 1
  • shapeit 1
  • khmer 1
  • png 1
  • screen 1
  • minhash 1
  • maximum likelihood 1
  • k-mer frequency 1
  • barcode 1
  • cgMLST 1
  • variation 1
  • ampgram 1
  • amptransformer 1
  • WGS 1
  • image_processing 1
  • dereplicate 1
  • de novo assembler 1
  • small genome 1
  • functional analysis 1
  • rrna 1
  • salmon 1
  • tnhaplotyper2 1
  • mitochondrion 1
  • metamaps 1
  • adapter 1
  • gene labels 1
  • sequenzautils 1
  • rename 1
  • smrnaseq 1
  • fixmate 1
  • dict 1
  • pharmacogenetics 1
  • assembly evaluation 1
  • short reads 1
  • xenograft 1
  • graft 1
  • fetch 1
  • metagenomic 1
  • identifier 1
  • tab 1
  • gatk 1
  • gwas 1
  • BAM 1
  • correction 1
  • calling 1
  • estimation 1
  • recombination 1
  • eigenstrat 1
  • vector 1
  • tnseq 1
  • raw 1
  • mgf 1
  • parquet 1
  • parser 1
  • GFF/GTF 1
  • assay 1
  • verifybamid 1
  • DNA contamination estimation 1
  • hifi 1
  • Assembly 1
  • quality assurnce 1
  • qa 1
  • long read alignment 1
  • genome polishing 1
  • assembly polishing 1
  • chloroplast 1
  • pangenome-scale 1
  • all versus all 1
  • mashmap 1
  • wavefront 1
  • admixture 1
  • taxonomic composition 1
  • microRNA 1
  • prepare 1
  • catpack 1
  • vsearch/dereplicate 1
  • drug categorization 1
  • nuclear contamination estimate 1
  • metagenome assembler 1
  • regtools 1
  • leafcutter 1
  • reference panels 1
  • quality_control 1
  • nucBed 1
  • AT content 1
  • nucleotide content 1
  • controlstatistics 1
  • emoji 1
  • host removal 1
  • omics 1
  • biological activity 1
  • prior knowledge 1
  • mouse 1
  • nanopore sequencing 1
  • cobra 1
  • predict 1
  • case/control 1
  • clahe 1
  • association 1
  • GWAS 1
  • extension 1
  • cram-size 1
  • single-stranded 1
  • ancientDNA 1
  • size 1
  • authentict 1
  • translation 1
  • contiguate 1
  • MMseqs2 1
  • InterProScan 1
  • busco 1
  • antimicrobial reistance 1
  • maxbin2 1
  • getpileupsummaries 1
  • metagenome-assembled genomes 1
  • cross-samplecontamination 1
  • calculatecontamination 1
  • megahit 1
  • taxonomic assignment 1
  • denovo 1
  • debruijn 1
  • daa 1
  • rma6 1
  • 3D heat map 1
  • contour map 1
  • Merqury 1
  • annotateintervals 1
  • AMP 1
  • determinegermlinecontigploidy 1
  • peptide prediction 1
  • collectsvevidence 1
  • contaminant 1
  • cancer genome 1
  • somatic structural variations 1
  • mobile element insertions 1
  • sequencing summary 1
  • block-compressed 1
  • assembler 1
  • de Bruijn 1
  • random 1
  • generate 1
  • mitochondrial 1
  • bgc 1
  • haplotype resolution 1
  • svannotate 1
  • gccounter 1
  • splitcram 1
  • repeat content 1
  • genome heterozygosity 1
  • genome size 1
  • gunc 1
  • hmtnote 1
  • readcountssummary 1
  • getpileupsumaries 1
  • germlinevariantsites 1
  • germlinecnvcaller 1
  • germline contig ploidy 1
  • pixel_classification 1
  • multicut 1
  • pixel classification 1
  • probability_maps 1
  • printsvevidence 1
  • bam2seqz 1
  • rare variants 1
  • insert size 1
  • repair 1
  • paired 1
  • read pairs 1
  • vcf2bed 1
  • fracminhash sketch 1
  • features 1
  • subcontigs 1
  • nucleotide composition 1
  • concoct 1
  • duplicate marking 1
  • ARGs 1
  • antibiotic resistance genes 1
  • faqcs 1
  • cache 1
  • porechop_abi 1
  • pmdtools 1
  • depth information 1
  • structural variation 1
  • duphold 1
  • rtg 1
  • quast 1
  • contact 1
  • pretext 1
  • jpg 1
  • bmp 1
  • contact maps 1
  • eigenstratdatabasetools 1
  • reference 0
  • variant calling 0
  • gff 0
  • gtf 0
  • MSA 0
  • gfa 0
  • count 0
  • VCF 0
  • single-cell 0
  • imputation 0
  • graph 0
  • bcftools 0
  • sv 0
  • variation graph 0
  • indexing 0
  • visualisation 0
  • databases 0
  • protein 0
  • bqsr 0
  • tsv 0
  • serotype 0
  • demultiplex 0
  • openms 0
  • markduplicates 0
  • base quality score recalibration 0
  • protein sequence 0
  • repeat 0
  • histogram 0
  • searching 0
  • example 0
  • structure 0
  • pangenome graph 0
  • plot 0
  • neural network 0
  • mappability 0
  • LAST 0
  • bwa 0
  • plink2 0
  • low-coverage 0
  • machine learning 0
  • genotype 0
  • seqkit 0
  • cooler 0
  • gzip 0
  • iCLIP 0
  • mmseqs2 0
  • db 0
  • hmmer 0
  • ucsc 0
  • complexity 0
  • feature 0
  • genotyping 0
  • peaks 0
  • kraken2 0
  • msa 0
  • blast 0
  • mkref 0
  • glimpse 0
  • pangenome 0
  • demultiplexing 0
  • cnvkit 0
  • profile 0
  • report 0
  • multiple sequence alignment 0
  • low frequency variant calling 0
  • kmers 0
  • scRNA-seq 0
  • duplicates 0
  • mirna 0
  • ptr 0
  • diversity 0
  • mem 0
  • concatenate 0
  • interval 0
  • single cell 0
  • fastx 0
  • csv 0
  • kallisto 0
  • call 0
  • MAF 0
  • indels 0
  • coptr 0
  • wxs 0
  • idXML 0
  • mpileup 0
  • 3-letter genome 0
  • clipping 0
  • merging 0
  • query 0
  • view 0
  • ccs 0
  • bigwig 0
  • read depth 0
  • fungi 0
  • peak-calling 0
  • CLIP 0
  • circrna 0
  • rna 0
  • microarray 0
  • bin 0
  • ganon 0
  • ATAC-seq 0
  • add 0
  • microsatellite 0
  • union 0
  • retrotransposon 0
  • miscoding lesions 0
  • isomir 0
  • palaeogenetics 0
  • archaeogenetics 0
  • bgzip 0
  • telomere 0
  • interval_list 0
  • paf 0
  • redundancy 0
  • haplotypecaller 0
  • resistance 0
  • HMM 0
  • chromosome 0
  • gsea 0
  • logratio 0
  • STR 0
  • hybrid capture sequencing 0
  • copy number alteration calling 0
  • chunk 0
  • biosynthetic gene cluster 0
  • bcl2fastq 0
  • propr 0
  • DNA sequencing 0
  • quantification 0
  • BGC 0
  • public datasets 0
  • ranking 0
  • phylogenetic placement 0
  • targeted sequencing 0
  • genmod 0
  • transcriptomics 0
  • DNA sequence 0
  • bedgraph 0
  • fgbio 0
  • arriba 0
  • fastk 0
  • spark 0
  • html 0
  • structural_variants 0
  • C to T 0
  • insert 0
  • fam 0
  • bim 0
  • fusion 0
  • SNP 0
  • small indels 0
  • subsample 0
  • pangolin 0
  • panel 0
  • pan-genome 0
  • pairsam 0
  • duplication 0
  • prokaryotes 0
  • replace 0
  • covid 0
  • benchmarking 0
  • lineage 0
  • polishing 0
  • indel 0
  • PCA 0
  • fingerprint 0
  • prokka 0
  • regions 0
  • typing 0
  • genomes 0
  • entrez 0
  • eukaryotes 0
  • scores 0
  • mcmicro 0
  • aln 0
  • bwameth 0
  • npz 0
  • windowmasker 0
  • hi-c 0
  • bakta 0
  • vrhyme 0
  • nucleotide 0
  • highly_multiplexed_imaging 0
  • mkfastq 0
  • image_analysis 0
  • cellranger 0
  • gene expression 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • guide tree 0
  • transposons 0
  • complement 0
  • roh 0
  • organelle 0
  • remove 0
  • converter 0
  • gatk4spark 0
  • comparisons 0
  • combine 0
  • comparison 0
  • quality trimming 0
  • score 0
  • adapter trimming 0
  • popscle 0
  • pileup 0
  • genotype-based deconvoltion 0
  • bamtools 0
  • bracken 0
  • hidden Markov model 0
  • archiving 0
  • sylph 0
  • notebook 0
  • reports 0
  • ataqv 0
  • repeat expansion 0
  • virulence 0
  • krona chart 0
  • survivor 0
  • cool 0
  • pseudoalignment 0
  • dump 0
  • CRISPR 0
  • krona 0
  • prefetch 0
  • spaceranger 0
  • wastewater 0
  • wig 0
  • atac-seq 0
  • tabix 0
  • ambient RNA removal 0
  • chip-seq 0
  • ligate 0
  • population genomics 0
  • cfDNA 0
  • gstama 0
  • profiles 0
  • ichorcna 0
  • mash 0
  • tama 0
  • pigz 0
  • bustools 0
  • refine 0
  • resolve_bioscience 0
  • gene set 0
  • trancriptome 0
  • gene set analysis 0
  • spatial_transcriptomics 0
  • lofreq 0
  • krakentools 0
  • phase 0
  • haplotypes 0
  • split_kmers 0
  • interactive 0
  • reformat 0
  • serogroup 0
  • polyA_tail 0
  • hla 0
  • primer 0
  • hlala 0
  • hla_typing 0
  • hlala_typing 0
  • iphop 0
  • checksum 0
  • corrupted 0
  • tree 0
  • mapcounter 0
  • haplogroups 0
  • find 0
  • krakenuniq 0
  • instrain 0
  • pair 0
  • long terminal repeat 0
  • trgt 0
  • regression 0
  • taxids 0
  • SimpleAF 0
  • taxon name 0
  • zlib 0
  • differential expression 0
  • vg 0
  • vcflib 0
  • orthologs 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • standardization 0
  • repeats 0
  • svdb 0
  • ome-tif 0
  • MCMICRO 0
  • interactions 0
  • join 0
  • reformatting 0
  • function 0
  • pharokka 0
  • bloom filter 0
  • k-mer index 0
  • COBS 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • kma 0
  • parallelized 0
  • orthology 0
  • genetics 0
  • rgfa 0
  • small variants 0
  • multiallelic 0
  • nucleotides 0
  • cnvnator 0
  • proportionality 0
  • orf 0
  • leviosam2 0
  • lift 0
  • registration 0
  • mirdeep2 0
  • cancer genomics 0
  • homoploymer 0
  • ped 0
  • Duplication purging 0
  • purge duplications 0
  • library 0
  • preseq 0
  • import 0
  • doublets 0
  • variant pruning 0
  • anndata 0
  • bfiles 0
  • subset 0
  • read-group 0
  • duplicate 0
  • GPU-accelerated 0
  • graph layout 0
  • nextclade 0
  • msisensor-pro 0
  • micro-satellite-scan 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • Read depth 0
  • RNA sequencing 0
  • soft-clipped clusters 0
  • snpsift 0
  • snpeff 0
  • effect prediction 0
  • shigella 0
  • switch 0
  • ancient dna 0
  • Streptococcus pneumoniae 0
  • transformation 0
  • salmonella 0
  • varcal 0
  • fusions 0
  • Pharmacogenetics 0
  • retrotransposons 0
  • collate 0
  • bam2fq 0
  • frame-shift correction 0
  • long-read sequencing 0
  • scaffolding 0
  • rtgtools 0
  • sequence analysis 0
  • junctions 0
  • runs_of_homozygosity 0
  • polish 0
  • taxonomic profile 0
  • concordance 0
  • duplex 0
  • deconvolution 0
  • bayesian 0
  • merge mate pairs 0
  • reads merging 0
  • unaligned 0
  • realignment 0
  • GEO 0
  • trim 0
  • microscopy 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • metadata 0
  • microbial 0
  • allele-specific 0
  • emboss 0
  • panelofnormals 0
  • MaltExtract 0
  • HOPS 0
  • authentication 0
  • edit distance 0
  • joint genotyping 0
  • secondary metabolites 0
  • NRPS 0
  • RiPP 0
  • interval list 0
  • evidence 0
  • antibiotics 0
  • antismash 0
  • RNA-Seq 0
  • simulate 0
  • artic 0
  • aggregate 0
  • demultiplexed reads 0
  • concat 0
  • tbi 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • blastn 0
  • version 0
  • cnv calling 0
  • immunoprofiling 0
  • structural-variant calling 0
  • cvnkit 0
  • vdj 0
  • eCLIP 0
  • splice 0
  • parse 0
  • fasterq-dump 0
  • awk 0
  • intersect 0
  • intersection 0
  • normalize 0
  • norm 0
  • scatter 0
  • reheader 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • windows 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • region 0
  • heatmap 0
  • sizes 0
  • bases 0
  • spatial_omics 0
  • random forest 0
  • allele 0
  • UMIs 0
  • gem 0
  • ChIP-seq 0
  • baf 0
  • genomad 0
  • getfasta 0
  • derived alleles 0
  • tnfilter 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • jaccard 0
  • overlap 0
  • array_cgh 0
  • cytosure 0
  • decomposeblocksub 0
  • ancestral alleles 0
  • gprofiler2 0
  • gost 0
  • genomecov 0
  • closest 0
  • rad 0
  • bamtobed 0
  • sorting 0
  • structural variant 0
  • bam2fastx 0
  • bam2fastq 0
  • immcantation 0
  • airrseq 0
  • site frequency spectrum 0
  • immunoinformatics 0
  • f coefficient 0
  • bioawk 0
  • unionBedGraphs 0
  • reverse complement 0
  • simulation 0
  • hmmfetch 0
  • decompose 0
  • pca 0
  • pruning 0
  • subtract 0
  • linkage equilibrium 0
  • slopBed 0
  • transmembrane 0
  • genome graph 0
  • chunking 0
  • homozygous genotypes 0
  • decoy 0
  • heterozygous genotypes 0
  • htseq 0
  • inbreeding 0
  • shiftBed 0
  • multinterval 0
  • sompy 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • drep 0
  • homology 0
  • co-orthology 0
  • clumping fastqs 0
  • deduping 0
  • plastid 0
  • smaller fastqs 0
  • resfinder 0
  • resistance genes 0
  • dbsnp 0
  • standardize 0
  • quarto 0
  • masking 0
  • python 0
  • r 0
  • low-complexity 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • trio binning 0
  • tandem repeats 0
  • phylogenetics 0
  • minimum_evolution 0
  • parallel 0
  • csi 0
  • Read coverage histogram 0
  • biallelic 0
  • sequence similarity 0
  • spectral clustering 0
  • agat 0
  • longest 0
  • comparative genomics 0
  • isoform 0
  • autozygosity 0
  • homozygosity 0
  • deep variant 0
  • variancepartition 0
  • mutect 0
  • idx 0
  • update header 0
  • intron 0
  • dream 0
  • md 0
  • transform 0
  • gaps 0
  • introns 0
  • nm 0
  • uq 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • BCF 0
  • short 0
  • file manipulation 0
  • plink2_pca 0
  • propd 0
  • vcf2db 0
  • gemini 0
  • melon 0
  • maf 0
  • lua 0
  • toml 0
  • plant 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • SINE 0
  • VCFtools 0
  • network 0
  • downsample bam 0
  • wget 0
  • mkvdjref 0
  • construct 0
  • graph projection to vcf 0
  • cellpose 0
  • extractunbinned 0
  • linkbins 0
  • sintax 0
  • vsearch/sort 0
  • subsample bam 0
  • downsample 0
  • usearch 0
  • unmarkduplicates 0
  • bedtobigbed 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • mobile genetic elements 0
  • genome annotation 0
  • trna 0
  • covariance models 0
  • umicollapse 0
  • snv 0
  • scanner 0
  • scRNA-Seq 0
  • crispr 0
  • antibody capture 0
  • files 0
  • antigen capture 0
  • helitron 0
  • multiomics 0
  • remove samples 0
  • upd 0
  • uniparental 0
  • disomy 0
  • domains 0
  • nucleotide sequence 0
  • tnscope 0
  • copyratios 0
  • comp 0
  • denoisereadcounts 0
  • readwriter 0
  • dnamodelapply 0
  • dnascope 0
  • tblastn 0
  • bedcov 0
  • groupby 0
  • genotype dosages 0
  • vcf file 0
  • postprocessing 0
  • bgen 0
  • subtyping 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • bgen file 0
  • Escherichia coli 0
  • createreadcountpanelofnormals 0
  • workflow_mode 0
  • yahs 0
  • whamg 0
  • wham 0
  • compartments 0
  • copy-number 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • geo 0
  • workflow 0
  • mapad 0
  • adna 0
  • c to t 0
  • cumulative coverage 0
  • proteus 0
  • readproteingroups 0
  • calder2 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • cadd 0
  • snakemake 0
  • distance-based 0
  • long read 0
  • homologs 0
  • telseq 0
  • mzML 0
  • multiqc 0
  • mass_error 0
  • search engine 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • Staging 0
  • vsearch/fastqfilter 0
  • fastqfilter 0
  • ATACseq 0
  • shift 0
  • ATACshift 0
  • http(s) 0
  • utility 0
  • setgt 0
  • jvarkit 0
  • translate 0
  • tar 0
  • tarball 0
  • adapterremoval 0
  • CRISPRi 0
  • HLA 0
  • tag2tag 0
  • nanoq 0
  • Read filters 0
  • Read trimming 0
  • Read report 0
  • hhsuite 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • functional 0
  • impute-info 0
  • tags 0
  • sequencing_bias 0
  • mkarv 0
  • hashing-based deconvolution 0
  • rank 0
  • 16S 0
  • java 0
  • script 0
  • post mortem damage 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • atlas 0
  • staging 0
  • targz 0
  • Computational Immunology 0
  • bias 0
  • scanpy 0
  • resegment 0
  • morphology 0
  • fix 0
  • post Post-processing 0
  • malformed 0
  • partitioning 0
  • chip 0
  • updatedata 0
  • run 0
  • model 0
  • AMPs 0
  • allele counts 0
  • antimicrobial peptide prediction 0
  • plotting 0
  • amp 0
  • pdb 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • affy 0
  • block substitutions 0
  • relabel 0
  • cell segmentation 0
  • Bioinformatics Tools 0
  • bclconvert 0
  • Immune Deconvolution 0
  • elfasta 0
  • elprep 0
  • doublet 0
  • patterns 0
  • source tracking 0
  • regex 0
  • nuclear segmentation 0
  • paired reads re-pairing 0
  • installation 0
  • doublet_detection 0
  • barcodes 0
  • doCounts 0
  • subsetting 0
  • logFC 0
  • significance statistic 0
  • p-value 0
  • scvi 0
  • solo 0
  • import segmentation 0
  • redundant 0
  • hmmpress 0
  • identity-by-descent 0
  • go 0
  • scimap 0
  • Bayesian 0
  • structural-variants 0
  • bamtools/split 0
  • tag 0
  • cell_barcodes 0
  • haploype 0
  • mygene 0
  • yaml 0
  • associations 0
  • impute 0
  • bedgraphtobigwig 0
  • bamtools/convert 0
  • reference compression 0
  • pile up 0
  • reference panel 0
  • bacphlip 0
  • virulent 0
  • rna velocity 0
  • spatial_neighborhoods 0
  • Indel 0
  • grea 0
  • seqfu 0
  • multi-tool 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • probabilistic realignment 0
  • n50 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • shuffleBed 0
  • SNV 0
  • refresh 0
  • temperate 0
  • read group 0
  • bwamem2 0
  • bwameme 0
  • grabix 0
  • ribosomal 0
  • 10x 0
  • background 0
  • regulatory network 0
  • transcription factors 0
  • paraphase 0
  • selector 0
  • Pacbio 0
  • quality check 0
  • realign 0
  • circular 0
  • phylogenies 0
  • hmmscan 0
  • spot 0
  • orthogroup 0
  • sage 0
  • mass spectrometry 0
  • featuretable 0
  • extraction 0
  • guidetree 0
  • AC/NS/AF 0
  • functional enrichment 0
  • autofluorescence 0
  • paired reads merging 0
  • overlap-based merging 0
  • check 0
  • lifestyle 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • cycif 0
  • vcflib/vcffixup 0
  • junction 0
  • droplet based single cells 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • trimfq 0
  • bigbed 0
  • cmseq 0
  • duplicate removal 0
  • bedtointervallist 0
  • mash/sketch 0
  • calibratedragstrmodel 0
  • reduced 0
  • representations 0
  • mass-spectroscopy 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • asereadcounter 0
  • Neisseria meningitidis 0
  • vqsr 0
  • variant quality score recalibration 0
  • targets 0
  • cnnscorevariants 0
  • collectreadcounts 0
  • ploidy 0
  • collapsing 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • createsomaticpanelofnormals 0
  • limma 0
  • Listeria monocytogenes 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • lofreq/call 0
  • lofreq/filter 0
  • qualities 0
  • estimate 0
  • dragstr 0
  • functional genomics 0
  • sgRNA 0
  • CRISPR-Cas9 0
  • maximum-likelihood 0
  • rra 0
  • composestrtablefile 0
  • short variant discovery 0
  • combinegvcfs 0
  • DNA damage 0
  • NGS 0
  • damage patterns 0
  • smudgeplot 0
  • unionsum 0
  • train 0
  • graph drawing 0
  • SNP table 0
  • single molecule 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • gender 0
  • zipperbams 0
  • graph construction 0
  • ubam 0
  • Beautiful stand-alone HTML report 0
  • squeeze 0
  • odgi 0
  • combine graphs 0
  • graph stats 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • unmapped 0
  • GATK UnifiedGenotyper 0
  • bioinformatics tools 0
  • metaphlan 0
  • bootstrapping 0
  • methylation bias 0
  • mbias 0
  • heattree 0
  • gangstr 0
  • microrna 0
  • gene-calling 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • gamma 0
  • UShER 0
  • mosdepth 0
  • mitochondrial to nuclear ratio 0
  • otu table 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • microsatellite instability 0
  • fq 0
  • lint 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • adapter removal 0
  • spliced 0
  • flip 0
  • txt 0
  • abricate 0
  • amrfinderplus 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • genome profile 0
  • Haemophilus influenzae 0
  • file parsing 0
  • gawk 0
  • extractvariants 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • svcluster 0
  • splitintervals 0
  • readcounter 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • compound 0
  • extract_variants 0
  • Hidden Markov Model 0
  • gene model 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • tama_collapse.py 0
  • gfastats 0
  • TAMA 0
  • gvcftools 0
  • Mykrobe 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • gstama/polyacleanup 0
  • GTDB taxonomy 0
  • genome taxonomy database 0
  • archaea 0
  • gunzip 0
  • models 0
  • shiftfasta 0
  • reorder 0
  • Klebsiella 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • indexfeaturefile 0
  • kallisto/index 0
  • quant 0
  • digital normalization 0
  • k-mer counting 0
  • effective genome size 0
  • pneumoniae 0
  • jupytext 0
  • panelofnormalscreation 0
  • kegg 0
  • kofamscan 0
  • jointgenotyping 0
  • combining 0
  • genomicsdbimport 0
  • genomicsdb 0
  • gatherbqsrreports 0
  • tranche filtering 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • papermill 0
  • Jupyter 0
  • annotations 0
  • shiftchain 0
  • pos 0
  • haemophilus 0
  • selectvariants 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • Python 0
  • reblockgvcf 0
  • printreads 0
  • interproscan 0
  • preprocessintervals 0
  • postprocessgermlinecnvcalls 0
  • genomic islands 0
  • insertion 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • jasminesv 0
  • jasmine 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • sequencing adapters 0
  • custom 0
  • sertotype 0
  • interleave 0
  • header 0
  • seq 0
  • na 0
  • selection 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • gc_wiggle 0
  • induce 0
  • sex determination 0
  • sequence headers 0
  • genetic sex 0
  • relative coverage 0
  • Cores 0
  • Segmentation 0
  • error 0
  • TMA dearray 0
  • de-novo 0
  • longread 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • cls 0
  • grep 0
  • boxplot 0
  • scramble 0
  • amplicon 0
  • ampliconclip 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • track 0
  • readgroup 0
  • paired-end 0
  • cluster analysis 0
  • subseq 0
  • clusteridentifier 0
  • peak-caller 0
  • cut&tag 0
  • cut&run 0
  • chromatin 0
  • seacr 0
  • pcr duplicates 0
  • assembly-binning 0
  • applyvarcal 0
  • cutesv 0
  • VQSR 0
  • variant recalibration 0
  • gct 0
  • exploratory 0
  • density 0
  • sambamba 0
  • rdtest2vcf 0
  • spatype 0
  • spa 0
  • streptococcus 0
  • sccmec 0
  • variantcalling 0
  • Sample 0
  • protein coding genes 0
  • detecting svs 0
  • short-read sequencing 0
  • polymorphic sites 0
  • svtk/baftest 0
  • baftest 0
  • countsvtypes 0
  • rdtest 0
  • antitarget 0
  • polymorphic 0
  • decompress 0
  • polymut 0
  • polya tail 0
  • fast5 0
  • chromosome_visualization 0
  • Mycobacterium tuberculosis 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • access 0
  • cload 0
  • mcool 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • sniffles 0
  • core 0
  • snippy 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • hash sketch 0
  • dbnsfp 0
  • predictions 0
  • SNPs 0
  • invariant 0
  • constant 0
  • partition histograms 0
  • rRNA 0
  • ribosomal RNA 0
  • target 0
  • export 0
  • signatures 0
  • flagstat 0
  • ligation junctions 0
  • genetic 0
  • deletions 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • csRNA-seq 0
  • STRIPE-seq 0
  • PRO-seq 0
  • GRO-seq 0
  • picard/renamesampleinvcf 0
  • exclude 0
  • variant identifiers 0
  • str 0
  • indep 0
  • indep pairwise 0
  • recode 0
  • whole genome association 0
  • identifiers 0
  • scoring 0
  • variant genetic 0
  • sortvcf 0
  • pcr 0
  • pbp 0
  • pairtools 0
  • pairstools 0
  • restriction fragments 0
  • select 0
  • groupreads 0
  • duplexumi 0
  • consensus sequence 0
  • public 0
  • paragraph 0
  • graphs 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pair-end 0
  • liftovervcf 0
  • read 0
  • pedigrees 0
  • ENA 0
  • motif 0
  • ChIP-Seq 0
  • phantom peaks 0
  • prophage 0
  • identification 0
  • illumina datasets 0
  • phylogenetic composition 0
  • SRA 0
  • ANI 0
  • hybrid-selection 0
  • mate-pair 0
  • percent on target 0
  • multimapper 0
  • read distribution 0
  • subsampling 0
  • long uncorrected reads 0
  • rhocall 0
  • R 0
  • escherichia coli 0
  • bamstat 0
  • strandedness 0
  • experiment 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • PEP 0
  • sequence-based 0
  • mapping-based 0
  • segment 0
  • integrity 0
  • blastx 0
  • pedfilter 0
  • rocplot 0
  • rtg-tools 0
  • salsa 0
  • salsa2 0
  • LCA 0
  • Ancestor 0
  • neighbour-joining 0
  • endogenous DNA 0
  • circos 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • genbank 0
  • gene finding 0
  • embl 0
  • intervals coverage 0
  • split by chromosome 0
  • deletion 0
  • genomic intervals 0
  • schema 0
  • normal database 0
  • panel of normals 0
  • cutoff 0
  • eklipse 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • False duplications 0
  • Assembly curation 0
  • pep 0
  • purging 0
  • integron 0

contiguate draft genome assembly

010

results versions

Screen assemblies for antimicrobial resistance against multiple databases

010

report versions

abricate:

Mass screening of contigs for antibiotic resistance genes

Screen assemblies for antimicrobial resistance against multiple databases

01

report versions

abricate:

Mass screening of contigs for antibiotic resistance genes

ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).

01230

ancestry_fractions allele_frequencies versions

The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.

010

passed_gff failed_gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

0100000

sample_dir contig_gbks db_tsv tsv faa sample_log full_log db db_txt db_fasta db_mmseqs versions

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.

0101

txt versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

0123012

translated_mrna total_mrna translation buffering mrna_abundance rdata fold_change_plot interaction_p_distribution_plot residual_distribution_summary_plot residual_vs_fitted_plot rvm_fit_for_all_contrasts_group_plot rvm_fit_for_interactions_plot rvm_fit_for_omnibus_group_plot simulated_vs_obt_dfbetas_without_interaction_plot session_info versions

anota2seq:

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

0100

tsv versions

Use deamination patterns to estimate contamination in single-stranded libraries

010101

txt versions

authentict:

Estimates present-day DNA contamination in ancient DNA single-stranded libraries.

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

primary_filtered_bam contamination_bam versions

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

gff versions

Profiles the nucleotide content of intervals in a fasta file.

012

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Merges methylation information for opposite-strand C's in a CpG context

010101

bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Perform basic quality control on a BAM file generated with Biscuit

010101

reports versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Relates methylation calls back to genomic cytosine contexts.

010101

coverage report summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Benchmarking Universal Single Copy Orthologs

metafastamodelineagebusco_lineages_pathconfig_file

meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_proteins versions

Benchmarking Universal Single Copy Orthologs

0100000

batch_summary short_summaries_txt short_summaries_json log full_table missing_busco_list single_copy_proteins seq_dir translated_dir busco_dir downloaded_lineages single_copy_faa single_copy_fna versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Download database for BUSCO

0

download_dir versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

BUSCO plot generation tool

0

png versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

0100

report assembly contigs corrected_reads corrected_trimmed_reads metadata contig_position contig_info versions

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101010101

orf2lca bin2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101

orf2lca contig2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Downloads the required files for either Nr or GTDB for building into a CAT database

01

rawdb versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Creates a CAT_pack database based on input FASTAs

01000

db taxonomy versions versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101001010101010101

rat_log complete_abundance contig_abundance read2classification alignment_diamond contig2classification cat_log orf2lca faa gff unmapped_diamond unmapped_fasta unmapped2classification versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Summarises results from CAT/BAT/RAT classification steps

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

0100

checkm_output marker_file checkm_tsv versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

01230

output fasta versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 database download

0

database versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

CheckM2 bin quality prediction

0101

checkm2_output checkm2_tsv versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

Construct the database necessary for checkv's quality assessment

NO input

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Assess the quality of metagenome-assembled viral genomes.

010

quality_summary completeness contamination complete_genomes proviruses viruses versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

010

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Determine the allelic profiles of a genome using a pre-defined schema

0101

stats contigs_info alleles log paralogous_counts paralogous_loci cds_coordinates invalid_cds loci_summary_stats versions

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.

01010101000

self_circular extended_circular extended_partial extended_failed orphan_end all_cobra_assemblies joining_summary log versions

cobra-meta:

COBRA is a tool to get higher quality viral genomes assembled from metagenomes.

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Generate the input coverage table for CONCOCT using a BEDFile

0123

tsv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Cut up fasta file in non-overlapping or overlapping parts of equal length.

010

fasta bed versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Creates a FASTA file for each new cluster assigned by CONCOCT

012

fasta versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

01

csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC

012

p_value_txt versions

controlfreec/assesssignificance:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Copy number and genotype annotation from whole genome and whole exome sequencing data

0123456000000000

bedgraph control_cpn sample_cpn gcprofile_cpn BAF CNV info ratio config versions

controlfreec/freec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

01

bed versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Format Freec output to circos input format

01

circos versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

0123

png_baf png_ratio_log2 png_ratio versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

012

png_baf png_ratio_log2 png_ratio versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Map reads to contigs and estimate coverage

010100

coverage versions

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

Controllable lossy compression of BAM/CRAM files

0100

bam cram sam bed versions

DAS Tool binning step.

01230

log summary contig2bin eval bins pdfs fasta_proteins candidates_faa fasta_archaea_scg fasta_bacteria_scg b6 seqlength versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

010

fastatocontig2bin versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

010

scaffolds2bin versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

0100

dc_estimate dc_pvals versions

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

0120

daa daa_tsv arg potential_arg versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

Assemble bacterial isolate genomes from Nanopore reads

012

contigs log raw_contigs gfa txt versions

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

0123

tsv json versions

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tbi tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Run falco on sequenced reads

01

html txt versions

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

Perform adapter and quality trimming on sequencing reads with reporting

01

reads stats debug statspdf reads_fail reads_unpaired log versions

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

Perform adapter/quality trimming on sequencing reads

010000

reads json html log reads_fail reads_merged versions

Run FastQC on sequenced reads

01

html zip versions

fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.

01

tsv versions

Run NCBI's FCS adaptor on assembled genomes

01

cleaned_assembly adaptor_report log pipeline_args skipped_trims versions

fcs:

The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.

Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.

010

fcs_gx_report taxonomy_report versions

fcs:

"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."

Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to remove foreign contamination from genome assemblies

012

cleaned contaminants versions

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Fetches the NCBI FCS-GX database using a provided manifest URL

0

database versions

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to screen and remove foreign contamination from genome assemblies

01200

fcsgx_report taxonomy_report log hits versions

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Filtlong filters long reads based on quality measures or short read data.

012

reads log versions

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

0

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Annotates intervals with GC content, mappability, and segmental-duplication content

0101010101010101

annotated_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Determines the baseline contig ploidy for germline samples given counts data

0123010

calls model versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

cohortcalls cohortmodel casecalls versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

012301010100

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

0120

genes features clusters gbk json versions

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

01

linear_plot_png transformed_linear_plot_png log_plot_png transformed_log_plot_png model summary lookup_table fitted_histogram_png versions

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GenomeTools gt-stat utility to show statistics about features contained in GFF3 files

01

stats versions

gt:

The GenomeTools genome analysis system

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes

0

db versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Merging of CheckM and GUNC results in one summary table

012

tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Detection of Chimerism and Contamination in Prokaryotic Genomes

010

maxcss_level_tsv all_levels_tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.

01

txt html versions

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Whole-genome assembly using PacBio HiFi reads

01201201201

raw_unitigs bin_files processed_unitigs primary_contigs alternate_contigs hap1_contigs hap2_contigs corrected_reads read_overlaps log versions

gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference

01

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

01

vcf versions

hmtnote:

Human mitochondrial variants annotation using HmtVar.

write your description here

0100

fastq json versions

hostile:

Hostile: accurate host decontamination

Downloads required reference genomes for Hostile

NO input

reference versions

hostile:

Hostile: accurate host decontamination

A Python application to generate self-contained HTML reports for variant review and other genomic applications

0123012

report versions

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

0120

mask versions

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

01230

output versions

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

012000000000000

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

Generate a consensus sequence from a BAM file using iVar

0100

fasta qual mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

0120

bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Call variants from a BAM file using iVar

010000

tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

01

bed versions

jvarkit:

Java utilities for Bioinformatics.

Removes low abundance k-mers from FASTA/FASTQ files

01

trimmed versions

khmer:

khmer k-mer counting library

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

01

smorfs all_orfs amp_prediction readme_file log_file versions

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

0000

index versions log

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

010

rma6 alignments log versions

malt:

A tool for mapping metagenomic data

Screens query sequences against large sequence databases

0101

screen versions

mash:

Fast sequence distance estimator that uses MinHash

Mashmap is an approximate long read or contig mapper based on Jaccard similarity

0101

paf versions

MaxBin is a software that is capable of clustering metagenomic contigs

0123

binned_fastas summary abundance log marker_counts unbinned_fasta tooshort_fasta marker_bins marker_genes versions

An ultra-fast metagenomic assembler for large and complex metagenomics

012

contigs k_contigs addi_contigs local_contigs kfinal_contigs log versions

pigz:

Parallel implementation of the gzip algorithm.

Analyses a DAA file and exports information in text format

010

txt_gz megan versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Analyses an RMA file and exports information in text format

010

txt megan_summary versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101000

hist log_stderr versions

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

A reimplemenation of KatGC to work with FastK databases

012

filled_gc_plot line_gc_plot stacked_gc_plot versions

merquryfk:

FastK based version of Merqury

FastK based version of Merqury

012340101

stats bed assembly_qv spectra_cn_fl spectra_cn_ln spectra_cn_st qv spectra_asm_fl spectra_asm_ln spectra_asm_st phased_block_bed phased_block_stats continuity_N block_N block_blob hapmers_blob versions

merquryfk:

FastK based version of Merqury

Depth computation per contig step of metabat2

012

depth versions

metabat2:

Metagenome binning

Metagenome binning of contigs

012

tooshort lowdepth unbinned membership fasta versions

metabat2:

Metagenome binning

Strain-level metagenomic assignment

012340

wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

Minia is a short-read assembler based on a de Bruijn graph

01

contigs unitigs h5 versions

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

A python workflow that assembles mitogenomes from Pacbio HiFi reads

010000

fasta stats gb gff all_potential_contigs contigs_annotations contigs_circularization contigs_filtering coverage_mapping coverage_plot final_mitogenome_annotation final_mitogenome_choice final_mitogenome_coverage potential_contigs reads_mapping_and_assembly shared_genes versions

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

A tool to reconstruct plasmids in bacterial assemblies

01

chromosome contig_report plasmids mobtyper_results versions

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

A bioinformatics tool for working with modified bases

0120101

bed bedgraph log versions

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data

Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.

01

img_clahe versions

molkartgarage:

One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

normalized_counts normalized_counts_wo_HK versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

nacho_qc_reports nacho_qc_png nacho_qc_txt versions

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

DNA contaminant removal using NanoLyse

010

fastq log versions

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

01

html png txt log versions

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

0101

vecscreen_output versions

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

010101

pt versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

01

pdf corr_matrix all matched versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates

01

index versions

Alignment with PacBio's minimap2 frontend

0101

bam versions

pbmm2:

A minimap2 frontend for PacBio native data formats

Creates a sequence dictionary for a reference sequence.

01

reference_dict versions

picard:

Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

Generate GWAS association studies

0123010101

assoc log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

pmdtools command to filter ancient DNA molecules from others

01200

bam versions

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Polishing genome assemblies with short reads.

01010

fasta versions debug

polypolish:

Polishing genome assemblies with short reads.

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

01

reads log versions

converts sam/bam/cram/pairs into genome contact map

01012

pretext versions

a module to generate images from Pretext contact maps.

01

image versions

write your description here

01

html json versions

Compute summary statistics for control gene from BAM files.

01200

control_stats versions

pypgx:

A Python package for pharmacogenomics research

Evaluate alignment data

010

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

012000

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

0101

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Quality Assessment Tool for Genome Assemblies

010101

results tsv transcriptome misassemblies unaligned versions

Extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format.

012

junc versions

regtools:

RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.

Predict antibiotic resistance from protein or nucleotide data

0100

json tsv tmp tool_version db_version versions

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Quality control of riboseq bam data

012012012010101

predictions all transprofile versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Quality control of riboseq bam data

01201

distribution pdf offset versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

sdf versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

List CRAM Content-ID and Data-Series sizes

01

size versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

bam cram sam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

01230101010101010100

orientation_data contamination_data contamination_segments stats vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Convert FASTA/Q to tabular format, and provide various information, like sequence length, GC content/GC skew.

01

text versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Generates a BED file containing genomic locations of lengths of N.

01

bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Sequence quality metrics for FASTQ and uBAM files.

01

json html versions

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

01200

seqz versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.

01234012301

phased_variant versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

validate consistency of feature and sample annotations with matrices and contrasts

0120101

sample_meta feature_meta assays contrasts versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Assemble bacterial isolate genomes from Illumina paired-end reads

01

contigs corrections log raw_contigs gfa versions

Simple ANI calculation between reference and query genomes.

0101

dist versions

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

Memory-efficient ANI database queries with skani.

0101

search versions

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

Storing skani sketches/indices on disk.

01

sketch_dir sketch markers versions

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

All-to-all ANI computation.

01

triangle versions

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

extract versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Compare many FracMinHash signatures generated by sourmash sketch.

01000

matrix labels csv versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

0100000

result unassigned matches prefetch prefetchcsv versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.

010

result versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Assembles a small genome (bacterial, fungal, viral)

012300

scaffolds contigs transcripts gene_clusters gfa warnings log versions

Fast, efficient, lossless compression of FASTQ files.

012

spring versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Fast, efficient, lossless decompression of FASTQ files.

010

fastq versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Advanced sequence file format conversions

01000

cram gzi versions

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

01

results_xlsx summary_tsv detailed_summary_tsv resfinder_tsv plasmidfinder_tsv mlst_tsv settings_txt pointfinder_tsv versions

staramr:

Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.

SummarizedExperiment container

010101

rds log versions

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Parses a Thermo RAW file containing mass spectra to an open file format

01

spectra versions

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

A post sequencing QC tool for Oxford Nanopore sequencers

01

report_data report_html plots_html plotly_js versions

Cluster contigs from multiple assemblies by similarity

012

cluster_dir versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

01001

bam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

00

index versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

0100

sam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

012

bam log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

01200

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โ€œdot plotโ€ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

calculate locally stable secondary structures of RNAs

0

rnalfold_txt versions

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to โ€œscanโ€ very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

a pangenome-scale aligner

0123400

paf versions

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

0100

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

Click here to trigger an update.