Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • bam 164
  • fasta 162
  • fastq 113
  • vcf 110
  • genomics 95
  • metagenomics 72
  • index 69
  • genome 68
  • reference 64
  • gatk4 60
  • bed 57
  • alignment 57
  • assembly 54
  • cram 53
  • sam 47
  • sort 45
  • structural variants 37
  • annotation 36
  • align 35
  • database 33
  • variant calling 33
  • merge 32
  • map 28
  • bacteria 27
  • filter 27
  • statistics 26
  • gff 23
  • variants 23
  • coverage 22
  • qc 22
  • quality control 21
  • gtf 21
  • nanopore 19
  • classify 19
  • cnv 18
  • gfa 18
  • download 18
  • k-mer 18
  • split 17
  • taxonomic profiling 17
  • contamination 16
  • convert 15
  • binning 15
  • quality 15
  • classification 15
  • taxonomy 15
  • sentieon 14
  • somatic 14
  • ancient DNA 14
  • clustering 14
  • count 14
  • VCF 13
  • contigs 13
  • conversion 13
  • copy number 13
  • variant 13
  • trimming 12
  • variation graph 12
  • isoseq 12
  • MSA 12
  • reporting 12
  • gvcf 12
  • bisulfite 12
  • bedtools 12
  • imputation 12
  • graph 12
  • picard 11
  • sv 11
  • methylation 11
  • methylseq 11
  • databases 11
  • proteomics 11
  • bisulphite 11
  • pacbio 11
  • rnaseq 11
  • expression 10
  • cna 10
  • 5mC 10
  • QC 10
  • table 10
  • serotype 10
  • illumina 10
  • phage 10
  • indexing 10
  • visualisation 10
  • compression 10
  • consensus 10
  • antimicrobial resistance 10
  • sequences 10
  • stats 10
  • phylogeny 9
  • bqsr 9
  • build 9
  • tsv 9
  • haplotype 9
  • protein 9
  • pangenome graph 9
  • protein sequence 9
  • WGBS 9
  • plot 9
  • searching 9
  • aDNA 9
  • markduplicates 9
  • histogram 9
  • scWGBS 9
  • bins 9
  • DNA methylation 9
  • demultiplex 9
  • amr 9
  • metrics 9
  • mappability 8
  • validation 8
  • pairs 8
  • iCLIP 8
  • kmer 8
  • matrix 8
  • completeness 8
  • palaeogenomics 8
  • damage 8
  • checkm 8
  • archaeogenomics 8
  • wgs 8
  • cooler 8
  • mmseqs2 8
  • long-read 8
  • biscuit 8
  • repeat 8
  • bisulfite sequencing 8
  • depth 8
  • virus 8
  • filtering 8
  • LAST 8
  • bcftools 8
  • low-coverage 8
  • annotate 8
  • bwa 8
  • base quality score recalibration 8
  • bismark 7
  • db 7
  • metagenome 7
  • aligner 7
  • phasing 7
  • mag 7
  • mkref 7
  • blast 7
  • transcriptome 7
  • imaging 7
  • openms 7
  • umi 7
  • transcript 7
  • dedup 7
  • decompression 7
  • differential 7
  • samtools 7
  • peaks 7
  • glimpse 7
  • ucsc 7
  • evaluation 7
  • mags 7
  • newick 7
  • genotyping 6
  • deduplication 6
  • antimicrobial resistance genes 6
  • long reads 6
  • ncbi 6
  • antimicrobial peptides 6
  • kmers 6
  • cnvkit 6
  • NCBI 6
  • feature 6
  • gff3 6
  • genotype 6
  • seqkit 6
  • mitochondria 6
  • prokaryote 6
  • bedGraph 6
  • hmmsearch 6
  • plasmid 6
  • kraken2 6
  • complexity 6
  • pangenome 6
  • gene 6
  • tumor-only 6
  • prediction 6
  • low frequency variant calling 6
  • cluster 6
  • single-cell 6
  • germline 6
  • single 6
  • benchmark 5
  • svtk 5
  • enrichment 5
  • multiple sequence alignment 5
  • snp 5
  • riboseq 5
  • MAF 5
  • fragment 5
  • kallisto 5
  • splicing 5
  • short-read 5
  • isolates 5
  • de novo assembly 5
  • interval 5
  • 3-letter genome 5
  • msa 5
  • amps 5
  • report 5
  • arg 5
  • adapters 5
  • example 5
  • sourmash 5
  • gzip 5
  • de novo 5
  • csv 5
  • population genetics 5
  • mem 5
  • deamination 5
  • mutect2 5
  • single cell 5
  • duplicates 5
  • microbiome 5
  • antibiotic resistance 5
  • json 5
  • mapping 5
  • demultiplexing 5
  • clipping 5
  • structure 5
  • extract 5
  • view 5
  • idXML 5
  • counts 5
  • fastx 4
  • CLIP 4
  • segmentation 4
  • cnv calling 4
  • microarray 4
  • merging 4
  • peak-calling 4
  • bedgraph 4
  • interval_list 4
  • copy number alteration calling 4
  • genmod 4
  • propr 4
  • fgbio 4
  • malt 4
  • normalization 4
  • sequencing 4
  • ranking 4
  • profiling 4
  • BGC 4
  • family 4
  • matching 4
  • ganon 4
  • circrna 4
  • reference-free 4
  • compare 4
  • ATAC-seq 4
  • biosynthetic gene cluster 4
  • taxonomic classification 4
  • fungi 4
  • happy 4
  • diamond 4
  • HiFi 4
  • resistance 4
  • retrotransposon 4
  • public datasets 4
  • hmmcopy 4
  • hmmer 4
  • detection 4
  • deep learning 4
  • archaeogenetics 4
  • cut 4
  • palaeogenetics 4
  • miscoding lesions 4
  • sequence 4
  • tabular 4
  • phylogenetic placement 4
  • umitools 4
  • ont 4
  • compress 4
  • STR 4
  • ccs 4
  • text 4
  • hic 4
  • query 4
  • ngscheckmate 4
  • vsearch 4
  • mpileup 4
  • DNA sequence 4
  • containment 4
  • targeted sequencing 4
  • FASTQ 4
  • summary 4
  • paf 4
  • haplotypecaller 4
  • genome assembler 4
  • diversity 4
  • ampir 4
  • hybrid capture sequencing 4
  • ancestry 4
  • logratio 4
  • DNA sequencing 4
  • indels 4
  • quantification 4
  • parsing 4
  • bgzip 4
  • concatenate 4
  • add 4
  • C to T 3
  • lossless 3
  • das tool 3
  • informative sites 3
  • ataqv 3
  • das_tool 3
  • unzip 3
  • ligate 3
  • remove 3
  • prefetch 3
  • observations 3
  • long_read 3
  • SNP 3
  • notebook 3
  • replace 3
  • uLTRA 3
  • minimap2 3
  • CRISPR 3
  • amplify 3
  • DRAMP 3
  • angsd 3
  • neubi 3
  • insert 3
  • converter 3
  • uncompress 3
  • zip 3
  • mlst 3
  • gsea 3
  • spark 3
  • neural network 3
  • archiving 3
  • chimeras 3
  • bim 3
  • bacterial 3
  • fam 3
  • variant_calling 3
  • eukaryotes 3
  • prokaryotes 3
  • typing 3
  • bin 3
  • genotype-based deconvoltion 3
  • plink2 3
  • benchmarking 3
  • snps 3
  • vrhyme 3
  • genome mining 3
  • SV 3
  • telomere 3
  • PacBio 3
  • popscle 3
  • deeparg 3
  • entrez 3
  • kinship 3
  • identity 3
  • HMM 3
  • RNA 3
  • genomes 3
  • combine 3
  • comparisons 3
  • scores 3
  • chromosome 3
  • polishing 3
  • arriba 3
  • fingerprint 3
  • PCA 3
  • profile 3
  • relatedness 3
  • gridss 3
  • fusion 3
  • rna_structure 3
  • indel 3
  • macrel 3
  • hi-c 3
  • abundance 3
  • shapeit 3
  • bcl2fastq 3
  • krona 3
  • UMI 3
  • virulence 3
  • score 3
  • cat 3
  • panel 3
  • png 3
  • wig 3
  • pan-genome 3
  • chip-seq 3
  • atac-seq 3
  • pairsam 3
  • bracken 3
  • spaceranger 3
  • adapter trimming 3
  • prokka 3
  • kraken 3
  • quality trimming 3
  • gatk4spark 3
  • gene expression 3
  • redundancy 3
  • chunk 3
  • microbes 3
  • small indels 3
  • wastewater 3
  • aln 3
  • checkv 3
  • cellranger 3
  • tabix 3
  • bamtools 3
  • organelle 3
  • bcf 3
  • reports 3
  • cut up 3
  • highly_multiplexed_imaging 3
  • mcmicro 3
  • image_analysis 3
  • wxs 3
  • spatial 3
  • complement 3
  • cool 3
  • msi 3
  • bigwig 3
  • amplicon sequencing 3
  • dump 3
  • roh 3
  • reads 3
  • survivor 3
  • rsem 3
  • duplication 3
  • npz 3
  • subsample 3
  • mkfastq 3
  • host 3
  • structural_variants 3
  • pseudoalignment 3
  • clean 3
  • genome assembly 3
  • bakta 3
  • krona chart 3
  • windowmasker 3
  • image_processing 2
  • gene set 2
  • gem 2
  • tnhaplotyper2 2
  • mitochondrion 2
  • evidence 2
  • rgfa 2
  • genomad 2
  • contig 2
  • nucleotides 2
  • cnvnator 2
  • ChIP-seq 2
  • gatk 2
  • baf 2
  • panelofnormals 2
  • dictionary 2
  • gene set analysis 2
  • joint genotyping 2
  • phase 2
  • Read depth 2
  • SimpleAF 2
  • comparison 2
  • proportionality 2
  • guide tree 2
  • concordance 2
  • rtgtools 2
  • gstama 2
  • resolve_bioscience 2
  • nextclade 2
  • ancient dna 2
  • switch 2
  • shigella 2
  • msisensor-pro 2
  • micro-satellite-scan 2
  • homoploymer,microsatellite 2
  • spatial_transcriptomics 2
  • profiles 2
  • graph layout 2
  • effect prediction 2
  • snpeff 2
  • snpsift 2
  • cancer genomics 2
  • join 2
  • megan 2
  • checksum 2
  • tree 2
  • FracMinHash sketch 2
  • Streptococcus pneumoniae 2
  • junctions 2
  • mash 2
  • rna 2
  • import 2
  • scaffolding 2
  • adapter 2
  • variant pruning 2
  • bam2fq 2
  • collate 2
  • fixmate 2
  • preseq 2
  • library 2
  • read-group 2
  • sequenzautils 2
  • soft-clipped clusters 2
  • ped 2
  • fusions 2
  • varcal 2
  • GPU-accelerated 2
  • salmonella 2
  • seqtk 2
  • sample 2
  • transformation 2
  • signature 2
  • purge duplications 2
  • tama 2
  • hidden Markov model 2
  • polyA_tail 2
  • refine 2
  • maximum likelihood 2
  • iphop 2
  • instrain 2
  • untar 2
  • cfDNA 2
  • ichorcna 2
  • mask 2
  • Duplication purging 2
  • mapcounter 2
  • hlala_typing 2
  • hla_typing 2
  • hlala 2
  • hla 2
  • vcflib 2
  • vg 2
  • trancriptome 2
  • runs_of_homozygosity 2
  • duplicate purging 2
  • small genome 2
  • interactive 2
  • de novo assembler 2
  • filtermutectcalls 2
  • lofreq 2
  • serogroup 2
  • barcode 2
  • primer 2
  • preprocessing 2
  • pair 2
  • html 2
  • krakenuniq 2
  • taxon tables 2
  • visualization 2
  • krakentools 2
  • taxonomic profile 2
  • standardise 2
  • standardisation 2
  • khmer 2
  • otu tables 2
  • scaffold 2
  • scRNA-seq 2
  • standardization 2
  • bustools 2
  • structural-variant calling 2
  • duplex 2
  • fastk 2
  • transposons 2
  • concat 2
  • aggregate 2
  • sketch 2
  • artic 2
  • GEO 2
  • simulate 2
  • recombination 2
  • eCLIP 2
  • intervals 2
  • UMIs 2
  • metagenomic 2
  • parse 2
  • unaligned 2
  • salmon 2
  • trim 2
  • RNA-Seq 2
  • correction 2
  • intersection 2
  • retrotransposons 2
  • genome bins 2
  • polish 2
  • long terminal repeat 2
  • mtDNA 2
  • demultiplexed reads 2
  • pileup 2
  • short reads 2
  • scatter 2
  • pharokka 2
  • eigenstrat 2
  • emboss 2
  • bloom filter 2
  • k-mer index 2
  • function 2
  • COBS 2
  • intersect 2
  • tbi 2
  • normalize 2
  • norm 2
  • xenograft 2
  • read depth 2
  • identifier 2
  • reheader 2
  • tab 2
  • metadata 2
  • repeat_expansions 2
  • expansionhunterdenovo 2
  • xz 2
  • orf 2
  • format 2
  • cvnkit 2
  • blastn 2
  • estimation 2
  • eido 2
  • antismash 2
  • windows 2
  • reads merging 2
  • mzml 2
  • single cells 2
  • orthology 2
  • nucleotide 2
  • bedpe 2
  • long terminal retrotransposon 2
  • heatmap 2
  • regression 2
  • realignment 2
  • interactions 2
  • ampgram 2
  • regions 2
  • settings 2
  • bayesian 2
  • sra-tools 2
  • gwas 2
  • spatial_omics 2
  • somatic variants 2
  • random forest 2
  • metagenomes 2
  • fasterq-dump 2
  • awk 2
  • bwameth 2
  • interval list 2
  • proteome 2
  • amptransformer 2
  • deconvolution 2
  • image 2
  • parallelized 2
  • vdj 2
  • antibiotics 2
  • RiPP 2
  • merge mate pairs 2
  • NRPS 2
  • leviosam2 2
  • secondary metabolites 2
  • lift 2
  • mudskipper 2
  • transcriptomic 2
  • blastp 2
  • metamaps 2
  • union 2
  • deseq2 2
  • authentication 2
  • zlib 2
  • edit distance 2
  • MaltExtract 2
  • rna-seq 2
  • HOPS 2
  • structural 2
  • immunoprofiling 2
  • fai 2
  • freqsum 1
  • induce 1
  • gc_wiggle 1
  • sex determination 1
  • genetic sex 1
  • transposable element 1
  • coreutils 1
  • generic 1
  • retrieval 1
  • bam2seqz 1
  • relative coverage 1
  • htseq 1
  • rare variants 1
  • predictions 1
  • sniffles 1
  • core 1
  • snippy 1
  • snp-dists 1
  • distance-matrix 1
  • nanopore sequencing 1
  • taxon name 1
  • taxids 1
  • dbnsfp 1
  • SNPs 1
  • copy number, BAM, CRAM, SMN1, SMN2 1
  • pile up 1
  • invariant 1
  • constant 1
  • go 1
  • mygene 1
  • cell_barcodes 1
  • tag 1
  • prior knowledge 1
  • functional analysis 1
  • POA 1
  • rna velocity 1
  • error 1
  • boxplot 1
  • gnu 1
  • de-novo 1
  • longread 1
  • sha256 1
  • 256 bit 1
  • shinyngs 1
  • exploratory 1
  • hashing-based deconvoltion 1
  • hamming-distance 1
  • density 1
  • cobra 1
  • variation 1
  • check 1
  • overlap-based merging 1
  • paired reads merging 1
  • pseudodiploid 1
  • features 1
  • sliding window 1
  • grea 1
  • extension 1
  • translation 1
  • trimfq 1
  • pseudohaploid 1
  • spot 1
  • featuretable 1
  • mass spectrometry 1
  • sage 1
  • orthologs 1
  • orthogroup 1
  • rtg 1
  • pedfilter 1
  • rocplot 1
  • rtg-tools 1
  • salsa 1
  • mapping-based 1
  • circular 1
  • salsa2 1
  • realign 1
  • quality check 1
  • LCA 1
  • Ancestor 1
  • multimapper 1
  • flagstat 1
  • sambamba 1
  • duplicate marking 1
  • integrity 1
  • sequence-based 1
  • ampliconclip 1
  • Read filters 1
  • uniques 1
  • subsampling 1
  • dereplicate 1
  • long uncorrected reads 1
  • drug categorization 1
  • rhocall 1
  • Read report 1
  • Read trimming 1
  • R 1
  • bamstat 1
  • nanoq 1
  • read distribution 1
  • duplicate 1
  • strandedness 1
  • experiment 1
  • read_pairs 1
  • fragment_size 1
  • redundant 1
  • WGS 1
  • cgMLST 1
  • inner_distance 1
  • extraction 1
  • amplicon 1
  • calmd 1
  • random draw 1
  • grep 1
  • Pacbio 1
  • applyvarcal 1
  • AC/NS/AF 1
  • VQSR 1
  • vcflib/vcffixup 1
  • variant recalibration 1
  • subseq 1
  • cellsnp 1
  • donor deconvolution 1
  • sequence headers 1
  • guidetree 1
  • sertotype 1
  • genotype-based demultiplexing 1
  • interleave 1
  • lexogen 1
  • droplet based single cells 1
  • rename 1
  • header 1
  • busco 1
  • seq 1
  • selection 1
  • assembly-binning 1
  • seacr 1
  • dict 1
  • transcriptomics 1
  • faidx 1
  • insert size 1
  • repair 1
  • paired 1
  • read pairs 1
  • size 1
  • cram-size 1
  • selector 1
  • paraphase 1
  • transcription factors 1
  • regulatory network 1
  • chromatin 1
  • readgroup 1
  • 10x 1
  • ribosomal 1
  • grabix 1
  • scramble 1
  • peak-caller 1
  • cut&tag 1
  • bwameme 1
  • bwamem2 1
  • cut&run 1
  • biological activity 1
  • rdtest2vcf 1
  • rRNA 1
  • copy number variation 1
  • wham 1
  • sequence similarity 1
  • copy-number 1
  • copy number analysis 1
  • homology 1
  • co-orthology 1
  • gender determination 1
  • immunoinformatics 1
  • copy number alterations 1
  • airrseq 1
  • spectral clustering 1
  • immcantation 1
  • yahs 1
  • geo 1
  • mapad 1
  • bam2fastq 1
  • bam2fastx 1
  • adna 1
  • c to t 1
  • mapper 1
  • structural variant 1
  • whamg 1
  • comparative genomics 1
  • readproteingroups 1
  • sintax 1
  • joint-genotyping 1
  • construct 1
  • install 1
  • graph projection to vcf 1
  • introns 1
  • gaps 1
  • extractunbinned 1
  • transform 1
  • linkbins 1
  • idx 1
  • wavefront 1
  • mutect 1
  • deep variant 1
  • archive 1
  • vsearch/sort 1
  • amplicon sequences 1
  • usearch 1
  • long read alignment 1
  • pangenome-scale 1
  • all versus all 1
  • mashmap 1
  • proteus 1
  • calling 1
  • VCFtools 1
  • peak picking 1
  • boxcox 1
  • Escherichia coli 1
  • Read coverage histogram 1
  • tnfilter 1
  • derived alleles 1
  • ancestral alleles 1
  • site frequency spectrum 1
  • reverse complement 1
  • simulation 1
  • hmmfetch 1
  • alr 1
  • reformatting 1
  • decompose 1
  • multiallelic 1
  • small variants 1
  • transmembrane 1
  • sompy 1
  • genome graph 1
  • tnseq 1
  • removal 1
  • decoy 1
  • clr 1
  • blat 1
  • eigenvectors 1
  • dnamodelapply 1
  • hicPCA 1
  • sliding 1
  • snakemake 1
  • workflow 1
  • workflow_mode 1
  • createreadcountpanelofnormals 1
  • copyratios 1
  • denoisereadcounts 1
  • readwriter 1
  • dnascope 1
  • confidence 1
  • gost 1
  • registration 1
  • groupby 1
  • tnscope 1
  • bgen 1
  • gprofiler2 1
  • vector 1
  • cytosure 1
  • array_cgh 1
  • chloroplast 1
  • genotypegvcf 1
  • deduplicate 1
  • ribosomal RNA 1
  • probabilistic realignment 1
  • rrna 1
  • rdtest 1
  • vcf2bed 1
  • decompress 1
  • machine_learning 1
  • cell_phenotyping 1
  • cell_type_identification 1
  • n50 1
  • seqfu 1
  • liftover 1
  • baftest 1
  • polya tail 1
  • fast5 1
  • collapse 1
  • genotype likelihood 1
  • reference-independent 1
  • hwe equilibrium 1
  • hwe statistics 1
  • hardy-weinberg 1
  • predict 1
  • Mycobacterium tuberculosis 1
  • countsvtypes 1
  • svtk/baftest 1
  • homologs 1
  • genetics 1
  • omics 1
  • structural-variants 1
  • signatures 1
  • Bayesian 1
  • hash sketch 1
  • fracminhash sketch 1
  • scimap 1
  • spatial_neighborhoods 1
  • associations 1
  • case/control 1
  • clahe 1
  • GWAS 1
  • spatype 1
  • streptococcus 1
  • sccmec 1
  • variantcalling 1
  • association 1
  • refresh 1
  • detecting svs 1
  • short-read sequencing 1
  • svdb 1
  • multi-tool 1
  • nucleotide sequence 1
  • uniq 1
  • plastid 1
  • mgf 1
  • files 1
  • raw 1
  • resistance genes 1
  • resfinder 1
  • upd 1
  • kma 1
  • uniparental 1
  • disomy 1
  • parallel 1
  • scRNA-Seq 1
  • snv 1
  • downsample 1
  • downsample bam 1
  • subsample bam 1
  • vcf2db 1
  • gemini 1
  • maf 1
  • lua 1
  • toml 1
  • vcflib/vcfbreakmulti 1
  • parquet 1
  • parser 1
  • distance-based 1
  • coexpression 1
  • chromosomal rearrangements 1
  • eucaryotes 1
  • sequencing adapters 1
  • minimum_evolution 1
  • phylogenetics 1
  • bedgraphtobigwig 1
  • assay 1
  • corpcor 1
  • correlation 1
  • bigbed 1
  • umicollapse 1
  • bedtobigbed 1
  • r 1
  • python 1
  • genepred 1
  • refflat 1
  • gtftogenepred 1
  • quarto 1
  • ucsc/liftover 1
  • standardize 1
  • dbsnp 1
  • rad 1
  • Segmentation 1
  • neighbour-joining 1
  • duplexumi 1
  • percent on target 1
  • cache 1
  • str 1
  • faqcs 1
  • antibiotic resistance genes 1
  • ARGs 1
  • ANI 1
  • SRA 1
  • ENA 1
  • fetch 1
  • public 1
  • consensus sequence 1
  • groupreads 1
  • Streptococcus pyogenes 1
  • unmapped 1
  • ubam 1
  • zipperbams 1
  • single molecule 1
  • generate 1
  • lint 1
  • fq 1
  • rust 1
  • variant caller 1
  • somatic variant calling 1
  • germline variant calling 1
  • bacterial variant calling 1
  • bootstrapping 1
  • endogenous DNA 1
  • swissprot 1
  • gamma 1
  • machine learning 1
  • custom 1
  • version 1
  • na 1
  • cls 1
  • gct 1
  • cutesv 1
  • pcr duplicates 1
  • paired-end 1
  • track 1
  • corrrelation 1
  • scatterplot 1
  • cumulative coverage 1
  • blastx 1
  • genbank 1
  • segment 1
  • escherichia coli 1
  • PEP 1
  • samplesheet 1
  • validate 1
  • schema 1
  • pep 1
  • eigenstratdatabasetools 1
  • eklipse 1
  • circos 1
  • deletion 1
  • split by chromosome 1
  • embl 1
  • UShER 1
  • gene-calling 1
  • Cores 1
  • snvs 1
  • panelofnormalscreation 1
  • germline contig ploidy 1
  • germlinecnvcaller 1
  • germlinevariantsites 1
  • getpileupsumaries 1
  • readcountssummary 1
  • indexfeaturefile 1
  • learnreadorientationmodel 1
  • readorientationartifacts 1
  • leftalignandtrimvariants 1
  • mergebamalignment 1
  • mutectstats 1
  • postprocessgermlinecnvcalls 1
  • genomicsdbimport 1
  • preprocessintervals 1
  • printreads 1
  • printsvevidence 1
  • reblockgvcf 1
  • revert 1
  • selectvariants 1
  • shiftchain 1
  • shiftfasta 1
  • shiftintervals 1
  • site depth 1
  • splitcram 1
  • splitintervals 1
  • svannotate 1
  • jointgenotyping 1
  • genomicsdb 1
  • gangstr 1
  • collectreadcounts 1
  • heattree 1
  • targets 1
  • annotateintervals 1
  • variant quality score recalibration 1
  • vqsr 1
  • allele-specific 1
  • asereadcounter 1
  • bedtointervallist 1
  • calculatecontamination 1
  • cross-samplecontamination 1
  • getpileupsummaries 1
  • calibratedragstrmodel 1
  • cnnscorevariants 1
  • collectsvevidence 1
  • gatherbqsrreports 1
  • combinegvcfs 1
  • short variant discovery 1
  • composestrtablefile 1
  • dragstr 1
  • condensedepthevidence 1
  • createsequencedictionary 1
  • createsomaticpanelofnormals 1
  • determinegermlinecontigploidy 1
  • duplication metrics 1
  • estimatelibrarycomplexity 1
  • filterintervals 1
  • filtervarianttranches 1
  • tranche filtering 1
  • CNV 1
  • TMA dearray 1
  • variantfiltration 1
  • element 1
  • autofluorescence 1
  • lifestyle 1
  • temperate 1
  • virulent 1
  • bacphlip 1
  • graft 1
  • mouse 1
  • bamtools/convert 1
  • yaml 1
  • bamtools/split 1
  • bamUtil 1
  • trimBam 1
  • illumiation_correction 1
  • background 1
  • background_correction 1
  • microscopy 1
  • clumping fastqs 1
  • smaller fastqs 1
  • deduping 1
  • csi 1
  • BCF 1
  • update header 1
  • biallelic 1
  • homozygosity 1
  • autozygosity 1
  • sorting 1
  • bamtobed 1
  • cycif 1
  • single-stranded 1
  • genomecov 1
  • nuclear contamination estimate 1
  • contiguate 1
  • antimicrobial reistance 1
  • adapterremoval 1
  • admixture 1
  • reference panels 1
  • affy 1
  • Staphylococcus aureus 1
  • allele 1
  • amp 1
  • antimicrobial peptide prediction 1
  • AMPs 1
  • model 1
  • post Post-processing 1
  • allele counts 1
  • ancientDNA 1
  • doCounts 1
  • installation 1
  • HLA 1
  • RNA-seq 1
  • utility 1
  • http(s) 1
  • mkarv 1
  • atlas 1
  • post mortem damage 1
  • sequencing_bias 1
  • ATLAS 1
  • read group 1
  • authentict 1
  • closest 1
  • getfasta 1
  • UNet 1
  • antitarget 1
  • qa 1
  • quality assurnce 1
  • chromap 1
  • duplicate removal 1
  • chromosome_visualization 1
  • splice 1
  • polymut 1
  • polymorphic 1
  • polymorphic sites 1
  • protein coding genes 1
  • cmseq 1
  • access 1
  • export 1
  • antibody capture 1
  • target 1
  • CNV calling 1
  • partition histograms 1
  • concoct 1
  • nucleotide composition 1
  • subcontigs 1
  • cooler/balance 1
  • cload 1
  • digest 1
  • enzyme 1
  • makebins 1
  • genomic bins 1
  • mcool 1
  • crispr 1
  • antigen capture 1
  • overlap 1
  • file manipulation 1
  • jaccard 1
  • chunking 1
  • maskfasta 1
  • overlapped bed 1
  • multinterval 1
  • shiftBed 1
  • region 1
  • sizes 1
  • bases 1
  • slopBed 1
  • subtract 1
  • unionBedGraphs 1
  • bioawk 1
  • Salmonella enterica 1
  • multiomics 1
  • subtyping 1
  • BAM 1
  • tblastn 1
  • postprocessing 1
  • cadd 1
  • calder2 1
  • topology 1
  • compartments 1
  • domains 1
  • Assembly 1
  • hifi 1
  • cellpose 1
  • mkvdjref 1
  • svcluster 1
  • recalibration model 1
  • quast 1
  • hla-typing 1
  • Neisseria gonorrhoeae 1
  • gender 1
  • graph construction 1
  • graph drawing 1
  • squeeze 1
  • odgi 1
  • combine graphs 1
  • graph stats 1
  • graph unchopping 1
  • graph formats 1
  • graph viz 1
  • tumor/normal 1
  • ILP 1
  • NextGenMap 1
  • HLA-I 1
  • PCR/optical duplicates 1
  • flip 1
  • upper-triangular matrix 1
  • ligation junctions 1
  • pairtools 1
  • pairstools 1
  • restriction fragments 1
  • select 1
  • covid 1
  • pangolin 1
  • lineage 1
  • paragraph 1
  • ngm 1
  • sequencing summary 1
  • pbbam 1
  • mitochondrial genome 1
  • 3D heat map 1
  • contour map 1
  • Merqury 1
  • assembly evaluation 1
  • smudgeplot 1
  • ploidy 1
  • unionsum 1
  • metaphlan 1
  • methylation bias 1
  • mbias 1
  • microrna 1
  • mirna 1
  • target prediction 1
  • reference genome 1
  • mobile element insertions 1
  • mosdepth 1
  • otu table 1
  • scan 1
  • mtnucratio 1
  • ratio 1
  • mitochondrial to nuclear ratio 1
  • bioinformatics tools 1
  • Beautiful stand-alone HTML report 1
  • GATK UnifiedGenotyper 1
  • SNP table 1
  • contaminant_removal 1
  • cancer genome 1
  • somatic structural variations 1
  • graphs 1
  • pbbam/pbmerge 1
  • k-mer frequency 1
  • contact maps 1
  • indep 1
  • indep pairwise 1
  • recode 1
  • identifiers 1
  • scoring 1
  • variant genetic 1
  • pmdtools 1
  • porechop_abi 1
  • contact 1
  • pretext 1
  • jpg 1
  • bmp 1
  • gene finding 1
  • dna sequencing 1
  • microbial 1
  • intervals coverage 1
  • genomic intervals 1
  • normal database 1
  • panel of normals 1
  • cutoff 1
  • haplotype purging 1
  • false duplications 1
  • assembly curation 1
  • Haplotype purging 1
  • False duplications 1
  • Assembly curation 1
  • split assembly 1
  • exclude 1
  • GRO-seq 1
  • subreads 1
  • liftovervcf 1
  • pbp 1
  • pair-end 1
  • read 1
  • pedigrees 1
  • motif 1
  • ChIP-Seq 1
  • phantom peaks 1
  • prophage 1
  • identification 1
  • illumina datasets 1
  • phylogenetic composition 1
  • hybrid-selection 1
  • mate-pair 1
  • pcr 1
  • PRO-seq 1
  • picard/renamesampleinvcf 1
  • sortvcf 1
  • deletions 1
  • insertions 1
  • tandem duplications 1
  • CoPRO 1
  • GRO-cap 1
  • PRO-cap 1
  • CAGE 1
  • NETCAGE 1
  • RAMPAGE 1
  • csRNA-seq 1
  • STRIPE-seq 1
  • GC content 1
  • Neisseria meningitidis 1
  • variantrecalibrator 1
  • mitochondrial 1
  • gunc 1
  • gunzip 1
  • gvcftools 1
  • extract_variants 1
  • extractvariants 1
  • abricate 1
  • amrfinderplus 1
  • fARGene 1
  • rgi 1
  • ibd 1
  • hbd 1
  • beagle 1
  • haplogroups. 1
  • genome taxonomy database 1
  • Haemophilus influenzae 1
  • haplotype resolution 1
  • gccounter 1
  • readcounter 1
  • Hidden Markov Model 1
  • hmtnote 1
  • annotations 1
  • pos 1
  • haemophilus 1
  • panel_of_normals 1
  • IDR 1
  • igv 1
  • igv.js 1
  • archaea 1
  • GTDB taxonomy 1
  • genome browser 1
  • genome manipulation 1
  • gawk 1
  • txt 1
  • file parsing 1
  • bgc 1
  • genome profile 1
  • compound 1
  • models 1
  • genome size 1
  • genome heterozygosity 1
  • repeat content 1
  • Salmonella Typhi 1
  • gfastats 1
  • genome summary 1
  • genome statistics 1
  • gstama/polyacleanup 1
  • transcripts 1
  • gget 1
  • low coverage 1
  • Sample 1
  • Haplotypes 1
  • Imputation 1
  • GNU 1
  • merge compare 1
  • genomes on a tree 1
  • tama_collapse.py 1
  • gene model 1
  • TAMA 1
  • gstama/merge 1
  • js 1
  • multicut 1
  • rma6 1
  • mash/dist 1
  • lofreq/filter 1
  • qualities 1
  • AMP 1
  • peptide prediction 1
  • functional genomics 1
  • sgRNA 1
  • CRISPR-Cas9 1
  • maximum-likelihood 1
  • rra 1
  • DNA damage 1
  • NGS 1
  • damage patterns 1
  • screen 1
  • lofreq/call 1
  • taxonomic assignment 1
  • mash/sketch 1
  • minhash 1
  • reduced 1
  • representations 1
  • maxbin2 1
  • metagenome-assembled genomes 1
  • MD5 1
  • 128 bit 1
  • megahit 1
  • denovo 1
  • debruijn 1
  • daa 1
  • call 1
  • Listeria monocytogenes 1
  • pixel classification 1
  • kallisto/index 1
  • pixel_classification 1
  • probability_maps 1
  • population genomics 1
  • dna 1
  • interproscan 1
  • genomic islands 1
  • insertion sequences 1
  • jasminesv 1
  • jasmine 1
  • Python 1
  • Jupyter 1
  • jupytext 1
  • papermill 1
  • quant 1
  • limma 1
  • digital normalization 1
  • effective genome size 1
  • screening assemblies 1
  • Klebsiella pneumoniae 1
  • kegg 1
  • kofamscan 1
  • combining 1
  • reorder 1
  • spliced 1
  • train 1
  • adapter removal 1
  • collapsing 1
  • legionella 1
  • Illumina 1

contiguate draft genome assembly

metascaffoldfasta

meta results versions

Screen assemblies for antimicrobial resistance against multiple databases

metaassemblydatabasedir

meta versions report

abricate:

Mass screening of contigs for antibiotic resistance genes

Screen assemblies for antimicrobial resistance against multiple databases

metaassembly

meta versions summary

abricate:

Mass screening of contigs for antibiotic resistance genes

A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes

metafasta

versions matches partials virulence txt out

abritamr:

A pipeline for running AMRfinderPlus and collating results into functional classes

Trim sequencing adapters and collapse overlapping reads

metareadsadapterlist

singles_truncated discarded paired_truncated collapsed collapsed_truncated paired_interleaved settings versions

Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.

metafastq

meta versions fixed_fastq

ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).

metabed_ped_genobim_mapfamK

meta versions Q-ancestry-fractions P-allele-frequencies

Read CEL files into an ExpressionSet and generate a matrix

metasamplesheetcelfiles_dirdescription

meta expression annotation rds versions

affy:

Methods for Affymetrix Oligonucleotide Arrays

Converts a GFF/GTF file into a proper GTF file

metagff

output_gtf log versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Converts a GFF/GTF file into a TSV file

metagff

tsv versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file

metagxf

output_gff log versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Add intron features to gtf/gff file without intron features.

metagffconfig

versions gff

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.

metagffkill_listconfig

meta versions gff

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.

metagffsconfig

meta versions gff

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Provides different type of statistics in text format from a GFF/GTF annotation file

metagff

stats_txt versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Provides basic statistics in text format from a GFF/GTF annotation file

metagff

stats_txt versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants

metafasta

meta summary results_dir versions

ALE: assembly likelihood estimator.

metaasmbam

meta ale versions

Generates a count of coverage of alleles

metainputinput_indexlocifasta

meta versions allelecount

A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.

metaamp_inputfaa_inputopt_amp_db

meta versions sample_dir txt csv faa summary_csv summary_html log results_db results_db_dmnd results_db_fasta results_db_tsv

A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.

summary_file

cluster_tsv rep_cluster_tsv log versions

ampcombi2/cluster:

A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.

A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.

summaries

tsv log versions

ampcombi2/complete:

This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

metaamp_inputfaa_inputgbk_inputopt_amp_db

meta sample_dir contig_gbks txt tsv faa sample_log full_log results_db results_db_dmnd results_db_fasta results_db_tsv versions

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.

metafaamodelmin_lengthmin_probability

meta versions amps_faa amps_tsv

AMPlify is an attentive deep learning model for antimicrobial peptide prediction.

metafaamodel_dir

meta versions tsv

amplify:

Attentive deep learning model for antimicrobial peptide prediction

Post-processing script of the MaltExtract component of the HOPS package

maltextract_resultstaxon_listfilter

versions json summary_pdf tsv candidate_pdfs

Identify antimicrobial resistance in gene or protein sequences

metafastadb

meta versions report mutation_report tool_version db_version

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Identify antimicrobial resistance in gene or protein sequences

NO input

meta versions db

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.

metaicountshapmap_file

meta versions txt

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculates base frequency statistics across reference positions from BAM.

metabambaiminqfile

meta versions depth_sample depth_global qs pos counts icounts

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculated genotype likelihoods from BAM files.

metabammeta2fastameta3error_file

meta versions genotype_likelihood

angsd:

ANGSD: Analysis of next generation Sequencing Data

Annotation and Ranking of Structural Variation

metasv_vcfsv_vcf_indexcandidate_small_variantsmeta2annotationsmeta3candidate_genesmeta4false_positive_snvmeta5gene_transcripts

meta versions tsv unannotated_tsv vcf

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

versions annotations

annotsv:

Annotation and Ranking of Structural Variation

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

metasample_treatment_colreferencetargetmeta2samplesheetcounts

meta translated_mrna total_mrna translation buffering mrna_abundance rdata fold_change_plot interaction_p_distribution_plot residual_distribution_summary_plot residual_vs_fitted_plot rvm_fit_for_all_contrasts_group_plot rvm_fit_for_interactions_plot rvm_fit_for_omnibus_group_plot simulated_vs_obt_dfbetas_without_interaction_plot session_info versions

anota2seq:

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

metasequence_inputdatabasesantismash_dirgff

meta versions clusterblast_file html_accessory_files knownclusterblast_html knownclusterblast_dir knownclusterblast_txt svg_files_clusterblast svg_files_knownclusterblast gbk_input json_results log zip gbk_results clusterblastoutput html knownclusterblastoutput json_sideloading

antismashlite:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

database_cssdatabase_detectiondatabase_modules

versions database antismash_dir

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

metabam

meta extracted_reads_fastq log intermediate_sam intermediate_bam intermediate_sorted_bam versions

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

metainput_tsvtooldb

meta tsv versions

CLI Download utility

metasource_url

meta downloaded_file versions

Download and prepare database for Ariba analysis

metadb_name

versions db

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Query input FASTQs against Ariba formatted databases

metareadsdb

versions results

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

NO input

versions reference

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Simulation tool to generate synthetic Illumina next-generation sequencing reads

metafastasequencing_systemfold_coverageread_length

versions meta fastq aln sam

art:

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.

Aggregates fastq files with demultiplexed reads

metafastq_dir

meta fastq versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

Run the alignment/variant-call/consensus logic of the artic pipeline

metafastqfast5_dirsequencing_summaryprimer_scheme_fastaprimer_scheme_bedmedaka_model_filemedaka_model_stringschemescheme_version

meta results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

copy number profiles of tumour cells.

argsmetainput_normalindex_normalinput_tumorindex_tumorallele_filesloci_filesbed_filefastagc_filert_file

meta allelefreqs metrics png purityploidy segments versions

Alignment by Simultaneous Harmonization of Layer/Adjacency Registration

metaimages

meta tif versions

Assembly summary statistics in JSON format

metaassembly

meta versions json

ataqv function of a corresponding ataqv tool

metabambaipeak_fileorganismmito_nametss_fileexcl_regs_fileautosom_ref_file

meta json problems versions

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

mkarv function of a corresponding ataqv tool

json

versions html

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

generate VCF file from a BAM file using various calling methods

metabambaifastafairecalpmdknown_allelesmethod

meta versions bam

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Estimate the post-mortem damage patterns of DNA

metabambaifastafaipool_rg_txt

meta versions empiric exponential counts table

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

gives an estimation of the sequencing bias based on known invariant sites

metabambaiempiricallelesinvariant_sites

meta versions recal_patterns

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

split single end read groups by length and merge paired end reads

metabambairead_group_settingblacklist

meta versions bam filelist

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Generate tables of feature metadata from GTF files

metameta2gtffasta

versions feature_annotation filtered_cdna

atlasgeneannotationmanipulation:

Scripts for manipulating gene annotation

Use deamination patterns to estimate contamination in single-stranded libraries

metabamconfigpositions

meta versions txt

authentict:

Estimates present-day DNA contamination in ancient DNA single-stranded libraries.

Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images.

metaimagemeta2markerfile

meta versions backsub_tif meta2 markerout

A bacteriophage lifestyle prediction tool

metafasta

meta versions bacphlip_results hmmsearch_results

Annotation of bacterial genomes (isolates, MAGs) and plasmids

metafastadbproteinsprodigal_tf

meta versions txt tsv gff gbff embl fna faa ffn hypotheticals_tsv hypotheticals_faa

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.

Downloads BAKTA database from Zenodo

NO input

versions db

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data

metabamindex

meta versions fastq

bam2fastx:

Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files

removes unused references from header of sorted BAM/CRAM files.

metabam

meta versions bam

This module is used to clip primer sequences from your alignments.

metabambaibedpe

meta versions bam bai

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

metaprimary_aligned_bamcontaminant_aligned_bam

versions primary_filtered_bam contamination_bam

write your description here

metabam

meta versions json

bamstats:

A command line tool to compute mapping statistics from a BAM file

Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis

metabam

meta versions fastq

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

metabam

meta versions out

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

metabam

meta versions bam

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

metabam

meta versions stats

bamtools:

C++ API & command-line toolkit for working with BAM data

trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’, or by soft clipping

metabamtrim_lefttrim_right

meta versions bam

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

Render an assembly graph in GFA 1.0 format to PNG and SVG image formats

metagfa

meta png svg versions

bandage:

Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

metareadsdbname

meta versions gff

Demultiplex Element Biosciences bases files

metarun_manifestrun_dir

meta versions sample_fastq sample_json qc_report run_stats generated_run_manifest metrics unassigned

BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.

metaimage

meta versions fields

Align short or PacBio reads to a reference genome using BBMap

metafastqref

meta versions bam

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Adapter and quality trimming of sequencing reads

metareadscontaminants

meta reads versions log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Merging overlapping paired reads into a single read.

metareadsinterleave

meta merged unmerged ihist versions log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.

metafastq

meta versions fastq

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Split sequencing reads by mapping them to multiple references simultaneously

metareadsindexprimary_refother_ref_namesother_ref_pathsonly_build_index

meta versions index primary_fastq all_fastq stats

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates

metareads

meta reads versions log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Filter out sequences by sequence header name(s)

metareadsnames_to_filteroutput_formatinterleaved_output

meta versions reads log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.

fasta

versions db

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.

metabam

meta stats hist versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Compares query sketches to reference sketches hosted on a remote server via the Internet.

metafile

meta versions hits

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Add or remove annotations.

metainputindexannotationsannotations_indexheader_lines

meta vcf csi tbi versions

annotate:

Add or remove annotations.

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

metavcfindexregionstargetssamples

meta vcf csi tbi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Concatenate VCF files

metavcfstbi

meta vcf csi tbi versions

concat:

Concatenate VCF files.

Compresses VCF files

metavcftbifasta

meta fasta versions

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

Converts certain output formats to VCF

metainputinput_indexmeta2fastabed

meta versions vcf_gz vcf bcf_gz bcf hap legend sample

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Filters VCF files

metavcf

meta vcf csi tbi versions

filter:

Apply fixed-threshold filters to VCF files.

Index VCF tools

metavcf

meta versions csi tbi

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Apply set operations to VCF files

metavcfstbis

meta results versions

isec:

Computes intersections, unions and complements of VCF files.

Merge VCF files

metavcfstbismeta2fastameta3faibed

meta vcf_gz vcf bcf_gz bcf versions

merge:

Merge VCF files.

Compresses VCF files

metabamintervalsmetafastasave_mpileup

meta vcf tbi stats mpileup versions

mpileup:

Generates genotype likelihoods at each genomic position with coverage.

Normalize VCF file

metavcftbimeta2fasta

meta vcf csi tbi versions

norm:

Normalize VCF files.

Split VCF by chunks or regions, creating multiple VCFs.

metavcftbisites_per_chunkscatterscatter_fileregionstargets

meta versions scatter csi tbi

pluginscatter:

Split VCF by chunks or regions, creating multiple VCFs.

Split VCF by sample, creating single- or multi-sample VCFs.

metavcftbisamplesgroupsregionstargets

meta versions vcf

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

Extracts fields from VCF or BCF files and outputs them in user-defined format.

metavcftbiregionstargetssamples

meta output versions

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Reheader a VCF file

metavcfheadersamplesmeta2fai

meta versions vcf

reheader:

Modify header of VCF/BCF files, change sample names.

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

metavcfaf_fileaf_file_tbigenetic_mapregions_filesamples_filetargets_file

meta versions roh

roh:

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

Sorts VCF files

metavcf

meta versions vcf csi tbi

sort:

Sort VCF files by coordinates.

Split a vcf file into files per chromosome

metavcftbi

meta split_vcf versions

bcftools:

Sort VCF files by coordinates.

Generates stats from VCF files

metavcftbiregionstargetssamplesexonsfasta

meta stats versions

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

metavcfindexregionstargetssamples

meta vcf csi tbi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Demultiplex Illumina BCL files

metasamplesheetrun_dir

versions fastq fastq_idx undetermined undetermined_idx reports stats interop

Demultiplex Illumina BCL files

metasamplesheetrun_dir

versions fastq fastq_idx undetermined undetermined_idx reports logs interop

Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.

metavcfrefgenmapexclsamplesexclmarkers

meta versions vcf log

beagle5:

Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.

Convert a BED file to a VCF file according to a YAML config

metabedconfigmeta2fai

meta versions vcf

Convert BAM/GFF/GTF/GVF/PSL files to bed

metainput

meta versions bed

bedops:

High-performance genomic feature operations.

Convert gtf format to bed format

metagtf

bed versions

gtf2bed:

The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.

Converts a bam file to a bed12 file.

metabam

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

For each feature in A, finds the closest feature (upstream or downstream) in B.

metainput_1input_2fasta_fai

meta versions output

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.

metabedsizes

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

computes both the depth and breadth of coverage of features in file B on the features in file A

metainput_Ainput_Bgenome_file

meta bed versions

bedtools:

A powerful toolset for genome arithmetic

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

metaintervalsscalesizesextension

meta genomecov versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

extract sequences in a FASTA file based on intervals defined in a feature file.

metabedfasta

meta fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.

metabedsummary_column

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

metaintervals1intervals2meta2chrom_sizes

meta intersect versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Calculate Jaccard statistic b/w two feature files.

metainput_ainput_bmeta2genome_file

meta versions tsv

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Makes adjacent or sliding windows across a genome or BED file.

metaregions

meta versions bed

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

metaintervals1intervals2meta2chrom_sizes

meta mapped versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

masks sequences in a FASTA file based on intervals defined in a feature file.

metabedfasta

meta fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.

metabed

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.

metabedschrom_sizes

meta versions bed

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Shifts each feature by specific number of bases

metabedchrom_sizes

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)

metabed

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Sorts a feature file by chromosome and other criteria.

metaintervalsgenome_file

meta sorted versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Split BED files into several smaller BED files

metabed

meta versions beds

bedtools:

A powerful toolset for genome arithmetic

Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.

metaintervals1intervals2

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Combines multiple BedGraph files into a single file

metabedgraphmeta2chrom_sizes

meta bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats.

metainput

meta versions output

Locate and tag duplicate reads in a BAM file

metabam

meta bam metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Merge a list of sorted bam files

metabam

meta bam bam_index checksum versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Parallel sorting and duplicate marking

metabamsmeta2fasta

meta bam bam_index cram metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Use k-mers to rapidly subtype S. enterica genomes

metaseqsscheme_metadata

meta versions summary kmer_results simple_summary

Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.

metareadsindex

meta bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

metareadsindex

meta bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Summarize and/or filter reads based on bisulfite conversion rate

metabambaiindex

meta bsconv_bam versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.

metabambaisnp_bedindex

meta epiread_bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Indexes a reference genome for use with Biscuit

fasta

index versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Merges methylation information for opposite-strand C's in a CpG context

metabedindex

meta mergecg_bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

metanormal_bamsnormal_baistumor_bamtumor_baiindex

meta versions vcf

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Perform basic quality control on a BAM file generated with Biscuit

metabam

biscuit_qc_reports versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.

metavcf

meta bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Performs alignment of BS-Seq reads using bismark

metareadsindex

meta bam unmapped report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Relates methylation calls back to genomic cytosine contexts.

metacoverage_fileindex

meta coverage report summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Removes alignments to the same position in the genome from the Bismark mapping output.

metabam

meta bam report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.

fasta

index versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Extracts methylation information for individual cytosines from alignments.

metabamindex

meta bedgraph methylation_calls coverage report mbias versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Collects bismark alignment reports

metaalign_reportsplitting_reportdedup_reportmbiasfasta

meta report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

bamalign_reportdedup_reportsplitting_reportmbias

summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Retrieve entries from a BLAST database

metaentryentry_batchmeta2db

meta fasta text versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

metafastameta2db

meta txt versions

blast:

BLAST finds regions of similarity between biological sequences.

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

metafastameta2dbout_ext

meta xml tsv csv versions

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

Builds a BLAST database

metafasta

meta db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

metafastameta2db

meta txt versions

blast:

Protein to Translated Nucleotide BLAST.

Downloads a BLAST database from NCBI

metaname

meta db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a sequence subject

metaquerymeta2subject

meta versions psl

Align reads to a reference genome using bowtie

metareadsmeta2indexsave_unaligned

bam fastq log versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create bowtie index for reference genome

metafasta

meta index versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

metareadsmeta2indexmeta3fastasave_unalignedsort_bam

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

metafasta

meta index versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

metakraken_reportdatabase

meta versions reports txt

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

metakraken2db

meta versions db

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

metainput

meta versions txt

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Benchmarking Universal Single Copy Orthologs

metafastamodelineagebusco_lineages_pathconfig_file

meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_proteins versions

Benchmarking Universal Single Copy Orthologs

metafastamodelineagebusco_lineages_pathconfig_file

meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_dir versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

BUSCO plot generation tool

short_summary_txt

png versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Find SA coordinates of the input reads for bwa short-read mapping

metareadsmeta2index

meta versions sai

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA index for reference genome

metafasta

meta index versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

metareadsmeta2indexfastasort_bam

bam cram csi crai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

metareadssaimeta2index

meta versions bam

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

metareadssaimeta2index

meta versions bam

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-mem2 index for reference genome

metafasta

meta index versions

bwamem2:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

metareadsmeta2indexmeta3fastasort_bam

meta sam bam cram crai csi versions

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-MEME index for reference genome

metafasta

meta versions index

bwameme:

Faster BWA-MEM2 using learned-index

Performs fastq alignment to a fasta reference using BWA-MEME

metareadsmeta2indexmeta3fastasort_bammbuffersort_threads

meta sam bam cram crai csi versions

bwameme:

Faster BWA-MEM2 using learned-index

Performs alignment of BS-Seq reads using bwameth

metareadsindex

meta bam versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Performs indexing of c2t converted reference genome

fasta

index versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

metavcfannotation_dir

meta versions tsv

Hierarchical Hi-C compartment computation

metainputresolution

meta versions output intermediate_data

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

metareadsmodegenomesize

meta versions report assembly contigs corrected_reads corrected_trimmed_reads metadata contig_position contig_info

A module for concatenation of gzipped or uncompressed files

metafiles_in

versions file_out

cat:

Just concatenation

Concatenates fastq files

metareads

meta reads versions

cat:

The cat utility reads files sequentially, writing them to the standard output.

Cluster protein sequences using sequence similarity

metasequences

meta fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Cluster nucleotide sequences using sequence similarity

metasequences

meta versions fasta clusters

cdhit:

Clusters and compares protein or nucleotide sequences

Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth

metaimg_datasignaturehigh_thresholdslow_thresholds

meta versions celltypes quality

cellpose segments cells in images

metaimagemodel

meta versions mask flows

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.

metareadsreference

outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.

bclcsv

fastq versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.

gtf

gtf versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.

fastagtfreference_name

reference versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.

reference_namegenesfastaseqs

reference versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.

metagex_fastqsvdj_fastqsab_fastqsbeam_fastqscmo_fastqsgex_referencegex_frna_probesetgex_targetpanelvdj_referencevdj_primer_indexfb_referencebeam_antigen_panelbeam_control_panelcmo_referencecmo_barcodescmo_barcode_assignmentfrna_sampleinfo

config outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.

metareadsreference

outs versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.

metalib_csvreference

outs versions

cellrangerarc:

Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.

bclcsv

fastq versions

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.

gtf

gtf versions

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.

fastagtfmotifsreference_configreference_name

reference versions

cellrangerarc:

Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.

Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.

metareadsreference

outs versions

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.

bclcsv

fastq versions

cellranger-atac:

Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.

fastagtfmotifsreference_configreference_name

reference versions

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.

metabambairegion_vcfbarcode

meta versions base cell sample allele_depth depth_coverage depth_other

cellsnp:

Efficient genotyping bi-allelic SNPs on single cells

Build centrifuge database for taxonomic profiling

metafastaconversion_tabletaxonomy_treename_tablesize_table

meta versions cf

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

metareadsdbsave_unalignedsave_aligned

meta report results sam fastq_unmapped fastq_mapped versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

metareportdb

meta versions kreport

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

metafastafasta_extdb

meta versions checkm_output checkm_output checkm_tsv

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

metaanalysis_dirmarker_filecoverage_fileexclude_marker_file

meta versions output fasta

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 database download

NO input

meta versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

CheckM2 bin quality prediction

metadbmetafastadb

meta versions checkm2_output checkm2_tsv

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

A simple program to parse Illumina NGS data and check it for quality criteria

run_dircheckqc_config

versions report

Construct the database necessary for checkv's quality assessment

NO input

versions checkv_db

checkv:

Assess the quality of metagenome-assembled viral genomes.

Assess the quality of metagenome-assembled viral genomes.

metafastadb

meta versions quality_summary completeness contamination complete_genomes proviruses viruses

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

metafastadb

meta versions checkv_db

checkv:

Assess the quality of metagenome-assembled viral genomes.

Create a schema to determine the allelic profiles of a genome

metafastaprodigal_tfcds

versions meta schema cds_coordinates invalid_cds

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Filter and trim long read data.

metafastq

meta versions fastq

zcat:

zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

metareadsmeta2fastameta3indexbarcodeswhitelistchr_orderpairs_chr_order

meta versions bed bam tagAlign pairs

chromap:

Fast alignment and preprocessing of chromatin profiles

Indexes a fasta reference genome ready for chromatin profiling.

metafasta

versions meta index

chromap:

Fast alignment and preprocessing of chromatin profiles

Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.

metameta2meta3meta4meta5meta6meta7autozygcoverageexomefracsnpideogramregionssites

meta versions plots

Annotate circRNAs detected in the output from CIRCexplorer2 parse

metajunctionsfastagene_annotation

meta txt versions

circexplorer2:

Circular RNA analysis toolkits

CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.

metafusions

meta bed versions

circexplorer2:

Circular RNA analysis toolkit

A method to improve mappings on circular genomes, using the BWA mapper.

metareferencemeta2elongation_factormeta3target

meta versions fasta

circulargenerator:

Creating a modified reference genome, with an elongation of the an specified amount of bases

Realign reads mapped with BWA to elongated reference genome

metabammeta2fastameta3elongation_factor

meta bam versions

circularmapper:

A method to improve mappings on circular genomes such as Mitochondria.

binning of metagenomic sequences

metafasta

meta versions fasta bins fm index links result

Runs the Clippy CLIP peak caller

metabedgtffai

peaks summits version

Predict recomination events in bacterial genomes

metamsanewick

meta versions emsim em fasta newick pos_ref status

Align sequences using Clustal Omega

metafastameta2treecompress

meta alignment versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

Renders a guidetree in clustalo

metafasta

meta tree versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

Calculates polymorphic site rates over protein coding genes

metabambaigfffasta

meta versions polymut

cmseq:

Set of utilities on sequences and BAM files

Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.

metafastameta2exclude_bed

meta bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Derive off-target (“antitarget”) bins from target regions.

metatargets

meta bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

metatumornormalmeta2fastameta3fasta_faimeta4targetsmeta5referencepanel_of_normals

meta bed cnn cnr cns pdf png versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number

metacnsvcf

meta versions output

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.

metacns

meta versions cns

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

metacnrcns

meta txt versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Compile a coverage reference from the given files (normal samples).

fastatargetsantitargets

meta cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Transform bait intervals into targets more suitable for CNVkit.

metabaitsmeta2annotation

meta bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.

metameta2meta3meta4bambairootfastafai

output_meta versions root tab

cnvnator:

Tool for calling copy number variations.

convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.

metacalls

meta versions vcf

cnvnator:

Tool for calling copy number variations.

command line tool for calling CNVs in whole genome sequencing data

metapytorbin_sizes

meta pytor versions

cnvpytor:

calling CNVs using read depth

calculates read depth histograms

metapytorbin_sizes

meta pytor versions

cnvpytor:

calling CNVs using read depth

command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.

metainput_fileindexfastafai

meta pytor versions

cnvpytor -rd:

calling CNVs using read depth

partitioning read depth histograms

metapytorbin_sizes

meta partitions versions

cnvpytor:

calling CNVs using read depth

view function to generate vcfs

metapytor_filesbin_sizesoutput_format

meta tsv vcf xls versions

cnvpytor:

calling CNVs using read depth

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.

metafastacoveragequerybamassemblerminkmaxk

meta extended_assemblies extended_circular extended_partial extended_failed orphan_end all_assemblies joining_summary log versions

cobra-meta:

COBRA is a tool to get higher quality viral genomes assembled from metagenomes.

Builds a classic bloom filter COBS index

metainput

meta index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Builds a compact bloom filter COBS index

metainput

meta index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

metacoverage_filefasta

meta versions args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv

concoct:

Clustering cONtigs with COverage and ComposiTion

Generate the input coverage table for CONCOCT using a BEDFile

metabedbamfilesbaifiles

meta versions tsv

concoct:

Clustering cONtigs with COverage and ComposiTion

Cut up fasta file in non-overlapping or overlapping parts of equal length.

metafastabed

meta versions fasta bed

concoct:

Clustering cONtigs with COverage and ComposiTion

Creates a FASTA file for each new cluster assigned by CONCOCT

metaoriginal_fastacsv

meta versions fasta

concoct:

Clustering cONtigs with COverage and ComposiTion

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

metaclustering_csv

meta versions csv

concoct:

Clustering cONtigs with COverage and ComposiTion

Calculate confidence scores from Kraken2 output

metakraken_resultkraken_taxon_db

meta score versions

Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC

metacnvsratio

meta versions p_value_txt

controlfreec/assesssignificance:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Copy number and genotype annotation from whole genome and whole exome sequencing data

argsmetamateFile_normalmateFile_tumorcpn_normalcpn_tumorminipileup_normalminipileup_tumorfastafaisnp_positionknown_snpsknown_snps_tbichr_directorymappabilitytarget_bed

meta versions bedgraph control_cpn sample_cpn gcprofile_cpn BAF CNV info ratio config

controlfreec/freec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

metaratio

meta versions bed

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Format Freec output to circos input format

metaratio

meta versions circos

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

metaratiobafploidy

meta versions png_baf png_ratio_log2 png_ratio

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

metaratiobaf

meta versions png_baf png_ratio_log2 png_ratio

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Run matrix balancing on a cool file

metacoolresolution

meta versions cool

cooler:

Sparse binary format for genomic interaction matrices

Create a cooler from genomic pairs and bins

metapairsindexcool_binchromsizes

meta version cool cool_bin

cooler:

Sparse binary format for genomic interaction matrices

Generate fragment-delimited genomic bins

fastachromsizesenzyme

versions bed

cooler:

Sparse binary format for genomic interaction matrices

Dump a cooler’s data to a text stream.

metacoolresolution

meta versions bedpe

cooler:

Sparse binary format for genomic interaction matrices

Generate fixed-width genomic bins

chromsizecool_bin

versions bed

cooler:

Sparse binary format for genomic interaction matrices

Merge multiple coolers with identical axes

metacool

meta versions cool

cooler:

Sparse binary format for genomic interaction matrices

Generate a multi-resolution cooler file by coarsening

metacool

meta versions mcool

cooler:

Sparse binary format for genomic interaction matrices

Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.

imagemeta

versions cores masks tma_map centroids meta

Compress files with crabz

metafile

meta versions archive

crabz:

Like pigz, but rust

Decompress files with crabz

metaarchive

meta versions file

crabz:

Like pigz, but rust

remove false positives of functional crispr genomics due to CNVs

metacount_filelibrary_file

meta versions norm_count_file

crisprcleanr:

Analysis of CRISPR functional genomics, remove false positive due to CNVs.

Controllable lossy compression of BAM/CRAM files

metainputkeepbedbedout

meta versions bam cram sam bed

Concatenate two or more CSV (or TSV) tables into a single table

metacsvin_formatout_format

meta versions csv

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Join two or more CSV (or TSV) tables by selected fields into a single table

metacsv

meta versions csv

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Splits CSV/TSV into multiple files according to column values

metacsvin_formatout_format

meta versions split_csv

csvtk:

CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.

Custom module to Add a new fasta file to an old one and update an associated GTF

metameta2fastagtfadd_fastabiotype

meta fasta gtf versions

custom:

Custom module to Add a new fasta file to an old one and update an associated GTF

Custom module used to dump software versions within the nf-core pipeline template

versions

yml mqc_yml versions

custom:

Custom module used to dump software versions within the nf-core pipeline template

Generates a FASTA file of chromosome sizes and a fasta index file

metafasta

meta sizes fai gzi versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

metagtffasta

meta gtf versions

gtffilter:

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

filter a matrix based on a minimum value and numbers of samples that must pass.

metaabundancesamplesheet_metasamplesheetminimum_abundanceminimum_samplesminimum_proportiongrouping_variableminimum_proportion_not_naminimum_samples_not_namost_variant_features

versions meta filtered tests

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

Test for the presence of suitable NCBI settings or create them on the fly.

NO input

versions ncbi_settings

sratools:

SRA Toolkit and SDK from NCBI

Make a GSEA class file (.cls) from tabular inputs

metasamples

meta cls versions

custom:

Make a GSEA class file (.cls) from tabular inputs

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

metatabular

meta gct versions

tabulartogseagct:

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.

metagtfmeta2quantsquant_typeidextra

meta tx2gene versions

custom:

"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"

Perform adapter/quality trimming on sequencing reads

metareads

meta reads log versions

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

structural-variant calling with cutesv

metabambaimeta2fasta

meta vcf versions

A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage

metabamfastafaispecieslist

versions results

DAS Tool binning step.

metacontigsbinsproteinsdb_directory

meta version log summary contig2bin eval bins pdfs fasta_proteins fasta_archaea_scg fasta_bacteria_scg b6 seqlength

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

metafastaextension

meta versions fastatocontig2bin

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

metafastaextension

meta versions scaffolds2bin

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.

metaconfig_filetable

versions report

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

metamatnetargs

meta dc_estimate dc_pvals versions

DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).

metabam

meta versions bam json hist log

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

NO input

versions db

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

metafastamodeldb

meta versions daa daa_tsv arg potential_arg

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.

NO input

versions deepbgc_db

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.

metagenome

meta versions readme log json bgc_gbk bgc_tsv full_gbk pfam_tsv bgc_png pr_png roc_png score_png

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

Deepcell/mesmer segmentation for whole-cell

metaimgmeta2membrane_img

meta mask versions

mesmer:

Deep cell is a collection of tools to segment imaging data

A Deep Learning Model for Transmembrane Topology Prediction and Classification

metafasta

meta gff3 line3 md csv png versions

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.

metainputinput_indexfastafasta_fai

meta versions bigWig bedgraph

deeptools:

A set of user-friendly tools for normalization and visualzation of deep-sequencing data

calculates scores per genome regions for other deeptools plotting utilities

metabigwigbed

meta matrix table versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Computes read coverage for genomic regions (bins) across the entire genome.

metabambaislabels

meta matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

metamatrixmethodplot_type

meta pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots cumulative reads coverages by BAM file

metabambais

meta pdf matrix metrics versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a heatmap

metamatrix

meta pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Generates principal component analysis (PCA) plot using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

metamatrix

meta pdf tab versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a profile plot

metamatrix

meta pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

metainputindexintervalmeta2fastameta3faimeta4gzi

meta vcf gvcf version

Call structural variants

metainputinput_indexvcfvcf_indexexclude_bedmeta2fastameta3fai

meta versions bcf csi

delly:

Structural variant discovery by integrated paired-end and split-read analysis

Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.

metainput_raw_gene_bc_matrices_h5input_hto_csv_fileoutput_namegenerate_gender_plotgenomegenerate_diagnostic_plots

meta zarr out_zarr versions

runs a differential expression analysis with DESeq2

metacontrast_variablereferencetargetmeta2samplesheetcountsmeta3control_genes_filemeta4transcript_lengths_file

results dispersion_plot rdata size_factors normalised_counts rlog_counts vst_counts model session_info versions

deseq2:

Differential gene expression analysis based on the negative binomial distribution

Queries a DIAMOND database using blastp mode

metafastameta2dbout_extblast_columns

meta blast xml txt daa sam tsv paf versions

diamond:

Accelerated BLAST compatible local sequence aligner

Queries a DIAMOND database using blastx mode

metafastameta2dbout_extblast_columns

meta blast xml txt daa sam tsv paf log versions

diamond:

Accelerated BLAST compatible local sequence aligner

calculate clusters of highly similar sequences

metadb

meta versions tsv

diamond:

Accelerated BLAST compatible local sequence aligner

Builds a DIAMOND database

metafastataxonmaptaxonnodestaxonnames

meta db versions

diamond:

Accelerated BLAST compatible local sequence aligner

Performs fastq alignment to a reference using DRAGMAP

metareadsmeta2hashmapmeta3fasta

bam versions

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

metafasta

meta hashmap versions

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Assemble bacterial isolate genomes from Nanopore reads

metashortreadslongreads

meta versions contigs log raw_contigs txt gfa

Export assembly segment sequences in GFA 1.0 format to FASTA format

metagfa

meta fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped BED format

metabed

meta bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

metagff3

meta gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped BED format

metabed

meta bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped GFF3 format

metagff3

meta gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

metaaligment_filealigment_file_indexsv_variantssnp_variantssnp_variants

meta versions vcf

Assessment of duplication rates in RNA-Seq datasets

metabammeta2gtf

meta scatter2d boxplot hist dupmatrix intercept_slope multiqc session_info versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

metainputindexfastameta2fai

meta vcf tbi versions

In silico prediction of E. coli serotype

metafasta

meta versions log tsv txt

Fast genome-wide functional annotation through orthology assignment.

metafastaeggnog_dbeggnog_data_direggnog_diamond_db

meta annotations orthologs hits versions

Convert any PEP project or Nextflow samplesheet to any format

samplesheetformatpep_input_base_dir

versions samplesheet_converted

eido:

Convert any PEP project or Nextflow samplesheet to any format

Validate samplesheet or PEP config against a schema

samplesheetschemapep_input_base_dir

versions log

validate:

Validate samplesheet or PEP config against a schema.

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

metagenosnpind

meta versions tsv json

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

tool for detection and quantification of large mtDNA rearrangements.

metabambairef_gb

meta csv circos versions

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

metabamrun_haplotypecallerrun_bqsrreference_sequencesfilter_regions_bedreference_elfastaknown_sitestarget_regions_bedintermediate_bqsr_tablesbqsr_tables_onlyget_activity_profileget_assembly_regions

meta versions bam metrics recall gvcf table activity_profile assembly_regions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

metabam

meta versions bam

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

metabam

meta versions bam

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.

metafasta

meta consensus versions

emboss:

The European Molecular Biology Open Software Suite

the revseq program from emboss reverse complements a nucleotide sequence

metasequences

meta versions revseq

emboss:

The European Molecular Biology Open Software Suite

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

metasequenceout_ext

meta versions outseq

emboss:

The European Molecular Biology Open Software Suite

EMM typing of Streptococcus pyogenes assemblies

metafasta

meta versions tsv

endorS.py calculates endogenous DNA from samtools flagstat files and print to screen

metastats_rawstats_qualityfilteredstats_deduplicated

meta versions json

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

metaassemblyspeciescache_version

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

metainputfeature_file

meta versions output

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

metavcfcustom_extra_filesgenomespeciescache_versioncachemeta2fastaextra_files

vcf tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Searches a term in a public NCBI database

metadatabaseterm

meta versions result_xml

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using Unique Identifier(s)

metadatabaseuiduids_file

meta versions xml

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using an UID

metaxml_inputpatternelementsep

meta versions xtract_table

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

phylogenetic placement of query sequences in a reference tree

metaqueryalnreferencealnreferencetreebfastfilebinaryfile

meta epang jplace log versions

epang:

Massively parallel phylogenetic placement of genetic sequences

splits an alignment into reference and query parts

metarefalnfullaln

meta query reference versions

epang:

Massively parallel phylogenetic placement of genetic sequences

estimation of the unfolded site frequency spectrum

metae_configdataseed

meta versions sfs_out pvalues_out

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

metafasta

meta dropset okayset versions

evigene:

EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.

Estimate repeat sizes using NGS data

metabambaimeta2fastameta3fasta_faimeta4variant_catalog

meta versions bam vcf json

Merge STR profiles into a multi-sample STR profile

metamanifestmeta2fastameta3fasta_fai

meta versions merged_profiles

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Compute genome-wide STR profile

metaalignment_filealignment_indexmeta2fastameta3fasta_fai

meta versions locus_tsv motif_tsv str_profile

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Run falco on sequenced reads

metareads

meta html txt txt versions

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

Aligns sequences using FAMSA

metafastameta2treecompress

meta alignment versions

famsa:

Algorithm for large-scale multiple sequence alignments

Renders a guidetree in famsa

metafasta

meta tree versions

famsa:

Algorithm for large-scale multiple sequence alignments

Perform adapter and quality trimming on sequencing reads with reporting

metareads

meta versions reads reads_fail reads_unpaired stats debug statspdf log

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

metainputhmm_model

meta versions log txt hmm orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp

Alignment-free computation of average nucleotide Identity (ANI)

metaqueryreference

meta ani versions

"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."

metafasta

meta success_log error_log versions

fasta_validate:

"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"

Quickly compute statistics over a fasta file in windows.

metafasta

meta versions freq mononuc dinuc trinuc tetranuc

A fast K-mer counter for high-fidelity shotgun datasets

metareads

meta versions hist ktab prof

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A fast K-mer counter for high-fidelity shotgun datasets

metahistogram

meta versions hist

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A tool to merge FastK histograms

metafastk_histfastk_ktabfastk_prof

meta versions fastk_hist fastk_ktab fastk_prof

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

Distance-based phylogeny with FastME

metainfiletopo

versions nwk stats matrix bootstrap

Perform adapter/quality trimming on sequencing reads

metareadsadapter_fastadiscard_trimmed_passsave_trimmed_failsave_merged

meta reads json html log versions reads_fail reads_merged

Run FastQC on sequenced reads

metareads

meta html zip versions

FASTQ summary statistics in JSON format

metareads

meta versions json

Build fastq screen config file from bowtie index files

genome_namesindexes

versions database

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Align reads to multiple reference genomes using fastq-screen

metareadsdatabase

fastq_screen versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Produces a Newick format phylogeny from a multiple sequence alignment. Capable of bacterial genome size alignments.

alignment

versions phylogeny

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

metafastx

meta versions fasta

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Run NCBI's FCS adaptor on assembled genomes

metaassembly

meta versions cleaned_assembly adaptor_report log pipeline_args skipped_trims

fcs:

The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.

Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.

metaassemblydatabase

meta versions fcs_gx_report taxonomy_report

fcs:

"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."

A command line tool that makes it easier to find sequencing data from the SRA / GEO / ENA.

ids

versions json

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

metabammin_readsmin_baseq

meta versions bam

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

metagrouped_bammin_readsmin_baseq

meta bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Collects a suite of metrics to QC duplex sequencing data.

metagrouped_baminterval_list

meta versions family_sizes duplex_family_sizes duplex_yield_metrics umi_counts duplex_qc duplex_umi_counts

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

r-ggplot2:

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

reads

meta version bam cram

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

metabammeta2fastamin_readsmin_baseqmax_base_error_rate

meta bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

metabamstrategy

meta versions bam histogram

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.

metabam

meta bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM

metamapped_bammeta2unmapped_bammeta3fastameta4dict

meta bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Filtlong filters long reads based on quality measures or short read data.

metashortreadslongreads

meta versions reads log

Perform merging of mate paired-end sequencing reads

metareads

meta merged notcombined histogram versions

De novo assembler for single molecule sequencing reads

metareadsmode

meta versions fasta gfa gv txt log json

Efficient compression tool for protein structures

metapdb

meta fcz versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Decompression tool for foldcomp compressed structures

metafcz

meta pdb versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Create a database from protein structures

metapdb

meta db versions

foldseek:

Foldseek: fast and accurate protein structure search

Search for protein structural hits against a foldseek database of protein structures

metapdbmeta_dbdb

meta aln versions

foldseek:

Foldseek: fast and accurate protein structure search

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

meta

meta fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

fq lint is a FASTQ file pair validator.

metafastq

versions

fq:

fq is a library to generate and validate FASTQ file pairs.

fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.

metafastq

meta versions fastq

fq:

fq is a library to generate and validate FASTQ file pairs.

Demultiplex fastq files

metasample_sheetfastq_readstructure_pairs

meta versions sample_fastq metrics most_frequent_unmatched

A haplotype-based variant detector

metainput_1input_1_indexinput_2input_2_indextarget_bedref_metafastaref_idx_metafasta_faisamples_metasamplespopulations_metapopulationscnv_metacnv

meta versions vcf

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

metavariantsdepthsrepeatsbarcodeslineages_meta

meta lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

specify the relative abundance of each known haplotype

metavariantsdepthsbarcodeslineages_meta

meta demix versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

downloads new versions of the curated SARS-CoV-2 lineage file and barcodes

db_name

barcodes lineages_topology lineages_meta versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

call variant and sequencing depth information of the variant

metabamfasta

meta variants depths versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

Cluster genome FASTA files by average nucleotide identity

metabinsqc_tableqc_format

meta tsv dereplicated_bins versions

Gene Allele Mutation Microbial Assessment

metafastadb

meta versions gamma psl gff fasta

gamma:

Tool for Gene Allele Mutation Microbial Assessment

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

metaaligment_filesalignment_indicesref_regionsfastafasta_fai

meta versions vcf samplestats

Build ganon database using custom reference sequences.

metainputtaxonomy_filesgenome_size_files

meta versions db info

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

metafastqsdb

meta versions tre report one all unc log

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

metarepdb

meta versions tre

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

metatre

meta versions txt

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

assigns taxonomy to query sequences in phylogenetic placement output

metajplace

meta examineassign profile labelled_tree per_query versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

grafts query sequences from phylogenetic placement on the reference tree

metajplace

meta versions newick

gappa:

Genesis Applications for Phylogenetic Placement Analysis

colours a phylogeny with placement densities

metajplace

meta versions newick nexus phyloxml svg colours log

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Performs local realignment around indels to correct for mapping errors

metabambaiintervalsmeta2fastameta3faimeta4dictmeta5known_vcf

meta versions bam bai

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

metabambaimeta2fastameta3faimeta4dictmeta5known_vcf

meta versions intervals

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

metabambaimeta2fastameta3faimeta4dictmeta5intervalsmeta6contaminationmeta7dbsnpmeta8comp

meta versions vcf

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Assigns all the reads in a file to a single new read-group

metameta2meta3bamfastafasta_index

meta versions bam bai cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Annotates intervals with GC content, mappability, and segmental-duplication content

metaintervalsmeta2fastameta3fasta_faimeta4dictmeta5mappable_regionsmeta6mappable_regions_tbimeta7segmental_duplication_regionsmeta8segmental_duplication_regions_tbi

meta versions annotated_intervals

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

metavcfvcf_tbirecalrecal_indextranchesfastafaidict

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for alle-specific expression analysis of RNAseq data

metainputinput_indexvcftbimeta2fastameta3faimeta4dictintervals

meta versions csv

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

metabedmeta2dict

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

metapileupmatched

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

metabambam_indexintervalsfastafasta_faidictstrtablefile

meta versions dragstr_model

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

metavcftbialigned_inputintervalsfastafaidictarchitectureweights

meta versions vcf tbi

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

metameta2meta3meta4inputinput_indexintervalsfastafaidict

meta versions hdf5 tsv

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

metainputinput_indexsite_depth_vcfsite_depth_vcf_indexfastafasta_faidict

meta versions split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index

gatk4:

Genome Analysis Toolkit (GATK4)

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

metavcfvcf_idxfastafaidict

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

fastafasta_faidict

versions str_table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges adjacent DepthEvidence records

metadepth_evidencedepth_evidence_indexfastafasta_faidict

meta versions condensed_evidence condensed_evidence_index

gatk4:

Genome Analysis Toolkit (GATK4)

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

metacounts

meta versions pon

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

metafasta

dict versions

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals contraining germline and artifactual sites for use with mutect2.

metagenoomicsdbmeta2fastameta3faimeta4dict

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Denoises read counts to produce denoised copy ratios

metameta2countspon

meta versions standardized denoised

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

metameta2countsbedexclude_bedscontig_ploidy_tableploidy_model

meta versions calls model

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Estimates the numbers of unique molecules in a sequencing library.

metainputfastafaidict

meta versions metrics

gatk4:

Genome Analysis Toolkit (GATK4)

Converts FastQ file to SAM/BAM format

metareads

meta bam versions

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

metaintervalsmeta2read_countsmeta3annotated_intervals

meta versions interval_list

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

metavcfvcf_tbistatsorientationbiassegmentationtableestimatemeta2fastameta3faimeta4dict

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

metavcftbiresourcesresources_indexfastafaidict

meta versions vcf tbi

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers scattered BQSR recalibration reports into a single file

metatable

meta table versions

gatk4:

Genome Analysis Toolkit (GATK4)

write your description here

metapileup

meta table versions

gatk4:

Genome Analysis Toolkit (GATK4)

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

metavcftbiwspaceinterval_fileinterval_valuerun_intlistrun_updatewspaceinput_map

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

metagvcfgvcf_indexintervalsintervals_indexfastafaidictdbsnpdbsnp_tbi

meta vcf tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

metatsvintervalsmodelploidy

meta versions cohortcalls cohortmodel casecalls

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

metainputinput_indexintervalsmeta2fastameta3faimeta4dictvariantsvariants_tbi

pileup versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

metainputinput_indexintervalsdragstr_modelmeta2fastameta3faimeta4dictmeta5dbsnpmeta6dbsnp_tbi

meta versions vcf tbi bam

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an index for a feature file, e.g. VCF or BED file.

metafeature_file

meta index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts an Picard IntervalList file to a BED file.

metainterval

meta bed versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits the interval list file into unique, equally-sized interval files and place it under a directory

metainterval_list

meta versions interval_list

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts

metaf1r2

artifactprior versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

metavcftbiintervalsfastafaidict

meta versions vcf tbi

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafasta_fai

meta versions bam cram bai crai metrics

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

metaalignedunalignedmeta2fastameta3dict

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges mutect2 stats generated on different intervals/regions

metastats

meta versions stats

gatk4:

Genome Analysis Toolkit (GATK4)

Merges several vcf files

metavcfmeta2dict

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

metainputinput_indexintervalsmeta2fastameta3faimeta4dictgermline_resourcegermline_resource_tbipanel_of_normalspanel_of_normals_tbi

vcf tbi stats f1r2 versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

metaploidycallsmodel

meta versions denoised segments intervals

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Prepares bins for coverage collection.

metafastameta2faimeta3dictmeta4intervalsmeta5exclude_intervals

meta versions interval_list

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

metainputindexmeta2fastameta3faimeta4dict

meta versions bam cram sam

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

metaevidence_filesevidence_indicesbedfastafasta_faidict

meta versions printed_evidence printed_evidence

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

metagvcftbiintervalsfastafaidictdbsnpdbsnp_tbi

meta versions gvcf tbi

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

metabam

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

metabam

fastq versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

metavcfvcf_idxintervals

meta vcf vcf_tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a fasta with the bases shifted by offset

metafastameta2fasta_faimeta3dict

meta versions dict intervals shift_back_chain shift_fa shift_intervals

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence

metasite_depthssite_depths_indicesvcftbifastafasta_faidict

meta versions baf baf_tbi

gatk4:

Genome Analysis Toolkit (GATK4)

Splits CRAM files efficiently by taking advantage of their container based structure

metacram

meta versions split_crams

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Split intervals into sub-interval files.

metaintervalmeta2fastameta3faimeta4dict

meta bed versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits reads that contain Ns in their cigar string

metabambaiintervalsmeta2fastameta3faimeta4dict

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

metavcftbibedfastafasta_faidict

meta versions annotated_vcf index

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

metavcfsindicesploidy_tablefastafasta_faidict

meta versions clustered_vcf clustered_vcf_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

metavcfvcf_tbimeta2fastameta3faimeta4dict

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

metavcftbiresource_vcfresource_tbilabelsfastafaidict

recal idx tranches plots version

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.

metainputprogram_file

meta versions output

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

metainputhmmmodel_dir

meta versions genes features clusters gbk json

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

Convert a mappability file to bedgraph format

metamapmeta2index

meta versions bedgraph sizes

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

metafasta

meta versions gem log

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Define the mappability of a reference

metaindexread_length

meta versions map

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

metafasta

meta versions gem info

gem3:

The GEM indexer (v3).

Performs fastq alignment to a fasta reference using using gem3-mapper

metameta2fastqgemsort_bam

meta versions bam

gem3:

The GEM indexer (v3).

A derivative of GenomeScope2.0 modified to work with FastK

metafastk_histex_histogram

meta versions linear_plot transformed_linear_plot log_plot transformed_log_plot model summary kmer_cov

create index file for genmap

metafasta

meta versions index

genmap:

Ultra-fast computation of genome mappability.

create mappability files for a genome

metaindexmeta2regions

meta versions wig bedgraph txt csv

genmap:

Ultra-fast computation of genome mappability.

for annotating regions, frequencies, cadd scores

metainput_vcf

meta versions vcf

genmod:

Annotate genetic inheritance models in variant files

Score compounds

metainput_vcf

meta versions vcf

genmod:

Annotate genetic inheritance models in variant files

annotate models of inheritance

metainput_vcfreduced_penetrancefamily_file

meta versions vcf

genmod:

Annotate genetic inheritance models in variant files

Score the variants of a vcf based on their annotation

metainput_vcffamily_filescore_config

meta versions vcf

genmod:

Annotate genetic inheritance models in variant files

Download geNomad databases and related files

NO input

versions genomad_db

genomad:

Identification of mobile genetic elements

Identify mobile genetic elements present in genomic assemblies

metafastagenomad_dbscore_calibration

meta aggregated_classification taxonomy provirus compositions calibrated_classification plasmid_fasta plasmid_genes plasmid_proteins plasmid_summary virus_fasta virus_genes virus_proteins virus_summary versions

genomad:

Identification of mobile genetic elements

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

metahistogram

meta versions linear_plot_png linear_plot_png transformed_linear_plot_png log_plot_png transformed_log_plot_png model summary lookup_table fitted_histogram_png

Genotype Salmonella Typhi from Mykrobe results

metajson

meta versions tsv

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

Peak-calling for ChIP-seq and ATAC-seq enrichment experiments

metatreatment_bamcontrol_bamblacklist_bedsave_pvaluessave_pileupsave_bedsave_duplicates

meta peaks bedgraph_pvalues bedgraph_pileup bed_intervals duplicates version

geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA

geo_accession

versions geo_accession samples

Retrieves GEO data from the Gene Expression Omnibus (GEO)

metaquerygse

rds expression annotation versions

geoquery:

Get data from NCBI Gene Expression Omnibus (GEO)

Downloads databases needed for running getorganelle

organelle_type

versions organelle_type db

getorganelle:

Get organelle genomes from genome skimming data

Assembles organelle genomes from genomic data

metafastqorganelle_typedb

meta versions fasta etc

getorganelle:

Get organelle genomes from genome skimming data

Collapse walk-preserving shared affixes in variation graphs in GFA format

metagfa

meta gfa affixes versions

A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

metaassemblyout_fmtgenome_sizetargetagpfileinclude_bedexclude_bedinstructions

meta versions assembly_summary assembly

Converts GFA or rGFA files to FASTA

metagfa

meta versions fasta

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Summary statistics for GFA files

metagfa

meta versions stats

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Compare, merge, annotate and estimate accuracy of generated gtf files

metagtfsfastafaireference_gtf

meta annotated_gtf combined_gtf tmap refmap loci stats tracking versions

Validate, filter, convert and perform various other operations on GFF files

metagfffasta

meta gtf gffread_gff gffread_fasta versions

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

metafiles

meta versions output file

gget:

gget enables efficient querying of genomic databases

Defines chunks where to run imputation

metainputregion

meta versions txt

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.

metaregionfreqtruthestimatemin_probmin_dpbins

meta versions errors_cal errors_grp errors_spl rsquared_grp rsquared_spl

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Concatenates imputation chunks in a single VCF/BCF file ligating phased information.

metainput_listinput_index

meta versions merged_variants

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

metainputinput_indexsamples_fileinput_regionoutput_regionreferencereference_indexmap

meta versions phased_variants

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Generates haplotype calls by sampling haplotype estimates

metainput

meta versions haplo_sampled

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Defines chunks where to run imputation

metainputinput_indexregionmeta2mapmodel

meta versions chunk_chr

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Program to compute the genotyping error rate at the sample or marker level.

metaregionfreqtruthestimatesamplesgroupsbinsac_binsallele_countsmin_val_glmin_val_dp

meta versions errors_cal errors_grp errors_spl rsquare_grp rsquare_spl rsquare_per_site

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.

metainput_listinput_index

meta versions merged_variants

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool for imputation and phasing from vcf file or directly from bam files.

metainputinput_indexsamples_fileinput_regionoutput_regionmeta2referencereference_indexmapfasta_referencefasta_reference_index

meta versions phased_variants stats_coverage

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool to create a binary reference panel for quick reading time.

metareferencereference_indexinput_regionoutput_regionmeta2map

meta versions bin_ref

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

merge gVCF files and perform joint variant calling

metagvcfs

versions bcf

Writes a sorted concatenation of file/s

metainput

meta sorted versions

sort:

Writes a sorted concatenation of file/s

Split a file into consecutive or interleaved sections

metainput

meta split versions

gnu:

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.

Query metadata for any taxon across the tree of life.

metataxontaxa_file

meta versions taxonsearch

goat:

goat-cli is a command line interface to query the Genomes on a Tree Open API.

Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.

metabamsindexesfai

meta output bams

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Quickly generate evenly sized (by amount of data) regions across a number of bam/cram files

metabaimeta2fai

meta bed versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

runs a functional enrichment analysis with gprofiler2

metade_filecontrast_variablereferencetargetbackground_filegmt_file

meta all_enrich rds plot_png plot_html sub_enrich sub_plot filtered_gmt session_info versions

gprofiler2:

An R interface corresponding to the 2019 update of g:Profiler web tool.

Checks if the input file is bgzip compressed or not

input

versions compress_bgzip

grabix:

a wee tool for random access into BGZF files.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

metafastqfastaindex

meta sam versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

fasta

gmidx versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Tools for population-scale genotyping using pangenome graphs.

metabambairefref_fairegion_file

meta versions vcf tbi

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.

Tools for population-scale genotyping using pangenome graphs.

metavcf

meta versions vcf tbi

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

metainputsfastafasta_faibwa_index

meta versions vcf

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

metameta2meta3meta4vcfbedpebedfastafasta_faibwa_index

meta versions bedpe bed

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

metameta2vcfpondir

meta versions high_conf_sv all_sv

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

run the Broad Gene Set Enrichment tool in GSEA mode

metagctclsgene_setsreferencetargetchip

meta rpt index_html heat_map_corr_plot report_tsvs_ref report_htmls_ref report_tsvs_target report_htmls_target ranked_gene_list gene_set_sizes butterfly_plot histogram heatmap pvalues_vs_nes_plot ranked_list_corr gene_set_tsv gene_set_html gene_set_heatmap snapshot gene_set_enplot gene_set_dist archive versions

gsea:

Gene Set Enrichment Analysis (GSEA)

Collapse redundant transcript models in Iso-Seq data.

metabamfasta

meta versions bed bed_trans_reads local_density_error polya read strand_check trans_report varcov variants

tama_collapse.py:

Collapse similar gene model

Merge multiple transcriptomes while maintaining source information.

metabed

meta bed gene_report merge trans_report versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

metafasta

meta versions fasta report tails

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files

metagff3

meta gt_gff3 error_log versions

gt:

The GenomeTools genome analysis system

GenomeTools gt-gff3validator utility to strictly validate a GFF3 file

metagff3

meta success_log error_log versions

gt:

The GenomeTools genome analysis system

Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility

metaindex

meta tabout gff3 fasta inner_fasta versions

gt:

The GenomeTools genome analysis system

GenomeTools gt-stat utility to show statistics about features contained in GFF3 files

metagff3

meta stats versions

gt:

The GenomeTools genome analysis system

Computes enhanced suffix array using GenomeTools gt-suffixerator utility

metafastamode

meta index versions

gt:

The GenomeTools genome analysis system

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

metabinsdatabasemash_db

meta versions summary tree markers msa user_msa filtered log warnings failed

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Sort GTF files in chr/pos/feature order

gtf

versions gtf

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

alignment

versions fasta embl_predicted gff embl_branch vcf stats phylip tree tree_labelled

Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes

db_name

versions db

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Merging of CheckM and GUNC results in one summary table

metagunc_filecheckm_file

meta versions tsv

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Detection of Chimerism and Contamination in Prokaryotic Genomes

metafastadb

meta versions maxcss_levels_tsv all_levels_tsv

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Compresses and decompresses files.

metaarchive

gunzip versions

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

metagvcf

meta versions vcf

gvcftools:

gvcftools is a package of small utilities for creating and analyzing gVCF files

Tool to convert and summarize ABRicate outputs using the hAMRonization specification

metareportformatsoftware_versionreference_db_version

meta versions json tsv

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.

metareportformatsoftware_versionreference_db_version

meta versions json tsv

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize DeepARG outputs using the hAMRonization specification

metareportformatsoftware_versionreference_db_version

meta versions json tsv

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize fARGene outputs using the hAMRonization specification

metareportformatsoftware_versionreference_db_version

meta versions json tsv

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize RGI outputs using the hAMRonization specification.

metareportformatsoftware_versionreference_db_version

meta versions json tsv

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to summarize and combine all hAMRonization reports into a single file

reportsformat

versions json tsv html

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

metavcfmapexclude

meta versions hbd ibd log

Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.

metavcf

meta versions txt html

classification into haplogroups

metainputfileformat

meta versions txt

haplogrep2:

A tool for mtDNA haplogroup classification.

Somatic VCF Feature Extraction tool from hap.y.

metameta2meta3vcfregions_bedtargets_bedbamfastafasta_fai

meta features versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

metaquery_vcftruth_vcfregions_bedtargets_bedfastafasta_faifalse_positives_bedstratification_tsvstratification_beds

meta summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv json runinfo vcf tbi versions

happy:

Haplotype VCF comparison tools

Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py

metameta2meta3vcfbedfastafasta_fai

meta vcf versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.

metaquery_vcftruth_vcfregions_bedtargets_bedfastafasta_faifalse_positives_bedstratification_tsvbams

meta features metrics stats versions

sompy:

Haplotype VCF comparison tools somatic variant comparison

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

metafastadatabase_dirmodel_fp

meta versions gbk svg tsv

Computes PCA eigenvectors for a Hi-C matrix.

metamatrix

meta versions results pca1 pca2

hicexplorer:

Set of programs to process, analyze and visualize Hi-C and capture Hi-C data

Whole-genome assembly using PacBio HiFi reads

metareadspaternal_kmer_dumpmaternal_kmer_dumpuse_parental_kmershic_read1hic_read2

meta versions raw_unitigs processed_unitigs primary_contigs alternate_contigs paternal_contigs maternal_contigs corrected_reads source_overlaps reverse_overlaps log

Align RNA-Seq reads to a reference with HISAT2

metareadsmeta2indexmeta3splicesites

meta bam summary versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Builds HISAT2 index for reference genome

metafastameta2gtfmeta3splicesites

meta index versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Extracts splicing sites from a gtf files

metagtf

meta versions splicesites

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Pre-compute the graph index structure.

graph

versions folder

hlala:

HLA typing from short and long reads

Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.

metabamgraph

meta versions folder

hlala:

HLA typing from short and long reads

gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference

metafasta

meta versions wig

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.

metafasta

meta versions bigwig

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file

metabigwig

meta versions wig

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

readCounter function from HMMcopy utilities, used to generate read in windows

metabam

meta versions wig

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Mask multiple sequence alignments

metaunmaskedalnfmask_rffmask_allgmask_rfgmask_allpmask_rfpmask_allmaskfile

meta versions maskedaln fmask_rf fmask_all gmask_rf gmask_all pmask_rf pmask_all

hmmer:

Biosequence analysis using profile hidden Markov models

reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.

metaseqfile

meta versions seqreformated

hmmer:

Biosequence analysis using profile hidden Markov models

hmmalign from the HMMER suite aligns a number of sequences to an HMM profile

metafastahmm

meta versions sthlm

hmmer:

Biosequence analysis using profile hidden Markov models

create an hmm profile from a multiple sequence alignment

metaalignmentmxfile

versions hmm

hmmer:

Biosequence analysis using profile hidden Markov models

extract hmm from hmm database file or create index for hmm database

metahmmkeykeyfile

meta versions hmm index

hmmer:

Biosequence analysis using profile hidden Markov models

search profile(s) against a sequence database

metahmmfileseqdbwrite_alignwrite_targetwrite_domain

meta versions output alignments target_summary domain_summary

hmmer:

Biosequence analysis using profile hidden Markov models

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

metavcf

meta versions vcf

hmtnote:

Human mitochondrial variants annotation using HmtVar.

Annotate peaks with HOMER suite

metapeaksfastagtf

meta annotated_peaks annotation_stats versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Find peaks with HOMER suite

metatagDir

meta peaks versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Create a tag directory with the HOMER suite

metabamfasta

meta tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Create a UCSC bed graph with the HOMER suite

metatagDir

meta bedGraph versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Coverting from HOMER peak to BED file formats

metatagDir

meta bed versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Serotype prediction of Haemophilus parasuis assemblies

metafasta

meta versions tsv

count how many reads map to each feature

metameta2inputindexgtf

meta txt

htseq/count:

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

metavcftbimeta2background_vcfbackground_tbi

meta versions tsv

htsnimtools:

useful command-line tools written to show-case hts-nim

HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.

metareadsmeta2umis

meta dedup annotated stats log versions

Assembly polisher using short (and long) reads

metareadsmeta2sr_bamdraftgenome_sizereads_coverage

meta fasta versions

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals

wigsgc_wigmap_wigcentromere

versions rds txt

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA

metawiggc_wigmap_wigpanel_of_normalscentromere

meta versions cna_seg ichorcna_params genome_plot

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

Plot a metagene of cross-link events/sites around various transcriptomic landmarks.

metabedsegmentation

meta tsv versions

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount peaks on a BED file of crosslinks

metabedsigxls

meta peaks versions

icount:

Computational pipeline for analysis of iCLIP data

Formats a GTF file for use with iCount sigxls

metagtffai

gtf versions

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount sigxls on a BED file of crosslinks

metabedsegmentation

meta peaks scores versions

icount:

Computational pipeline for analysis of iCLIP data

Report proportion of cross-link events/sites on each region type.

metabedsegmentation

meta summary_type summary_subtype summary_gene versions

icount:

Computational pipeline for analysis of iCLIP data

Demultiplex paired-end FASTQ files from QuantSeq-Pool

metareadssamplesheet

versions fastq undetermined stats

Measures reproducibility of ChIP-seq, ATAC-seq peaks using IDR (Irreproducible Discovery Rate)

peakspeak_typeprefix

versions idr log png

igv.js is an embeddable interactive genome visualization component

metaalignmentindex

meta browser align_files index_files versions

igv:

Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as teh genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements

A Python application to generate self-contained HTML reports for variant review and other genomic applications

metasitestrackstracks_indicesmeta2fastafai

versions report

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

metah5meta2ilpmeta3probs

meta versions out_tiff

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

metainput_imgmeta2ilp

meta versions output

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Strain-level comparisons across multiple inStrain profiles

metameta2profilesbamsstb_file

meta versions compare

instrain:

Calculation of strain-level metrics

inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification

metabamgenome_fastagenes_fastastb_file

meta versions profile

instrain:

Calculation of strain-level metrics

Produces protein annotations and predictions from an amino acids FASTA file

metafastainterproscan_database

tsv xml gff3 json versions

Download, extract, and check md5 of iPHoP databases

NO input

iphop_db versions

iphop:

Predict host genus from genomes of uncultivated phages.

Predict phage host using iPHoP

metafastaiphop_db

meta versions iphop_genus iphop_genome iphop_detailed_output

iphop:

Predict host genus from genomes of uncultivated phages.

Produces a Newick format phylogeny from a multiple sequence alignment using the maxium likelihood algorithm. Capable of bacterial genome size alignments.

metaalignmenttreetree_telmclustmdefpartitions_equalpartitions_proportionalpartitions_unlinkedguide_treesitefreq_inconstraint_treetrees_zsuptreetrees_rf

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

Genomic island prediction in bacterial and archaeal genomes

metagenome

gff log versions

Identify insertion sites positions in bacterial genomes

metareadsreferencequery

meta versions results

IsoSeq - Cluster - Cluster trimmed consensus sequences

metabam

meta versions bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq:

IsoSeq - Scalable De Novo Isoform Discovery

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq3:

IsoSeq3 - Scalable De Novo Isoform Discovery

Extract UMI and cell barcodes

metabamdesign

meta versions bam pbi

isoseq3:

Iso-Seq - Scalable De Novo Isoform Discovery

Generate a consensus sequence from a BAM file using iVar

metabamfastasave_mpileup

meta fasta qual mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

metabambaibed

meta bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Call variants from a BAM file using iVar

metabamfastafaigffsave_mpileup

meta tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

metavcfsbamssample_distsmeta2fastameta3fasta_faichr_norm

meta versions vcf

Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.

metanotebookparametersinput_files

meta report versions

jupytext:

Jupyter notebooks as plain text scripts or markdown documents

papermill:

Parameterize, execute, and analyze notebooks

nbconvert:

Parameterize, execute, and analyze notebooks

Taxonomic classification of metagenomic sequence data using a protein reference database

metareadsdb

meta versions results

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.

metatsv

meta versions txt

kaiju:

Fast and sensitive taxonomic classification for metagenomics

write your description here

metaresultstaxon_rank

meta versions results

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Merge two tab-separated output files of Kaiju and Kraken in the column format

metakaijukrakendb

meta merged versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Make Kaiju FMI-index file from a protein FASTA file

metafasta

meta versions fmi

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Aligns sequences using kalign

metafastacompress

meta alignment versions

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

Create kallisto index

metafasta

meta index versions

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

metareadsindexgtfchromosomesfragment_lengthfragment_length_sd

meta versions log abundance abundance_hdf5 run_info

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

quantifies scRNA-seq data from fastq files using kb-python.

metareadsindext2gt1ct2cworkflow_modetechnology

meta count versions

kb:

kallisto and bustools are wrapped in an easy-to-use program called kb

index creation for kb count quantification of single-cell data.

fastagtfworkflow_mode

versions kb_ref_idx t2g cdna intron cdna_t2c intron_t2c

kb:

kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.

Creates a histogram of the number of distinct k-mers having a given frequency.

metareads

meta versions hist json png ps pdf jellyfish_hash

kat:

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts

Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.

pe_readsse_readsname

versions reads

khmer:

khmer k-mer counting library

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

fastakmer_size

report kmers versions

khmer:

khmer k-mer counting library

Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).

metafastas

meta versions txt

Generate k-mers (sketches) from FASTA/Q sequences

metasequences

meta versions outdir info

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

metacompute_dir

meta versions kmcp log

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

metasearch_out

meta versions result

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

metasearch_resultsdb

meta versions profile

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

metareadsdb

meta result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Produces annotation using kofamscan against a Profile database and a KO list

metafastaprofilesko_list

txt tsv versions

Adds fasta files to a Kraken2 taxonomic database

metafastataxonomy_namestaxonomy_nodesaccession2taxid

meta db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

metadbcleaning

meta db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Classifies metagenomic sequence data

metareadsdbsave_output_fastqssave_reads_assignment

meta classified_reads_fastq unclassified_reads_fastq classified_reads_assignment report versions

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Takes multiple kraken-style reports and combines them into a single report file

metakreports

meta versions txt

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Extract reads classified at any user-specified taxonomy IDs.

metataxidclassified_reads_assignmentclassified_reads_fastqreport

meta extracted_kraken2_reads versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Takes a Kraken report file and prints out a krona-compatible TEXT file

metakreport

meta versions krona

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Download and build (custom) KrakenUniq databases

metacustom_library_dircustom_taxonomy_dir

meta versions db

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Download KrakenUniq databases and related fles

metapattern

meta versions output

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Classifies metagenomic sequence data using unique k-mer counts

metasequencessequence_typedbram_chunk_sizesave_output_readsreport_filesave_output

meta classified_reads unclassified_reads classified_assignment report versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

KronaTools Update Taxonomy downloads a taxonomy database

NO input

versions db

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.

metadatabasereport

versions html

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Creates a Krona chart from text files listing quantities and lineages.

metareport

meta versions html

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Update Taxonomy downloads a taxonomy database

NO input

versions db

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Makes a dotplot (Oxford Grid) of pair-wise sequence alignments

metamafformatannot_aannot_b

meta png gif versions

last:

LAST finds & aligns related regions of sequences.

Aligns query sequences to target sequences indexed with lastdb

metafastxparam_fileindex

meta versions maf multiqc

last:

LAST finds & aligns related regions of sequences.

Prepare sequences for subsequent alignment with lastal.

metafastx

meta versions index

last:

LAST finds & aligns related regions of sequences.

Converts MAF alignments in another format.

metamafformat

meta versions axt_gz blast_gz blasttab_gz chain_gz gff_gz html_gz psl_gz sam_gz tab_gz

last:

LAST finds & aligns related regions of sequences.

Reorder alignments in a MAF file

metamaf

meta versions maf

last:

LAST finds & aligns related regions of sequences.

Post-alignment masking

metamaf

meta versions maf

last:

LAST finds & aligns related regions of sequences.

Find split or spliced alignments in a MAF file

metamaf

meta versions maf multiqc

last:

LAST finds & aligns related regions of sequences.

Find suitable score parameters for sequence alignment

metafastxindex

meta versions param_file multiqc

last:

LAST finds & aligns related regions of sequences.

Align sequences using learnMSA

metafastacompress

meta alignment versions

learnmsa:

learnMSA: Learning and Aligning large Protein Families

Bayesian reconstruction of ancient DNA fragments

metareads

meta versions bam fq_pass fq_fail unmerged_r1_fq_pass unmerged_r1_fq_fail unmerged_r2_fq_pass unmerged_r2_fq_pass log

Typing of clinical and environmental isolates of Legionella pneumophila

metaseqs

meta versions tsv

Index chain files for lift over

metafaichain

meta clft versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

Converting aligned short and long reads records from one reference to another

metainputmeta_refclft

meta bam versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

Uses Liftoff to accurately map annotations in GFF or GTF between assemblies of the same, or closely-related species

metatarget_faref_faref_annotationref_db

meta gff3 polished_gff3 unmapped_txt versions

lima - The PacBio Barcode Demultiplexer and Primer Remover

metaccsprimers

meta bam pbi xml json clips counts guess report summary fasta fastagz fastq fastqgz versions

runs a differential expression analysis with Limma

metacontrast_variablereferencetargetmeta2samplesheeetintensities

results md_plot rdata model session_info versions

limma:

Linear Models for Microarray Data

Serogrouping Listeria monocytogenes assemblies

metafasta

meta versions tsv

Lofreq subcommand to for insert base and indel alignment qualities

metabamfasta

meta versions bam

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments

metabamintervalsfasta

meta versions vcf

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

It predicts variants using multiple processors

metabambaiintervalsmeta2fastameta3fai

meta versions vcf

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors

Lofreq subcommand to remove variants with low coverage or strand bias potential

metavcf

meta versions vcf

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Inserts indel qualities in a BAM file

metabammeta2fasta

meta versions bam

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

metameta2meta3tumortumor_indexnormalnormal_indexfastafaitarget_bed

meta versions vcf

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

metabammeta2fasta

meta versions bam

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

metabambaisnpssvsmodsmeta2fastameta3fai

meta versions vcf

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

metafasta

meta scn gff versions

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain

metafasta

meta versions gff3 scn

LTR_HARVEST_parallel:

A Perl wrapper for LTR_harvest

gt:

The GenomeTools genome analysis system

Identifies LTR retrotransposons using LTR_retriever

metagenomeharvestfindermgescannon_tgca

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments

metafastapass_listannotation_outmonoploid_seqs

meta log lai_out versions

lai:

Assessing genome assembly quality using the LTR Assembly Index (LAI)

Identifies LTR retrotransposons using LTR_retriever

metagenomeharvestfindermgescannon_tgca

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

metafasta

meta versions amp_prediction smorfs all_orfs readme_file log_file

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

metaipbamcontrolbammacs2_gsize

versions peak xls gapped bed bdg

macs2:

Model Based Analysis for ChIP-Seq data

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

metaipbamcontrolbammacs3_gsize

meta versions peak xls gapped bed bdg

macs3:

Model Based Analysis for ChIP-Seq data

Multiple sequence alignment using MAFFT

metafastameta2addmeta3addfragmentsmeta4addfullmeta5addprofilemeta6addlongcompress

meta versions fas

pigz:

Parallel implementation of the gzip algorithm.

mageck count for functional genomics, reads are usually mapped to a specific sgRNA

metalibraryinputfile

meta versions norm count

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

maximum-likelihood analysis of gene essentialities computation

metacount_tabledesign_matrix

meta versions gene_summary sgrna_summary

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.

metacount_table

meta versions gene_summary sgrna_summary r_script

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Multiple Sequence Alignment using Graph Clustering

metameta2fastatreecompress

meta versions alignment

magus:

Multiple Sequence Alignment using Graph Clustering

Multiple Sequence Alignment using Graph Clustering

metafasta

meta versions tree

magus:

Multiple Sequence Alignment using Graph Clustering

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

fastasgffmapping_db

versions index log

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

metafastqsindex

versions rma6 alignments log

malt:

A tool for mapping metagenomic data

Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening

metarma6taxon_listncbi_dir

versions results

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

metavcfmeta2fasta

meta versions vcf tbi

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

metainputindextarget_bedtarget_bed_tbimeta2fastameta3faiconfig

meta candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

metainput_normalinput_index_normalinput_tumorinput_index_tumortarget_bedtarget_bed_tbimeta2fastameta3faiconfig

meta candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

metainputinput_indextarget_bedtarget_bed_tbimeta2fastameta3faiconfig

meta candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Create mapAD index for reference genome

metafasta

meta versions index

mapad:

An aDNA aware short-read mapper

Map short-reads to an indexed reference genome

metareadsmeta2indexmismatch_parameterdouble_stranded_libraryfive_prime_overhangthree_prime_overhangdeam_rate_double_strandeddeam_rate_single_strandedindel_rate

meta versions bam

mapad:

An aDNA aware short-read mapper

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

metabamfasta

meta versions runtime_log fragmisincorporation_plot length_plot misincorporation pctot_freq pgtoa_freq dnacomp lgdistribution stats_out_mcmc_hist stats_out_mcmc_iter stats_out_mcmc_trace stats_out_mcmc_iter_summ_stat stats_out_mcmc_post_pred stats_out_mcmc_correct_prob dnacomp_genome rescaled fasta folder

Calculate Mash distances between reference and query seqeunces

metareferencequery

meta versions dist

mash:

Fast sequence distance estimator that uses MinHash

Screens query sequences against large sequence databases

metaquerysequence_sketch

meta versions screen

mash:

Fast sequence distance estimator that uses MinHash

Creates vastly reduced representations of sequences using MinHash

metareads

meta mash stats versions

mash:

Fast sequence distance estimator that uses MinHash

Quickly create a tree using Mash distances

metaseqs

meta versions tree matrix

MaxBin is a software that is capable of clustering metagenomic contigs

metacontigsreadsabund

meta versions binned_fastas summary log marker unbinned_fasta tooshort_fasta marker_genes

Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"

metarawfastaparfile

meta versions maxquant_txt

maxquant:

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.

Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.

metaimagemeta2maskmeta3markerfile

meta versions csv

Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation

metafasta

meta versions tsv fa

Create MD5 (128-bit) checksums

metafiles

meta versions checksum

A tool to create consensus sequences and variant calls from nanopore sequencing data

metareadsassembly

meta versions assembly

An ultra-fast metagenomic assembler for large and complex metagenomics

metareads

meta versions contigs k_contigs addi_contigs local_contigs kfinal_contigs

Analyses a DAA file and exports information in text format

metadaamegan_summary

meta versions txt_gz megan

megan:

A tool for studying the taxonomic content of a set of DNA reads

Analyses an RMA file and exports information in text format

metarma6megan_summary

meta versions txt megan_summary

megan:

A tool for studying the taxonomic content of a set of DNA reads

Serotyping of Neisseria meningitidis assemblies

metafasta

meta versions tsv

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

metafasta_assemblymeta1meryl_db_readslookup_tableseqmerspeak

meta versions hist log_stderr

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

k-mer based assembly evaluation.

metameryl_dbassembly

meta versions assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_cn_hist spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_asm_hist assembly_qv scaffold_qv read_ploidy

A script to generate hap-mer dbs for trios

metachild_merylmaternal_merylpaternal_meryl

meta versions mat_hapmer_meryl pat_hapmer_meryl inherited_hapmers_fl_png inherited_hapmers_ln_png inherited_hapmers_st_png

merqury:

Evaluate genome assemblies with k-mers and more.

k-mer based assembly evaluation.

metameryl_dbassembly

meta versions assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_cn_hist spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_asm_hist assembly_qv scaffold_qv read_ploidy hapmers_blob_png

merqury:

Evaluate genome assemblies with k-mers and more.

A reimplemenation of Kat Comp to work with FastK databases

metafastk1_histfastk1_ktabfastk2_histfastk2_ktab

meta versions filled_png line_png stacked_png filled_pdf line_pdf stacked_pdf

merquryfk:

FastK based version of Merqury

A reimplemenation of KatGC to work with FastK databases

metafastk_histfastk_ktab

meta versions filled_gc_plot_png filled_gc_plot_pdf line_gc_plot_png line_gc_plot_pdf stacked_gc_plot_png stacked_gc_plot_pdf

merquryfk:

FastK based version of Merqury

FastK based version of Merqury

metafastk_histfastk_ktabassemblyhaplotigs

meta versions stats bed spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_cn_fl_pdf spectra_cn_ln_pdf spectra_cn_st_pdf spectra_asm_fl_pdf spectra_asm_ln_pdf spectra_asm_st_pdf assembly_qv qv

merquryfk:

FastK based version of Merqury

An improved version of Smudgeplot using FastK

metafastk_histfastk_ktab

meta versions filled_ploidy_plot_png filled_ploidy_plot_pdf line_ploidy_plot_png line_ploidy_plot_pdf stacked_ploidy_plot_png stacked_ploidy_plot_pdf

merquryfk:

FastK based version of Merqury

A genomic k-mer counter (and sequence utility) with nice features.

metareadskvalue

meta versions meryl_db

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

metameryl_dbkvalue

meta versions hist

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

metameryl_dbskvalue

meta versions meryl_db

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

Depth computation per contig step of metabat2

metabambai

meta versions depth

metabat2:

Metagenome binning

Metagenome binning of contigs

metafastadepth

meta versions fasta tooshort lowdepth unbinned membership

metabat2:

Metagenome binning

Annotation of eukaryotic metagenomes using MetaEuk

metafastadatabase

meta versions faa codon tsv gff

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Strain-level metagenomic assignment

metaclassification_resmeta_filemeta_unmappedreadsLengthspara_filedatabase_folder

meta versions wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Maps long reads to a metamaps database

metareadsdatabase

meta versions classification_res meta_file meta_unmappedreadsLengths para_file

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Build MetaPhlAn database for taxonomic profiling.

NO input

db versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn4

metaprofiles

meta versions txt

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

metainputmetaphlan_db

meta versions profile biom bowtie2out

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn3

metaprofiles

meta versions txt

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

metainputmetaphlan_db

meta versions profile biom bowtie2out

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Extracts per-base methylation metrics from alignments

metafastafaibambai

meta bedgraph methylkit versions

methyldackel:

Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.

Generates methylation bias plots from alignments

metafastafaibambai

meta txt versions

methyldackel:

Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.

A tool to estimate bacterial species abundance

metareadsdbmode

meta versions results

midas:

An integrated pipeline for estimating strain-level genomic variation from metagenomic data

marks duplicate spots along gridline edges.

metaspot_table

meta versions marked_dups_spots

mindagap:

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

metapanorama

meta versions tiff

mindagap:

Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.

Minia is a short-read assembler based on a de Bruijn graph

metareads

meta contigs unitigs h5 versions

A very fast OLC-based de novo assembler for noisy long reads

metareadspaf

meta versions gfa assembly

A versatile pairwise aligner for genomic and spliced nucleotide sequences

metareadsmeta2referencebam_formatbam_index_extensioncigar_paf_formatcigar_bam

meta paf bam index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Provides fasta index required by minimap2 alignment.

metafasta

meta index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

metapepmeta2ref

meta paf gff versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

Provides fasta index required by miniprot alignment.

metafasta

meta index versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

miRanda is an algorithm for finding genomic targets for microRNAs

metaquerymirbase

meta txt versions

Download a mitochondrial genome to be used as reference for MitoHiFi

species

versions fasta gb

findMitoReference.py:

Fetch mitochondrial genome in Fasta and Genbank format from NCBI

A python workflow that assembles mitogenomes from Pacbio HiFi reads

inputref_faref_gbinput_modecode

versions fasta gb gff all_potential_contigs contigs_annotations contigs_circularization contigs_filtering coverage_mapping coverage_plot final_mitogenome_annotation final_mitogenome_choice final_mitogenome_coverage potential_contigs reads_mapping_and_assembly shared_genes versions

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

Run Torsten Seemann's classic MLST on a genome assembly

metafasta

meta versions tsv

Cluster sequences using MMSeqs2 cluster.

metainput_db

meta db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create an MMseqs database from an existing FASTA/Q file

metasequence

meta db versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Creates sequence index for mmseqs database

metadb

versions db_indexed

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create a tsv file from a query and a target database as well as the result database

metadb_resultmeta2db_querymeta3db_target

meta tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Download an mmseqs-formatted database

database

versions database

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a databse using MMseqs2

metafastameta2db_target

meta tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences in linear time using MMSeqs2 linclust.

metainput_db

meta db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Search and calculate a score for similar sequences in a query and a target database.

metaquery_dbmetatarget_db

meta versions db_search

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Computes the lowest common ancestor by identifying the query sequence homologs against the target database.

metadb_querydb_target

meta db_taxonomy versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Conversion of expandable profile to databases to the MMseqs2 databases format

db

versions db_exprofile

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

A tool to reconstruct plasmids in bacterial assemblies

metafasta

meta versions chromosome contig_report plasmids mobtyper_results

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

A bioinformatics tool for working with modified bases

metabambaimeta2fastameta3bed

meta versions bed bedgraph log

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data

Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.

metaimage

meta versions img_clahe

molkartgarage:

One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.

Calculates genome-wide sequencing coverage.

metabambaibedmeta2fasta

meta global_txt regions_txt summary_txt per_base_bed per_base_csi per_base_d4 regions_bed regions_csi quantized_bed quantized_csi thresholds_bed thresholds_csi versions

Download the mOTUs database

motus_downloaddb

versions db

motus:

The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

Taxonomic meta-omics profiling using universal marker genes

inputdbprofile_version_yml

versions txt biom

motus:

Marker gene-based OTU (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

metareadsdb

meta versions out bam mgc log

motus:

Marker gene-based OTU (mOTU) profiling

Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data

metanormal_bamnormal_baitumor_bamtumor_baihomopolymers

meta versions txt txt txt txt

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

Scan a reference genome to get microsatellite & homopolymer information

metafasta

meta versions txt

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

msi

msisensor2 detection of MSI regions.

metatumor_bamnormal_bamintervalsmodels

meta msi distribution somatic versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

msi

msisensor2 detection of MSI regions.

fastaoutput

versions output

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

metanormalnormal_indextumortumor_indexintervalsfastamsisensor_scan

meta output_report output_dis output_germline output_somatic versions list

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

metafasta

meta versions list

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

Aligns protein structures using mTM-align

metapdbscompress

meta alignment structure versions

mTM-align:

Algorithm for structural multiple sequence alignments

pigz:

Parallel implementation of the gzip algorithm.

A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.

metabammt_id

meta versions mtnucratio json

Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.

metabamgtfindex

meta versions bam rad

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.

gtf

versions index

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Aggregate results from bioinformatics analyses across many samples into a single report

multiqc_filesmultiqc_configextra_multiqc_configmultiqc_logoreplace_namessample_names

report data plots versions

multiqc:

MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

vcfsfastasnpeff_resultsgffallele_freqsgenotype_qualitycoveragehomozygous_freqheterozygous_freqgff_exclude

versions bam full_alignment info_txt snp_alignment snp_genome_alignment snpstatistics snptable snptable_snpeff snptable_uncertainty structure_genotypes structure_genotypes_nomissing json

MUMmer is a system for rapidly aligning entire genomes

metarefquery

meta versions coords

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

fasta

aligned_fasta msf clustalw phyi phys html tree log versions

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

metafastacompress

meta versions alignment

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

Fetch the GO concepts for a list of genes

metagene_list

meta gmt tsv versions

AMR predictions for supported species

metaseqsspecies

meta versions csv json

mykrobe:

Antibiotic resistance prediction in minutes

Compare multiple runs of long read sequencing data and alignments

metafilelist

versions meta report_html lengths_violin_html log_length_violin_html n50_html number_of_reads_html overlay_histogram_html overlay_histogram_normalized_html overlay_log_histogram_html overlay_log_histogram_normalized_html total_throughput_html quals_violin_html overlay_histogram_identity_html overlay_histogram_phredscore_html percent_identity_violin_html active_pores_over_time_html cumulative_yield_plot_gigabases_html sequencing_speed_over_time_html stats_txt

Filtering and trimming of Oxford Nanopore Sequencing data

metareadssummary_file

meta versions filtreads log_file

DNA contaminant removal using NanoLyse

metafastqfasta

meta fastq log versions

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

metabambai

meta insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

metafastqsummary_txt

meta html png txt log versions

Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.

metaontreadsoutput_format

meta stats reads versions

Performs fastq alignment to a reference using NARFMAP

metareadsmeta2hashmap

bam versions

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

metafasta

meta hashmap versions

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

A tool to quickly download assemblies from NCBI's Assembly database

metaaccessionstaxidsgroups

meta versions gbk fna rm features gff faa gpff wgs_gbk cds rna rna_fna report stats

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

metafasta_fileadapters_database_file

meta versions vecscreen_output

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

datasetreferencetag

versions prefix

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

metadatasetfasta

meta versions csv json json_tree tsv

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

Performs fastq alignment to a fasta reference using NextGenMap

metareadsfasta

meta bam versions

bwa:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime

Serotyping Neisseria gonorrhoeae assemblies

metafasta

meta versions tsv

Merging paired-end reads and removing sequencing adapters.

metareads

meta versions merged_reads unstitched_read1 unstitched_read2

Determines the gender of a sample from the BAM/CRAM file.

metabambaimeta2fastameta3fastamethod

meta versions tsv

ngsbits:

Short-read sequencing tools

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.

metareadsmeta2snp_pt

meta vaf versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.

metafilesmeta2snp_bedmeta3fasta

versions pdf corr_matrix matched all vcf

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

metabedmeta2fastabowtie_index

meta versions pt

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

metavafs

meta pdf corr_matrix matched all versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

write your description here

metareadsformatmode

meta versions npa npc npl npo

Visualise metagenome redundancy curve from a single Nonpareil npo file

metanpo

meta versions png

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Calculate metagenome redundancy curve from FASTQ files

metareadsformatmode

meta versions npa npc npl npo

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Visualise metagenome redundancy curves from multiple Nonpareil npo file in a single image

metanpos

meta versions png

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

metarefquery

meta versions delta coords

An nf-core module for the OATK

metareadsmito_hmmmito_hmm_h3fmito_hmm_h3imito_hmm_h3mmito_hmm_h3ppltd_hmmpltd_hmm_h3fpltd_hmm_h3ipltd_hmm_h3mpltd_hmm_h3p

meta versions mito_fasta pltd_fasta mito_bed pltd_bed mito_gfa pltd_gfa annot_mito_txt annot_pltd_txt clean_gfa final_gfa initial_gfa multiplex_gfa unzip_gfa

Construct a dynamic succinct variation graph in ODGI format from a GFAv1.

metagraph

meta versions og

odgi:

An optimized dynamic genome/graph implementation

Draw previously-determined 2D layouts of the graph with diverse annotations.

metagraphlay

meta versions png

odgi:

An optimized dynamic genome/graph implementation

Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.

metagraph

meta versions lay tsv

odgi:

An optimized dynamic genome/graph implementation

Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.

metagraph

meta versions sorted_graph

odgi:

An optimized dynamic genome/graph implementation

Squeezes multiple graphs in ODGI format into the same file in ODGI format.

metagraphs

meta graph versions

odgi:

An optimized dynamic genome/graph implementation

Metrics describing a variation graph and its path relationship.

metagraph

meta versions tsv yaml

odgi:

An optimized dynamic genome/graph implementation

Merge unitigs into a single node preserving the node order.

metagraph

meta versions unchopped_graph

odgi:

An optimized dynamic genome/graph implementation

Project a graph into other formats.

metagraph

meta versions gfa

odgi:

An optimized dynamic genome/graph implementation

Visualize a variation graph in 1D.

metagraph

meta versions png

odgi:

An optimized dynamic genome/graph implementation

Calls CNVs in bam files from tumor patients

metanormalnormal_indextumortumor_indexbedfasta

png profile summary versions

Create a decoy peptide database from a standard FASTA database.

metafasta

meta versions fasta

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Filters peptide/protein identification results by different criteria.

metaid_filefilter_file

meta versions filtered

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Merges several idXML files into one idXML file.

metaidxmls

meta versions idxml

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Split a merged identification file into their originating identification files

metamerged_idxml

meta versions idxmls

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Switches between different scores of peptide or protein hits in identification data

metaidxml

meta versions idxml

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)

metamzml

meta versions mzml

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Refreshes the protein references for all peptide hits.

metaid_fileid_fasta

meta versions id_file_pi

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Perform HLA-I typing of sequencing data

metabambai

meta versions hla_type coverage_plot

OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.

metafastas

meta versions orthofinder

A program to convert bam into paf.

metabam

meta paf versions

paftools:

A program to manipulate paf files / convert to and from paf.

a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates

metapair

meta versions index

Find and remove PCR/optical duplicates

metainput

meta versions pairs stat

pairtools:

CLI tools to process mapped Hi-C data

Flip pairs to get an upper-triangular matrix

metasamchromsizes

meta versions flip

pairtools:

CLI tools to process mapped Hi-C data

Merge multiple pairs/pairsam files

metaallpairs

meta versions pairs

pairtools:

CLI tools to process mapped Hi-C data

Find ligation junctions in .sam, make .pairs

metabamchromsizes

meta versions pairsam stat

pairtools:

CLI tools to process mapped Hi-C data

Assign restriction fragments to pairs

metapairsfrag

meta versions restrict

pairtools:

CLI tools to process mapped Hi-C data

Select pairs according to given condition by options.args

metainput

meta versions selected unselected

pairtools:

CLI tools to process mapped Hi-C data

Sort a .pairs/.pairsam file

metainput

meta versions sorted

pairtools:

CLI tools to process mapped Hi-C data

Split a .pairsam file into .pairs and .sam.

metapairs

meta versions pairs bam

pairtools:

CLI tools to process mapped Hi-C data

Calculate pairs statistics

metapairs

meta versions stat

pairtools:

CLI tools to process mapped Hi-C data

Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file

metagfabed_subsetbed_excludetsv_groupby

meta versions tsv html

panacus:

panacus is a tool for computing counting statistics for GFA files

Create visualizations from a tsv coverage histogram created with panacus.

metatsv

meta versions image

panacus:

panacus is a tool for computing counting statistics for GFA files

A fast and scalable tool for bacterial pangenome analysis

metagff

meta versions results aln

panaroo:

panaroo - an updated pipeline for pangenome investigation

Phylogenetic Assignment of Named Global Outbreak LINeages

metafasta

report versions

star:

Phylogenetic Assignment of Named Global Outbreak LINeages

NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).

metainputinput_indexbqsr_tableinterval_filefasta

meta versions bam bai

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

metavcf_filedbsnp_filetabix_file

meta versions ann_vcf

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

metaref_metainputinput_indexinterval_filefasta

meta vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.

metareadsinterval_filemeta2fastameta3indexknown_sites

meta versions bam bai qc_metrics bqsr_table duplicate_metrics

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs

metaref_metainputfasta

meta versions vcf

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

metaref_metainputinput_indexinterval_filefasta

meta versions vcf

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.

metagvcf

meta gvcf_index versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

metatumor_bamtumor_bam_indexnormal_bamnormal_bam_indexinterval_fileref_metafastapanel_of_normalspanel_of_normals_index

meta vcf stats versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Paraclu finds clusters in data attached to sequences.

metabedmin_cluster

meta versions bed

Determines the depth in a BAM/CRAM file

metameta2meta3inputinput_indexfastafasta_fai

meta versions depth binned_depth

paragraph:

Graph realignment tools for structural variants

Genotype structural variants using paragraph and grmpy

metavariantsvariants_indexreadsreads_indexmanifestmeta2fastameta3fasta_fai

meta versions vcf json

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

metavcf

meta versions graph

paragraph:

Graph realignment tools for structural variants

HiFi-based caller for highly homologous genes

metabambaimeta2fastameta3config

meta versions bam bai json vcf

Serogroup Pseudomonas aeruginosa assemblies

metafasta

meta versions tsv blast details

The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.

metabam

meta versions bam pbi

pbbam:

PacBio BAM C++ library

Pacbio ccs - Generate Higly Accurate Single-Molecule Consensus Reads

metabampbichunk_numchunk_on

meta versions bam pbi report_txt report_json metrics

Assign PBP type of Streptococcus pneumoniae assemblies

metafastadb

meta versions tsv blast

converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq

metabampbi

meta versions fastq

pbtk:

pbtk - PacBio BAM toolkit

Minimalistic tool which creates an index file that enables random access into PacBio BAM files

metabam

meta versions pbi

pbtk:

pbtk - PacBio BAM toolkit

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.

metareads

meta versions assembled unassembled discarded

Manipulation, validation and exploration of pedigrees

metavcfpedvcf_tbi

meta ped html csv png versions

Runs PEKA CLIP peak k-mer analysis

metapeakscrosslinksfastafaigtf

cluster distribution pdf versions

"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."

metabam

meta versions spp pdf rdata

Install databases necessary for Pharokka's functional analysis

NO input

pharokka_db versions

pharokka:

Fast Phage Annotation Program

Functional annotation of phages

metaphage_fastapharokka_db

meta log cds_functions card vfdb mash reoriented versions

pharokka:

Fast Phage Annotation Program

Predict prophages in bacterial genomes

metagbk

meta coordinates gbk log information bacteria_fasta bacteria_gbk phage_fasta phage_gbk prophage_gff prophage_tbl prophage_tsv versions

phispy:

Prophage finder using multiple metrics

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

metareadssliva_dbunivec_db

meta results versions

Assigns all the reads in a file to a single new read-group

metameta2meta3readsfastafasta_index

meta versions bam bai cram

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates an interval list from a bed file and a reference dict

metabedmeta2dictarguments_file

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

metasam

meta versions sam

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

metabambaibait_intervalstarget_intervalsmeta2fastameta3faimeta4dict

meta versions metrics

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the insert size distribution of a paired-end library.

metabam

meta versions pdf metrics

picard:

Java tools for working with NGS data in the BAM format

Collect multiple metrics from a BAM file

metabambaimeta2fastameta3fai

meta metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics from a RNAseq BAM file

metabamref_flatgene_predfastarrna_intervals

meta metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

metabambaimeta2fastameta3faiintervallist

meta metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates a sequence dictionary for a reference sequence.

metafasta

meta versions reference_dict

picard:

Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

Checks that all data in the set of input files appear to come from the same individual

metainput1input1_indexinput2input2_indexhaplotype_mapmeta2fastafasta_index

meta crosscheck_metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.

metareferencehaplotype_map

fingerprint versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Converts a FASTQ file to an unaligned BAM or SAM file.

metareads

meta versions bam

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

metabamfilterreadlist

meta bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Verify mate-pair information between mates and fix if needed

metabam

meta versions bam

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Lifts over a VCF file from one reference build to another.

metainput_vcfmeta2fastameta3dictmeta4chain

meta versions vcf_lifted vcf_unlifted

picard:

Move annotations from one assembly to another

Locate and tag duplicate reads in a BAM file

metareadsmeta2fastameta3fai

meta bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Merges multiple BAM files into a single file

metabam

meta bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

metavcf

meta versions vcf

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases

metafastameta2faimeta3dict

meta versions intervals

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts BAM/SAM files based on a variety of picard specific criteria

metabamsort_order

meta versions bam

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts vcf files

metavcfmeta2fastameta3dict

meta versions vcf

picard:

Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format

Compresses files with pigz.

metaraw_file

meta archive versions

pigz:

Parallel implementation of the gzip algorithm.

write your description here

metazip

meta file versions

pigz:

Parallel implementation of the gzip algorithm.

Automatically improve draft assemblies and find variation among strains, including large event detection

metafastameta2bambaipilon_mode

meta versions improved_assembly change_record vcf tracks_bed tracks_wig

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

metabambaibedfastafai

meta versions bp cem del dd int_{final inv li rp si td

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Main caller script for peak calling

metabams

meta versions divergent_TREs bidirectional_TREs unidirectional_TREs peakcalling_log

pints:

Peak Identifier for Nascent Transcripts Starts (PINTS)

Pangenome toolbox for bacterial genomes

metagff

meta versions results aln

Identify plasmids in bacterial sequences and assemblies

metaseqs

meta versions json txt tsv genome_seq plasmid_seq

assembles bacterial plasmids

metascaffoldfasta

meta html tab images logs data database fasta_files kmer versions

Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data

metatumor_filetummor_file_baicontrol_filecontrol_file_baifastafaiskipregions_file

meta vcf tbi log version

Analyses binary variant call format (BCF) files using plink

metabcf

meta versions bed bim fam

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

metameta2meta3meta4bedbimfambcfvcfphe

meta versions epi episummary log nosex

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Exclude variant identifiers from plink bfiles

metabedbimfamvariants

meta versions bed bim fam

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Subset plink bfiles with a text file of variant identifiers

metabedbimfamvariants

meta versions bed bim fam

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

metameta2meta3meta4bedbimfambcfvcfphe

meta versions fepi fepisummary flog fnosex

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate GWAS association studies

metameta2meta3meta4bedbimfambcfvcfphe

meta assoc log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate Hardy-Weinberg statistics for provided input

metameta2meta3bedbimfamvcfbcf

meta versions hwe

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.

metabedbimfamwindow_sizevariant_countvariance_inflation_factor

meta versions prunein pruneout

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.

metabedbimfamwindow_sizevariant_countr2_threshold

meta versions prunein pruneout

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

LD analysis in PLINK examines genetic variant associations within populations

metameta2meta3meta4bedbimfamvcfbcfsnpfile

meta versions ld log nosex

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Recodes plink bfiles into a new text fileset applying different modifiers

metabedbimfam

meta versions ped map txt raw traw beagle-dat chr-dat chr-map geno pheno pos phase info lgen list gen genz sample rlist strctin tped tfam vcf vcfgz

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Analyses variant calling files using plink

metavcf

meta versions bed bim fam

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Subset plink pfiles with a text file of variant identifiers

metapgenpsampvarvariants

meta versions extract_pgen extract_psam extract_pvar

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Apply a scoring system to each sample in a plink 2 fileset

metapgenpsampvarscorefile

meta versions score

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Import variant genetic data using plink2

metavcf

meta versions pgen psam pvar

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

pmdtools command to filter ancient DNA molecules from others

metabambaithresholdreference

meta versions bam

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

metareads

meta versions xml txt

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.

metaplp_prefixbamdonor_genotype

meta versions demuxlet_result

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.

metabamvcf

meta versions cel plp var umi

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.

metaplpn_sample

meta versions result vcf lmix singlet_result singlet_vcf

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

metareads

meta versions reads log

Adapter removal and demultiplexing of Oxford Nanopore reads

metareads

meta versions reads log

porechop:

Adapter removal and demultiplexing of Oxford Nanopore reads

Software for predicting library complexity and genome coverage in high-throughput sequencing

metabam

meta versions ccurve log

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Software for predicting library complexity and genome coverage in high-throughput sequencing

metabam

meta versions lc_extrap log

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Calculate pairwise nucleotide identity with respect to a reference sequence

metafastameta2referencecompress

versions valid_fasta valid_fasta report log

Filter reads by quality score.

metareads

meta versions reads logs log_tab

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

converts sam/bam/cram/pairs into genome contact map

metainputinputinputinput

meta versions pretext

a module to generate images from Pretext contact maps.

metapretext_map

meta versions image

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

metareads

meta versions good_reads single_reads bad_reads log

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

metagenomeoutput_format

meta versions nucleotide_fasta amino_acid_fasta all_gene_annotations gene_annotations

Whole genome annotation of small genomes (bacterial, archeal, viral)

metafastaproteinsprodigal_tf

meta versions gff gbk fna faa ffn sqn fsa tbl err log txt tsv

Perform Gene Ratio Enrichment Analysis

metameta2adjgmt

meta enrichedGO versions

grea:

Gene Ratio Enrichment Analysis

Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)

metacount

meta logratio session_info versions

propr:

Logratio methods for omics data

Perform differential proportionality analysis

metacountmeta2samplesheet

meta propd results fdr adj warnings session_info versions

propr:

Logratio methods for omics data

Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.

metacount

meta propr matrix fdr adj session_info versions

propr:

Logratio methods for omics data

corpcor:

Efficient Estimation of Covariance and (Partial) Correlation

Proteinortho is a tool to detect orthologous genes within different species.

metafasta_files

versions orthologgroups orthologgraph blastgraph

reads a maxQuant proteinGroups file with Proteus

metasamplesheetintensitiesmeta2contrast_variable

dendro_plot mean_var_plot raw_dist_plot norm_dist_plot raw_rdata norm_rdata raw_tab norm_tab session_info versions

proteus:

R package for analysing proteomics data

PureCLIP is a tool to detect protein-RNA interaction footprints from single-nucleotide CLIP-seq data, such as iCLIP and eCLIP.

metameta2ipbamcontrolbamipbaicontrolbaiinput_controlgenome_fasta

meta versions crosslinks peaks

Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.

metabambaiintervals

meta txt png loess_qc_txt loess_txt versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Generate on and off-target intervals for PureCN from a list of targets

metatarget_bedmeta2fastagenome

meta txt bed versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.

metacoverage_filesnormal_vcfgenomeassay

rds png bias_rds bias_bed low_cov_bed versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Run PureCN workflow to normalize, segment and determine purity and ploidy

metaintervalscoveragenormaldbgenome

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample.seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Calculate coverage cutoffs to determine when to purge duplicated sequence.

metastat

meta versions cutoff log

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Separates out sequences purged of falsely duplicated sequences.

metaassemblybed

meta versions haplotigs purged

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Plots the read coverage from a purge dups statistics file and cutoffs.

metastatfilecutoff

meta versions png

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Create read depth histogram and base-level read depth for an assembly based on pacbio data

metapaf_alignment

meta versions stat basecov

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Purge haplotigs and overlaps for an assembly

metabasecovcutoffpaf

meta versions bed log

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Split fasta file by 'N's to aid in self alignment for duplicate purging

metaassembly

meta versions split_fasta

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

write your description here

metasummary

meta html json versions

Damage parameter estimation for ancient DNA

metabambai

meta versions csv

pydamage:

Damage parameter estimation for ancient DNA

Damage parameter estimation for ancient DNA

metacsv

meta versions csv

pydamage:

Damage parameter estimation for ancient DNA

Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.

metafastaoutput_format

meta versions annotations faa fna score

Demultiplexer for Nanopore samples

metareads

meta reads versions

Evaluate alignment data

metabamgff

meta results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

metabacrammgfffasta

meta results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

metabammeta2gtf

meta results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Render a Quarto notebook, including parametrization.

metanotebookparametersinput_filesextensions

meta html notebook artifacts params_yaml extensions versions

papermill:

Parameterize, execute, and analyze notebooks

Quality Assessment Tool for Genome Assemblies

consensusfastagff

quast report misassemblies transcriptome unaligned versions

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

metabamsbaisreference_haplotype_filereference_legend_filechrregions_startregions_endbufferngengenetic_map_filemeta2posfilephasefilemeta3fasta

meta versions vcf tbi rdata plots

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

Consensus module for raw de novo DNA assembly of long uncorrected reads

metareadsassemblypaf

meta versions improved_assembly

Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.

alignment

versions phylogeny stockholm_alignment

Randomly subsample sequencing reads to a specified coverage

metareadsgenome_sizedepth_cutoff

meta versions reads

De novo genome assembler for long uncorrected reads.

metareads

meta versions fasta gfa

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.

alignment

versions phylogeny phylogeny_bootstrapped

Create a database for RepeatModeler

metafasta

meta db versions

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

Performs de novo transposable element (TE) family identification with RepeatModeler

metadb

meta fasta stk log versions

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

metafastqfastadb_pointdb_res

meta json disinfinder_kma pheno_table_species pheno_table pointfinder_kma pointfinder_prediction pointfinder_results pointfinder_table resfinder_blast resfinder_hit_in_genome_seq resfinder_kma resfinder_resistance_gene_seq resfinder_results_table resfinder_results_tab resfinder_results versions

resfinder:

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data

card

versions db tool_version db_version

rgi:

This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.

Predict antibiotic resistance from protein or nucleotide data

metafastacardwildcard

meta versions json tsv tmp tool_version db_version

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Markup VCF file using rho-calls.

metameta2vcftbirohbed

meta vcf versions

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Call regions of homozygosity and make tentative UPD calls

metavcfroh

meta versions bed wig

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Quality control of riboseq bam data

metabam_ribobai_ribometa2bam_tibai_timeta3fastagtfmeta4candidate_orfsmeta5para_ribometa6para_ribo

meta predictions all transprofile versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Quality control of riboseq bam data

metabambaimeta2gtf

meta txt pdf offset versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Accurate detection of short and long active ORFs using Ribo-seq data

metabam_ribobai_ribometa2candidate_orfs

meta protocol bam_summary read_length_dist metagene_profile_5p metagene_profile_3p metagene_plots psite_offsets pos_wig neg_wig orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Accurate detection of short and long active ORFs using Ribo-seq data

metafastagtf

meta candidate_orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data

metameta2meta3bamgtffasta

meta best_offset offset offset_plot psites codon_coverage_rpf codon_coverage_psite cds_coverage cds_window_coverage ribowaltz_qc versions

Render an rmarkdown notebook. Supports parametrization.

metanotebookparametersinput_files

meta report session_info versions

rmarkdown:

Dynamic Documents for R

Calculate pan-genome from annotated bacterial assemblies in GFF3 format

metagff

meta versions results aln

Ribosomal RNA extraction from a GTF file.

gtf

versions rrna_gtf

Calculate expression with RSEM

metareadsindex

counts_gene counts_transctips stat logs versions bam_star bam_genome bam_transcript

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Prepare a reference genome for RSEM

fastagtf

rsem transcript_fasta versions

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Generate statistics from a bam file

metabam

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Infer strandedness from sequencing reads

metabambed

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate inner distance between read pairs.

metabambed

distance freq mean pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

metabambed

bed interact_bed xls pdf events_pdf rscript log versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

metabambed

pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

metabambed

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate read duplication rate

metabambed

seq_xls pos_xls pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculte TIN (transcript integrity number) from RNA-seq reads

metabambaibed

txt xls versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

metainput1input2sam_rg

meta versions sdf

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Converts a PED file to VCF headers

metainput

meta versions output

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.

metainput

meta versions png svg

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set

metaquery_vcfquery_vcf_indextruth_vcftruth_vcf_indextruth_bedevaluation_bedsdf

meta versions tp_vcf tp_tbi fn_vcf fn_tbi fp_vcf fp_tbi baseline_vcf baseline_tbi snp_roc non_snp_roc weighted_roc summary phasing

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Uses the RTN R package for transcriptional regulatory network inference (TNI).

expression_matrix

tni tni_perm tni_bootstrap tni_filtered versions

rtn:

RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons

sage is a search software for proteomics data

meta"*.mzML"meta2fasta_proteomemeta3base_config

meta versions results_tsv results_json results_pin tmt_tsv lfq_tsv

sageproteomics:

Proteomics searching so fast it feels like magic.

Create index for salmon

genome_fastatranscriptome_fasta

index versions

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

gene/transcript quantification with Salmon

metareadsindexgtftranscript_fastaalignment_modelib_type

results json_info versions

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

SALSA, A tool to scaffold long read assemblies with HiC

metafastaindexbed

meta fasta agp agp_original_coordinates versions

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

metabambaidatabase

meta versions csv json bam

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

Outputs some statistics drawn from read flags.

metabam

meta versions stats

sambamba:

Tools for working with SAM/BAM data

find and mark duplicate reads in BAM file

metabam

meta versions bam bai

sambamba:

process your BAM data faster!

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

metabam

meta versions bam

Clips read alignments where they match BED file defined regions

metabambedsave_cliprejectssave_clipstats

meta versions bam stats rejects_bam

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

metainputbamsplit

meta versions reads

samtools:

Tools for dealing with SAM, BAM and CRAM files

calculates MD and NM tags

metabamfasta

meta versions bam

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Concatenate BAM or CRAM file

metainput_files

meta bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

shuffles and groups reads together by their names

metainput

meta versions output

samtools:

Tools for dealing with SAM, BAM and CRAM files

The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format

metainputmeta2fastainterleave

meta fastq fastq_interleaved fastq_other fastq_singleton versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

convert and then index CRAM -> BAM or BAM -> CRAM file

metainputindexfasta

meta bam cram bai crai version

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

produces a histogram or table of coverage per chromosome

metainputinput_indexmeta2fastafai

meta versions coverage

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

List CRAM Content-ID and Data-Series sizes

metacram

meta size versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Computes the depth at each position or region.

meta1bammeta2intervals

meta1 versions tsv

samtools:

Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region

Create a sequence dictionary file from a FASTA file

metafasta

meta dict versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index FASTA file

metafastameta2fai

meta fa fai gzi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Converts a SAM/BAM/CRAM file to FASTA

metainputinterleave

meta versions fasta interleaved singleton other

samtools:

Tools for dealing with SAM, BAM and CRAM files

Converts a SAM/BAM/CRAM file to FASTQ

metainputinterleave

meta versions fastq interleaved singleton other

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

metabam

meta versions bam

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

metabambai

meta flagstat versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

metainput

meta readgroup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Reports alignment summary statistics for a BAM/CRAM/SAM file

metabambai

meta idxstats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

converts FASTQ files to unmapped SAM/BAM/CRAM

metareads

meta versions sam bam cram

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index SAM/BAM/CRAM file

metabam

meta bai crai csi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

mark duplicate alignments in a coordinate sorted file

metainputfastameta2

meta versions output

samtools:

Tools for dealing with SAM, BAM and CRAM files

Merge BAM or CRAM file

metainput_filesmeta2fastameta3fai

meta bam cram versions csi crai

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

BAM

metainputfastaintervals

meta mpileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.

metabam

meta versions bam

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

metainputmeta2fasta

meta bam cram csi crai metrics versions

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Sort SAM/BAM/CRAM file

metabammeta2fasta

meta bam cram crai csi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces comprehensive statistics from SAM/BAM/CRAM file

metainputinput_indexmeta2fasta

meta stats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

metainputindexmeta2fastaqname

meta bam cram sam bai csi crai unselected unselected_index versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

SCIMAP is a suite of tools that enables spatial single-cell analyses

metacellByFeature

meta versions annotedDataCsv annotedDataH5ad

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data.

SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.

metaphenotyped

meta versions spatial_lda_output composition_plot motif_location_plot

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.

Use pangenome outputs for GWAS

metagenestraitstree

meta versions csv

The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier

metaclustersfastamei_ref

meta versions meis_tab dels_tab vcf

scramble:

Soft Clipped Read Alignment Mapper

The cluster_identifier tool of Scramble identifies soft clipped clusters

metainputinput_indexfasta

meta versions clusters

scramble:

Soft Clipped Read Alignment Mapper

Call peaks using SEACR on sequenced reads in bedgraph format

metabedgraphctrlbedgraphthreshold

meta bed versions

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

metareadsfastaindex

meta alignment trans_alignments single_bed multi_bed versions

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

Generate genome indices for segemehl align

fasta

index versions

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

metagenomic binning with self-supervised learning

metabamfasta

meta versions csv h5 output_prerecluster_bins output_recluster_bins tsv

semibin:

Metagenomic binning with semi-supervised siamese neural network

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

metavcfvcf_tbirecalrecal_indextranchesmeta2fastameta3fai

meta vcf tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Create BWA index for reference genome

metafasta

meta index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

metareadsmeta2indexmeta3fastameta4fasta_fai

meta bam bai versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects multiple quality metrics from a bam file

metameta2meta3bambaifastafai

meta versions mq_metrics qd_metrics gc_summary gc_metrics aln_metrics is_metrics

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

metabambaimeta2fastameta3fasta_fai

meta cram crai bam bai score metrics metrics_multiqc_tsv versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

metameta2meta3meta4vcfidxfastafaiml_model

meta versions vcf index

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

metabambaiintervalsmeta2fastameta3faimeta4dbsnpmeta5dbsnp_tbimeta6ml_modelml_modelpcr_indel_modelemit_vcfemit_gvcf

meta vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

metagvcfstbisintervalsfastafaidbsnpdbsnp_tbi

meta vcf tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

metainputinput_indexintervalsfastafaidbsnpdbsnp_tbiemit_vcfemit_gvcf

meta vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

metameta2meta3inputindexfastafai

meta output index output_index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

metameta2meta3vcfvcf_tbistatscontaminationsegmentsorientation_priorsfastafai

vcf vcf_tbi stats versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

metameta2meta3meta4meta5meta6meta7meta8inputinput_indexintervalsdictfastafaigermline_resourcegermline_resource_tbipanel_of_normalspanel_of_normals_tbiemit_orientation_dataemit_contamination_data

meta orientation_data contamination_data contamination_segments vcf index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

metameta2meta3meta4meta5meta6meta7bambaifastafaicosmiccosmic_tbiponpon_tbidbsnpdbsnp_tbiinterval

meta vcf index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

metavcftbiresource_vcfresource_tbilabelsfastafai

recal idx tranches plots version

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects whole genome quality metrics from a bam file

metameta2meta3meta4bambaifastafaiinterval_list

meta versions wgs_metrics

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.

metafastas

meta versions fasta

seqfu:

DNA sequence utilities for FASTX files

Statistics for FASTA or FASTQ files

metafiles

meta versions stats multiqc

seqfu:

Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files

Concatenating multiple uncompressed sequence files together

metainput

meta fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTQ to FASTA format

metafastq

meta versions fasta

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTA/Q to tabular format, and provide various information, like sequence length, GC content/GC skew.

metafastx

meta text versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Select sequences from a large file based on name/ID

metasequencepattern

meta versions filter

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

match up paired-end reads from two fastq files

metareads

meta versions reads unpaired_reads

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Use seqkit to find/replace strings within sequences and sequence headers

metafastx

meta versions fastx

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

metafastx

meta fastx log versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

metafastx

meta fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Use seqkit to generate sliding windows of input fasta

metafastx

meta fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Sorts sequences by id/name/sequence/length

metafastx

meta fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Split single or paired-end fastq.gz files

metareads

meta reads versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

simple statistics of FASTA/Q files

metareads

meta versions stats

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert tabular format (first two/three columns) to FASTA/Q format.

metatext

meta fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Salmonella serotype prediction from reads and assemblies

metaseqs

meta versions log tsv txt

Generates a BED file containing genomic locations of lengths of N.

metafasta

meta bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

metareads

meta versions reads

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Rename sequence names in FASTQ or FASTA files.

metasequences

meta versions sequences

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.

Subsample reads from FASTQ files

metareadssample_size

meta versions reads

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Common transformation operations on FASTA or FASTQ files.

metasequences

meta versions sequences

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

sequencesfilter_list

versions sequences

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Trim low quality bases from FastQ files

metareads

meta versions reads

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

metampileupsnpfilecalling_methodoutput_format

meta versions eigenstrat plink freqsum

sequencetools:

Tools for population genetics on sequencing data

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

metanormalbamtumourbamfastawigfile

meta versions seqz

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

metafasta

meta versions wig

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Induce a variation graph in GFA format from alignments in PAF format

metapaffasta

meta gfa versions

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

metareads

meta versions tsv txt

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.

metadepth

meta versions json tsv

Demultiplex bgzip'd fastq files

metasample_sheetfastqs_dir

meta versions sample_fastq metrics most_frequent_unmatched per_project_metrics per_sample_metrics sample_barcode_hop_metrics

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.

metainput_listinput_list_index

meta versions merged_variants

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.

metainputinput_indexpedigreeregionreferencereference_indexscaffoldscaffold_indexmap

meta phased_variants versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.

metainput_plaininput_plain_indexinput_regionpedigreescaffoldscaffold_indexscaffold_regionmap

meta phased_variants versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Program to compute switch error rate and genotyping error rate given simulated or trio data.

metaestimateestimate_indexregionpedigreetruthtruth_indexfreqfreq_index

meta versions errors

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

metareads

meta versions assembly gfa results

Print SHA256 (256-bit) checksums.

metafile

meta versions checksum

md5sum:

Create an SHA256 (256-bit) checksum.

Determine Shigella serotype from Illumina or Oxford Nanopore reads

metareads

meta versions tsv hits

Determine Shigella serotype from assemblies or Illumina paired-end reads

metaseqs

meta versions tsv

build and deploy Shiny apps for interactively mining differential abundance data

metameta2samplefeature_metaassay_filescontrastsdifferential_results

meta data app versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make plots for interpretation of differential abundance statistics

metameta2differential_resultssamplefeature_metaassay_file

meta volcanos_png volcanos_html versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots

metasamplefeature_metaassay_files

boxplots_png boxplots_html densities_png densities_html pca2d_png pca2d_html pca3d_png pca3d_html mad_png mad_dendro dendro versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

validate consistency of feature and sample annotations with matrices and contrasts

metameta2meta3meta4samplefeature_metaassay_filescontrasts

meta sample_meta feature_meta assays contrasts versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Assemble bacterial isolate genomes from Illumina paired-end reads

metareads

meta versions contigs corrections log raw_contigs gfa

A windowed adaptive trimming tool for FASTQ files using quality

metareadsqual_type

meta single_trimmed paired1_trimmed paired2_trimmed log versions

Indexing of transcriptome for gene expression quantification using SimpleAF

metagenome_fastameta2genome_gtfmeta3transcript_fasta

meta index transcript_tsv salmon versions

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

metareadsmeta2indexmeta3txp2genechemistrymeta4whitelist

meta alevin_results versions

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

Serovar prediction of salmonella assemblies

metafasta

meta versions tsv allele_json allele_fasta cgmlst_csv

Fast, efficient, lossless compression of FASTQ files.

metafastq

meta versions sfq

tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.

bambaimeta

meta run_metrics smncopynumber versions

Linearize and simplify variation graph in GFA format using blocked partial order alignment

metagfa

meta gfa versions maf

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developped by Brent Pedersen.

metainputindexexclude_bedsfastafai

meta versions vcf

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."

metainputsmeta2snakefile

meta outputs snakemake_dir versions

Performs fastq alignment to a fasta reference using SNAP

metareadsmeta2index

meta versions bam bai

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

Create a SNAP index for reference genome

meta2fastaaltcontigfilenonaltcontigfilealtliftoverfile

index versions

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

structural-variant calling with sniffles

metabambaimeta2fasta

meta vcf snf versions

Core-SNP alignment from Snippy outputs

metavcfaligned_fareference

meta versions aln full_aln tab vcf txt

snippy:

Rapid bacterial SNP calling and core genome alignments

Rapid haploid variant calling

metareadsindex

meta versions tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt

snippy:

Rapid bacterial SNP calling and core genome alignments

Pairwise SNP distance matrix from a FASTA sequence alignment

metaalignment

meta tsv versions

Genetic variant annotation and functional effect prediction toolbox

metavcfdb

cache versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Genetic variant annotation and functional effect prediction toolbox

metavcfdbcache

vcf report summary_html genes_txt versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Annotate a VCF file with another VCF file

metavcfvcf_tbimeta2databasedbs_tbi

meta versions vcf

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

metavcfvcf_tbimeta2databasedbs_tbi

meta vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Splits/Joins VCF(s) file into chromosomes

metavcf

meta versions out_vcfs

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Rapidly extracts SNPs from a multi-FASTA alignment.

alignment

versions fasta constant_sites constant_sites_string

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

metaquery_somalier_filesmeta2labelled_somalier_fileslabels_tsv

meta versions tsv html

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

metainputinput_indexmeta2fastameta3faimeta4sites

meta versions extract

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

metaextractpedsample_groups

versions html pairs_tsv samples_tsv

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Local sequence alignment tool for filtering, mapping and clustering.

metareadsmeta2fastasmeta3index

meta reads log meta2 index versions

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Compare many FracMinHash signatures generated by sourmash sketch.

metasignaturesfile_listsave_numpy_matrixsave_csv

meta versions matrix csv labels

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

metasignaturedbsave_unassignedsave_matches_sigsave_prefetchsave_prefetch_csv

meta versions result matches unassigned prefetch prefetchcsv

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.

metasignatures

meta versions signature_index

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a signature (a group of FracMinHash sketches) of a sequence using sourmash

metasequence

meta versions signatures

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.

metagather_resultstaxonomy

meta result versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data

metareadsimagecytaimagedarkimagecolorizedimagealignmentslidefilereferenceprobeset

meta outs versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.

gtf

gtf versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.

fastagtfreference_name

reference versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Assembles a small genome (bacterial, fungal, viral)

metailluminapacbionanoporeymlhmm

meta scaffolds contigs transcripts gene_clusters gfa log log versions

Computational method for finding spa types.

metafastarepeatsrepeat_order

meta versions tsv

split one ubam into multiple, per line, fast

metabam

meta versions bam

Spotiflow, accurate and efficient spot detection with stereographic flow.

metaimage_2d

meta versions spots

Fast, efficient, lossless compression of FASTQ files.

metafastq1fastq2

meta versions spring

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Fast, efficient, lossless decompression of FASTQ files.

metaspringwrite_one_fastq_gz

meta versions fastq

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).

metasrancbi_settingscertificate

meta versions reads

sratools:

SRA Toolkit and SDK from NCBI

Download sequencing data from the NCBI Sequence Read Archive (SRA).

metaidncbi_settingscertificate

meta sra versions

sratools:

SRA Toolkit and SDK from NCBI

Test for the presence of suitable NCBI settings or create them on the fly.

NO input

versions ncbi_settings

sratools:

SRA Toolkit and SDK from NCBI

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

metafastadb

meta versions txt txt txt bam pileup

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Serotype prediction of Streptococcus suis assemblies

metafasta

meta versions tsv

Advanced sequence file format conversions

metareadsfastafaigzi

meta versions reads gzi

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Predicts Staphylococcus aureus SCCmec type based on primers.

metafasta

meta versions tsv

Align reads to a reference genome using STAR

metareadsmeta2indexmeta3gtfstar_ignore_sjdbgtfseq_platformseq_center

bam log_final log_out log_progress versions bam_sorted bam_transcript bam_unsorted fastq tab junction wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create index for STAR

metafastameta2gtf

meta index versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

metagenome_fasta

results_xlsx summary_tsv detailed_summary_tsv resfinder_tsv plasmidfinder_tsv mlst_tsv settings_txt pointfinder_tsv versions

staramr:

Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.

Create a counts matrix for single-cell data using STARSolo, handling cell barcodes and UMI information.

metasolotypemeta2indexreads

meta log_final log_out log_progress summary versions

Serotype STEC samples from paired-end reads or assemblies

metaseqs

meta versions tsv

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

metaposfileinputrdatachromosome_nameKnGenmeta2collected_cramscollected_craiscramlistmeta3fastafasta_fai

meta input rdata plots vcf bgen versions

Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.

metavcfmeta2variant_catalog

meta versions vcf

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

metainputinput_indexfastafaitarget_bedtarget_bed_index

meta vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

metainput_normalinput_index_normalinput_tumorinput_index_tumormanta_candidate_small_indelsmanta_candidate_small_indels_tbifastafaitarget_bedtarget_bed_index

meta vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Merges the annotation gtf file and the stringtie output gtf files

stringtie_gtfannotation_gtf

merged_gtf versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Transcript assembly and quantification for RNA-Se

metabamannotation_gtf

meta transcript_gtf coverage_gtf abudance ballgown versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Count reads that map to genomic features

metabamannotation

meta counts summary versions

featurecounts:

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

SummarizedExperiment container

metamatrix_filesmeta2rowdatameta3coldata

meta rds log versions

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Converts a bedpe file ot a VCF file (beta version)

metabedpe

meta versions vcf

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

metavcfbedminsvmaxsvminallelefreqminnumreads

meta versions vcf

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

metavcfsmax_distance_breakpointsmin_supporting_callersaccount_for_typeaccount_for_sv_strandsestimate_distanced_by_sv_sizemin_sv_size

meta versions vcf

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

metafastameta2faimeta3parameterssnp_mutation_frequencysim_reads

meta versions parameters vcf bed fasta insertions

survivor:

Toolset for SV simulation, comparison and filtering

Report multipe stats over a VCF file

metavcfminsvmaxsvminnumreads

meta versions stats

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

metameta2meta3meta4meta5tumorbamtummorbainormalbamnormalbaibwa_indexfastafasta_faidbsnpdbsnp_tbiregions

meta versions sv indel som_sv som_indel germ_sv germ_indel unfiltered_sv unfiltered_indel unfiltered_som_sv unfiltered_som_indel unfiltered_germ_sv unfiltered_germ_indel raw_calls discordants log

SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

metameta2meta3testtest_tbitruthtruth_tbifastabed

meta versions fns fps distances log report

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

The merge module merges structural variants within one or more vcf files.

metapriorityvcfs

meta versions vcf

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

metain_occsin_frqsvcfvcf_dbsbedpe_dbs

meta versions out_occs out_frqs vcf

svdb:

structural variant database software

Performs tests on BAF files

metabedbafbaf_indexbatch

meta versions metrics

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

metavcf

meta versions counts

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

metabedsamplesfasta_fai

meta versions vcf tbi

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

argsmetavcffasta_fai

meta versions standardized_vcf

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

metavcftbi

meta versions bed

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

metameta2bamvcffastafai

meta versions json gt_vcf relevant_bam

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

metameta2bambam_indexvcffasta

meta versions json gt_vcf

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

metavcftbiconfig

meta versions vcf

Compresses/decompresses files

metainput

meta output gzi versions

bgzip:

Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.

bgzip a sorted tab-delimited genome file and then create tabix index

metatab

meta gz tbi csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

create tabix index from a sorted bgzip tab-delimited genome file

metatab

meta tbi csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA

metafast5

meta versions csv_gz

Convert taxon names to TaxIds

metanamenames_txttaxdb

meta versions tsv

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

Standardise and merge two or more taxonomic profiles into a single table

metaprofilesprofilerformattaxonomysamplesheet

meta versions merged_profiles

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

Standardise the output of a wide range of taxonomic profilers

metaprofileprofilerformattaxonomy

meta standardised_profile versions

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

A tool to detect resistance and lineages of M. tuberculosis genomes

metareads

meta versions bam csv json txt vcf

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Aligns sequences using T_COFFEE

metafastameta2treemeta3templateaccessory_informationscompress

meta alignment lib versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Compares 2 alternative MSAs to evaluate them.

metamsaref_msa

meta versions scores

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Computes the irmsd score for a given alignment and the structures.

metamsatemplatestructures

meta versions irmsd

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

metafasta

meta versions formatted_file

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

metamsalib

meta versions tcs scores

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Parses a Thermo RAW file containing mass spectra to an open file format

metaraw

meta versions spectra

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

metafasta

meta classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

Computes the coverage of different regions from the bam file.

metainputmeta2fasta

meta cov wig versions

tiddit:

TIDDIT - structural variant calling.

Identify chromosomal rearrangements.

metainputinput_indexmeta2fastameta3bwa_index

meta vcf ploidy versions

sv:

Search for structural variants.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

metafasta

meta explore_tsv top_sequence versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Plots telomeric repeat frequency against sliding window location using data produced by tidk/search

metatsv

meta svg versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Searches a genome for a telomere string such as TTAGGG

metafastastring

meta tsv bedgraph versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

metavcfvcf_indelsreferencefaivcf_output

meta versions fasta vcf ccf log

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

A post sequencing QC tool for Oxford Nanopore sequencers

metaseq_summaryfastqbam

meta report_data report_html plots_html plotly_js versions

TransDecoder itentifies candidate coding regions within transcript sequences. it is used to build gff file.

metafasta

meta versions pep gff3 cds dat folder

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

metafastafold

meta versions pep gff3 cds bed

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Trim FastQ files using Trim Galore!

metareads

meta reads unpaired html zip log versions

Performs quality and adapter trimming on paired end and single end reads

metareads

meta trimmed_reads unpaired_reads trim_log out_log summary versions

Assembles a de novo transcriptome from RNAseq reads

metareads

meta versions transcript_fasta log

Run TRUST4 on RNA-seq data

metabamreadsfastaref

meta tsv airr_tsv report_tsv fasta out fq versions

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

metavcftbitruth_vcftbibedmeta2fastameta3fai

meta versions fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

metavcfs

meta versions consistency

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

metavcf

meta versions vcf

truvari:

Structural variant comparison tool for VCFs

Cluster contigs from multiple assemblies by similarity

metareadsassembliesout_dir

meta cluster_dir versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Subsample a long-read sequencing fastq file for multiple assemblies

metareadsout_dir

meta subreads versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Transcript Selector for BRAKER TSEBRA combines gene predictions by selecing transcripts based on their extrisic evidence support

metagtfshints_fileskeep_gtfsconfig

meta versions tsebra_gtf tsebra_scores

Import transcript-level abundances and estimated counts for gene-level analysis packages

metaquantsmeta2tx2genemeta3coldataquant_type

meta tpm_gene counts_gene counts_gene_length_scaled counts_gene_scaled lengths_gene tpm_transcript counts_transcript lengths_transcript versions

tximeta:

Transcript Quantification Import with Automatic Metadata

Remove lines from bed file that refer to off-chromosome locations.

metabedgraph

meta versions bedgraph

ucsc:

Remove lines from bed file that refer to off-chromosome locations.

Convert a bedGraph file to bigWig format.

metabedgraphsizes

meta versions bigwig

ucsc:

Convert a bedGraph file to bigWig format.

Convert file from bed to bigBed format

metabedsizesautosql

meta versions bigbed

ucsc:

Convert file from bed to bigBed format

compute average score of bigwig over bed file

metabedbigwig

meta versions tab

ucsc:

Compute average score of big wig over each bed, which may have introns.

compute average score of bigwig over bed file

metagtf

meta genepred refflat versions

ucsc:

Convert GTF files to GenePred format

convert between genome builds

metabed

meta version lifted unlifted

ucsc:

Move annotations from one assembly to another

Convert ascii format wig file to binary big wig format

metawigchromsizes

versions bw

ucsc:

Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

metareadsgenomepickledb

meta versions bam

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

fastagtf

versions pickle pickle

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

metareadsgenomegtf

meta sam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.

metafastq

meta fastq no_match_fastq report versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

metabambaimode

meta bam log versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

metabambaiget_output_stats

meta bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

metareads

meta reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

metabambaicreate_bamget_group_info

meta bam log tsv

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

metabambai

meta bam log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Assembles bacterial genomes

metashortreadslongreads

meta versions scaffolds gfa log versions

Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data

metareads

outs versions

Extract files.

metaarchive

meta untar versions

Extract files.

metaarchive

meta files versions

untar:

Extract tar.gz files.

Unzip ZIP archive files

metaarchive

meta unzipped_archive versions

Unzip ZIP archive files

metaarchive

meta files versions

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Simple software to call UPD regions from germline exome/wgs trios.

metavcf

meta versions bed

The Java port of the VarDict variant caller

metabamsbaisbedmeta2fastameta3fasta_fai

meta versions vcf

Filtering, downsampling and profiling alignments in BAM/CRAM formats

metabam

meta versions bam

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

metanormal_vcftumor_vcfscenarioscenario_sample

meta versions vcf_gz bcf_gz vcf bcf

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

metabammeta2fastameta3fai

meta alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

metabamcandidatesalignment_jsonmeta2fastameta3fai

meta versions vcf_gz bcf_gz vcf bcf

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

metameta2meta3meta4sv_vcfcoverage_bedcnssnv_vcfblacklist_bed

meta versions cgh

A tool to create a Gemini-compatible DB file from an annotated VCF

metavcfped

meta versions db

vcf2maf

metavcfvep_cache

meta maf versions

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

metavcfvcf_tabixspecific_resourcestomlluaresources

meta versions vcf

If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields

metavcftbi

meta versions vcf

vcflib:

Command-line tools for manipulating VCF files

Command line tools for parsing and manipulating VCF files.

metavcftbi

meta vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

metavcftbi

meta versions vcf

vcflib:

Command-line tools for manipulating VCF files

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

metavcftbi

meta versions vcf

vcflib:

Command-line tools for manipulating VCF files

A set of tools written in Perl and C++ for working with VCF files

metavariant_filebeddiff_variant_file

meta versions vcf bcf frq frq_count idepth ldepth ldepth_mean gdepth hap_ld geno_ld geno_chisq list_hap_ld list_geno_ld interchrom_hap_ld interchrom_geno_ld tstv tstv_summary tstv_count tstv_qual filter_summary sites_pi windowed_pi weir_fst heterozygosity hwe tajima_d freq_burden lroh relatedness relatedness2 lqual missing_individual missing_site snp_density kept_sites removed_sites singeltons indel_hist hapcount mendel format info genotypes_matrix genotypes_matrix_individual genotypes_matrix_position impute_hap impute_hap_legend impute_hap_indv ldhat_sites ldhat_locs beagle_gl beagle_pl ped map_ tped tfam diff_sites_in_files diff_indv_in_files diff_sites diff_indv diff_discd_matrix diff_switch_error

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto turorial

metabarcodesbamsorted_bamgtf

meta versions loom

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

metabambairefvcf

meta versions log selfsm depthsm selfrg depthrg bestsm bestrg

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

metabambaisvd_udsvd_musvd_bedreferencesrefvcf

meta mu ud bed versions log self_sm ancenstry

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

metainputtbisinsertions_fastafastafasta_fai

meta versions graph

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

metagfapbgbwt

meta vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

metainput

meta versions xg vg_index

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

metarnacofold_fasta

versions meta rnacofold_csv rnacofold_ps

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

fasta

versions rnafold_txt rnafold_ps

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

fasta

versions rnalfold_txt

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Use vireo to perform donor deconvolution for multiplexed scRNA-seq data

metacell_datan_donordonor_filevartrix_data

meta versions summary donor_ids prob_singlets prob_doublets

Extracting sequences that were unbinnned by vRhyme into a FASTA file

metafastamembership

meta versions unbinnned_sequences

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

metabins

meta versions linked_bins

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

metareadsfasta

meta versions bins membership summary

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

metafasta

meta aln biom mothur otu bam out blast uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

metaqueryfastadb

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

metafastasort_arg

meta fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

metaqueryfastadbidcutoffoutoptionuser_columns

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

decomposes multiallelic variants into biallelic in a VCF file.

metavcfintervals

meta versions vcf

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

metavcftbiintervalsmeta2fastameta3fai

meta versions vcf fai

vt:

A tool set for short variant discovery in genetic sequence data

a pangenome-scale aligner

metafasta_gzpafquery_selfgzifaifasta_query_list

meta paf versions

simulating sequence reads from a reference genome

metafasta

meta versions fastq

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

metabambaifastafasta_fai

meta versions vcf tbi

Masks out highly repetitive DNA sequences with low complexity in a genome

metacounts

meta counts versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to generate frequency counts of repetitive units.

metaref

meta intervals versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to take a counts file and creates a file of genomic co-ordinates to be masked.

metacountsmetaref

meta wm_intervals versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

Convert and filter aligned reads to .npz

metabambaimeta2fastameta3fasta_fai

meta versions npz

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase

metanpzreference

meta versions gender

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Create a new reference using healthy reference samples

metainputs

meta versions npz

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Find copy number aberrations

metanpzmeta2referencemeta3blacklist

meta versions aberrations_bed bins_bed segments_bed chr_statistics chr_plots genome_plot

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

A large variant benchmarking tool analogous to hap.py for small variants.

metaquery_vcftruth_vcfbed

meta versions report bench_vcf bench_vcf_tbi

Fast lightweight accurate xenograft sorting

host_fastagraft_fastaindexnobjectsmask

hash info versions

xengsort:

A fast xenograft read sorter based on space-efficient k-mer hashing

Compresses files with xz.

metaraw_file

meta archive versions

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Decompresses files with xz.

metaarchive

meta file versions

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Performs assembly scaffolding using YaHS

metahic_mapfastafai

meta versions scaffolds_fasta scaffolds_agp binary

Builds a YARA index for a reference genome

fasta

versions index

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Align reads to a reference genome using YARA

metareadsindex

meta versions bam bai

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Compress file lists to produce ZIP archive files

metafiles

meta zipped_archive versions

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Click here to trigger an update.