Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • bam 170
  • fasta 166
  • vcf 120
  • fastq 117
  • genomics 110
  • metagenomics 79
  • index 70
  • genome 70
  • reference 65
  • alignment 62
  • gatk4 61
  • bed 60
  • assembly 58
  • cram 56
  • sam 49
  • sort 47
  • annotation 38
  • structural variants 38
  • variant calling 38
  • align 37
  • database 34
  • merge 32
  • filter 31
  • bacteria 28
  • gff 28
  • map 28
  • statistics 27
  • coverage 26
  • variants 24
  • qc 23
  • quality control 22
  • classify 21
  • gtf 21
  • cnv 20
  • download 20
  • nanopore 20
  • taxonomic profiling 18
  • split 18
  • variant 18
  • k-mer 18
  • gfa 18
  • contamination 17
  • sentieon 17
  • taxonomy 16
  • classification 16
  • MSA 16
  • somatic 15
  • convert 15
  • quality 15
  • count 15
  • binning 15
  • copy number 14
  • clustering 14
  • pacbio 14
  • proteomics 14
  • VCF 14
  • ancient DNA 14
  • imputation 13
  • conversion 13
  • bedtools 13
  • contigs 13
  • phylogeny 13
  • single-cell 13
  • bcftools 12
  • trimming 12
  • reporting 12
  • isoseq 12
  • sv 12
  • graph 12
  • bisulfite 12
  • variation graph 12
  • gvcf 12
  • bqsr 11
  • methylseq 11
  • bisulphite 11
  • methylation 11
  • illumina 11
  • build 11
  • consensus 11
  • picard 11
  • cna 11
  • protein 11
  • QC 11
  • rnaseq 11
  • table 11
  • compression 11
  • databases 11
  • indexing 10
  • stats 10
  • imaging 10
  • metrics 10
  • antimicrobial resistance 10
  • serotype 10
  • phage 10
  • visualisation 10
  • sequences 10
  • long-read 10
  • 5mC 10
  • expression 10
  • tsv 10
  • kmer 10
  • histogram 9
  • base quality score recalibration 9
  • demultiplex 9
  • matrix 9
  • plot 9
  • openms 9
  • depth 9
  • neural network 9
  • amr 9
  • scWGBS 9
  • markduplicates 9
  • searching 9
  • aDNA 9
  • pairs 9
  • wgs 9
  • cluster 9
  • pangenome graph 9
  • protein sequence 9
  • haplotype 9
  • WGBS 9
  • DNA methylation 9
  • bins 9
  • low-coverage 8
  • checkm 8
  • palaeogenomics 8
  • samtools 8
  • bwa 8
  • validation 8
  • transcript 8
  • archaeogenomics 8
  • bcf 8
  • damage 8
  • biscuit 8
  • cooler 8
  • iCLIP 8
  • mappability 8
  • bisulfite sequencing 8
  • aligner 8
  • LAST 8
  • machine learning 8
  • db 8
  • metagenome 8
  • mmseqs2 8
  • virus 8
  • repeat 8
  • genotype 8
  • completeness 8
  • filtering 8
  • annotate 8
  • germline 7
  • mkref 7
  • dedup 7
  • mags 7
  • segmentation 7
  • sequence 7
  • long reads 7
  • umi 7
  • transcriptome 7
  • newick 7
  • kraken2 7
  • phasing 7
  • blast 7
  • ncbi 7
  • bismark 7
  • gff3 7
  • differential 7
  • population genetics 7
  • mag 7
  • decompression 7
  • hmmsearch 7
  • complexity 7
  • peaks 7
  • spatial 7
  • evaluation 7
  • gene 7
  • ucsc 7
  • seqkit 7
  • glimpse 7
  • structure 7
  • duplicates 6
  • NCBI 6
  • hmmer 6
  • bedGraph 6
  • mirna 6
  • antimicrobial peptides 6
  • antimicrobial resistance genes 6
  • mitochondria 6
  • snp 6
  • kmers 6
  • feature 6
  • json 6
  • prokaryote 6
  • low frequency variant calling 6
  • example 6
  • deduplication 6
  • scRNA-seq 6
  • demultiplexing 6
  • short-read 6
  • genotyping 6
  • prediction 6
  • cnvkit 6
  • plasmid 6
  • mapping 6
  • pangenome 6
  • vsearch 6
  • gzip 6
  • single 6
  • tumor-only 6
  • report 5
  • summary 5
  • sourmash 5
  • de novo 5
  • de novo assembly 5
  • isolates 5
  • microbiome 5
  • multiple sequence alignment 5
  • single cell 5
  • text 5
  • extract 5
  • diversity 5
  • csv 5
  • call 5
  • antibiotic resistance 5
  • mem 5
  • adapters 5
  • fragment 5
  • 3-letter genome 5
  • detection 5
  • amps 5
  • mutect2 5
  • svtk 5
  • arg 5
  • visualization 5
  • enrichment 5
  • mpileup 5
  • idXML 5
  • riboseq 5
  • benchmark 5
  • splicing 5
  • clipping 5
  • kallisto 5
  • deamination 5
  • counts 5
  • view 5
  • interval 5
  • query 5
  • msa 5
  • MAF 5
  • telomere 4
  • profiling 4
  • DNA sequencing 4
  • sequencing 4
  • xeniumranger 4
  • redundancy 4
  • ganon 4
  • FASTQ 4
  • umitools 4
  • isomir 4
  • read depth 4
  • retrotransposon 4
  • indels 4
  • circrna 4
  • sample 4
  • compare 4
  • interval_list 4
  • peak-calling 4
  • ranking 4
  • targeted sequencing 4
  • bgzip 4
  • genmod 4
  • fastx 4
  • bedgraph 4
  • CLIP 4
  • ont 4
  • cut 4
  • copy number alteration calling 4
  • hybrid capture sequencing 4
  • public datasets 4
  • containment 4
  • bin 4
  • propr 4
  • bigwig 4
  • haplotypecaller 4
  • miscoding lesions 4
  • profile 4
  • quantification 4
  • logratio 4
  • structural 4
  • diamond 4
  • SV 4
  • reference-free 4
  • paf 4
  • ccs 4
  • genome assembler 4
  • compress 4
  • taxonomic classification 4
  • fgbio 4
  • deep learning 4
  • hic 4
  • happy 4
  • STR 4
  • family 4
  • microsatellite 4
  • HiFi 4
  • tabular 4
  • hmmcopy 4
  • phylogenetic placement 4
  • matching 4
  • ngscheckmate 4
  • palaeogenetics 4
  • archaeogenetics 4
  • biosynthetic gene cluster 4
  • ancestry 4
  • BGC 4
  • merging 4
  • concatenate 4
  • malt 4
  • normalization 4
  • ampir 4
  • ATAC-seq 4
  • fungi 4
  • chunk 4
  • DNA sequence 4
  • bcl2fastq 4
  • resistance 4
  • add 4
  • parsing 4
  • microarray 4
  • genomes 3
  • snps 3
  • comparisons 3
  • adapter trimming 3
  • ambient RNA removal 3
  • combine 3
  • amplicon sequencing 3
  • mtDNA 3
  • amplicon sequences 3
  • deeparg 3
  • variant_calling 3
  • preprocessing 3
  • rsem 3
  • observations 3
  • typing 3
  • untar 3
  • score 3
  • virulence 3
  • chromosome 3
  • PCA 3
  • ligate 3
  • pileup 3
  • scores 3
  • transposons 3
  • mlst 3
  • fingerprint 3
  • amplify 3
  • transcripts 3
  • fai 3
  • long_read 3
  • minimap2 3
  • pairsam 3
  • genome assembly 3
  • shapeit 3
  • uLTRA 3
  • bacterial 3
  • dna 3
  • kraken 3
  • genome mining 3
  • pan-genome 3
  • dictionary 3
  • insert 3
  • subsample 3
  • krona 3
  • SNP 3
  • complement 3
  • popscle 3
  • genotype-based deconvoltion 3
  • remove 3
  • wastewater 3
  • fam 3
  • plink2 3
  • bim 3
  • prokaryotes 3
  • indel 3
  • spaceranger 3
  • vrhyme 3
  • relatedness 3
  • spark 3
  • identity 3
  • kinship 3
  • informative sites 3
  • eukaryotes 3
  • replace 3
  • entrez 3
  • reports 3
  • intervals 3
  • das_tool 3
  • converter 3
  • prokka 3
  • lossless 3
  • tabix 3
  • macrel 3
  • structural_variants 3
  • notebook 3
  • quality trimming 3
  • image_analysis 3
  • fastk 3
  • krona chart 3
  • html 3
  • roh 3
  • DRAMP 3
  • UMI 3
  • rna_structure 3
  • RNA 3
  • survivor 3
  • pseudoalignment 3
  • uncompress 3
  • neubi 3
  • fusion 3
  • dump 3
  • highly_multiplexed_imaging 3
  • panel 3
  • aln 3
  • cool 3
  • gatk4spark 3
  • benchmarking 3
  • cellranger 3
  • clean 3
  • organelle 3
  • png 3
  • wig 3
  • wxs 3
  • chimeras 3
  • mcmicro 3
  • chip-seq 3
  • bwameth 3
  • cut up 3
  • archiving 3
  • cfDNA 3
  • angsd 3
  • image 3
  • ataqv 3
  • nucleotide 3
  • PacBio 3
  • cat 3
  • mkfastq 3
  • small indels 3
  • reads 3
  • mzml 3
  • hi-c 3
  • mapper 3
  • bakta 3
  • windowmasker 3
  • arriba 3
  • atac-seq 3
  • rna 3
  • host 3
  • C to T 3
  • prefetch 3
  • HMM 3
  • microbes 3
  • zip 3
  • gene expression 3
  • pypgx 3
  • polishing 3
  • gsea 3
  • npz 3
  • checkv 3
  • bamtools 3
  • duplication 3
  • bracken 3
  • abundance 3
  • das tool 3
  • gridss 3
  • unzip 3
  • CRISPR 3
  • nucleotides 2
  • micro-satellite-scan 2
  • mudskipper 2
  • ped 2
  • cnvnator 2
  • parallelized 2
  • tumor 2
  • joint genotyping 2
  • signature 2
  • svdb 2
  • profiles 2
  • screen 2
  • vg 2
  • transcriptomic 2
  • orthology 2
  • gatk 2
  • hidden Markov model 2
  • hla 2
  • khmer 2
  • regression 2
  • evidence 2
  • interactions 2
  • panelofnormals 2
  • de novo assembler 2
  • small genome 2
  • msi 2
  • zlib 2
  • instability 2
  • mapcounter 2
  • proportionality 2
  • sketch 2
  • MSI 2
  • comparison 2
  • tree 2
  • hlala_typing 2
  • hla_typing 2
  • read-group 2
  • hlala 2
  • mask 2
  • bustools 2
  • gwas 2
  • function 2
  • resolve_bioscience 2
  • FracMinHash sketch 2
  • genomad 2
  • maximum likelihood 2
  • salmon 2
  • iphop 2
  • somatic variants 2
  • instrain 2
  • COBS 2
  • ChIP-seq 2
  • k-mer index 2
  • trancriptome 2
  • trim 2
  • graph layout 2
  • tama 2
  • tnhaplotyper2 2
  • bloom filter 2
  • long terminal retrotransposon 2
  • gstama 2
  • concordance 2
  • long terminal repeat 2
  • graft 2
  • phase 2
  • nextclade 2
  • retrotransposons 2
  • simulate 2
  • gene set 2
  • gene set analysis 2
  • removal 2
  • refine 2
  • msisensor-pro 2
  • baf 2
  • spatial_transcriptomics 2
  • metamaps 2
  • lift 2
  • homoploymer 2
  • leviosam2 2
  • ichorcna 2
  • standardization 2
  • antismash 2
  • GPU-accelerated 2
  • vcflib 2
  • aggregate 2
  • pharokka 2
  • RNA-Seq 2
  • gem 2
  • orf 2
  • xz 2
  • taxonomic profile 2
  • standardise 2
  • standardisation 2
  • otu tables 2
  • taxon tables 2
  • rgfa 2
  • artic 2
  • xenograft 2
  • archive 2
  • polyA_tail 2
  • filtermutectcalls 2
  • reads merging 2
  • bedpe 2
  • structural-variant calling 2
  • metagenomes 2
  • megan 2
  • Streptococcus pneumoniae 2
  • guide tree 2
  • awk 2
  • sequenzautils 2
  • BAM 2
  • blastn 2
  • random forest 2
  • Pharmacogenetics 2
  • fasterq-dump 2
  • sra-tools 2
  • settings 2
  • junctions 2
  • correction 2
  • transformation 2
  • runs_of_homozygosity 2
  • spatial_omics 2
  • long-read sequencing 2
  • MCMICRO 2
  • duplicate 2
  • switch 2
  • ome-tif 2
  • blastp 2
  • deseq2 2
  • rna-seq 2
  • contig 2
  • regions 2
  • mirdeep2 2
  • RNA sequencing 2
  • scaffold 2
  • polish 2
  • smrnaseq 2
  • heatmap 2
  • ancient dna 2
  • lofreq 2
  • frame-shift correction 2
  • sequence analysis 2
  • Read depth 2
  • mash 2
  • fixmate 2
  • recombination 2
  • gene labels 2
  • eCLIP 2
  • immunoprofiling 2
  • fusions 2
  • parse 2
  • fcs-gx 2
  • estimation 2
  • minhash 2
  • edit distance 2
  • genome bins 2
  • vdj 2
  • single cells 2
  • authentication 2
  • soft-clipped clusters 2
  • HOPS 2
  • dict 2
  • SimpleAF 2
  • doublets 2
  • rtgtools 2
  • rename 2
  • proteome 2
  • scaffolding 2
  • seqtk 2
  • salmonella 2
  • checksum 2
  • MaltExtract 2
  • collate 2
  • calling 2
  • anndata 2
  • distance 2
  • cnv calling 2
  • CNV 2
  • varcal 2
  • bam2fq 2
  • cvnkit 2
  • repeats 2
  • serogroup 2
  • windows 2
  • krakenuniq 2
  • effect prediction 2
  • normalize 2
  • merge mate pairs 2
  • short reads 2
  • norm 2
  • scatter 2
  • import 2
  • unaligned 2
  • snpsift 2
  • transcriptomics 2
  • UMIs 2
  • reheader 2
  • duplex 2
  • interactive 2
  • fetch 2
  • GEO 2
  • eido 2
  • snpeff 2
  • cancer genomics 2
  • preseq 2
  • union 2
  • interval list 2
  • variation 2
  • allele-specific 2
  • concat 2
  • tbi 2
  • allele 2
  • realignment 2
  • intersect 2
  • ampgram 2
  • variant pruning 2
  • microbial 2
  • amptransformer 2
  • krakentools 2
  • deconvolution 2
  • bayesian 2
  • join 2
  • antibiotics 2
  • bfiles 2
  • RiPP 2
  • adapter 2
  • library 2
  • shigella 2
  • secondary metabolites 2
  • registration 2
  • dereplicate 2
  • emboss 2
  • purge duplications 2
  • image_processing 2
  • primer 2
  • metadata 2
  • intersection 2
  • eigenstrat 2
  • Duplication purging 2
  • barcode 2
  • validate 2
  • samplesheet 2
  • format 2
  • tab 2
  • GC content 2
  • metagenomic 2
  • expansionhunterdenovo 2
  • mitochondrion 2
  • NRPS 2
  • pair 2
  • repeat_expansions 2
  • demultiplexed reads 2
  • identifier 2
  • gender determination 1
  • VCFtools 1
  • deduplicate 1
  • wavefront 1
  • wham 1
  • vcfbreakmulti 1
  • usearch 1
  • uniq 1
  • DNA contamination estimation 1
  • mashmap 1
  • verifybamid 1
  • toml 1
  • copy number analysis 1
  • long read alignment 1
  • construct 1
  • all versus all 1
  • extractunbinned 1
  • linkbins 1
  • copy-number 1
  • whamg 1
  • pangenome-scale 1
  • sintax 1
  • vsearch/sort 1
  • graph projection to vcf 1
  • fracminhash sketch 1
  • lua 1
  • density 1
  • model 1
  • rare variants 1
  • error 1
  • de-novo 1
  • longread 1
  • sha256 1
  • 256 bit 1
  • shinyngs 1
  • exploratory 1
  • boxplot 1
  • features 1
  • genetic sex 1
  • sliding window 1
  • AMPs 1
  • CRAM 1
  • SMN1 1
  • SMN2 1
  • POA 1
  • sniffles 1
  • core 1
  • snippy 1
  • dist 1
  • relative coverage 1
  • sex determination 1
  • predictions 1
  • sequence headers 1
  • cut&tag 1
  • cut&run 1
  • chromatin 1
  • seacr 1
  • assembly-binning 1
  • applyvarcal 1
  • VQSR 1
  • variant recalibration 1
  • subseq 1
  • grep 1
  • sertotype 1
  • induce 1
  • interleave 1
  • header 1
  • seq 1
  • selection 1
  • random draw 1
  • pseudohaploid 1
  • pseudodiploid 1
  • freqsum 1
  • bam2seqz 1
  • gc_wiggle 1
  • dbnsfp 1
  • SNPs 1
  • maf 1
  • umicollapse 1
  • cds 1
  • transcroder 1
  • sequencing adapters 1
  • bedgraphtobigwig 1
  • bigbed 1
  • bedtobigbed 1
  • genepred 1
  • refflat 1
  • gtftogenepred 1
  • ucsc/liftover 1
  • scRNA-Seq 1
  • eucaryotes 1
  • files 1
  • upd 1
  • uniparental 1
  • disomy 1
  • snv 1
  • downsample 1
  • downsample bam 1
  • subsample bam 1
  • vcf2db 1
  • gemini 1
  • coding 1
  • chromosomal rearrangements 1
  • invariant 1
  • variantcalling 1
  • constant 1
  • rRNA 1
  • ribosomal RNA 1
  • antimicrobial peptide prediction 1
  • signatures 1
  • hash sketch 1
  • spatype 1
  • spa 1
  • streptococcus 1
  • sccmec 1
  • amp 1
  • Mycobacterium tuberculosis 1
  • detecting svs 1
  • short-read sequencing 1
  • svtk/baftest 1
  • baftest 1
  • countsvtypes 1
  • rdtest2vcf 1
  • rdtest 1
  • vcf2bed 1
  • decompress 1
  • polya tail 1
  • fast5 1
  • copy number alterations 1
  • homology 1
  • copy number variation 1
  • sage 1
  • 10x 1
  • regulatory network 1
  • transcription factors 1
  • paraphase 1
  • selector 1
  • cram-size 1
  • size 1
  • quality check 1
  • realign 1
  • circular 1
  • spot 1
  • orthogroup 1
  • orthologs 1
  • mass spectrometry 1
  • grabix 1
  • featuretable 1
  • extraction 1
  • cgMLST 1
  • WGS 1
  • redundant 1
  • nanoq 1
  • Read filters 1
  • Read trimming 1
  • Read report 1
  • drug categorization 1
  • uniques 1
  • Illumina 1
  • functional 1
  • impute-info 1
  • ribosomal 1
  • bwameme 1
  • tag2tag 1
  • gnu 1
  • pile up 1
  • taxids 1
  • taxon name 1
  • nanopore sequencing 1
  • rna velocity 1
  • cobra 1
  • extension 1
  • grea 1
  • translation 1
  • paired reads merging 1
  • overlap-based merging 1
  • check 1
  • hamming-distance 1
  • hashing-based deconvoltion 1
  • coreutils 1
  • bwamem2 1
  • generic 1
  • transposable element 1
  • retrieval 1
  • busco 1
  • droplet based single cells 1
  • lexogen 1
  • genotype-based demultiplexing 1
  • donor deconvolution 1
  • cellsnp 1
  • trimfq 1
  • vcflib/vcffixup 1
  • AC/NS/AF 1
  • Pacbio 1
  • guidetree 1
  • tags 1
  • hashing-based deconvolution 1
  • mygene 1
  • p-value 1
  • elfasta 1
  • elprep 1
  • pharmacogenetics 1
  • controlstatistics 1
  • source tracking 1
  • emoji 1
  • quality_control 1
  • coptr 1
  • ptr 1
  • doublet_detection 1
  • barcodes 1
  • subsetting 1
  • differential expression 1
  • logFC 1
  • scvi 1
  • AT content 1
  • solo 1
  • import segmentation 1
  • nuclear segmentation 1
  • cell segmentation 1
  • contiguate 1
  • relabel 1
  • resegment 1
  • morphology 1
  • hostile 1
  • decontamination 1
  • human removal 1
  • metagenome assembler 1
  • screening 1
  • cleaning 1
  • nucleotide content 1
  • nucBed 1
  • rank 1
  • poolseq 1
  • java 1
  • script 1
  • xml 1
  • svg 1
  • standard 1
  • haplotag 1
  • staging 1
  • Staging 1
  • miRNA 1
  • antimicrobial reistance 1
  • microRNA 1
  • multiqc 1
  • mass_error 1
  • search engine 1
  • variant-calling 1
  • bclconvert 1
  • stardist 1
  • telseq 1
  • vsearch/dereplicate 1
  • vsearch/fastqfilter 1
  • fastqfilter 1
  • ATACseq 1
  • shift 1
  • ATACshift 1
  • setgt 1
  • jvarkit 1
  • translate 1
  • tar 1
  • tarball 1
  • targz 1
  • go 1
  • cell_barcodes 1
  • yahs 1
  • tnfilter 1
  • multiallelic 1
  • small variants 1
  • transmembrane 1
  • genome graph 1
  • tnseq 1
  • decoy 1
  • htseq 1
  • rrna 1
  • sompy 1
  • reformatting 1
  • peak picking 1
  • site frequency spectrum 1
  • ancestral alleles 1
  • derived alleles 1
  • array_cgh 1
  • hmmfetch 1
  • cytosure 1
  • affy 1
  • vector 1
  • gprofiler2 1
  • gost 1
  • rad 1
  • structural variant 1
  • bam2fastx 1
  • bam2fastq 1
  • immcantation 1
  • airrseq 1
  • immunoinformatics 1
  • co-orthology 1
  • clusteridentifier 1
  • decompose 1
  • simulation 1
  • spectral clustering 1
  • denoisereadcounts 1
  • geo 1
  • mapad 1
  • adna 1
  • c to t 1
  • proteus 1
  • readproteingroups 1
  • eigenvectors 1
  • hicPCA 1
  • sliding 1
  • snakemake 1
  • workflow 1
  • workflow_mode 1
  • createreadcountpanelofnormals 1
  • copyratios 1
  • readwriter 1
  • reverse complement 1
  • dnamodelapply 1
  • dnascope 1
  • groupby 1
  • tnscope 1
  • bgen 1
  • chloroplast 1
  • confidence 1
  • blat 1
  • alr 1
  • clr 1
  • boxcox 1
  • Staphylococcus aureus 1
  • Escherichia coli 1
  • Read coverage histogram 1
  • sequence similarity 1
  • comparative genomics 1
  • tag 1
  • machine_learning 1
  • multi-tool 1
  • predict 1
  • hardy-weinberg 1
  • hwe statistics 1
  • hwe equilibrium 1
  • reference-independent 1
  • genotype likelihood 1
  • collapse 1
  • liftover 1
  • probabilistic realignment 1
  • seqfu 1
  • n50 1
  • cell_type_identification 1
  • cell_phenotyping 1
  • clahe 1
  • homologs 1
  • refresh 1
  • association 1
  • GWAS 1
  • case/control 1
  • genetics 1
  • associations 1
  • spatial_neighborhoods 1
  • scimap 1
  • Bayesian 1
  • structural-variants 1
  • omics 1
  • biological activity 1
  • functional analysis 1
  • prior knowledge 1
  • adapterremoval 1
  • nucleotide sequence 1
  • deep variant 1
  • resistance genes 1
  • mutect 1
  • idx 1
  • reference panels 1
  • transform 1
  • gaps 1
  • introns 1
  • install 1
  • joint-genotyping 1
  • genotypegvcf 1
  • admixture 1
  • parallel 1
  • plastid 1
  • kma 1
  • resfinder 1
  • raw 1
  • distance-based 1
  • mgf 1
  • parquet 1
  • parser 1
  • dbsnp 1
  • standardize 1
  • quarto 1
  • python 1
  • r 1
  • coexpression 1
  • correlation 1
  • corpcor 1
  • assay 1
  • phylogenetics 1
  • minimum_evolution 1
  • peak-caller 1
  • prophage 1
  • cluster analysis 1
  • mutectstats 1
  • panelofnormalscreation 1
  • germline contig ploidy 1
  • germlinecnvcaller 1
  • germlinevariantsites 1
  • getpileupsumaries 1
  • readcountssummary 1
  • smaller fastqs 1
  • clumping fastqs 1
  • indexfeaturefile 1
  • learnreadorientationmodel 1
  • readorientationartifacts 1
  • leftalignandtrimvariants 1
  • mergebamalignment 1
  • snvs 1
  • genomicsdbimport 1
  • postprocessgermlinecnvcalls 1
  • preprocessintervals 1
  • printreads 1
  • printsvevidence 1
  • reblockgvcf 1
  • revert 1
  • selectvariants 1
  • shiftchain 1
  • shiftfasta 1
  • shiftintervals 1
  • site depth 1
  • splitcram 1
  • splitintervals 1
  • svannotate 1
  • jointgenotyping 1
  • genomicsdb 1
  • variantfiltration 1
  • collectreadcounts 1
  • heattree 1
  • targets 1
  • annotateintervals 1
  • csi 1
  • variant quality score recalibration 1
  • vqsr 1
  • asereadcounter 1
  • bedtointervallist 1
  • calculatecontamination 1
  • cross-samplecontamination 1
  • getpileupsummaries 1
  • calibratedragstrmodel 1
  • cnnscorevariants 1
  • collectsvevidence 1
  • gatherbqsrreports 1
  • combinegvcfs 1
  • short variant discovery 1
  • composestrtablefile 1
  • dragstr 1
  • condensedepthevidence 1
  • createsequencedictionary 1
  • createsomaticpanelofnormals 1
  • determinegermlinecontigploidy 1
  • duplication metrics 1
  • estimatelibrarycomplexity 1
  • filterintervals 1
  • deduping 1
  • filtervarianttranches 1
  • tranche filtering 1
  • svcluster 1
  • recalibration model 1
  • gene-calling 1
  • extractvariants 1
  • tama_collapse.py 1
  • mouse 1
  • gene model 1
  • TAMA 1
  • gstama/merge 1
  • gstama/polyacleanup 1
  • GTDB taxonomy 1
  • genome taxonomy database 1
  • archaea 1
  • gunc 1
  • gunzip 1
  • gvcftools 1
  • extract_variants 1
  • abricate 1
  • genomes on a tree 1
  • amrfinderplus 1
  • fARGene 1
  • rgi 1
  • ibd 1
  • hbd 1
  • beagle 1
  • mitochondrial 1
  • haplogroups 1
  • bacphlip 1
  • virulent 1
  • Haemophilus influenzae 1
  • haplotype resolution 1
  • temperate 1
  • lifestyle 1
  • bamtools/convert 1
  • merge compare 1
  • variantrecalibrator 1
  • Salmonella Typhi 1
  • gawk 1
  • txt 1
  • file parsing 1
  • bgc 1
  • microscopy 1
  • genome profile 1
  • background_correction 1
  • illumiation_correction 1
  • compound 1
  • models 1
  • element 1
  • genome size 1
  • genome heterozygosity 1
  • repeat content 1
  • Mykrobe 1
  • GNU 1
  • trimBam 1
  • bamUtil 1
  • gfastats 1
  • genome summary 1
  • genome manipulation 1
  • genome statistics 1
  • bamtools/split 1
  • gget 1
  • low coverage 1
  • yaml 1
  • Sample 1
  • Haplotypes 1
  • Imputation 1
  • joint-variant-calling 1
  • gangstr 1
  • gamma 1
  • readcounter 1
  • cls 1
  • enzyme 1
  • postprocessing 1
  • makebins 1
  • genomic bins 1
  • mcool 1
  • UNet 1
  • TMA dearray 1
  • Segmentation 1
  • Cores 1
  • custom 1
  • version 1
  • tblastn 1
  • na 1
  • gct 1
  • cload 1
  • cutesv 1
  • subtyping 1
  • Salmonella enterica 1
  • sorted 1
  • file manipulation 1
  • bioawk 1
  • pcr duplicates 1
  • paired-end 1
  • unionBedGraphs 1
  • subtract 1
  • track 1
  • slopBed 1
  • corrrelation 1
  • scatterplot 1
  • digest 1
  • cooler/balance 1
  • cumulative coverage 1
  • cmseq 1
  • antibody capture 1
  • antigen capture 1
  • qa 1
  • quality assurnce 1
  • multiomics 1
  • chromap 1
  • duplicate removal 1
  • chromosome_visualization 1
  • splice 1
  • mkvdjref 1
  • polymut 1
  • polymorphic 1
  • polymorphic sites 1
  • protein coding genes 1
  • access 1
  • cadd 1
  • antitarget 1
  • cellpose 1
  • export 1
  • target 1
  • hifi 1
  • partition histograms 1
  • Assembly 1
  • domains 1
  • concoct 1
  • compartments 1
  • nucleotide composition 1
  • subcontigs 1
  • topology 1
  • calder2 1
  • bases 1
  • sizes 1
  • UShER 1
  • groupreads 1
  • str 1
  • faqcs 1
  • antibiotic resistance genes 1
  • ARGs 1
  • ANI 1
  • homozygosity 1
  • biallelic 1
  • SRA 1
  • ENA 1
  • public 1
  • update header 1
  • consensus sequence 1
  • duplexumi 1
  • unmapped 1
  • sorting 1
  • ubam 1
  • zipperbams 1
  • single molecule 1
  • generate 1
  • random 1
  • BCF 1
  • lint 1
  • fq 1
  • rust 1
  • variant caller 1
  • somatic variant calling 1
  • germline variant calling 1
  • bacterial variant calling 1
  • bootstrapping 1
  • autozygosity 1
  • bamtobed 1
  • region 1
  • eigenstratdatabasetools 1
  • shiftBed 1
  • multinterval 1
  • overlapped bed 1
  • maskfasta 1
  • blastx 1
  • segment 1
  • chunking 1
  • duphold 1
  • structural variation 1
  • depth information 1
  • escherichia coli 1
  • PEP 1
  • schema 1
  • pep 1
  • eklipse 1
  • closest 1
  • jaccard 1
  • circos 1
  • deletion 1
  • overlap 1
  • split by chromosome 1
  • getfasta 1
  • embl 1
  • genbank 1
  • swissprot 1
  • Streptococcus pyogenes 1
  • endogenous DNA 1
  • percent on target 1
  • cache 1
  • genomecov 1
  • gccounter 1
  • reformat 1
  • scramble 1
  • GRO-cap 1
  • identification 1
  • illumina datasets 1
  • phylogenetic composition 1
  • hybrid-selection 1
  • mate-pair 1
  • liftovervcf 1
  • pcr 1
  • picard/renamesampleinvcf 1
  • sortvcf 1
  • deletions 1
  • insertions 1
  • tandem duplications 1
  • CoPRO 1
  • PRO-cap 1
  • phantom peaks 1
  • CAGE 1
  • NETCAGE 1
  • RAMPAGE 1
  • csRNA-seq 1
  • STRIPE-seq 1
  • PRO-seq 1
  • GRO-seq 1
  • genetic 1
  • exclude 1
  • variant identifiers 1
  • subset 1
  • indep 1
  • indep pairwise 1
  • recode 1
  • crispr 1
  • ChIP-Seq 1
  • identifiers 1
  • pairtools 1
  • graph stats 1
  • graph unchopping 1
  • graph formats 1
  • graph viz 1
  • tumor/normal 1
  • hla-typing 1
  • ILP 1
  • HLA-I 1
  • block-compressed 1
  • HLA 1
  • PCR/optical duplicates 1
  • flip 1
  • upper-triangular matrix 1
  • ligation junctions 1
  • pairstools 1
  • motif 1
  • restriction fragments 1
  • select 1
  • covid 1
  • pangolin 1
  • lineage 1
  • paragraph 1
  • graphs 1
  • pbbam 1
  • pbmerge 1
  • subreads 1
  • pbp 1
  • pair-end 1
  • read 1
  • pedigrees 1
  • whole genome association 1
  • scoring 1
  • odgi 1
  • LCA 1
  • fragment_size 1
  • inner_distance 1
  • read distribution 1
  • sequence-based 1
  • mapping-based 1
  • nuclear contamination estimate 1
  • integrity 1
  • rtg 1
  • pedfilter 1
  • rocplot 1
  • rtg-tools 1
  • salsa 1
  • salsa2 1
  • Ancestor 1
  • experiment 1
  • multimapper 1
  • flagstat 1
  • sambamba 1
  • duplicate marking 1
  • amplicon 1
  • ampliconclip 1
  • calmd 1
  • faidx 1
  • insert size 1
  • repair 1
  • paired 1
  • read pairs 1
  • post Post-processing 1
  • readgroup 1
  • read_pairs 1
  • strandedness 1
  • variant genetic 1
  • duplicate purging 1
  • pmdtools 1
  • porechop_abi 1
  • contact 1
  • pretext 1
  • jpg 1
  • bmp 1
  • contact maps 1
  • gene finding 1
  • intervals coverage 1
  • genomic intervals 1
  • normal database 1
  • panel of normals 1
  • cutoff 1
  • haplotype purging 1
  • false duplications 1
  • allele counts 1
  • assembly curation 1
  • Haplotype purging 1
  • False duplications 1
  • Assembly curation 1
  • purging 1
  • installation 1
  • doCounts 1
  • quast 1
  • neighbour-joining 1
  • subsampling 1
  • long uncorrected reads 1
  • rhocall 1
  • R 1
  • bamstat 1
  • combine graphs 1
  • squeeze 1
  • HMMER 1
  • adapter removal 1
  • effective genome size 1
  • Klebsiella 1
  • pneumoniae 1
  • ancientDNA 1
  • kegg 1
  • kofamscan 1
  • combining 1
  • authentict 1
  • read group 1
  • bias 1
  • reorder 1
  • spliced 1
  • train 1
  • collapsing 1
  • digital normalization 1
  • legionella 1
  • clinical 1
  • pneumophila 1
  • limma 1
  • Listeria monocytogenes 1
  • ATLAS 1
  • lofreq/call 1
  • lofreq/filter 1
  • qualities 1
  • AMP 1
  • peptide prediction 1
  • sequencing_bias 1
  • functional genomics 1
  • sgRNA 1
  • k-mer counting 1
  • single-stranded 1
  • maximum-likelihood 1
  • pixel_classification 1
  • amino acid 1
  • Hidden Markov Model 1
  • hmtnote 1
  • annotations 1
  • pos 1
  • haemophilus 1
  • panel_of_normals 1
  • IDR 1
  • igv 1
  • igv.js 1
  • js 1
  • genome browser 1
  • multicut 1
  • pixel classification 1
  • probability_maps 1
  • quant 1
  • population genomics 1
  • interproscan 1
  • genomic islands 1
  • insertion 1
  • autofluorescence 1
  • jasminesv 1
  • jasmine 1
  • Python 1
  • Jupyter 1
  • jupytext 1
  • papermill 1
  • cycif 1
  • background 1
  • kallisto/index 1
  • CRISPR-Cas9 1
  • rra 1
  • graph drawing 1
  • Beautiful stand-alone HTML report 1
  • de Bruijn 1
  • microrna 1
  • target prediction 1
  • mitochondrial genome 1
  • reference genome 1
  • mosdepth 1
  • otu table 1
  • microsatellite instability 1
  • scan 1
  • mtnucratio 1
  • ratio 1
  • mitochondrial to nuclear ratio 1
  • bioinformatics tools 1
  • GATK UnifiedGenotyper 1
  • mbias 1
  • SNP table 1
  • contaminant 1
  • cancer genome 1
  • somatic structural variations 1
  • mobile element insertions 1
  • sequencing summary 1
  • NextGenMap 1
  • ngm 1
  • Neisseria gonorrhoeae 1
  • gender 1
  • http(s) 1
  • utility 1
  • RNA-seq 1
  • graph construction 1
  • assembler 1
  • methylation bias 1
  • DNA damage 1
  • MD5 1
  • NGS 1
  • damage patterns 1
  • estimate 1
  • post mortem damage 1
  • taxonomic assignment 1
  • mash/sketch 1
  • reduced 1
  • representations 1
  • maxbin2 1
  • atlas 1
  • metagenome-assembled genomes 1
  • mkarv 1
  • mass-spectroscopy 1
  • mcr-1 1
  • 128 bit 1
  • metaphlan 1
  • megahit 1
  • denovo 1
  • debruijn 1
  • daa 1
  • rma6 1
  • Neisseria meningitidis 1
  • k-mer frequency 1
  • 3D heat map 1
  • contour map 1
  • Merqury 1
  • assembly evaluation 1
  • smudgeplot 1
  • ploidy 1
  • unionsum 1
  • scanpy 1

contiguate draft genome assembly

010

results versions

Screen assemblies for antimicrobial resistance against multiple databases

010

report versions

abricate:

Mass screening of contigs for antibiotic resistance genes

Screen assemblies for antimicrobial resistance against multiple databases

01

report versions

abricate:

Mass screening of contigs for antibiotic resistance genes

A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes

01

matches partials virulence out txt versions

abritamr:

A pipeline for running AMRfinderPlus and collating results into functional classes

Trim sequencing adapters and collapse overlapping reads

010

singles_truncated discarded paired_truncated collapsed collapsed_truncated paired_interleaved settings versions

Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.

01

fixed_fastq versions

ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).

01230

ancestry_fractions allele_frequencies versions

Read CEL files into an ExpressionSet and generate a matrix

01201

rds expression annotation versions

affy:

Methods for Affymetrix Oligonucleotide Arrays

Converts a GFF/GTF file into a proper GTF file

01

output_gtf log versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Converts a GFF/GTF file into a TSV file

01

tsv versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file

01

output_gff log versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Add intron features to gtf/gff file without intron features.

010

gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.

0100

gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.

010

gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Provides different type of statistics in text format from a GFF/GTF annotation file

01

stats_txt versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Provides basic statistics in text format from a GFF/GTF annotation file

01

stats_txt versions

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants

01

summary results_dir versions

ALE: assembly likelihood estimator.

012

ale versions

Generates a count of coverage of alleles

01200

allelecount versions

A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.

0100

sample_dir txt csv faa summary_csv summary_html log results_db results_db_dmnd results_db_fasta results_db_tsv versions

A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.

0

cluster_tsv rep_cluster_tsv log versions

ampcombi2/cluster:

A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.

A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.

0

tsv log versions

ampcombi2/complete:

This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

01000

sample_dir contig_gbks txt tsv faa sample_log full_log results_db results_db_dmnd results_db_fasta results_db_tsv versions

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.

01000

amps_faa amps_tsv versions

AMPlify is an attentive deep learning model for antimicrobial peptide prediction.

010

tsv versions

amplify:

Attentive deep learning model for antimicrobial peptide prediction

Post-processing script of the MaltExtract component of the HOPS package

000

json summary_pdf tsv candidate_pdfs versions

Identify antimicrobial resistance in gene or protein sequences

010

report mutation_report versions tool_version db_version

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Identify antimicrobial resistance in gene or protein sequences

NO input

db versions

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.

0101

txt versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculates base frequency statistics across reference positions from BAM.

0123

depth_sample depth_global qs pos counts icounts versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculated genotype likelihoods from BAM files.

010101

genotype_likelihood versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Module to subset AnnData object to cells with matching barcodes from the csv file

012

h5ad versions

anndata:

An annotated data matrix.

Get the size (n_cells or n_genes) of an anndata object stored as a h5ad file

010

size versions

anndata:

An annotated data matrix.

Annotation and Ranking of Structural Variation

012301010101

tsv unannotated_tsv vcf versions

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

annotations versions

annotsv:

Annotation and Ranking of Structural Variation

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

0123012

translated_mrna total_mrna translation buffering mrna_abundance rdata fold_change_plot interaction_p_distribution_plot residual_distribution_summary_plot residual_vs_fitted_plot rvm_fit_for_all_contrasts_group_plot rvm_fit_for_interactions_plot rvm_fit_for_omnibus_group_plot simulated_vs_obt_dfbetas_without_interaction_plot session_info versions

anota2seq:

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

0100

clusterblast_file html_accessory_files knownclusterblast_html knownclusterblast_dir knownclusterblast_txt svg_files_clusterblast svg_files_knownclusterblast gbk_input json_results log zip gbk_results clusterblastoutput html knownclusterblastoutput json_sideloading versions

antismashlite:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

000

database antismash_dir versions

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

01

extracted_reads_fastq log intermediate_sam intermediate_bam intermediate_sorted_bam versions

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

0100

tsv versions

CLI Download utility

01

downloaded_file versions

Download and prepare database for Ariba analysis

01

db versions

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Query input FASTQs against Ariba formatted databases

0101

results versions

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

0101010000

fusions fusions_fail versions

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

0

blacklist cytobands protein_domains known_fusions versions

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Simulation tool to generate synthetic Illumina next-generation sequencing reads

01000

fastq aln sam versions

art:

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.

Aggregates fastq files with demultiplexed reads

01

fastq versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

Run the alignment/variant-call/consensus logic of the artic pipeline

0100000000

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

copy number profiles of tumour cells.

01234000000

allelefreqs bafs cnvs logrs metrics png purityploidy segments versions

Alignment by Simultaneous Harmonization of Layer/Adjacency Registration

0100

tif versions

Assembly summary statistics in JSON format

01

json versions

ataqv function of a corresponding ataqv tool

012300000

json problems versions

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

mkarv function of a corresponding ataqv tool

0

html versions

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

generate VCF file from a BAM file using various calling methods

012340000

vcf versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Estimate the post-mortem damage patterns of DNA

012300

empiric exponential counts table versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Gives an estimation of the sequencing bias based on known invariant sites

0123400

recal_patterns versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

split single end read groups by length and merge paired end reads

01234

bam txt versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Generate tables of feature metadata from GTF files

0101

feature_annotation filtered_cdna versions

atlasgeneannotationmanipulation:

Scripts for manipulating gene annotation

Use deamination patterns to estimate contamination in single-stranded libraries

010101

txt versions

authentict:

Estimates present-day DNA contamination in ancient DNA single-stranded libraries.

Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images.

0101

backsub_tif markerout versions

A bacteriophage lifestyle prediction tool

01

bacphlip_results hmmsearch_results versions

Annotation of bacterial genomes (isolates, MAGs) and plasmids

01000

embl faa ffn fna gbff gff hypotheticals_tsv hypotheticals_faa tsv txt versions

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.

Downloads BAKTA database from Zenodo

NO input

db versions

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data

012

fastq versions

bam2fastx:

Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files

removes unused references from header of sorted BAM/CRAM files.

01

bam versions

This module is used to clip primer sequences from your alignments.

0123

bam bai versions

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

primary_filtered_bam contamination_bam versions

write your description here

01

json versions

bamstats:

A command line tool to compute mapping statistics from a BAM file

Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis

01

fastq versions

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

data versions

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

bam versions

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

stats versions

bamtools:

C++ API & command-line toolkit for working with BAM data

trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’, or by soft clipping

0123

bam versions

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

Render an assembly graph in GFA 1.0 format to PNG and SVG image formats

01

png svg versions

bandage:

Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

gff versions

Demultiplex Element Biosciences bases files

012

sample_fastq sample_json qc_report run_stats generated_run_manifest metrics unassigned versions

BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.

01

fields versions

Align short or PacBio reads to a reference genome using BBMap

010

bam log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Adapter and quality trimming of sequencing reads

010

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Merging overlapping paired reads into a single read.

010

merged unmerged ihist versions log

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.

01

fastq log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Split sequencing reads by mapping them to multiple references simultaneously

0100010

index primary_fastq all_fastq stats log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates

01

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Filter out sequences by sequence header name(s)

01000

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.

0

index versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.

01

covstats hist versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Compares query sketches to reference sketches hosted on a remote server via the Internet.

01

hits versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Add or remove annotations.

012340

vcf tbi csi versions

annotate:

Add or remove annotations.

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Concatenate VCF files

012

vcf tbi csi versions

concat:

Concatenate VCF files.

Compresses VCF files

01234

fasta versions

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

Converts certain output formats to VCF

012010

vcf_gz vcf bcf_gz bcf hap legend samples versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools Haplotype-aware consequence caller

01010101

vcf tbi csi versions

reheader:

Haplotype-aware consequence caller

Filters VCF files

012

vcf tbi csi versions

filter:

Apply fixed-threshold filters to VCF files.

Index VCF tools

01

csi tbi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Apply set operations to VCF files

012

results versions

isec:

Computes intersections, unions and complements of VCF files.

Merge VCF files

012010101

vcf index versions

merge:

Merge VCF files.

Compresses VCF files

012010

vcf tbi stats mpileup versions

mpileup:

Generates genotype likelihoods at each genomic position with coverage.

Normalize VCF file

01201

vcf tbi csi versions

norm:

Normalize VCF files.

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

01200

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Split VCF by chunks or regions, creating multiple VCFs.

01200000

scatter tbi csi versions

pluginscatter:

Split VCF by chunks or regions, creating multiple VCFs.

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

0120000

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

Split VCF by sample, creating single- or multi-sample VCFs.

0120000

vcf tbi csi versions

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

01200

vcf tbi csi versions

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

Extracts fields from VCF or BCF files and outputs them in user-defined format.

012000

output versions

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Reheader a VCF file

012301

vcf index versions

reheader:

Modify header of VCF/BCF files, change sample names.

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

012010000

roh versions

roh:

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

Sorts VCF files

01

vcf tbi csi versions

sort:

Sort VCF files by coordinates.

Split a vcf file into files per chromosome

012

split_vcf versions

bcftools:

Sort VCF files by coordinates.

Generates stats from VCF files

0120101010101

stats versions

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Demultiplex Illumina BCL files

012

fastq fastq_idx undetermined undetermined_idx reports stats interop versions

Demultiplex Illumina BCL files

012

fastq fastq_idx undetermined undetermined_idx reports logs interop versions

Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.

010000

vcf log versions

beagle5:

Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.

Convert a BED file to a VCF file according to a YAML config

01201

vcf versions

Convert BAM/GFF/GTF/GVF/PSL files to bed

01

bed versions

bedops:

High-performance genomic feature operations.

Convert gtf format to bed format

01

bed versions

gtf2bed:

The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.

Converts a bam file to a bed12 file.

01

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

For each feature in A, finds the closest feature (upstream or downstream) in B.

0120

output versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

computes both the depth and breadth of coverage of features in file B on the features in file A

0120

bed versions

bedtools:

A powerful toolset for genome arithmetic

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

012000

genomecov versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

extract sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

01201

intersect versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Calculate Jaccard statistic b/w two feature files.

01201

tsv versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Makes adjacent or sliding windows across a genome or BED file.

01

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

01201

mapped versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

masks sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.

01

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Profiles the nucleotide content of intervals in a fasta file.

012

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Shifts each feature by specific number of bases

0101

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)

010

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Sorts a feature file by chromosome and other criteria.

010

sorted versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Split BED files into several smaller BED files

012

beds versions

bedtools:

A powerful toolset for genome arithmetic

Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.

012

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Combines multiple BedGraph files into a single file

0101

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats.

01

output versions

Locate and tag duplicate reads in a BAM file

01

bam metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Merge a list of sorted bam files

01

bam bam_index checksum versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Parallel sorting and duplicate marking

0101

bam bam_index cram metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Use k-mers to rapidly subtype S. enterica genomes

010

summary kmer_results simple_summary versions

Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.

010

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

010

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Summarize and/or filter reads based on bisulfite conversion rate

0120

bsconv_bam versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.

01230

epiread_bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Indexes a reference genome for use with Biscuit

0

index versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Merges methylation information for opposite-strand C's in a CpG context

010

mergecg_bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

012340

vcf versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Perform basic quality control on a BAM file generated with Biscuit

010

biscuit_qc_reports versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.

01

bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Performs alignment of BS-Seq reads using bismark

010101

bam report unmapped versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Relates methylation calls back to genomic cytosine contexts.

010101

coverage report summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Removes alignments to the same position in the genome from the Bismark mapping output.

01

bam report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.

01

index versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Extracts methylation information for individual cytosines from alignments.

0101

bedgraph methylation_calls coverage report mbias versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Collects bismark alignment reports

01234

report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

00000

summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Retrieve entries from a BLAST database

01201

fasta text versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

txt versions

blast:

BLAST finds regions of similarity between biological sequences.

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

01010

xml tsv csv versions

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

Builds a BLAST database

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

txt versions

blast:

Protein to Translated Nucleotide BLAST.

Downloads a BLAST database from NCBI

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a sequence subject

0101

psl versions

Align reads to a reference genome using bowtie

01010

bam log fastq versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create bowtie index for reference genome

01

index versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

01010100

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

01

index versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

010

reports txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

db bracken_files versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Benchmarking Universal Single Copy Orthologs

metafastamodelineagebusco_lineages_pathconfig_file

meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_proteins versions

Benchmarking Universal Single Copy Orthologs

010000

batch_summary short_summaries_txt short_summaries_json full_table missing_busco_list single_copy_proteins seq_dir translated_dir busco_dir versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

BUSCO plot generation tool

0

png versions

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Find SA coordinates of the input reads for bwa short-read mapping

0101

sai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA index for reference genome

01

index versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

bam cram csi crai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-mem2 index for reference genome

01

index versions

bwamem2:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

sam bam cram crai csi versions

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-MEME index for reference genome

01

index versions

bwameme:

Faster BWA-MEM2 using learned-index

Performs fastq alignment to a fasta reference using BWA-MEME

010101000

sam bam cram crai csi versions

bwameme:

Faster BWA-MEM2 using learned-index

Performs alignment of BS-Seq reads using bwameth

010101

bam versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Performs indexing of c2t converted reference genome

01

index versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

010

tsv versions

Analysis of gene family evolution

010

cafe versions cafe_base_count cafe_significant_trees cafe_report cafe_results

Hierarchical Hi-C compartment computation

010

output_folder intermediate_data_folder versions

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

0100

report assembly contigs corrected_reads corrected_trimmed_reads metadata contig_position contig_info versions

A module for concatenation of gzipped or uncompressed files

01

file_out versions

cat:

Just concatenation

Concatenates fastq files

01

reads versions

cat:

The cat utility reads files sequentially, writing them to the standard output.

Cluster protein sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Cluster nucleotide sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth

01000

celltypes quality versions

Module to use CellBender to remove ambient RNA from single-cell RNA-seq data

0123

h5ad versions

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data

01

h5 filtered_h5 posterior_h5 barcodes metrics report pdf log checkpoint versions

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

cellpose segments cells in images

010

mask flows versions

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.

010

outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.

012

fastq undetermined_fastq reports stats interop versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.

0

gtf versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.

000

reference versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.

0000

reference versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.

00101010101010000000000000

config outs versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.

010

outs versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.

01230

outs lib versions

cellrangerarc:

Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.

00

versions fastq

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.

0

gtf versions

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.

00000

reference config versions

cellrangerarc:

Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.

Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.

010

outs versions

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.

00

versions fastq

cellranger-atac:

Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.

00000

reference versions

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.

01234

base cell sample allele_depth depth_coverage depth_other versions

cellsnp:

Efficient genotyping bi-allelic SNPs on single cells

Build centrifuge database for taxonomic profiling

010000

cf versions

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

01000

report results sam fastq_mapped fastq_unmapped versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

010

kreport versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

0100

checkm_output marker_file checkm_tsv versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

01230

output fasta versions

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 database download

0

database versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

CheckM2 bin quality prediction

0101

checkm2_output checkm2_tsv versions

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

A simple program to parse Illumina NGS data and check it for quality criteria

010

report versions

Construct the database necessary for checkv's quality assessment

NO input

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Assess the quality of metagenome-assembled viral genomes.

010

quality_summary completeness contamination complete_genomes proviruses viruses versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

010

checkv_db versions

checkv:

Assess the quality of metagenome-assembled viral genomes.

Create a schema to determine the allelic profiles of a genome

0100

schema cds_coordinates invalid_cds versions

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Filter and trim long read data.

010

fastq versions

zcat:

zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

0101010000

bed bam tagAlign pairs versions

chromap:

Fast alignment and preprocessing of chromatin profiles

Indexes a fasta reference genome ready for chromatin profiling.

01

index versions

chromap:

Fast alignment and preprocessing of chromatin profiles

Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.

01010101010101

plots versions

Annotate circRNAs detected in the output from CIRCexplorer2 parse

0100

txt versions

circexplorer2:

Circular RNA analysis toolkits

CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.

01

junction versions

circexplorer2:

Circular RNA analysis toolkit

A method to improve mappings on circular genomes, using the BWA mapper.

010101

fasta elongated versions

circulargenerator:

Creating a modified reference genome, with an elongation of the an specified amount of bases

Realign reads mapped with BWA to elongated reference genome

01010101

bam versions

circularmapper:

A method to improve mappings on circular genomes such as Mitochondria.

binning of metagenomic sequences

01

fasta bins fm index links result versions

Runs the Clippy CLIP peak caller

0100

peaks summits intergenic_gtf versions

Predict recomination events in bacterial genomes

012

emsim em status newick fasta pos_ref versions

Align sequences using Clustal Omega

01010

alignment versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

Renders a guidetree in clustalo

01

tree versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

Calculates polymorphic site rates over protein coding genes

01234

polymut versions

cmseq:

Set of utilities on sequences and BAM files

Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.

0101

bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Derive off-target (“antitarget”) bins from target regions.

01

bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

012010101010

bed cnn cnr cns pdf png versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number

012

cns versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.

01

output versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

012

tsv cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Compile a coverage reference from the given files (normal samples).

000

cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Transform bait intervals into targets more suitable for CNVkit.

0101

bed versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.

012010101

root tab versions

cnvnator:

Tool for calling copy number variations.

convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.

01

vcf versions

cnvnator:

Tool for calling copy number variations.

Command line tool for calling CNVs in whole genome sequencing data

010

pytor versions

cnvpytor:

calling CNVs using read depth

calculates read depth histograms

010

pytor versions

cnvpytor:

calling CNVs using read depth

command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.

01200

pytor versions

cnvpytor -rd:

calling CNVs using read depth

partitioning read depth histograms

010

pytor versions

cnvpytor:

calling CNVs using read depth

view function to generate vcfs

0100

vcf tsv xls versions

cnvpytor:

calling CNVs using read depth

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.

01010101000

self_circular extended_circular extended_partial extended_failed orphan_end all_cobra_assemblies joining_summary log versions

cobra-meta:

COBRA is a tool to get higher quality viral genomes assembled from metagenomes.

Builds a classic bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Builds a compact bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

args_txt clustering_csv log_txt original_data_csv pca_components_csv pca_transformed_csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Generate the input coverage table for CONCOCT using a BEDFile

0123

tsv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Cut up fasta file in non-overlapping or overlapping parts of equal length.

010

fasta bed versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Creates a FASTA file for each new cluster assigned by CONCOCT

012

fasta versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

01

csv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Calculate confidence scores from Kraken2 output

010

score versions

Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC

012

p_value_txt versions

controlfreec/assesssignificance:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Copy number and genotype annotation from whole genome and whole exome sequencing data

0123456000000000

bedgraph control_cpn sample_cpn gcprofile_cpn BAF CNV info ratio config versions

controlfreec/freec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

01

bed versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Format Freec output to circos input format

01

circos versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

0123

png_baf png_ratio_log2 png_ratio versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

012

png_baf png_ratio_log2 png_ratio versions

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Run matrix balancing on a cool file

012

cool versions

cooler:

Sparse binary format for genomic interaction matrices

Create a cooler from genomic pairs and bins

01230

cool versions

cooler:

Sparse binary format for genomic interaction matrices

Generate fragment-delimited genomic bins

000

bed versions

cooler:

Sparse binary format for genomic interaction matrices

Dump a cooler’s data to a text stream.

012

bedpe versions

cooler:

Sparse binary format for genomic interaction matrices

Generate fixed-width genomic bins

012

bed versions

cooler:

Sparse binary format for genomic interaction matrices

Merge multiple coolers with identical axes

01

cool versions

cooler:

Sparse binary format for genomic interaction matrices

Generate a multi-resolution cooler file by coarsening

01

mcool versions

cooler:

Sparse binary format for genomic interaction matrices

Indexes a directory of fasta files for use with CoPTR

01

index_dir versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.

01

cores masks tma_map centroids versions

Map reads to contigs and estimate coverage

010100

coverage versions

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

Compress files with crabz

01

archive versions

crabz:

Like pigz, but rust

Decompress files with crabz

01

file versions

crabz:

Like pigz, but rust

remove false positives of functional crispr genomics due to CNVs

01200

norm_count_file versions

crisprcleanr:

Analysis of CRISPR functional genomics, remove false positive due to CNVs.

Controllable lossy compression of BAM/CRAM files

0100

bam cram sam bed versions

Concatenate two or more CSV (or TSV) tables into a single table

0100

csv versions

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Join two or more CSV (or TSV) tables by selected fields into a single table

01

csv versions

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Splits CSV/TSV into multiple files according to column values

0100

split_csv versions

csvtk:

CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.

Custom module to Add a new fasta file to an old one and update an associated GTF

012010

fasta gtf versions

custom:

Custom module to Add a new fasta file to an old one and update an associated GTF

Custom module used to dump software versions within the nf-core pipeline template

0

yml mqc_yml versions

custom:

Custom module used to dump software versions within the nf-core pipeline template

Filters a differential expression table based on logFC and adjusted p-value thresholds

010000

filtered versions

pandas:

Python library for data manipulation and analysis

Generates a FASTA file of chromosome sizes and a fasta index file

01

sizes fai gzi versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

0101

gtf versions

gtffilter:

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

filter a matrix based on a minimum value and numbers of samples that must pass.

0101

filtered tests session_info versions

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

Test for the presence of suitable NCBI settings or create them on the fly.

NO input

ncbi_settings versions

sratools:

SRA Toolkit and SDK from NCBI

Make a GSEA class file (.cls) from tabular inputs

01

cls versions

custom:

Make a GSEA class file (.cls) from tabular inputs

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

01

gct versions

tabulartogseagct:

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.

0101000

tx2gene versions

custom:

"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"

Perform adapter/quality trimming on sequencing reads

01

reads log versions

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

structural-variant calling with cutesv

01201

vcf versions

A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage

01000

results versions

DAS Tool binning step.

01200

log summary contig2bin eval bins pdfs fasta_proteins candidates_faa fasta_archaea_scg fasta_bacteria_scg b6 seqlength versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

010

fastatocontig2bin versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

010

scaffolds2bin versions

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.

0

report versions

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

010

dc_estimate dc_pvals versions

DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).

01

bam json hist log versions

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

NO input

db versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

0120

daa daa_tsv arg potential_arg versions

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.

NO input

db versions

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.

010

readme log json bgc_gbk bgc_tsv full_gbk pfam_tsv bgc_png pr_png roc_png score_png versions

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

Deepcell/mesmer segmentation for whole-cell

0101

mask versions

mesmer:

Deep cell is a collection of tools to segment imaging data

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

vcf vcf_tbi gvcf gvcf_tbi versions

A Deep Learning Model for Transmembrane Topology Prediction and Classification

01

gff3 line3 md csv png versions

This tool filters alignments in a BAM/CRAM file according the the specified parameters.

012

bam logs versions

deeptools:

A set of user-friendly tools for normalization and visualzation of deep-sequencing data

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.

01200

bigwig bedgraph versions

deeptools:

A set of user-friendly tools for normalization and visualzation of deep-sequencing data

calculates scores per genome regions for other deeptools plotting utilities

010

matrix table versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Computes read coverage for genomic regions (bins) across the entire genome.

0123

matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

0100

pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots cumulative reads coverages by BAM file

012

pdf matrix metrics versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a heatmap

01

pdf table versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.

01

pdf tab versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a profile plot

01

pdf table versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

Call variants from the examples produced by make_examples

01

call_variants_tfrecords versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Transforms the input alignments to a format suitable for the deep neural network variant caller

012301010101

examples gvcf versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012010101

vcf vcf_tbi gvcf gvcf_tbi versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

vcf vcf_tbi gvcf gvcf_tbi versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01

report versions

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Call structural variants

0123450101

bcf csi versions

delly:

Structural variant discovery by integrated paired-end and split-read analysis

Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.

0120000

zarr out_zarr versions

runs a differential expression analysis with DESeq2

01230120101

results dispersion_plot rdata size_factors normalised_counts rlog_counts vst_counts model session_info versions

deseq2:

Differential gene expression analysis based on the negative binomial distribution

Queries a DIAMOND database using blastp mode

010100

blast xml txt daa sam tsv paf versions

diamond:

Accelerated BLAST compatible local sequence aligner

Queries a DIAMOND database using blastx mode

010100

blast xml txt daa sam tsv paf log versions

diamond:

Accelerated BLAST compatible local sequence aligner

calculate clusters of highly similar sequences

01

tsv versions

diamond:

Accelerated BLAST compatible local sequence aligner

Builds a DIAMOND database

01000

db versions

diamond:

Accelerated BLAST compatible local sequence aligner

Doublet detection in single-cell RNA-seq data

01

h5ad predictions versions

Performs fastq alignment to a reference using DRAGMAP

0101010

sam bam cram crai csi log versions

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

01

hashmap versions

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Assemble bacterial isolate genomes from Nanopore reads

012

contigs log raw_contigs gfa txt versions

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Assessment of duplication rates in RNA-Seq datasets

0101

scatter2d boxplot hist dupmatrix intercept_slope multiqc session_info versions

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

vcf tbi versions

In silico prediction of E. coli serotype

01

log tsv txt versions

Fast genome-wide functional annotation through orthology assignment.

010001

annotations orthologs hits versions

Convert any PEP project or Nextflow samplesheet to any format

000

versions samplesheet_converted

eido:

Convert any PEP project or Nextflow samplesheet to any format

Validate samplesheet or PEP config against a schema

000

versions log

validate:

Validate samplesheet or PEP config against a schema.

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

0123

tsv json versions

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

tool for detection and quantification of large mtDNA rearrangements.

0120

deletions genes circos versions

Convert a file in FASTA format to the ELFASTA format

01

elfasta log versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.

01

consensus versions

emboss:

The European Molecular Biology Open Software Suite

the revseq program from emboss reverse complements a nucleotide sequence

01

revseq versions

emboss:

The European Molecular Biology Open Software Suite

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

010

outseq versions

emboss:

The European Molecular Biology Open Software Suite

EMM typing of Streptococcus pyogenes assemblies

01

tsv versions

endorS.py calculates endogenous DNA from samtools flagstat files and print to screen

0123

json versions

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

cache versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

0120000010

vcf tab json report versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Searches a term in a public NCBI database

010

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using Unique Identifier(s)

0120

xml versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using an UID

01000

txt versions

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

phylogenetic placement of query sequences in a reference tree

012300

epang jplace log versions

epang:

Massively parallel phylogenetic placement of genetic sequences

splits an alignment into reference and query parts

012

query reference versions

epang:

Massively parallel phylogenetic placement of genetic sequences

estimation of the unfolded site frequency spectrum

0123

sfs_out pvalues_out versions

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

01

dropset okayset versions

evigene:

EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.

Estimate repeat sizes using NGS data

012010101

vcf json bam versions

Merge STR profiles into a multi-sample STR profile

010101

merged_profiles versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Compute genome-wide STR profile

0120101

locus_tsv motif_tsv str_profile versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Run falco on sequenced reads

01

html txt versions

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

Aligns sequences using FAMSA

01010

alignment versions

famsa:

Algorithm for large-scale multiple sequence alignments

Renders a guidetree in famsa

01

tree versions

famsa:

Algorithm for large-scale multiple sequence alignments

Perform adapter and quality trimming on sequencing reads with reporting

01

reads stats debug statspdf reads_fail reads_unpaired log versions

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

Alignment-free computation of average nucleotide Identity (ANI)

010

ani versions

"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."

01

success_log error_log versions

fasta_validate:

"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"

Quickly compute statistics over a fasta file in windows.

01

freq mononuc dinuc trinuc tetranuc versions

A fast K-mer counter for high-fidelity shotgun datasets

01

hist ktab prof versions

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A fast K-mer counter for high-fidelity shotgun datasets

01

hist versions

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A tool to merge FastK histograms

0123

hist ktab prof versions

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

Distance-based phylogeny with FastME

012

nwk stats matrix bootstrap versions

Perform adapter/quality trimming on sequencing reads

010000

reads json html log reads_fail reads_merged versions

Run FastQC on sequenced reads

01

html zip versions

fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.

01

tsv versions

FASTQ summary statistics in JSON format

01

json versions

Build fastq screen config file from bowtie index files

00

database versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Align reads to multiple reference genomes using fastq-screen

010

txt png html fastq versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Produces a Newick format phylogeny from a multiple sequence alignment. Capable of bacterial genome size alignments.

0

phylogeny versions

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

01

fasta versions

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Run NCBI's FCS adaptor on assembled genomes

01

cleaned_assembly adaptor_report log pipeline_args skipped_trims versions

fcs:

The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.

Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.

010

fcs_gx_report taxonomy_report versions

fcs:

"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."

Fetches the NCBI FCS-GX database using a provided manifest URL

0

database versions

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to screen and remove foreign contamination from genome assemblies

01200

fcsgx_report taxonomy_report log hits versions

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

A command line tool that makes it easier to find sequencing data from the SRA / GEO / ENA.

0

json versions

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Collects a suite of metrics to QC duplex sequencing data.

010

family_sizes duplex_family_sizes duplex_yield_metrics umi_counts duplex_qc duplex_umi_counts versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

r-ggplot2:

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

01

bam cram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

0101000

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

010

bam histogram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.

01

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM

01010101

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Filtlong filters long reads based on quality measures or short read data.

012

reads log versions

Perform merging of mate paired-end sequencing reads

01

merged notcombined histogram versions

De novo assembler for single molecule sequencing reads

010

fasta gfa gv txt log json versions

Efficient compression tool for protein structures

01

fcz versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Decompression tool for foldcomp compressed structures

01

pdb versions

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Aligns protein structures using foldmason

010

msa_3di msa_aa versions

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Create a database from protein structures

01

db versions

foldseek:

Foldseek: fast and accurate protein structure search

Search for protein structural hits against a foldseek database of protein structures

0101

aln versions

foldseek:

Foldseek: fast and accurate protein structure search

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

0

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

fq lint is a FASTQ file pair validator.

01

lint versions

fq:

fq is a library to generate and validate FASTQ file pairs.

fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.

01

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Demultiplex fastq files

012

sample_fastq metrics most_frequent_unmatched versions

A haplotype-based variant detector

0123450101010101

vcf versions

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012000

lineages summarized versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

specify the relative abundance of each known haplotype

01200

demix versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

downloads new versions of the curated SARS-CoV-2 lineage file and barcodes

0

barcodes lineages_topology lineages_meta versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

call variant and sequencing depth information of the variant

010

variants versions

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

Cluster genome FASTA files by average nucleotide identity

0123

tsv dereplicated_bins versions

Gene Allele Mutation Microbial Assessment

010

gamma psl gff fasta versions

gamma:

Tool for Gene Allele Mutation Microbial Assessment

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Build ganon database using custom reference sequences.

01000

db info versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

010

tre report one all unc log versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

010

tre versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

01

txt versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

assigns taxonomy to query sequences in phylogenetic placement output

012

examineassign profile labelled_tree per_query krona sativa versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Grafts query sequences from phylogenetic placement on the reference tree

01

newick versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

colours a phylogeny with placement densities

01

newick nexus phyloxml svg colours log versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Performs local realignment around indels to correct for mapping errors

012301010101

bam versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

intervals versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

vcf versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Annotates intervals with GC content, mappability, and segmental-duplication content

0101010101010101

annotated_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for alle-specific expression analysis of RNAseq data

012340101010

csv versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

012300000

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

0120000

dragstr_model versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

0123400000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

hdf5 tsv versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234000

split_read_evidence split_read_evidence_index paired_end_evidence paired_end_evidence_index site_depths site_depths_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012000

combined_gvcf versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges adjacent DepthEvidence records

012000

condensed_evidence condensed_evidence_index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

01

pon versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

01

dict versions

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals contraining germline and artifactual sites for use with mutect2.

01010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Denoises read counts to produce denoised copy ratios

0101

standardized denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

0123010

calls model versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Estimates the numbers of unique molecules in a sequencing library.

01000

metrics versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts FastQ file to SAM/BAM format

01

bam versions

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

012300000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers scattered BQSR recalibration reports into a single file

01

table versions

gatk4:

Genome Analysis Toolkit (GATK4)

write your description here

010

table versions

gatk4:

Genome Analysis Toolkit (GATK4)

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345000

genomicsdb updatedb intervallist versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

012340101010101

vcf tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

cohortcalls cohortmodel casecalls versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

012301010100

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

vcf tbi bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an index for a feature file, e.g. VCF or BED file.

01

index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Converts an Picard IntervalList file to a BED file.

01

bed versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits the interval list file into unique, equally-sized interval files and place it under a directory

01

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts

01

artifactprior versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

0123000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

0120101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges mutect2 stats generated on different intervals/regions

01

stats versions

gatk4:

Genome Analysis Toolkit (GATK4)

Merges several vcf files

0101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

01230101010000

vcf tbi stats f1r2 versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

0123

intervals segments denoised versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Prepares bins for coverage collection.

0101010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

bam cram sam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

0120000

printed_evidence printed_evidence_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

012300000

vcf versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

01

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

01

fastq versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

0123

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a fasta with the bases shifted by offset

010101

shift_fa shift_fai shift_back_chain dict intervals shift_intervals versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence

01201000

baf baf_tbi versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits CRAM files efficiently by taking advantage of their container based structure

01

split_crams versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Split intervals into sub-interval files.

01010101

split_intervals versions

gatk4:

Genome Analysis Toolkit (GATK4)

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123000

annotated_vcf index versions

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

0120000

clustered_vcf clustered_vcf_index versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

012010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012000000

recal idx tranches plots versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

012300000

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.

010

output versions

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

0120

genes features clusters gbk json versions

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

Convert a mappability file to bedgraph format

0101

bedgraph sizes versions

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

01

index log versions

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Define the mappability of a reference

010

map versions

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

01

index info versions

gem3:

The GEM indexer (v3).

Performs fastq alignment to a fasta reference using using gem3-mapper

01010

bam versions

gem3:

The GEM indexer (v3).

A derivative of GenomeScope2.0 modified to work with FastK

01

linear_plot log_plot model summary transformed_linear_plot transformed_log_plot kmer_cov versions

create index file for genmap

01

index versions

genmap:

Ultra-fast computation of genome mappability.

create mappability files for a genome

0101

wig bedgraph txt csv versions

genmap:

Ultra-fast computation of genome mappability.

for annotating regions, frequencies, cadd scores

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score compounds

01

vcf versions

genmod:

Annotate genetic inheritance models in variant files

annotate models of inheritance

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Score the variants of a vcf based on their annotation

0120

vcf versions

genmod:

Annotate genetic inheritance models in variant files

Download geNomad databases and related files

NO input

genomad_db versions

genomad:

Identification of mobile genetic elements

Identify mobile genetic elements present in genomic assemblies

010

aggregated_classification taxonomy provirus compositions calibrated_classification plasmid_fasta plasmid_genes plasmid_proteins plasmid_summary virus_fasta virus_genes virus_proteins virus_summary versions

genomad:

Identification of mobile genetic elements

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

01

linear_plot_png transformed_linear_plot_png log_plot_png transformed_log_plot_png model summary lookup_table fitted_histogram_png versions

Genotype Salmonella Typhi from Mykrobe results

01

tsv versions

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

Peak-calling for ChIP-seq and ATAC-seq enrichment experiments

0120

peak versions bedgraph_pvalues bedgraph_pileup bed_intervals duplicates

geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA

0

samples versions

Retrieves GEO data from the Gene Expression Omnibus (GEO)

01

rds expression annotation versions

geoquery:

Get data from NCBI Gene Expression Omnibus (GEO)

Downloads databases needed for running getorganelle

0

db versions

getorganelle:

Get organelle genomes from genome skimming data

Assembles organelle genomes from genomic data

0101

fasta etc versions

getorganelle:

Get organelle genomes from genome skimming data

Collapse walk-preserving shared affixes in variation graphs in GFA format

01

gfa affixes versions

A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

010000000

assembly_summary assembly versions

Converts GFA or rGFA files to FASTA

01

fasta versions

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Summary statistics for GFA files

01

stats versions

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Compare, merge, annotate and estimate accuracy of generated gtf files

0101201

annotated_gtf combined_gtf tmap refmap loci stats tracking versions

Validate, filter, convert and perform various other operations on GFF files

010

gtf gffread_gff gffread_fasta versions

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

01

files output versions

gget:

gget enables efficient querying of genomic databases

Defines chunks where to run imputation

0123

chunk_chr versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.

01234567000

errors_cal errors_grp errors_spl rsquare_grp rsquare_spl versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Concatenates imputation chunks in a single VCF/BCF file ligating phased information.

012

merged_variants versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

012345678

phased_variants versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Generates haplotype calls by sampling haplotype estimates

01

haplo_sampled versions

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Defines chunks where to run imputation

012340

chunk_chr versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Program to compute the genotyping error rate at the sample or marker level.

0123456780123400

errors_cal errors_grp errors_spl rsquare_grp rsquare_spl rsquare_per_site versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.

012

merged_variants versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool for imputation and phasing from vcf file or directly from bam files.

0123456789012

phased_variants stats_coverage versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool to create a binary reference panel for quick reading time.

0123401

bin_ref versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

merge gVCF files and perform joint variant calling

0101

bcf versions

GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).

0120000

barcodes matrix features classification_report config_report summary_report versions

Writes a sorted concatenation of file/s

01

sorted versions

sort:

Writes a sorted concatenation of file/s

Split a file into consecutive or interleaved sections

01

split versions

gnu:

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.

Query metadata for any taxon across the tree of life.

012

taxonsearch versions

goat:

goat-cli is a command line interface to query the Genomes on a Tree Open API.

Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.

01201

output ped bed bed_index roc html png versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Quickly generate evenly sized (by amount of data) regions across a number of bam/cram files

01010

bed versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

runs a functional enrichment analysis with gprofiler2

0100

all_enrich rds plot_png plot_html sub_enrich sub_plot filtered_gmt session_info versions

gprofiler2:

An R interface corresponding to the 2019 update of g:Profiler web tool.

Checks if the input file is bgzip compressed or not

01

compress_bgzip versions

grabix:

a wee tool for random access into BGZF files.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0100

sam versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0

index versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Tools for population-scale genotyping using pangenome graphs.

01201010

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.

Tools for population-scale genotyping using pangenome graphs.

01

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incoperating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

run the Broad Gene Set Enrichment tool in GSEA mode

0123010

rpt index_html heat_map_corr_plot report_tsvs_ref report_htmls_ref report_tsvs_target report_htmls_target ranked_gene_list gene_set_sizes histogram heatmap pvalues_vs_nes_plot ranked_list_corr butterfly_plot gene_set_tsv gene_set_html gene_set_heatmap snapshot gene_set_enplot gene_set_dist archive versions

gsea:

Gene Set Enrichment Analysis (GSEA)

Collapse redundant transcript models in Iso-Seq data.

010

bed bed_trans_reads local_density_error polya read strand_check trans_report versions varcov variants

tama_collapse.py:

Collapse similar gene model

Merge multiple transcriptomes while maintaining source information.

010

bed gene_report merge trans_report versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

01

fasta report tails versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files

01

gt_gff3 error_log versions

gt:

The GenomeTools genome analysis system

GenomeTools gt-gff3validator utility to strictly validate a GFF3 file

01

success_log error_log versions

gt:

The GenomeTools genome analysis system

Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility

01

tabout gff3 fasta inner_fasta versions

gt:

The GenomeTools genome analysis system

GenomeTools gt-stat utility to show statistics about features contained in GFF3 files

01

stats versions

gt:

The GenomeTools genome analysis system

Computes enhanced suffix array using GenomeTools gt-suffixerator utility

010

index versions

gt:

The GenomeTools genome analysis system

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

010100

summary tree markers msa user_msa filtered failed log warnings versions

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Sort GTF files in chr/pos/feature order

0

gtf versions

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes

0

db versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Merging of CheckM and GUNC results in one summary table

012

tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Detection of Chimerism and Contamination in Prokaryotic Genomes

010

maxcss_level_tsv all_levels_tsv versions

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Compresses and decompresses files.

01

gunzip versions

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

01

vcf versions

gvcftools:

gvcftools is a package of small utilities for creating and analyzing gVCF files

Tool to convert and summarize ABRicate outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize DeepARG outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize fARGene outputs using the hAMRonization specification

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize RGI outputs using the hAMRonization specification.

01000

json tsv versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to summarize and combine all hAMRonization reports into a single file

00

json tsv html versions

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

0100

hbd ibd log versions

Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.

01

txt html versions

classification into haplogroups

010

txt versions

haplogrep2:

A tool for mtDNA haplogroup classification.

Somatic VCF Feature Extraction tool from hap.y.

012340101

features versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py

0120101

preprocessed_vcf versions

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.

012340101010101

features metrics stats versions

sompy:

Haplotype VCF comparison tools somatic variant comparison

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

0100

gbk svg tsv versions

Computes PCA eigenvectors for a Hi-C matrix.

01

results pca1 pca2 versions

hicexplorer:

Set of programs to process, analyze and visualize Hi-C and capture Hi-C data

Whole-genome assembly using PacBio HiFi reads

01012012

raw_unitigs corrected_reads source_overlaps reverse_overlaps processed_contigs processed_unitigs primary_contigs alternate_contigs paternal_contigs maternal_contigs log versions

pacbio structural variant calling tool

01201201

vcf csv versions

Align RNA-Seq reads to a reference with HISAT2

010101

bam summary fastq versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Builds HISAT2 index for reference genome

010101

index versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Extracts splicing sites from a gtf files

01

txt versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Pre-compute the graph index structure.

01

graph versions

hlala:

HLA typing from short and long reads

Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.

0123

results extraction extraction_mapped extraction_unmpapped hla fastq reads_per_level remapped versions

hlala:

HLA typing from short and long reads

gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference

01

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.

01

bigwig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file

01

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

readCounter function from HMMcopy utilities, used to generate read in windows

012

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Mask multiple sequence alignments

012345670

maskedaln fmask_rf fmask_all gmask_rf gmask_all pmask_rf pmask_all versions

hmmer:

Biosequence analysis using profile hidden Markov models

reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.

01

seqreformated versions

hmmer:

Biosequence analysis using profile hidden Markov models

hmmalign from the HMMER suite aligns a number of sequences to an HMM profile

010

sthlm versions

hmmer:

Biosequence analysis using profile hidden Markov models

create an hmm profile from a multiple sequence alignment

010

hmm hmmbuildout versions

hmmer:

Biosequence analysis using profile hidden Markov models

extract hmm from hmm database file or create index for hmm database

01000

hmm index versions

hmmer:

Biosequence analysis using profile hidden Markov models

R script that scores output from multiple runs of hmmer/hmmsearch

01

hmmrank versions

hmmer:

Biosequence analysis using profile hidden Markov models

R:

A Language and Environment for Statistical Computing

Tidyverse:

Tidyverse: R packages for data science

search profile(s) against a sequence database

012345

output alignments target_summary domain_summary versions

hmmer:

Biosequence analysis using profile hidden Markov models

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

01

vcf versions

hmtnote:

Human mitochondrial variants annotation using HmtVar.

Annotate peaks with HOMER suite

0100

txt stats versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Find peaks with HOMER suite

01

txt versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Create a tag directory with the HOMER suite

010

tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Create a UCSC bed graph with the HOMER suite

01

bedGraph versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Coverting from HOMER peak to BED file formats

01

bed versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Downloads required reference genomes for Hostile

NO input

reference versions

hostile:

Hostile: accurate host decontamination

Serotype prediction of Haemophilus parasuis assemblies

01

tsv versions

count how many reads map to each feature

01201

txt versions

htseq/count:

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

012012

tsv versions

htsnimtools:

useful command-line tools written to show-case hts-nim

HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.

0101

log dedup annotated stats versions

Assembly polisher using short (and long) reads

0101000

fasta versions

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals

000000

rds txt versions

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA

010000000

rdata seg cna_seg seg_txt corrected_depth ichorcna_params plots genome_plot versions

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

Plot a metagene of cross-link events/sites around various transcriptomic landmarks.

010

tsv versions

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount peaks on a BED file of crosslinks

012

peaks versions

icount:

Computational pipeline for analysis of iCLIP data

Formats a GTF file for use with iCount sigxls

010

gtf regions versions

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount sigxls on a BED file of crosslinks

010

sigxls scores versions

icount:

Computational pipeline for analysis of iCLIP data

Report proportion of cross-link events/sites on each region type.

010

summary_type summary_subtype summary_gene versions

icount:

Computational pipeline for analysis of iCLIP data

Demultiplex paired-end FASTQ files from QuantSeq-Pool

012

fastq undetermined stats versions

Measures reproducibility of ChIP-seq, ATAC-seq peaks using IDR (Irreproducible Discovery Rate)

000

idr log png versions

igv.js is an embeddable interactive genome visualization component

012

browser align_files index_files versions

igv:

Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as teh genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements

A Python application to generate self-contained HTML reports for variant review and other genomic applications

0123012

report versions

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

010101

out_tiff versions

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

0101

output versions

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Strain-level comparisons across multiple inStrain profiles

0120

compare comparisons_table pooled_snv snv_keys snv_info versions

instrain:

Calculation of strain-level metrics

inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification

01000

profile snvs gene_info genome_info linkage mapping_info scaffold_info versions

instrain:

Calculation of strain-level metrics

Produces protein annotations and predictions from an amino acids FASTA file

010

tsv xml gff3 json versions

Download, extract, and check md5 of iPHoP databases

NO input

iphop_db versions

iphop:

Predict host genus from genomes of uncultivated phages.

Predict phage host using iPHoP

010

iphop_genus iphop_genome iphop_detailed_output versions

iphop:

Predict host genus from genomes of uncultivated phages.

Produces a Newick format phylogeny from a multiple sequence alignment using the maxium likelihood algorithm. Capable of bacterial genome size alignments.

012000000000000

phylogeny report mldist lmap_svg lmap_eps lmap_quartetlh sitefreq_out bootstrap state contree nex splits suptree alninfo partlh siteprob sitelh treels rate mlrate exch_matrix log versions

Quantification of transposable elements expression in scRNA-seq

0100

versions results counts log tmp

Genomic island prediction in bacterial and archaeal genomes

01

gff log versions

Identify insertion sites positions in bacterial genomes

0123

results versions

IsoSeq - Cluster - Cluster trimmed consensus sequences

01

bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi versions

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

010

bam pbi consensusreadset summary report versions

isoseq:

IsoSeq - Scalable De Novo Isoform Discovery

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq3:

IsoSeq3 - Scalable De Novo Isoform Discovery

Extract UMI and cell barcodes

010

bam pbi versions

isoseq3:

Iso-Seq - Scalable De Novo Isoform Discovery

Generate a consensus sequence from a BAM file using iVar

0100

fasta qual mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

0120

bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Call variants from a BAM file using iVar

010000

tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.

0100

report artifacts versions

jupytext:

Jupyter notebooks as plain text scripts or markdown documents

papermill:

Parameterize, execute, and analyze notebooks

nbconvert:

Parameterize, execute, and analyze notebooks

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

01

bed versions

jvarkit:

Java utilities for Bioinformatics.

Convert sam files to tsv files

01230123

tsv versions

jvarkit:

Java utilities for Bioinformatics.

Convert VCF to a user friendly table

012301

output versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Filtering VCF with dynamically-compiled java expressions

01230101010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

annotate VCF files for poly repeats

01010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Plot whole genome coverage from BAM/CRAM file as SVG

012010101

output versions

jvarkit:

Java utilities for Bioinformatics.

Taxonomic classification of metagenomic sequence data using a protein reference database

010

results versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.

010

txt versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

write your description here

0100

summary versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Merge two tab-separated output files of Kaiju and Kraken in the column format

0120

merged versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Make Kaiju FMI-index file from a protein FASTA file

01

fmi versions

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Aligns sequences using kalign

010

alignment versions

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

Create kallisto index

01

index versions

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

01010000

results json_info log versions

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

quantifies scRNA-seq data from fastq files using kb-python.

01000000

count versions matrix

kb:

kallisto and bustools are wrapped in an easy-to-use program called kb

index creation for kb count quantification of single-cell data.

000

versions index t2g cdna intron cdna_t2c intron_t2c

kb:

kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.

Creates a histogram of the number of distinct k-mers having a given frequency.

01

hist json png ps pdf jellyfish_hash versions

kat:

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts

Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.

000

reads versions

khmer:

khmer k-mer counting library

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

00

report kmers versions

khmer:

khmer k-mer counting library

Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).

01

txt versions

Generate k-mers (sketches) from FASTA/Q sequences

01

outdir info versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

01

kmcp log versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

01

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

010

profile versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

010

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Produces annotation using kofamscan against a Profile database and a KO list

0100

txt tsv versions

Adds fasta files to a Kraken2 taxonomic database

01000

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

010

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Downloads and builds Kraken2 standard database

0

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Classifies metagenomic sequence data

01000

classified_reads_fastq unclassified_reads_fastq classified_reads_assignment report versions

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Takes multiple kraken-style reports and combines them into a single report file

01

txt versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Extract reads classified at any user-specified taxonomy IDs.

0010101

extracted_kraken2_reads versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Takes a Kraken report file and prints out a krona-compatible TEXT file

01

txt versions

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Download and build (custom) KrakenUniq databases

0123

db versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Download KrakenUniq databases and related fles

0

output versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Classifies metagenomic sequence data using unique k-mer counts

012000000

classified_reads unclassified_reads classified_assignment report versions

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

KronaTools Update Taxonomy downloads a taxonomy database

NO input

db versions

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.

010

html versions

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Creates a Krona chart from text files listing quantities and lineages.

01

html versions

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Update Taxonomy downloads a taxonomy database

NO input

db versions

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Makes a dotplot (Oxford Grid) of pair-wise sequence alignments

012010

gif png versions

last:

LAST finds & aligns related regions of sequences.

Aligns query sequences to target sequences indexed with lastdb

0120

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Prepare sequences for subsequent alignment with lastal.

01

index versions

last:

LAST finds & aligns related regions of sequences.

Converts MAF alignments in another format.

010

axt_gz blast_gz blasttab_gz chain_gz gff_gz html_gz psl_gz sam_gz tab_gz versions

last:

LAST finds & aligns related regions of sequences.

Reorder alignments in a MAF file

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Post-alignment masking

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Find split or spliced alignments in a MAF file

01

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Find suitable score parameters for sequence alignment

010

param_file multiqc versions

last:

LAST finds & aligns related regions of sequences.

Align sequences using learnMSA

010

alignment versions

learnmsa:

learnMSA: Learning and Aligning large Protein Families

Bayesian reconstruction of ancient DNA fragments

01

bam fq_pass fq_fail unmerged_r1_fq_pass unmerged_r1_fq_fail unmerged_r2_fq_pass unmerged_r2_fq_fail log versions

Typing of clinical and environmental isolates of Legionella pneumophila

01

tsv versions

Index chain files for lift over

010

clft versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

Converting aligned short and long reads records from one reference to another

0101

bam versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

Uses Liftoff to accurately map annotations in GFF or GTF between assemblies of the same, or closely-related species

01000

gff3 polished_gff3 unmapped_txt versions

lima - The PacBio Barcode Demultiplexer and Primer Remover

010

counts report summary versions bam pbi fasta fastagz fastq fastqgz xml json clips guess

runs a differential expression analysis with Limma

0123012

results md_plot rdata model session_info normalised_counts versions

limma:

Linear Models for Microarray Data

Serogrouping Listeria monocytogenes assemblies

01

tsv versions

Lofreq subcommand to for insert base and indel alignment qualities

010

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments

0120

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

It predicts variants using multiple processors

01230101

vcf tbi versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Inserts indel qualities in a BAM file

0101

bam versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0123450101

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0101

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

bam log versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

vcf versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

01

scn gff versions

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain

01

gff3 scn versions

LTR_HARVEST_parallel:

A Perl wrapper for LTR_harvest

gt:

The GenomeTools genome analysis system

Identifies LTR retrotransposons using LTR_retriever

metagenomeharvestfindermgescannon_tgca

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments

01000

log lai_out versions

lai:

Assessing genome assembly quality using the LTR Assembly Index (LAI)

Identifies LTR retrotransposons using LTR_retriever

010000

log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

01

smorfs all_orfs amp_prediction readme_file log_file versions

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

0120

peak xls versions gapped bed bdg

macs2:

Model Based Analysis for ChIP-Seq data

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

0120

peak xls versions gapped bed bdg

macs3:

Model Based Analysis for ChIP-Seq data

Multiple sequence alignment using MAFFT

0101010101010

fas versions

pigz:

Parallel implementation of the gzip algorithm.

mageck count for functional genomics, reads are usually mapped to a specific sgRNA

010

count norm versions

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

maximum-likelihood analysis of gene essentialities computation

010

gene_summary sgrna_summary versions

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.

01

gene_summary sgrna_summary r_script versions

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Multiple Sequence Alignment using Graph Clustering

01010

alignment versions

magus:

Multiple Sequence Alignment using Graph Clustering

Multiple Sequence Alignment using Graph Clustering

01

tree versions

magus:

Multiple Sequence Alignment using Graph Clustering

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

000

index versions log

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

010

rma6 alignments log versions

malt:

A tool for mapping metagenomic data

Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening

0100

results versions

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

vcf tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012345601010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi diploid_sv_vcf diploid_sv_vcf_tbi somatic_sv_vcf somatic_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

0123401010

candidate_small_indels_vcf candidate_small_indels_vcf_tbi candidate_sv_vcf candidate_sv_vcf_tbi tumor_sv_vcf tumor_sv_vcf_tbi versions

manta:

Structural variant and indel caller for mapped sequencing data

Create mapAD index for reference genome

01

index versions

mapad:

An aDNA aware short-read mapper

Map short-reads to an indexed reference genome

01010000000

bam versions

mapad:

An aDNA aware short-read mapper

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

010

runtime_log fragmisincorporation_plot length_plot misincorporation lgdistribution dnacomp stats_out_mcmc_hist stats_out_mcmc_iter stats_out_mcmc_trace stats_out_mcmc_iter_summ_stat stats_out_mcmc_post_pred stats_out_mcmc_correct_prob dnacomp_genome rescaled pctot_freq pgtoa_freq fasta folder versions

Calculate Mash distances between reference and query seqeunces

010

dist versions

mash:

Fast sequence distance estimator that uses MinHash

Screens query sequences against large sequence databases

0101

screen versions

mash:

Fast sequence distance estimator that uses MinHash

Creates vastly reduced representations of sequences using MinHash

01

mash stats versions

mash:

Fast sequence distance estimator that uses MinHash

Mashmap is an approximate long read or contig mapper based on Jaccard similarity

0101

paf versions

Quickly create a tree using Mash distances

01

tree matrix versions

MaxBin is a software that is capable of clustering metagenomic contigs

0123

binned_fastas summary abundance log marker_counts unbinned_fasta tooshort_fasta marker_bins marker_genes versions

Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"

0120

maxquant_txt versions

maxquant:

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.

Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.

010101

csv versions

Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation

01

tsv fa versions

Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.

01

tif versions

mcstaging:

Staging modules for MCMICRO

Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.

01

tif versions

mcstaging:

Staging modules for MCMICRO

Create MD5 (128-bit) checksums

010

checksum versions

A tool to create consensus sequences and variant calls from nanopore sequencing data

012

assembly versions

An ultra-fast metagenomic assembler for large and complex metagenomics

012

contigs k_contigs addi_contigs local_contigs kfinal_contigs log versions

pigz:

Parallel implementation of the gzip algorithm.

Analyses a DAA file and exports information in text format

010

txt_gz megan versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Analyses an RMA file and exports information in text format

010

txt megan_summary versions

megan:

A tool for studying the taxonomic content of a set of DNA reads

Serotyping of Neisseria meningitidis assemblies

01

tsv versions

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101000

hist log_stderr versions

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

k-mer based assembly evaluation.

metameryl_dbassembly

meta versions assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_cn_hist spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_asm_hist assembly_qv scaffold_qv read_ploidy

A script to generate hap-mer dbs for trios

0100

mat_hapmer_meryl pat_hapmer_meryl inherited_hapmers_fl_png inherited_hapmers_ln_png inherited_hapmers_st_png versions

merqury:

Evaluate genome assemblies with k-mers and more.

k-mer based assembly evaluation.

012

assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_hist spectra_cn_ln_png spectra_cn_st_png spectra_asm_fl_png spectra_asm_hist spectra_asm_ln_png spectra_asm_st_png assembly_qv scaffold_qv read_ploidy hapmers_blob_png versions

merqury:

Evaluate genome assemblies with k-mers and more.

A reimplemenation of Kat Comp to work with FastK databases

01234

filled_png line_png stacked_png filled_pdf line_pdf stacked_pdf versions

merquryfk:

FastK based version of Merqury

A reimplemenation of KatGC to work with FastK databases

012

filled_gc_plot_png filled_gc_plot_pdf line_gc_plot_png line_gc_plot_pdf stacked_gc_plot_png stacked_gc_plot_pdf versions

merquryfk:

FastK based version of Merqury

FastK based version of Merqury

0123400

stats bed assembly_qv spectra_cn_fl spectra_cn_ln spectra_cn_st qv spectra_asm_fl spectra_asm_ln spectra_asm_st phased_block_bed phased_block_stats continuity_N block_N block_blob hapmers_blob versions

merquryfk:

FastK based version of Merqury

An improved version of Smudgeplot using FastK

012

filled_ploidy_plot_png filled_ploidy_plot_pdf line_ploidy_plot_png line_ploidy_plot_pdf stacked_ploidy_plot_png stacked_ploidy_plot_pdf versions

merquryfk:

FastK based version of Merqury

A genomic k-mer counter (and sequence utility) with nice features.

010

meryl_db versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

010

hist versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

010

meryl_db versions

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

Depth computation per contig step of metabat2

012

depth versions

metabat2:

Metagenome binning

Metagenome binning of contigs

012

tooshort lowdepth unbinned membership fasta versions

metabat2:

Metagenome binning

Annotation of eukaryotic metagenomes using MetaEuk

010

faa codon tsv gff versions

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Strain-level metagenomic assignment

012340

wimp evidence_unknown_species reads2taxon em contig_coverage length_and_id krona versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Maps long reads to a metamaps database

010

classification_res meta_file meta_unmappedreadsLengths para_file versions

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

Build MetaPhlAn database for taxonomic profiling.

NO input

db versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn4

01

txt versions

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn3

01

txt versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Extracts per-base methylation metrics from alignments

01200

bedgraph methylkit versions

methyldackel:

Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.

Generates methylation bias plots from alignments

01200

txt versions

methyldackel:

Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.

A tool to estimate bacterial species abundance

0100

results versions

midas:

An integrated pipeline for estimating strain-level genomic variation from metagenomic data

marks duplicate spots along gridline edges.

01

marked_dups_spots versions

mindagap:

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

01

tiff versions

mindagap:

Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.

Minia is a short-read assembler based on a de Bruijn graph

01

contigs unitigs h5 versions

A very fast OLC-based de novo assembler for noisy long reads

012

gfa assembly versions

A versatile pairwise aligner for genomic and spliced nucleotide sequences

01010000

paf bam index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Provides fasta index required by minimap2 alignment.

01

index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0101

paf gff versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

Provides fasta index required by miniprot alignment.

01

index versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

miRanda is an algorithm for finding genomic targets for microRNAs

010

txt versions

miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.

0101

outputs versions

mirdeep2:

miRDeep2 Mapper (mapper.pl) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.

miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.

012010123

outputs versions

mirdeep2:

miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs. It includes three main scripts: miRDeep2.pl, mapper.pl, and quantifier.pl for comprehensive miRNA detection and quantification.

mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.

0101012

tsv versions

mirtop:

Small RNA-seq annotation

mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package

0101012

tsv fasta vcf versions

mirtop:

Small RNA-seq annotation

mirtop gff generates the GFF3 adapter format to capture miRNA variations

0101012

gff versions

mirtop:

Small RNA-seq annotation

mirtop gff gets the number of isomiRs and miRNAs annotated in the GFF file by isomiR category.

01

txt log versions

mirtop:

Small RNA-seq annotation

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

Download a mitochondrial genome to be used as reference for MitoHiFi

01

fasta gb versions

findMitoReference.py:

Fetch mitochondrial genome in Fasta and Genbank format from NCBI

A python workflow that assembles mitogenomes from Pacbio HiFi reads

010000

fasta stats gb gff all_potential_contigs contigs_annotations contigs_circularization contigs_filtering coverage_mapping coverage_plot final_mitogenome_annotation final_mitogenome_choice final_mitogenome_coverage potential_contigs reads_mapping_and_assembly shared_genes versions

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

Run Torsten Seemann's classic MLST on a genome assembly

01

tsv versions

Cluster sequences using MMSeqs2 cluster.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create an MMseqs database from an existing FASTA/Q file

01

db versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Creates sequence index for mmseqs database

01

db_indexed versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create a tsv file from a query and a target database as well as the result database

010101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Download an mmseqs-formatted database

0

database versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a databse using MMseqs2

0101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences in linear time using MMSeqs2 linclust.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Search and calculate a score for similar sequences in a query and a target database.

0101

db_search versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Computes the lowest common ancestor by identifying the query sequence homologs against the target database.

010

db_taxonomy versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Conversion of expandable profile to databases to the MMseqs2 databases format

0

db_exprofile versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

A tool to reconstruct plasmids in bacterial assemblies

01

chromosome contig_report plasmids mobtyper_results versions

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

A bioinformatics tool for working with modified bases

0120101

bed bedgraph log versions

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data

Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.

01

img_clahe versions

molkartgarage:

One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.

Calculates genome-wide sequencing coverage.

012301

global_txt summary_txt regions_txt per_base_d4 per_base_bed per_base_csi regions_bed regions_csi quantized_bed quantized_csi thresholds_bed thresholds_csi versions

Download the mOTUs database

0

db versions

motus:

The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

Taxonomic meta-omics profiling using universal marker genes

0100

txt biom versions

motus:

Marker gene-based OTU (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

010

out versions

motus:

Marker gene-based operational taxonomic unit (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

010

out bam mgc log versions

motus:

Marker gene-based OTU (mOTU) profiling

Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data

0123456

output output_dis output_germline output_somatic versions

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

Scan a reference genome to get microsatellite & homopolymer information

01

txt versions

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

msisensor2 detection of MSI regions.

01234500

msi distribution somatic versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

msisensor2 detection of MSI regions.

00

scan versions

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

01234500

output_report output_dis output_germline output_somatic versions

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

01

list versions

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

Aligns protein structures using mTM-align

010

alignment structure versions

mTM-align:

Algorithm for structural multiple sequence alignments

pigz:

Parallel implementation of the gzip algorithm.

A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.

010

mtnucratio json versions

Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.

01000

bam rad versions

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.

0

index versions

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Aggregate results from bioinformatics analyses across many samples into a single report

000000

report data plots versions

multiqc:

MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

010101010000001

full_alignment info_txt snp_alignment snp_genome_alignment snpstatistics snptable snptable_snpeff snptable_uncertainty structure_genotypes structure_genotypes_nomissing json versions

MUMmer is a system for rapidly aligning entire genomes

012

coords versions

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

01

aligned_fasta phyi phys clustalw html msf tree log versions

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

010

alignment versions

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

Fetch the GO concepts for a list of genes

01

gmt tsv versions

AMR predictions for supported species

010

csv json versions

mykrobe:

Antibiotic resistance prediction in minutes

Compare multiple runs of long read sequencing data and alignments

01

report_html lengths_violin_html log_length_violin_html n50_html number_of_reads_html overlay_histogram_html overlay_histogram_normalized_html overlay_log_histogram_html overlay_log_histogram_normalized_html total_throughput_html quals_violin_html overlay_histogram_identity_html overlay_histogram_phredscore_html percent_identity_violin_html active_pores_over_time_html cumulative_yield_plot_gigabases_html sequencing_speed_over_time_html stats_txt versions

Filtering and trimming of Oxford Nanopore Sequencing data

010

filtreads log_file versions

DNA contaminant removal using NanoLyse

010

fastq log versions

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

insertions insertions_index deletions deletions_index rearrangements rearrangements_index bp_info bp_info_index versions

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

01

html png txt log versions

Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.

010

stats reads versions

Performs fastq alignment to a reference using NARFMAP

0101010

bam log versions

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

01

hashmap versions

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

A tool to quickly download assemblies from NCBI's Assembly database

0000

gbk fna rm features gff faa gpff wgs_gbk cds rna rna_fna report stats versions

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

0101

vecscreen_output versions

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

00

dataset versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

010

csv csv_errors csv_insertions tsv json json_auspice ndjson fasta_aligned fasta_translation nwk versions

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

Performs fastq alignment to a fasta reference using NextGenMap

010

bam versions

bwa:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime

Serotyping Neisseria gonorrhoeae assemblies

01

tsv versions

Merging paired-end reads and removing sequencing adapters.

01

merged_reads unstitched_read1 unstitched_read2 versions

Determines the gender of a sample from the BAM/CRAM file.

01201010

tsv versions

ngsbits:

Short-read sequencing tools

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.

0101

vaf versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.

010101

corr_matrix matched all pdf vcf versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

010101

pt versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

01

pdf corr_matrix all matched versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

write your description here

metareadsformatmode

meta versions npa npc npl npo

Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file

01

png versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Calculate metagenome redundancy curve from FASTQ files

0100

npa npc npl npo versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files

01

json tsv csv pdf versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image

01

png versions

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

012

delta coords versions

An nf-core module for the OATK

010123401234

mito_fasta pltd_fasta mito_bed pltd_bed mito_gfa pltd_gfa annot_mito_txt annot_pltd_txt clean_gfa final_gfa initial_gfa multiplex_gfa unzip_gfa versions

Construct a dynamic succinct variation graph in ODGI format from a GFAv1.

01

og versions

odgi:

An optimized dynamic genome/graph implementation

Draw previously-determined 2D layouts of the graph with diverse annotations.

012

png versions

odgi:

An optimized dynamic genome/graph implementation

Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.

01

lay tsv versions

odgi:

An optimized dynamic genome/graph implementation

Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.

01

sorted_graph versions

odgi:

An optimized dynamic genome/graph implementation

Squeezes multiple graphs in ODGI format into the same file in ODGI format.

01

graph versions

odgi:

An optimized dynamic genome/graph implementation

Metrics describing a variation graph and its path relationship.

01

tsv yaml versions

odgi:

An optimized dynamic genome/graph implementation

Merge unitigs into a single node preserving the node order.

01

unchopped_graph versions

odgi:

An optimized dynamic genome/graph implementation

Project a graph into other formats.

01

gfa versions

odgi:

An optimized dynamic genome/graph implementation

Visualize a variation graph in 1D.

01

png versions

odgi:

An optimized dynamic genome/graph implementation

Calls CNVs in bam files from tumor patients

0123400

png profile summary versions

Create a decoy peptide database from a standard FASTA database.

01

decoy_fasta versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Filters peptide/protein identification results by different criteria.

012

filtered versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Calculates a distribution of the mass error from given mass spectra and IDs.

012

frag_err prec_err versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Merges several idXML files into one idXML file.

01

idxml versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Split a merged identification file into their originating identification files

01

idxmls versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Switches between different scores of peptide or protein hits in identification data

01

idxml versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)

01

mzml versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Refreshes the protein references for all peptide hits.

0101

id_file_pi versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Annotates MS/MS spectra using Comet.

012

idxml pin versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Perform HLA-I typing of sequencing data

012

hla_type coverage_plot versions

OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.

0101

orthofinder working versions

A program to convert bam into paf.

01

paf versions

paftools:

A program to manipulate paf files / convert to and from paf.

a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates

01

index versions

Find and remove PCR/optical duplicates

01

pairs stat versions

pairtools:

CLI tools to process mapped Hi-C data

Flip pairs to get an upper-triangular matrix

010

flip versions

pairtools:

CLI tools to process mapped Hi-C data

Merge multiple pairs/pairsam files

01

pairs versions

pairtools:

CLI tools to process mapped Hi-C data

Find ligation junctions in .sam, make .pairs

010

pairsam stat versions

pairtools:

CLI tools to process mapped Hi-C data

Assign restriction fragments to pairs

010

restrict versions

pairtools:

CLI tools to process mapped Hi-C data

Select pairs according to given condition by options.args

01

selected unselected versions

pairtools:

CLI tools to process mapped Hi-C data

Sort a .pairs/.pairsam file

01

sorted versions

pairtools:

CLI tools to process mapped Hi-C data

Split a .pairsam file into .pairs and .sam.

01

pairs bam versions

pairtools:

CLI tools to process mapped Hi-C data

Calculate pairs statistics

01

stats versions

pairtools:

CLI tools to process mapped Hi-C data

Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file

01000

tsv versions

panacus:

panacus is a tool for computing counting statistics for GFA files

Create visualizations from a tsv coverage histogram created with panacus.

01

image versions

panacus:

panacus is a tool for computing counting statistics for GFA files

A fast and scalable tool for bacterial pangenome analysis

01

results aln versions

panaroo:

panaroo - an updated pipeline for pangenome investigation

Phylogenetic Assignment of Named Global Outbreak LINeages

01

report versions

NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).

0123401

bam bai versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

0123

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

012301

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.

010101010

bam bai bqsr_table versions qc_metrics duplicate_metrics

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

0101010

bam bai qc_metrics bqsr_table duplicate_metrics versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs

0101

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

012301

vcf versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.

01

gvcf_index versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

0123450100

vcf stats versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Paraclu finds clusters in data attached to sequences.

010

bed versions

Determines the depth in a BAM/CRAM file

0120101

depth binned_depth versions

paragraph:

Graph realignment tools for structural variants

Genotype structural variants using paragraph and grmpy

0123450101

vcf json versions

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

0101

graph versions

paragraph:

Graph realignment tools for structural variants

HiFi-based caller for highly homologous genes

0120101

json bam bai vcf vcf_index versions

Serogroup Pseudomonas aeruginosa assemblies

01

tsv blast details versions

The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.

01

bam pbi versions

pbbam:

PacBio BAM C++ library

Pacbio ccs - Generate Higly Accurate Single-Molecule Consensus Reads

01200

bam pbi report_txt report_json metrics versions

Alignment with PacBio's minimap2 frontend

0101

bam versions

pbmm2:

A minimap2 frontend for PacBio native data formats

Assign PBP type of Streptococcus pneumoniae assemblies

010

tsv blast versions

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

vcf versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

pbsv - PacBio structural variant (SV) signature discovery tool

0101

svsig versions

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq

012

fastq versions

pbtk:

pbtk - PacBio BAM toolkit

Minimalistic tool which creates an index file that enables random access into PacBio BAM files

01

pbi versions

pbtk:

pbtk - PacBio BAM toolkit

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.

01

assembled unassembled discarded versions

Manipulation, validation and exploration of pedigrees

0120

html csv ped png versions

Runs PEKA CLIP peak k-mer analysis

0101000

cluster distribution rtxn pdf tsites oxn clust versions

"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."

01

spp pdf rdata versions

Install databases necessary for Pharokka's functional analysis

NO input

pharokka_db versions

pharokka:

Fast Phage Annotation Program

Functional annotation of phages

010

cds_final_merged_output cds_functions length_gc_cds_density card vfdb mash reoriented versions

pharokka:

Fast Phage Annotation Program

Predict prophages in bacterial genomes

01

coordinates gbk log information bacteria_fasta bacteria_gbk phage_fasta phage_gbk prophage_gff prophage_tbl prophage_tsv versions

phispy:

Prophage finder using multiple metrics

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

0100

results versions

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates an interval list from a bed file and a reference dict

0101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

01234010101

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the insert size distribution of a paired-end library.

01

metrics histogram versions

picard:

Java tools for working with NGS data in the BAM format

Collect multiple metrics from a BAM file

0120101

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics from a RNAseq BAM file

01000

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

01201010

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates a sequence dictionary for a reference sequence.

01

reference_dict versions

picard:

Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

Checks that all data in the set of input files appear to come from the same individual

01234501

crosscheck_metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.

0120000

vcf tbi versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Converts a FASTQ file to an unaligned BAM or SAM file.

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

0120

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Verify mate-pair information between mates and fix if needed

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Lifts over a VCF file from one reference build to another.

01010101

vcf_lifted vcf_unlifted versions

picard:

Move annotations from one assembly to another

Locate and tag duplicate reads in a BAM file

010101

bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Merges multiple BAM files into a single file

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

01

vcf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases

010101

intervals versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts BAM/SAM files based on a variety of picard specific criteria

010

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts vcf files

010101

vcf versions

picard:

Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format

Compresses files with pigz.

01

archive versions

pigz:

Parallel implementation of the gzip algorithm.

write your description here

01

file versions

pigz:

Parallel implementation of the gzip algorithm.

Automatically improve draft assemblies and find variation among strains, including large event detection

010120

improved_assembly vcf change_record tracks_bed tracks_wig versions

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

012000

bp cem del dd int_final inv li rp si td versions

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Main caller script for peak calling

0120

divergent_TREs bidirectional_TREs unidirectional_TREs peakcalling_log versions

pints:

Peak Identifier for Nascent Transcripts Starts (PINTS)

Pangenome toolbox for bacterial genomes

01

results aln versions

Identify plasmids in bacterial sequences and assemblies

01

json txt tsv genome_seq plasmid_seq versions

assembles bacterial plasmids

010

html tab images logs data database fasta_files kmer versions

Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data

01234000

vcf tbi log version

Analyses binary variant call format (BCF) files using plink

01

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

epi episummary log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Exclude variant identifiers from plink bfiles

01234

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Subset plink bfiles with a text file of variant identifiers

01234

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

fepi fepisummary flog fnosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate GWAS association studies

0123010101

assoc log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate Hardy-Weinberg statistics for provided input

01230101

hwe versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.

0123000

prunein pruneout versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.

0123000

prunein pruneout versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

LD analysis in PLINK examines genetic variant associations within populations

0123010101

ld log nosex versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Recodes plink bfiles into a new text fileset applying different modifiers

0123

ped map txt raw traw beagledat chrdat chrmap geno pheno pos phase info lgen list gen gengz sample rlist strctin tped tfam vcf vcfgz versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Analyses variant calling files using plink

01

bed bim fam versions

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Subset plink pfiles with a text file of variant identifiers

01234

extract_pgen extract_psam extract_pvar versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Apply a scoring system to each sample in a plink 2 fileset

01230

score versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Import variant genetic data using plink2

01

pgen psam pvar pvar_zst versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

pmdtools command to filter ancient DNA molecules from others

01200

bam versions

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

xml txt versions

PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.

0101012

vcf max_cov bad_sites versions

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.

0123

demuxlet_result versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.

012

cel plp var umi versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.

012

result vcf lmix singlet_result singlet_vcf versions

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

01

reads log versions

Adapter removal and demultiplexing of Oxford Nanopore reads

01

reads log versions

porechop:

Adapter removal and demultiplexing of Oxford Nanopore reads

Software for predicting library complexity and genome coverage in high-throughput sequencing

01

c_curve log versions

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Software for predicting library complexity and genome coverage in high-throughput sequencing

01

lc_extrap log versions

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Calculate pairwise nucleotide identity with respect to a reference sequence

01010

valid_fasta invalid_fasta report log versions

Filter reads by quality score.

01

reads logs versions log_tab

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

converts sam/bam/cram/pairs into genome contact map

01012

pretext versions

a module to generate images from Pretext contact maps.

01

image versions

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

good_reads single_reads bad_reads log versions

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

010

gene_annotations nucleotide_fasta amino_acid_fasta all_gene_annotations versions

Whole genome annotation of small genomes (bacterial, archeal, viral)

0100

gff gbk fna faa ffn sqn fsa tbl err log txt tsv versions

frame-shift correction for long read (meta)genomics - fix frameshifts in reads

0101

out_fa versions

proovframe:

frame-shift correction for long read (meta)genomics

frame-shift correction for long read (meta)genomics - maps proteins to reads

012

tsv versions

proovframe:

frame-shift correction for long read (meta)genomics

Perform Gene Ratio Enrichment Analysis

0101

enrichedGO versions session_info

grea:

Gene Ratio Enrichment Analysis

Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)

01

logratio session_info versions

propr:

Logratio methods for omics data

Perform differential proportionality analysis

0101

propd results fdr adj warnings session_info versions

propr:

Logratio methods for omics data

Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.

01

propr matrix fdr adj warnings session_info versions

propr:

Logratio methods for omics data

corpcor:

Efficient Estimation of Covariance and (Partial) Correlation

Proteinortho is a tool to detect orthologous genes within different species.

01

orthologgroups orthologgraph blastgraph versions

reads a maxQuant proteinGroups file with Proteus

012

dendro_plot mean_var_plot raw_dist_plot norm_dist_plot raw_rdata norm_rdata raw_tab norm_tab session_info versions

proteus:

R package for analysing proteomics data

PureCLIP is a tool to detect protein-RNA interaction footprints from single-nucleotide CLIP-seq data, such as iCLIP and eCLIP.

012012010

crosslinks peaks versions

Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.

0120

txt png loess_qc_txt loess_txt versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Generate on and off-target intervals for PureCN from a list of targets

01010

txt bed versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.

012300

rds png bias_rds bias_bed low_cov_bed versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Run PureCN workflow to normalize, segment and determine purity and ploidy

01200

pdf local_optima_pdf seg genes_csv amplification_pvalues_csv vcf_gz variants_csv loh_csv chr_pdf segmentation_pdf multisample_seg versions

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Calculate coverage cutoffs to determine when to purge duplicated sequence.

01

cutoff log versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Separates out sequences purged of falsely duplicated sequences.

012

haplotigs purged versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Plots the read coverage from a purge dups statistics file and cutoffs.

012

png versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Create read depth histogram and base-level read depth for an assembly based on pacbio data

01

stat basecov versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Purge haplotigs and overlaps for an assembly

0123

bed log versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Split fasta file by 'N's to aid in self alignment for duplicate purging

01

split_fasta versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

write your description here

01

html json versions

Damage parameter estimation for ancient DNA

012

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Damage parameter estimation for ancient DNA

01

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Compute summary statistics for control gene from BAM files.

01200

control_stats versions

pypgx:

A Python package for pharmacogenomics research

Call SNVs/indels from BAM files for all target genes.

0120100

vcf tbi versions

pypgx:

A Python package for pharmacogenomics research

Prepare a depth of coverage file for all target genes with SV from BAM files.

01200

coverage versions

pypgx:

A Python package for pharmacogenomics research

Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.

010

annotations fna faa score versions

Demultiplexer for Nanopore samples

010

reads versions

Evaluate alignment data

010

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

012000

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

0101

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Render a Quarto notebook, including parametrization.

01000

html notebook artifacts params_yaml extensions versions

papermill:

Parameterize, execute, and analyze notebooks

Quality Assessment Tool for Genome Assemblies

010101

results tsv transcriptome misassemblies unaligned versions

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

0123456789101101201

vcf tbi rdata plots versions

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

Consensus module for raw de novo DNA assembly of long uncorrected reads

0123

improved_assembly versions

Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.

0

stockholm_alignment phylogeny versions

Randomly subsample sequencing reads to a specified coverage

0120

reads versions

De novo genome assembler for long uncorrected reads.

01

fasta gfa versions

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.

0

phylogeny phylogeny_bootstrapped versions

Create a database for RepeatModeler

01

db versions

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

Performs de novo transposable element (TE) family identification with RepeatModeler

01

fasta stk log versions

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

01200

json disinfinder_kma pheno_table_species pheno_table pointfinder_kma pointfinder_prediction pointfinder_results pointfinder_table resfinder_hit_in_genome_seq resfinder_blast resfinder_kma resfinder_resistance_gene_seq resfinder_results_table resfinder_results_tab resfinder_results versions

resfinder:

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data

0

db tool_version db_version versions

rgi:

This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.

Predict antibiotic resistance from protein or nucleotide data

0100

json tsv tmp tool_version db_version versions

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Markup VCF file using rho-calls.

012010

vcf versions

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Call regions of homozygosity and make tentative UPD calls

0101

bed wig versions

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Quality control of riboseq bam data

012012012010101

predictions all transprofile versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Quality control of riboseq bam data

01201

distribution pdf offset versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Accurate detection of short and long active ORFs using Ribo-seq data

01201

protocol bam_summary read_length_dist metagene_profile_5p metagene_profile_3p metagene_plots psite_offsets pos_wig neg_wig orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Accurate detection of short and long active ORFs using Ribo-seq data

012

candidate_orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data

010101

best_offset offset offset_plot psites codon_coverage_rpf codon_coverage_psite cds_coverage cds_window_coverage ribowaltz_qc versions

Render an rmarkdown notebook. Supports parametrization.

0100

report parameterised_notebook artifacts session_info versions

rmarkdown:

Dynamic Documents for R

Calculate pan-genome from annotated bacterial assemblies in GFF3 format

01

results aln versions

Ribosomal RNA extraction from a GTF file.

0

rrna_gtf versions

Calculate expression with RSEM

010

counts_gene counts_transcript stat logs versions bam_star bam_genome bam_transcript

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Prepare a reference genome for RSEM

00

index transcript_fasta versions

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Generate statistics from a bam file

01

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Infer strandedness from sequencing reads

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate inner distance between read pairs.

010

distance freq mean pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

010

xls rscript log bed interact_bed pdf events_pdf versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

010

pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

010

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate read duplication rate

01

seq_xls pos_xls pdf rscript versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculte TIN (transcript integrity number) from RNA-seq reads

0120

txt xls versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

sdf versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Converts a PED file to VCF headers

01

output versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.

01

png svg versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set

012345601

tp_vcf tp_tbi fn_vcf fn_tbi fp_vcf fp_tbi baseline_vcf baseline_tbi snp_roc non_snp_roc weighted_roc summary phasing versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Uses the RTN R package for transcriptional regulatory network inference (TNI).

01

tni tni_perm tni_bootstrap tni_filtered versions

rtn:

RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons

sage is a search software for proteomics data

010101

results_tsv results_json results_pin versions tmt_tsv lfq_tsv

sageproteomics:

Proteomics searching so fast it feels like magic.

Create index for salmon

00

index versions

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

gene/transcript quantification with Salmon

0100000

results json_info lib_format_counts versions

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

SALSA, A tool to scaffold long read assemblies with HiC

0120000

fasta agp agp_original_coordinates versions

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

0120

csv json bam versions

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

Outputs some statistics drawn from read flags.

01

stats versions

sambamba:

Tools for working with SAM/BAM data

find and mark duplicate reads in BAM file

01

bam bai versions

sambamba:

process your BAM data faster!

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

Module to validate illumina® Sample Sheet v2 files.

010

samplesheet versions

Clips read alignments where they match BED file defined regions

01000

bam stats rejects_bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

010

reads versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

calculates MD and NM tags

0101

bam versions

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Concatenate BAM or CRAM file

01

bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

shuffles and groups reads together by their names

0101

bam cram sam versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format

01010

fastq fastq_interleaved fastq_other fastq_singleton versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Produces a consensus FASTA/FASTQ/PILEUP

01

fasta fastq pileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

convert and then index CRAM -> BAM or BAM -> CRAM file

0120101

bam cram bai crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

produces a histogram or table of coverage per chromosome

0120101

coverage versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

List CRAM Content-ID and Data-Series sizes

01

size versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Computes the depth at each position or region.

0101

tsv versions

samtools:

Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region

Create a sequence dictionary file from a FASTA file

01

dict versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index FASTA file

0101

fa fai gzi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Converts a SAM/BAM/CRAM file to FASTA

010

fasta interleaved singleton other versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Converts a SAM/BAM/CRAM file to FASTQ

010

fastq interleaved singleton other versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

012

flagstat versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

01

readgroup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Reports alignment summary statistics for a BAM/CRAM/SAM file

012

idxstats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

converts FASTQ files to unmapped SAM/BAM/CRAM

01

sam bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index SAM/BAM/CRAM file

01

bai csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

mark duplicate alignments in a coordinate sorted file

0101

bam cram sam versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Merge BAM or CRAM file

010101

bam cram csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

BAM

0120

mpileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.

01

bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

0101

bam cram csi crai metrics versions

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Sort SAM/BAM/CRAM file

0101

bam cram crai csi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces comprehensive statistics from SAM/BAM/CRAM file

01201

stats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

012010

bam cram sam bai csi crai unselected unselected_index versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

SCIMAP is a suite of tools that enables spatial single-cell analyses

01

csv h5ad versions

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data.

SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.

01

spatial_lda_output composition_plot motif_location_plot versions

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.

Use pangenome outputs for GWAS

0120

csv versions

The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier

0100

meis_tab dels_tab vcf versions

scramble:

Soft Clipped Read Alignment Mapper

The cluster_identifier tool of Scramble identifies soft clipped clusters

0120

clusters versions

scramble:

Soft Clipped Read Alignment Mapper

Module to use scAR to remove ambient RNA from single-cell RNA-seq data

012

h5ad versions

scvitools:

scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data

scar:

scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.

Detect doublets in single-cell RNA-Seq data

01

h5ad predictions versions

scvitools:

A scalable toolkit for probabilistic modeling applied to single-cell omics data

Call peaks using SEACR on sequenced reads in bedgraph format

0120

bed versions

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

0100

alignment trans_alignments multi_bed single_bed versions

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

Generate genome indices for segemehl align

0

index versions

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

metagenomic binning with self-supervised learning

012

csv model output_fasta recluster_fasta tsv versions

semibin:

Metagenomic binning with semi-supervised siamese neural network

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Create BWA index for reference genome

01

index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

01010101

bam_and_bai versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

012012010101

metrics summary versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the GATK DepthOfCoverage tool.

01201010101

per_locus sample_summary statistics coverage_counts coverage_proportions interval_summary versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects multiple quality metrics from a bam file

01201010

mq_metrics qd_metrics gc_summary gc_metrics aln_metrics is_metrics mq_plot qd_plot is_plot gc_plot versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

0120101

cram crai bam bai score metrics metrics_multiqc_tsv versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

vcf index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

01230101010101000

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

012301010101

vcf_gz vcf_gz_tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

012340101010100

vcf vcf_tbi gvcf gvcf_tbi versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Generate recalibration table and optionally perform base quality recalibration

01201010101010

table table_post recal_alignment csv pdf versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

0120101

output index output_index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

01234560101

vcf vcf_tbi stats versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

01230101010101010100

orientation_data contamination_data contamination_segments stats vcf index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

012010101201201201

vcf index versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

01200000

recal idx tranches plots versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects whole genome quality metrics from a bam file

012010101

wgs_metrics versions

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.

01

fastq versions

seqcluster:

Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.

Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.

01

fasta versions

seqfu:

DNA sequence utilities for FASTX files

Statistics for FASTA or FASTQ files

01

stats multiqc versions

seqfu:

Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files

Concatenating multiple uncompressed sequence files together

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTQ to FASTA format

01

fasta versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTA/Q to tabular format, and provide various information, like sequence length, GC content/GC skew.

01

text versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Select sequences from a large file based on name/ID

010

filter versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

match up paired-end reads from two fastq files

01

reads unpaired_reads versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Use seqkit to find/replace strings within sequences and sequence headers

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx log versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Use seqkit to generate sliding windows of input fasta

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Sorts sequences by id/name/sequence/length

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Split single or paired-end fastq.gz files

01

reads versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

simple statistics of FASTA/Q files

01

stats versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert tabular format (first two/three columns) to FASTA/Q format.

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Translate DNA/RNA to protein sequence

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Salmonella serotype prediction from reads and assemblies

01

log tsv txt versions

Generates a BED file containing genomic locations of lengths of N.

01

bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Rename sequence names in FASTQ or FASTA files.

01

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.

Subsample reads from FASTQ files

012

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Common transformation operations on FASTA or FASTQ files.

01

fastx versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

010

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Trim low quality bases from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Sequence quality metrics for FASTQ and uBAM files.

01

json html versions

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

0100

eigenstrat plink freqsum versions

sequencetools:

Tools for population genetics on sequencing data

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

01200

seqz versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

wig versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Induce a variation graph in GFA format from alignments in PAF format

012

gfa versions

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

tsv txt versions

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

log read_qual breakpoints_double read_alignments read_ids collapsed_dup loh all_vcf all_breakpoints_clusters_list all_breakpoints_clusters all_plots somatic_vcf somatic_breakpoints_clusters_list somatic_breakpoints_clusters somatic_plots versions

Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.

010

json tsv versions

Demultiplex bgzip'd fastq files

012

sample_fastq metrics most_frequent_unmatched per_project_metrics per_sample_metrics sample_barcode_hop_metrics versions

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.

012

merged_variants versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.

0123401201201

phased_variant versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.

01234012301

phased_variant versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Program to compute switch error rate and genotyping error rate given simulated or trio data.

01234012012

errors versions

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

01

assembly gfa results versions

Print SHA256 (256-bit) checksums.

01

checksum versions

md5sum:

Create an SHA256 (256-bit) checksum.

Determine Shigella serotype from Illumina or Oxford Nanopore reads

01

tsv hits versions

Determine Shigella serotype from assemblies or Illumina paired-end reads

01

tsv versions

build and deploy Shiny apps for interactively mining differential abundance data

01230120

app versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make plots for interpretation of differential abundance statistics

010123

volcanos_png volcanos_html versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots

0123

boxplots_png boxplots_html densities_png densities_html pca2d_png pca2d_html pca3d_png pca3d_html mad_png mad_html dendro versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

validate consistency of feature and sample annotations with matrices and contrasts

0120101

sample_meta feature_meta assays contrasts versions

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Assemble bacterial isolate genomes from Illumina paired-end reads

01

contigs corrections log raw_contigs gfa versions

A windowed adaptive trimming tool for FASTQ files using quality

012

single_trimmed paired_trimmed singleton_trimmed log versions

Indexing of transcriptome for gene expression quantification using SimpleAF

010101

index transcript_tsv salmon versions

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

0120101001

results versions

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

Serovar prediction of salmonella assemblies

01

tsv allele_fasta allele_json cgmlst_csv versions

Fast, efficient, lossless compression of FASTQ files.

01

sfq versions

tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.

012

smncopynumber run_metrics versions

Linearize and simplify variation graph in GFA format using blocked partial order alignment

01

gfa maf versions

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developped by Brent Pedersen.

01230101

vcf versions

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."

0101

outputs snakemake_dir versions

Performs fastq alignment to a fasta reference using SNAP

0101

bam bai versions

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

Create a SNAP index for reference genome

01234

index versions

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

structural-variant calling with sniffles

012010100

vcf tbi snf versions

Core-SNP alignment from Snippy outputs

0120

aln full_aln tab vcf txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Rapid haploid variant calling

010

tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Pairwise SNP distance matrix from a FASTA sequence alignment

01

tsv versions

Genetic variant annotation and functional effect prediction toolbox

012

cache versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Genetic variant annotation and functional effect prediction toolbox

01001

vcf report summary_html genes_txt versions

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Annotate a VCF file with another VCF file

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Splits/Joins VCF(s) file into chromosomes

01

out_vcfs versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Rapidly extracts SNPs from a multi-FASTA alignment.

0

fasta constant_sites versions constant_sites_string

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

extract versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Local sequence alignment tool for filtering, mapping and clustering.

010101

reads log index versions

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Classifies and predicts the origin of metagenomic samples

010000

report versions

Compare many FracMinHash signatures generated by sourmash sketch.

01000

matrix labels csv versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

0100000

result unassigned matches prefetch prefetchcsv versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.

010

signature_index versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a signature (a group of FracMinHash sketches) of a sequence using sourmash

01

signatures versions

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.

010

result versions

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data

0123456700

outs versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.

0

gtf versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.

000

reference versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Assembles a small genome (bacterial, fungal, viral)

012300

scaffolds contigs transcripts gene_clusters gfa warnings log versions

Computational method for finding spa types.

0100

tsv versions

split one ubam into multiple, per line, fast

01

bam versions

Spotiflow, accurate and efficient spot detection with stereographic flow.

01

spots versions

Fast, efficient, lossless compression of FASTQ files.

012

spring versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Fast, efficient, lossless decompression of FASTQ files.

010

fastq versions

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).

0100

reads versions

sratools:

SRA Toolkit and SDK from NCBI

Download sequencing data from the NCBI Sequence Read Archive (SRA).

0100

sra versions

sratools:

SRA Toolkit and SDK from NCBI

Test for the presence of suitable NCBI settings or create them on the fly.

NO input

versions ncbi_settings

sratools:

SRA Toolkit and SDK from NCBI

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

012

gene_results fullgene_results mlst_results pileup sorted_bam versions

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Serotype prediction of Streptococcus suis assemblies

01

tsv versions

Advanced sequence file format conversions

01000

cram gzi versions

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Predicts Staphylococcus aureus SCCmec type based on primers.

01

tsv versions

Align reads to a reference genome using STAR

010101000

log_final log_out log_progress versions bam bam_sorted bam_sorted_aligned bam_transcript bam_unsorted fastq tab spl_junc_tab read_per_gene_tab junction sam wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create index for STAR

0101

index versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

01

results_xlsx summary_tsv detailed_summary_tsv resfinder_tsv plasmidfinder_tsv mlst_tsv settings_txt pointfinder_tsv versions

staramr:

Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.

Cell and nuclear segmentation with star-convex shapes

01

mask versions

Create a counts matrix for single-cell data using STARSolo, handling cell barcodes and UMI information.

012001

counts log_final log_out log_progress summary versions

Serotype STEC samples from paired-end reads or assemblies

01

tsv versions

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

012301234560120

input rdata plots vcf bgen versions

Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.

0101

vcf versions

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

0123400

vcf vcf_tbi genome_vcf genome_vcf_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

01234567800

vcf_indels vcf_indels_tbi vcf_snvs vcf_snvs_tbi versions

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Merges the annotation gtf file and the stringtie output gtf files

00

gtf versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Transcript assembly and quantification for RNA-Se

010

transcript_gtf abundance coverage_gtf ballgown versions

stringtie2:

Transcript assembly and quantification for RNA-Seq

Count reads that map to genomic features

012

counts summary versions

featurecounts:

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

SummarizedExperiment container

010101

rds log versions

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Converts a bedpe file ot a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multipe stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

sv indel germ_indel germ_sv som_indel som_sv unfiltered_sv unfiltered_indel unfiltered_germ_indel unfiltered_germ_sv unfiltered_som_indel unfiltered_som_sv raw_calls discordants log versions

SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

fns fps distances log report versions

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

Build a structural variant database

010

db versions

svdb:

structural variant database software

The merge module merges structural variants within one or more vcf files.

0100

vcf tbi csi versions

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01000000

vcf versions

svdb:

structural variant database software

Performs tests on BAF files

01234

metrics versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

010

standardized_vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

gt_vcf json versions

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

vcf versions

Compresses/decompresses files

01

output gzi versions

bgzip:

Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.

bgzip a sorted tab-delimited genome file and then create tabix index

01

gz_tbi gz_csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

create tabix index from a sorted bgzip tab-delimited genome file

01

tbi csi versions

tabix:

Generic indexer for TAB-delimited genome position files.

Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA

01

csv_gz versions

Compress directories into tarballs with various compression options

010

archive versions

Convert taxon names to TaxIds

0120

tsv versions

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

Standardise and merge two or more taxonomic profiles into a single table

010000

merged_profiles versions

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

Standardise the output of a wide range of taxonomic profilers

01000

standardised_profile versions

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

A tool to detect resistance and lineages of M. tuberculosis genomes

01

bam csv json txt vcf versions

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Aligns sequences using T_COFFEE

01010120

alignment lib versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Compares 2 alternative MSAs to evaluate them.

012

scores versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Computes a consensus alignment using T_COFFEE

01010

alignment eval versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Computes the irmsd score for a given alignment and the structures.

00

irmsd versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package

01010120

alignment versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

01

formatted_file versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

0101

tcs scores versions

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Telseq: a software for calculating telomere length

012010101

output versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Parses a Thermo RAW file containing mass spectra to an open file format

01

spectra versions

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

Computes the coverage of different regions from the bam file.

0101

cov wig versions

tiddit:

TIDDIT - structural variant calling.

Identify chromosomal rearrangements.

0120101

vcf ploidy versions

sv:

Search for structural variants.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

01

explore_tsv top_sequence versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Plots telomeric repeat frequency against sliding window location using data produced by tidk/search

01

svg versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Searches a genome for a telomere string such as TTAGGG

010

tsv bedgraph versions

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

010101010

fasta vcf ccf log versions

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

A post sequencing QC tool for Oxford Nanopore sequencers

01

report_data report_html plots_html plotly_js versions

TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.

01

pep gff3 cds dat folder versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

010

pep gff3 cds bed versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Trim FastQ files using Trim Galore!

01

reads log unpaired html zip versions

Performs quality and adapter trimming on paired end and single end reads

01

trimmed_reads unpaired_reads trim_log out_log summary versions

Assembles a de novo transcriptome from RNAseq reads

01

transcript_fasta log versions

Run TRUST4 on RNA-seq data

01201010101

tsv airr_files airr_tsv report_tsv fasta out fq outs versions

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

fn_vcf fn_tbi fp_vcf fp_tbi tp_base_vcf tp_base_tbi tp_comp_vcf tp_comp_tbi summary versions

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

consistency versions

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

vcf versions

truvari:

Structural variant comparison tool for VCFs

Cluster contigs from multiple assemblies by similarity

012

cluster_dir versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Subsample a long-read sequencing fastq file for multiple assemblies

01

subreads versions

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Transcript Selector for BRAKER TSEBRA combines gene predictions by selecing transcripts based on their extrisic evidence support

01000

tsebra_gtf tsebra_scores versions

Import transcript-level abundances and estimated counts for gene-level analysis packages

01010

tpm_gene counts_gene counts_gene_length_scaled counts_gene_scaled lengths_gene tpm_transcript counts_transcript lengths_transcript versions

tximeta:

Transcript Quantification Import with Automatic Metadata

Remove lines from bed file that refer to off-chromosome locations.

010

bedgraph versions

ucsc:

Remove lines from bed file that refer to off-chromosome locations.

Convert a bedGraph file to bigWig format.

010

bigwig versions

ucsc:

Convert a bedGraph file to bigWig format.

Convert file from bed to bigBed format

0100

bigbed versions

ucsc:

Convert file from bed to bigBed format

compute average score of bigwig over bed file

010

tab versions

ucsc:

Compute average score of big wig over each bed, which may have introns.

compute average score of bigwig over bed file

01

genepred refflat versions

ucsc:

Convert GTF files to GenePred format

convert between genome builds

010

lifted unlifted versions

ucsc:

Move annotations from one assembly to another

Convert ascii format wig file to binary big wig format

010

bw versions

ucsc:

Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

01001

bam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

00

index versions

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

0100

sam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.

010

fastq no_match_fastq report versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam fastq log versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

reads log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

012

bam log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Assembles bacterial genomes

012

scaffolds gfa log versions

Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data

010

outs versions

Extract files.

01

untar versions

Extract files.

01

files versions

untar:

Extract tar.gz files.

Unzip ZIP archive files

01

unzipped_archive versions

Unzip ZIP archive files

01

files versions

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Simple software to call UPD regions from germline exome/wgs trios.

01

bed versions

Aligns protein structures using UPP

01010

alignment versions

upp:

SATe-enabled phylogenetic placement

The Java port of the VarDict variant caller

01230101

vcf versions

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

bam versions

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

01200

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

alignment_properties_json versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

bcf_gz vcf_gz bcf vcf versions

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

010101010

cgh versions

A tool to create a Gemini-compatible DB file from an annotated VCF

012

db versions

vcf2maf

0100

maf versions

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

0123000

vcf versions

If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

Command line tools for parsing and manipulating VCF files.

012

vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

012

vcf versions

vcflib:

Command-line tools for manipulating VCF files

A set of tools written in Perl and C++ for working with VCF files

0100

vcf bcf frq frq_count idepth ldepth ldepth_mean gdepth hap_ld geno_ld geno_chisq list_hap_ld list_geno_ld interchrom_hap_ld interchrom_geno_ld tstv tstv_summary tstv_count tstv_qual filter_summary sites_pi windowed_pi weir_fst heterozygosity hwe tajima_d freq_burden lroh relatedness relatedness2 lqual missing_individual missing_site snp_density kept_sites removed_sites singeltons indel_hist hapcount mendel format info genotypes_matrix genotypes_matrix_individual genotypes_matrix_position impute_hap impute_hap_legend impute_hap_indv ldhat_sites ldhat_locs beagle_gl beagle_pl ped map_ tped tfam diff_sites_in_files diff_indv_in_files diff_sites diff_indv diff_discd_matrix diff_switch_error versions

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto turorial

01230

loom versions

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

graph versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

0100

vcf versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

01

xg vg_index versions

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

0

rnalfold_txt versions

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Use vireo to perform donor deconvolution for multiplexed scRNA-seq data

01234

summary donor_ids prob_singlets prob_doublets versions

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

bins membership summary versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

decomposes multiallelic variants into biallelic in a VCF file.

012

vcf versions

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

vcf fai versions

vt:

A tool set for short variant discovery in genetic sequence data

a pangenome-scale aligner

0123400

paf versions

simulating sequence reads from a reference genome

01

fastq versions

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

Masks out highly repetitive DNA sequences with low complexity in a genome

01

converted versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to generate frequency counts of repetitive units.

01

counts versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to take a counts file and creates a file of genomic co-ordinates to be masked.

0101

intervals versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

Convert and filter aligned reads to .npz

0120101

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase

0101

gender versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Create a new reference using healthy reference samples

01

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Find copy number aberrations

010101

aberrations_bed bins_bed segments_bed chr_statistics chr_plots genome_plot versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

A large variant benchmarking tool analogous to hap.py for small variants.

01234

report bench_vcf bench_vcf_tbi versions

Fast lightweight accurate xenograft sorting

00000

hash info versions

xengsort:

A fast xenograft read sorter based on space-efficient k-mer hashing

The xeniumranger import-segmentation module allows you to specify 2D nuclei and/or cell segmentation results for assigning transcripts to cells and recalculate all Xenium Onboard Analysis (XOA) outputs that depend on segmentation. Segmentation results can be generated by community-developed tools or prior Xenium segmentation result.

01000000

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.

010

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

0100

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.

010000

outs versions

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

Compresses files with xz.

01

archive versions

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Decompresses files with xz.

01

file versions

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Performs assembly scaffolding using YaHS

0100

scaffolds_fasta scaffolds_agp binary versions

a tool to build k-mer hash table for fasta and fastq files

01

yak versions

yak:

Yet another k-mer analyzer

Builds a YARA index for a reference genome

01

index versions

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Align reads to a reference genome using YARA

0101

bam bai versions

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Compress file lists to produce ZIP archive files

01

zipped_archive versions

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Click here to trigger an update.