Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • bam 178
  • fasta 176
  • genomics 146
  • vcf 128
  • fastq 125
  • metagenomics 94
  • genome 86
  • alignment 76
  • index 73
  • assembly 67
  • reference 67
  • bed 66
  • gatk4 63
  • cram 58
  • sort 57
  • sam 50
  • annotation 43
  • filter 41
  • structural variants 40
  • variant calling 40
  • database 39
  • align 38
  • merge 35
  • gff 33
  • bacteria 32
  • download 29
  • statistics 29
  • coverage 28
  • map 28
  • variants 27
  • classification 26
  • quality control 25
  • qc 25
  • gtf 24
  • nanopore 23
  • classify 23
  • k-mer 22
  • cnv 21
  • taxonomy 20
  • MSA 19
  • variant 19
  • split 19
  • single-cell 19
  • gfa 18
  • contamination 18
  • taxonomic profiling 18
  • pacbio 17
  • somatic 17
  • sentieon 17
  • conversion 17
  • clustering 16
  • convert 16
  • quality 15
  • proteomics 15
  • binning 15
  • count 15
  • long reads 14
  • VCF 14
  • copy number 14
  • ancient DNA 14
  • protein 14
  • build 14
  • rnaseq 14
  • trimming 13
  • phage 13
  • phylogeny 13
  • bcftools 13
  • contigs 13
  • imaging 13
  • bedtools 13
  • kmer 13
  • imputation 13
  • consensus 12
  • variation graph 12
  • mags 12
  • metrics 12
  • bqsr 12
  • cluster 12
  • graph 12
  • databases 12
  • gvcf 12
  • sv 12
  • bisulfite 12
  • isoseq 12
  • example 12
  • reporting 12
  • taxonomic classification 11
  • long-read 11
  • bisulphite 11
  • compression 11
  • picard 11
  • QC 11
  • methylation 11
  • wgs 11
  • cna 11
  • table 11
  • methylseq 11
  • virus 11
  • indexing 11
  • visualisation 11
  • illumina 11
  • mapping 10
  • histogram 10
  • depth 10
  • antimicrobial resistance 10
  • tsv 10
  • serotype 10
  • sequences 10
  • searching 10
  • demultiplex 10
  • protein sequence 10
  • plink2 10
  • stats 10
  • openms 10
  • 5mC 10
  • samtools 9
  • base quality score recalibration 9
  • aDNA 9
  • haplotype 9
  • matrix 9
  • expression 9
  • amr 9
  • pangenome graph 9
  • repeat 9
  • demultiplexing 9
  • validation 9
  • DNA methylation 9
  • filtering 9
  • segmentation 9
  • plot 9
  • bins 9
  • WGBS 9
  • neural network 9
  • structure 9
  • mmseqs2 9
  • scWGBS 9
  • pairs 9
  • markduplicates 9
  • aligner 8
  • bisulfite sequencing 8
  • completeness 8
  • palaeogenomics 8
  • germline 8
  • transcriptome 8
  • metagenome 8
  • annotate 8
  • archaeogenomics 8
  • bwa 8
  • machine learning 8
  • LAST 8
  • phasing 8
  • bcf 8
  • cooler 8
  • damage 8
  • gzip 8
  • db 8
  • HMM 8
  • mappability 8
  • gene 8
  • transcript 8
  • biscuit 8
  • low-coverage 8
  • genotype 8
  • iCLIP 8
  • sequence 8
  • checkm 8
  • seqkit 8
  • glimpse 7
  • umi 7
  • mkref 7
  • genomes 7
  • msa 7
  • complexity 7
  • newick 7
  • ucsc 7
  • prediction 7
  • hmmsearch 7
  • decompression 7
  • duplicates 7
  • mag 7
  • ncbi 7
  • bismark 7
  • peaks 7
  • evaluation 7
  • genotyping 7
  • hmmer 7
  • feature 7
  • gff3 7
  • kraken2 7
  • blast 7
  • dedup 7
  • population genetics 7
  • sketch 7
  • spatial 7
  • cnvkit 6
  • rna 6
  • tumor-only 6
  • report 6
  • bedGraph 6
  • multiple sequence alignment 6
  • prokaryote 6
  • call 6
  • single 6
  • short-read 6
  • csv 6
  • sequencing 6
  • antimicrobial peptides 6
  • scaffold 6
  • splicing 6
  • json 6
  • scRNA-seq 6
  • kmers 6
  • NCBI 6
  • BGC 6
  • low frequency variant calling 6
  • extract 6
  • immunoinformatics 6
  • GPU-accelerated 6
  • fungi 6
  • mirna 6
  • reads 6
  • pangenome 6
  • plasmid 6
  • biosynthetic gene cluster 6
  • snp 6
  • de novo 6
  • deduplication 6
  • antimicrobial resistance genes 6
  • profile 6
  • vsearch 6
  • differential 6
  • structural 6
  • mitochondria 6
  • clipping 6
  • BCR 5
  • ont 5
  • immunology 5
  • riboseq 5
  • cat 5
  • concatenate 5
  • svtk 5
  • FASTQ 5
  • idXML 5
  • sourmash 5
  • diversity 5
  • fastx 5
  • single cell 5
  • contig 5
  • bgzip 5
  • MAF 5
  • reference-free 5
  • indels 5
  • tabular 5
  • text 5
  • cool 5
  • arg 5
  • regions 5
  • fragment 5
  • profiling 5
  • counts 5
  • mutect2 5
  • ptr 5
  • de novo assembly 5
  • compare 5
  • gridss 5
  • benchmark 5
  • isolates 5
  • kallisto 5
  • visualization 5
  • preprocessing 5
  • wxs 5
  • distance 5
  • interval 5
  • coptr 5
  • prokaryotes 5
  • adapters 5
  • merging 5
  • mpileup 5
  • amps 5
  • query 5
  • view 5
  • 3-letter genome 5
  • genome mining 5
  • chromosome 5
  • microbiome 5
  • summary 5
  • deamination 5
  • antibiotic resistance 5
  • RNA-seq 5
  • detection 5
  • mem 5
  • eukaryotes 5
  • paf 4
  • containment 4
  • fai 4
  • hmmcopy 4
  • microscopy 4
  • deep learning 4
  • gsea 4
  • sylph 4
  • antibiotics 4
  • RiPP 4
  • NRPS 4
  • HiFi 4
  • phylogenetic placement 4
  • microsatellite 4
  • happy 4
  • dbCAN 4
  • resistance 4
  • STR 4
  • cut 4
  • malt 4
  • chunk 4
  • transcriptomics 4
  • fgbio 4
  • ccs 4
  • dna 4
  • skani 4
  • embeddings 4
  • compress 4
  • miscoding lesions 4
  • genome assembler 4
  • copy number alteration calling 4
  • bedpe 4
  • reports 4
  • mask 4
  • palaeogenetics 4
  • telomere 4
  • hybrid capture sequencing 4
  • targeted sequencing 4
  • archaeogenetics 4
  • antismash 4
  • hic 4
  • ATAC-seq 4
  • DNA sequencing 4
  • microarray 4
  • ganon 4
  • normalization 4
  • hashing-based deconvolution 4
  • union 4
  • quantification 4
  • bedgraph 4
  • ngscheckmate 4
  • CLIP 4
  • logratio 4
  • umitools 4
  • propr 4
  • scaffolding 4
  • snps 4
  • bcl2fastq 4
  • benchmarking 4
  • CAZyme gene Cluster 4
  • image 4
  • nucleotide 4
  • SV 4
  • diamond 4
  • DNA sequence 4
  • peak-calling 4
  • genmod 4
  • abundance 4
  • ranking 4
  • antibody 4
  • pypgx 4
  • ancestry 4
  • redundancy 4
  • add 4
  • mtDNA 4
  • matching 4
  • secondary metabolites 4
  • circrna 4
  • ampir 4
  • family 4
  • fusion 4
  • xeniumranger 4
  • interval_list 4
  • haplotypecaller 4
  • clean 4
  • retrotransposon 4
  • bin 4
  • public datasets 4
  • enrichment 4
  • CAZyme 4
  • sample 4
  • parsing 4
  • bigwig 4
  • isomir 4
  • entrez 3
  • npz 3
  • organelle 3
  • quality trimming 3
  • adapter trimming 3
  • fetch 3
  • notebook 3
  • windowmasker 3
  • shapeit 3
  • khmer 3
  • typing 3
  • fastk 3
  • long read 3
  • html 3
  • krona chart 3
  • RNA 3
  • pseudoalignment 3
  • krona 3
  • bwameth 3
  • pileup 3
  • rna_structure 3
  • rsem 3
  • decontamination 3
  • host 3
  • fcs-gx 3
  • variant_calling 3
  • chimeras 3
  • neubi 3
  • anndata 3
  • bakta 3
  • miRNA 3
  • amplicon 3
  • amplify 3
  • dictionary 3
  • ligate 3
  • image_analysis 3
  • rtg-tools 3
  • mcmicro 3
  • transcripts 3
  • score 3
  • genome assembly 3
  • duplication 3
  • highly_multiplexed_imaging 3
  • comparison 3
  • DRAMP 3
  • scanpy 3
  • amplicon sequencing 3
  • ataqv 3
  • UMI 3
  • population genomics 3
  • short reads 3
  • prokka 3
  • mapper 3
  • subsample 3
  • SNP 3
  • seqtk 3
  • cfDNA 3
  • wastewater 3
  • gatk4spark 3
  • repeat expansion 3
  • hidden Markov model 3
  • indel 3
  • virulence 3
  • bamtools 3
  • spark 3
  • fusions 3
  • guide tree 3
  • mzml 3
  • PacBio 3
  • somatic variants 3
  • ambient RNA removal 3
  • amplicon sequences 3
  • macrel 3
  • observations 3
  • untar 3
  • kinship 3
  • identity 3
  • relatedness 3
  • popscle 3
  • CRISPR 3
  • checkv 3
  • genotype-based deconvoltion 3
  • polishing 3
  • prefetch 3
  • uncompress 3
  • microbes 3
  • comparisons 3
  • roh 3
  • C to T 3
  • das tool 3
  • combine 3
  • das_tool 3
  • structural_variants 3
  • informative sites 3
  • tabix 3
  • dist 3
  • cut up 3
  • transposons 3
  • mlst 3
  • survivor 3
  • archiving 3
  • read depth 3
  • zip 3
  • unzip 3
  • replace 3
  • msisensor-pro 3
  • dump 3
  • insert 3
  • intervals 3
  • spaceranger 3
  • png 3
  • converter 3
  • wig 3
  • chip-seq 3
  • atac-seq 3
  • kraken 3
  • retrotransposons 3
  • minimap2 3
  • hi-c 3
  • fingerprint 3
  • vrhyme 3
  • lineage 3
  • PCA 3
  • pangolin 3
  • arriba 3
  • bacterial 3
  • fam 3
  • lossless 3
  • mkfastq 3
  • angsd 3
  • pairsam 3
  • uLTRA 3
  • scores 3
  • cellranger 3
  • pan-genome 3
  • deeparg 3
  • covid 3
  • aln 3
  • gene expression 3
  • small indels 3
  • panel 3
  • proteome 3
  • bracken 3
  • complement 3
  • long_read 3
  • remove 3
  • bim 3
  • junctions 2
  • leviosam2 2
  • lift 2
  • nextclade 2
  • homologs 2
  • micro-satellite-scan 2
  • tumor 2
  • repeats 2
  • rrna 2
  • duplicate 2
  • metamaps 2
  • msi 2
  • instability 2
  • tags 2
  • Duplication purging 2
  • haplogroups 2
  • salmon 2
  • ChIP-seq 2
  • Read depth 2
  • graph layout 2
  • polish 2
  • Hi-C 2
  • rgfa 2
  • orf 2
  • kma 2
  • phase 2
  • RNA sequencing 2
  • removal 2
  • gene set analysis 2
  • long terminal retrotransposon 2
  • MSI 2
  • mirdeep2 2
  • runs_of_homozygosity 2
  • small variants 2
  • archaea 2
  • long terminal repeat 2
  • MCMICRO 2
  • ped 2
  • gstama 2
  • tnhaplotyper2 2
  • tama 2
  • concordance 2
  • multiallelic 2
  • trancriptome 2
  • ome-tif 2
  • smrnaseq 2
  • genome taxonomy database 2
  • gene set 2
  • profiles 2
  • homoploymer 2
  • demultiplexed reads 2
  • authentication 2
  • variation 2
  • adapter 2
  • translation 2
  • differential expression 2
  • mash 2
  • bustools 2
  • preseq 2
  • transcriptomic 2
  • xz 2
  • minhash 2
  • archive 2
  • tree 2
  • COBS 2
  • zlib 2
  • taxon name 2
  • library 2
  • screen 2
  • aggregate 2
  • taxids 2
  • pair 2
  • barcode 2
  • subset 2
  • HOPS 2
  • bfiles 2
  • variant pruning 2
  • amptransformer 2
  • RNA-Seq 2
  • serogroup 2
  • import 2
  • simulate 2
  • lofreq 2
  • interactive 2
  • krakenuniq 2
  • ampgram 2
  • artic 2
  • krakentools 2
  • mudskipper 2
  • checksum 2
  • polyA_tail 2
  • reformatting 2
  • read-group 2
  • purge duplications 2
  • edit distance 2
  • reformat 2
  • structural-variants 2
  • primer 2
  • mapcounter 2
  • dereplicate 2
  • functional analysis 2
  • resolve_bioscience 2
  • spatial_transcriptomics 2
  • function 2
  • genetics 2
  • hlala_typing 2
  • hla_typing 2
  • hlala 2
  • hla 2
  • amino acid 2
  • WGS 2
  • regression 2
  • orthology 2
  • refine 2
  • maximum likelihood 2
  • iphop 2
  • parallelized 2
  • interactions 2
  • k-mer index 2
  • instrain 2
  • megan 2
  • ichorcna 2
  • orthologs 2
  • mass spectrometry 2
  • bloom filter 2
  • k-mer frequency 2
  • GC content 2
  • cgMLST 2
  • pharokka 2
  • assembly evaluation 2
  • MaltExtract 2
  • quarto 2
  • nucleotides 2
  • haplotypes 2
  • satellite data 2
  • helitron 2
  • GEO 2
  • Streptococcus pneumoniae 2
  • spatial transcriptomics 2
  • metagenomic 2
  • identifier 2
  • calling 2
  • duplex 2
  • ancient dna 2
  • expansionhunterdenovo 2
  • repeat_expansions 2
  • metadata 2
  • windows 2
  • tab 2
  • intersection 2
  • switch 2
  • CNV 2
  • UMIs 2
  • plant 2
  • cancer genomics 2
  • deconvolution 2
  • bayesian 2
  • salmonella 2
  • SimpleAF 2
  • estimation 2
  • gwas 2
  • rename 2
  • transformation 2
  • merge mate pairs 2
  • sequenzautils 2
  • reads merging 2
  • trim 2
  • bamUtil 2
  • find 2
  • pigz 2
  • svdb 2
  • scanner 2
  • unaligned 2
  • vizgen 2
  • emboss 2
  • recombination 2
  • norm 2
  • spatial_omics 2
  • insilico 2
  • random forest 2
  • version 2
  • metagenomes 2
  • join 2
  • normalize 2
  • settings 2
  • correction 2
  • scatter 2
  • reheader 2
  • effect prediction 2
  • sra-tools 2
  • structural-variant calling 2
  • fasterq-dump 2
  • vg 2
  • snpeff 2
  • heatmap 2
  • cnvnator 2
  • samplesheet 2
  • de novo assembler 2
  • vpt 2
  • shigella 2
  • metacache 2
  • small genome 2
  • eigenstrat 2
  • vcflib 2
  • validate 2
  • signature 2
  • intersect 2
  • format 2
  • eido 2
  • blastp 2
  • deseq2 2
  • rna-seq 2
  • concat 2
  • FracMinHash sketch 2
  • tbi 2
  • standardization 2
  • cvnkit 2
  • region 2
  • gatk 2
  • fixmate 2
  • taxon tables 2
  • otu tables 2
  • xenograft 2
  • registration 2
  • image_processing 2
  • allele 2
  • eCLIP 2
  • dict 2
  • substrings 2
  • gene labels 2
  • soft-clipped clusters 2
  • hostile 2
  • human removal 2
  • awk 2
  • standardisation 2
  • mitochondrion 2
  • collate 2
  • graft 2
  • long-read sequencing 2
  • rtgtools 2
  • genomad 2
  • Pharmacogenetics 2
  • gem 2
  • proportionality 2
  • secondary structure 2
  • frame-shift correction 2
  • sequence analysis 2
  • doublets 2
  • blastn 2
  • BAM 2
  • baf 2
  • pharmacogenetics 2
  • immunoprofiling 2
  • bam2fq 2
  • DNA 2
  • vdj 2
  • screening 2
  • joint genotyping 2
  • panelofnormals 2
  • qualty 2
  • ragtag 2
  • filtermutectcalls 2
  • cleaning 2
  • nacho 2
  • nanostring 2
  • interval list 2
  • mRNA 2
  • allele-specific 2
  • datacube 2
  • copyratios 2
  • samples 2
  • realignment 2
  • varcal 2
  • bases 2
  • microbial 2
  • parse 2
  • splice 2
  • sizes 2
  • genome bins 2
  • snpsift 2
  • corrupted 2
  • evidence 2
  • single cells 2
  • taxonomic profile 2
  • standardise 2
  • trgt 2
  • split_kmers 2
  • resfinder 1
  • ucsc/liftover 1
  • VCFtools 1
  • plastid 1
  • upd 1
  • quality-control 1
  • files 1
  • verifybamid 1
  • install 1
  • dbsnp 1
  • DNA contamination estimation 1
  • mgf 1
  • deduplicate 1
  • downsample bam 1
  • parquet 1
  • parser 1
  • standardize 1
  • raw 1
  • detect 1
  • uniparental 1
  • gaps 1
  • lua 1
  • toml 1
  • parallel 1
  • scRNA-Seq 1
  • downsample 1
  • snv 1
  • resistance genes 1
  • disomy 1
  • introns 1
  • maf 1
  • subsample bam 1
  • tumour contamination 1
  • genotypegvcf 1
  • joint-genotyping 1
  • gemini 1
  • transform 1
  • vcfbreakmulti 1
  • vcf2db 1
  • uniq 1
  • umicollapse 1
  • fusion report 1
  • mashmap 1
  • idx 1
  • workflow_mode 1
  • structural bioinformatics 1
  • fusion_report 1
  • dnamodelapply 1
  • sompy 1
  • readwriter 1
  • denoisereadcounts 1
  • peak picking 1
  • createreadcountpanelofnormals 1
  • workflow 1
  • htseq 1
  • site frequency spectrum 1
  • snakemake 1
  • mkdssp 1
  • sliding 1
  • ancestral alleles 1
  • derived alleles 1
  • tnfilter 1
  • hicPCA 1
  • eigenvectors 1
  • dnascope 1
  • dssp 1
  • proteus 1
  • boxcox 1
  • reverse complement 1
  • Read coverage histogram 1
  • simulation 1
  • hmmfetch 1
  • propd 1
  • decompose 1
  • Escherichia coli 1
  • transmembrane 1
  • clr 1
  • decoy 1
  • alr 1
  • blat 1
  • confidence 1
  • genome graph 1
  • chloroplast 1
  • bgen 1
  • tnseq 1
  • tnscope 1
  • python 1
  • readproteingroups 1
  • c to t 1
  • mutect 1
  • immcantation 1
  • usearch 1
  • structural variant 1
  • bam2fastx 1
  • vsearch/sort 1
  • sintax 1
  • eigenvector 1
  • linkbins 1
  • bam2fastq 1
  • extractunbinned 1
  • pangenome-scale 1
  • airrseq 1
  • graph projection to vcf 1
  • co-orthology 1
  • homology 1
  • sequence similarity 1
  • spectral clustering 1
  • comparative genomics 1
  • construct 1
  • deep variant 1
  • long read alignment 1
  • all versus all 1
  • adna 1
  • copy number variation 1
  • mapad 1
  • geo 1
  • yahs 1
  • array_cgh 1
  • cytosure 1
  • purity 1
  • vector 1
  • normal 1
  • copy number alterations 1
  • wavefront 1
  • gender determination 1
  • copy number analysis 1
  • copy-number 1
  • gprofiler2 1
  • gost 1
  • compartment signal 1
  • wham 1
  • whamg 1
  • rad 1
  • groupby 1
  • circular 1
  • cancer evolution 1
  • mzML 1
  • processing masks 1
  • hmmpress 1
  • hhsuite 1
  • 16S 1
  • CRISPRi 1
  • taxonomic composition 1
  • prepare 1
  • phylogenies 1
  • catpack 1
  • Computational Immunology 1
  • Bioinformatics Tools 1
  • Immune Deconvolution 1
  • binary masks 1
  • doublet 1
  • patterns 1
  • hmmscan 1
  • PCR 1
  • paired reads re-pairing 1
  • tandem repeats 1
  • short 1
  • intron 1
  • masking 1
  • low-complexity 1
  • GFF/GTF 1
  • trio binning 1
  • shuffleBed 1
  • junction 1
  • SNV 1
  • Indel 1
  • host removal 1
  • haploype 1
  • impute 1
  • reference compression 1
  • reference panel 1
  • regex 1
  • fix 1
  • nm 1
  • logFC 1
  • nuclear segmentation 1
  • import segmentation 1
  • solo 1
  • scvi 1
  • p-value 1
  • significance statistic 1
  • subsetting 1
  • relabel 1
  • barcodes 1
  • doublet_detection 1
  • refflat 1
  • quality_control 1
  • emoji 1
  • source tracking 1
  • controlstatistics 1
  • cell segmentation 1
  • resegment 1
  • malformed 1
  • block substitutions 1
  • partitioning 1
  • chip 1
  • area of interest 1
  • updatedata 1
  • run 1
  • pdb 1
  • decomposeblocksub 1
  • morphology 1
  • identity-by-descent 1
  • mgi 1
  • recovery 1
  • leafcutter 1
  • regtools 1
  • plotting 1
  • metagenome assembler 1
  • uq 1
  • md 1
  • elfasta 1
  • covariance models 1
  • metaspace 1
  • rearrangements 1
  • integron 1
  • mobile genetic elements 1
  • genome annotation 1
  • trna 1
  • LTR 1
  • data-download 1
  • assembly correction 1
  • unmarkduplicates 1
  • gtdb taxonomy 1
  • ncbi taxonomy 1
  • remove samples 1
  • gc 1
  • melon 1
  • metabolite annotation 1
  • hwe 1
  • SINE 1
  • long-reads 1
  • TCR 1
  • metabolomics 1
  • spatialdata 1
  • data-visualization 1
  • iterative model refinement 1
  • streamlit 1
  • references 1
  • synteny 1
  • modelsegments 1
  • missingness 1
  • patch 1
  • svdecompose 1
  • starfusion 1
  • mean 1
  • bnd-eval 1
  • BGZF 1
  • rna assembly 1
  • dream 1
  • covariance model 1
  • pro 1
  • homozygous genotypes 1
  • heterozygous genotypes 1
  • micro-satellite 1
  • inbreeding 1
  • dereplication 1
  • ascii_art 1
  • microbial genomics 1
  • drep 1
  • inhouse 1
  • agat 1
  • longest 1
  • isoform 1
  • variancepartition 1
  • f coefficient 1
  • linkage equilibrium 1
  • zarr 1
  • genome polishing 1
  • network 1
  • ome-ngff 1
  • wget 1
  • comp 1
  • bedcov 1
  • info 1
  • assembly polishing 1
  • hello 1
  • genotype dosages 1
  • vcf file 1
  • bgen file 1
  • plink2_pca 1
  • pca 1
  • cowpy 1
  • pruning 1
  • elprep 1
  • nucleotide content 1
  • r 1
  • rna velocity 1
  • paired reads merging 1
  • doublet-detection 1
  • functional enrichment 1
  • grea 1
  • extension 1
  • cobra 1
  • enhancer 1
  • check 1
  • nanopore sequencing 1
  • pile up 1
  • go 1
  • mygene 1
  • tf_affinity 1
  • cell_barcodes 1
  • tag 1
  • overlap-based merging 1
  • hamming-distance 1
  • biological activity 1
  • droplet based single cells 1
  • vcflib/vcffixup 1
  • trimfq 1
  • cellsnp 1
  • donor deconvolution 1
  • genotype-based demultiplexing 1
  • lexogen 1
  • busco 1
  • hashing-based deconvoltion 1
  • InterProScan 1
  • MMseqs2 1
  • retrieval 1
  • transposable element 1
  • generic 1
  • coreutils 1
  • gnu 1
  • prior knowledge 1
  • omics 1
  • Pacbio 1
  • subclonal deconvolution 1
  • hwe statistics 1
  • hardy-weinberg 1
  • abc_model 1
  • gene_regulation 1
  • predict 1
  • multi-tool 1
  • nucleotide sequence 1
  • reference-independent 1
  • distance-based 1
  • minimum_evolution 1
  • phylogenetics 1
  • assay 1
  • corpcor 1
  • correlation 1
  • coexpression 1
  • hwe equilibrium 1
  • genotype likelihood 1
  • Bayesian 1
  • refresh 1
  • scimap 1
  • spatial_neighborhoods 1
  • associations 1
  • case/control 1
  • GWAS 1
  • association 1
  • clahe 1
  • collapse 1
  • machine_learning 1
  • cell_phenotyping 1
  • cell_type_identification 1
  • n50 1
  • seqfu 1
  • probabilistic realignment 1
  • liftover 1
  • AC/NS/AF 1
  • guidetree 1
  • AT content 1
  • microRNA 1
  • poolseq 1
  • search engine 1
  • Open Science Framework 1
  • ome 1
  • mass_error 1
  • multiqc 1
  • tiff 1
  • stardist 1
  • insulation 1
  • Staging 1
  • staging 1
  • haplotag 1
  • clipOverlap 1
  • standard 1
  • svg 1
  • variant-calling 1
  • telseq 1
  • tile 1
  • jvarkit 1
  • nucBed 1
  • bclconvert 1
  • targz 1
  • tarball 1
  • tar 1
  • translate 1
  • setgt 1
  • vsearch/dereplicate 1
  • minimizer 1
  • ATACshift 1
  • shift 1
  • ATACseq 1
  • fastqfilter 1
  • vsearch/fastqfilter 1
  • osf 1
  • extent 1
  • xml 1
  • bwamem2 1
  • cram-size 1
  • sage 1
  • orthogroup 1
  • spot 1
  • realign 1
  • quality check 1
  • size 1
  • selector 1
  • extraction 1
  • paraphase 1
  • transcription factors 1
  • regulatory network 1
  • 10x 1
  • ribosomal 1
  • grabix 1
  • bwameme 1
  • featuretable 1
  • redundant 1
  • script 1
  • SBS 1
  • mutational signatures 1
  • java 1
  • rank 1
  • tag2tag 1
  • impute-info 1
  • functional 1
  • Illumina 1
  • nanoq 1
  • uniques 1
  • bs genome reference 1
  • drug categorization 1
  • Read report 1
  • scrublet 1
  • Read trimming 1
  • Read filters 1
  • gtftogenepred 1
  • sequencing summary 1
  • genepred 1
  • createsequencedictionary 1
  • filtervarianttranches 1
  • filterintervals 1
  • estimatelibrarycomplexity 1
  • duplication metrics 1
  • determinegermlinecontigploidy 1
  • createsomaticpanelofnormals 1
  • condensedepthevidence 1
  • gatherbqsrreports 1
  • dragstr 1
  • composestrtablefile 1
  • short variant discovery 1
  • combinegvcfs 1
  • collectsvevidence 1
  • collectreadcounts 1
  • cnnscorevariants 1
  • tranche filtering 1
  • genomicsdb 1
  • getpileupsummaries 1
  • learnreadorientationmodel 1
  • postprocessgermlinecnvcalls 1
  • snvs 1
  • mutectstats 1
  • mergebamalignment 1
  • leftalignandtrimvariants 1
  • readorientationartifacts 1
  • indexfeaturefile 1
  • genomicsdbimport 1
  • readcountssummary 1
  • getpileupsumaries 1
  • germlinevariantsites 1
  • germlinecnvcaller 1
  • germline contig ploidy 1
  • panelofnormalscreation 1
  • jointgenotyping 1
  • calibratedragstrmodel 1
  • cross-samplecontamination 1
  • printreads 1
  • groupreads 1
  • random 1
  • generate 1
  • single molecule 1
  • zipperbams 1
  • ubam 1
  • unmapped 1
  • duplexumi 1
  • fq 1
  • consensus sequence 1
  • public 1
  • ENA 1
  • SRA 1
  • ANI 1
  • ARGs 1
  • antibiotic resistance genes 1
  • lint 1
  • rust 1
  • calculatecontamination 1
  • heattree 1
  • bedtointervallist 1
  • asereadcounter 1
  • vqsr 1
  • variant quality score recalibration 1
  • annotateintervals 1
  • targets 1
  • gangstr 1
  • variant caller 1
  • gene-calling 1
  • gamma 1
  • UShER 1
  • bootstrapping 1
  • bacterial variant calling 1
  • germline variant calling 1
  • somatic variant calling 1
  • preprocessintervals 1
  • printsvevidence 1
  • str 1
  • gvcftools 1
  • rgi 1
  • fARGene 1
  • amrfinderplus 1
  • abricate 1
  • extractvariants 1
  • extract_variants 1
  • gunzip 1
  • hbd 1
  • gunc 1
  • GTDB taxonomy 1
  • gstama/polyacleanup 1
  • gstama/merge 1
  • TAMA 1
  • gene model 1
  • tama_collapse.py 1
  • ibd 1
  • beagle 1
  • merge compare 1
  • pos 1
  • js 1
  • igv.js 1
  • igv 1
  • IDR 1
  • panel_of_normals 1
  • haemophilus 1
  • annotations 1
  • mitochondrial 1
  • hmtnote 1
  • Hidden Markov Model 1
  • HMMER 1
  • readcounter 1
  • gccounter 1
  • haplotype resolution 1
  • Haemophilus influenzae 1
  • genomes on a tree 1
  • GNU 1
  • reblockgvcf 1
  • svannotate 1
  • txt 1
  • gawk 1
  • variantrecalibrator 1
  • recalibration model 1
  • variantfiltration 1
  • svcluster 1
  • splitintervals 1
  • bgc 1
  • splitcram 1
  • site depth 1
  • shiftintervals 1
  • shiftfasta 1
  • shiftchain 1
  • selectvariants 1
  • revert 1
  • file parsing 1
  • genome profile 1
  • joint-variant-calling 1
  • genome manipulation 1
  • Imputation 1
  • Haplotypes 1
  • Sample 1
  • low coverage 1
  • gget 1
  • genome statistics 1
  • genome summary 1
  • compound 1
  • gfastats 1
  • Mykrobe 1
  • Salmonella Typhi 1
  • repeat content 1
  • genome heterozygosity 1
  • genome size 1
  • models 1
  • faqcs 1
  • cache 1
  • multicut 1
  • deduping 1
  • autozygosity 1
  • homozygosity 1
  • biallelic 1
  • update header 1
  • BCF 1
  • csi 1
  • smaller fastqs 1
  • bamtobed 1
  • clumping fastqs 1
  • background_correction 1
  • illumiation_correction 1
  • element 1
  • trimBam 1
  • bamtools/split 1
  • yaml 1
  • sorting 1
  • closest 1
  • mouse 1
  • slopBed 1
  • Salmonella enterica 1
  • sorted 1
  • file manipulation 1
  • bioawk 1
  • unionBedGraphs 1
  • subtract 1
  • shiftBed 1
  • genomecov 1
  • multinterval 1
  • overlapped bed 1
  • maskfasta 1
  • chunking 1
  • jaccard 1
  • overlap 1
  • getfasta 1
  • bamtools/convert 1
  • bacphlip 1
  • tblastn 1
  • antimicrobial peptide prediction 1
  • doCounts 1
  • allele counts 1
  • nuclear contamination estimate 1
  • post Post-processing 1
  • model 1
  • AMPs 1
  • amp 1
  • HLA 1
  • Staphylococcus aureus 1
  • affy 1
  • reference panels 1
  • admixture 1
  • adapterremoval 1
  • antimicrobial reistance 1
  • contiguate 1
  • installation 1
  • utility 1
  • virulent 1
  • ancientDNA 1
  • temperate 1
  • lifestyle 1
  • autofluorescence 1
  • cycif 1
  • background 1
  • single-stranded 1
  • authentict 1
  • http(s) 1
  • read group 1
  • bias 1
  • ATLAS 1
  • sequencing_bias 1
  • post mortem damage 1
  • atlas 1
  • mkarv 1
  • subtyping 1
  • postprocessing 1
  • percent on target 1
  • cutesv 1
  • cumulative coverage 1
  • scatterplot 1
  • corrrelation 1
  • track 1
  • paired-end 1
  • pcr duplicates 1
  • gct 1
  • segment 1
  • cls 1
  • na 1
  • custom 1
  • Cores 1
  • Segmentation 1
  • TMA dearray 1
  • UNet 1
  • blastx 1
  • duphold 1
  • genomic bins 1
  • deletion 1
  • endogenous DNA 1
  • Streptococcus pyogenes 1
  • swissprot 1
  • genbank 1
  • embl 1
  • split by chromosome 1
  • circos 1
  • structural variation 1
  • eklipse 1
  • eigenstratdatabasetools 1
  • pep 1
  • schema 1
  • PEP 1
  • escherichia coli 1
  • depth information 1
  • mcool 1
  • makebins 1
  • cadd 1
  • multiomics 1
  • chromap 1
  • quality assurnce 1
  • qa 1
  • crispr 1
  • antibody capture 1
  • antigen capture 1
  • mkvdjref 1
  • chromosome_visualization 1
  • cellpose 1
  • hifi 1
  • Assembly 1
  • domains 1
  • compartments 1
  • topology 1
  • calder2 1
  • duplicate removal 1
  • polymut 1
  • enzyme 1
  • partition 1
  • digest 1
  • cload 1
  • cooler/balance 1
  • subcontigs 1
  • nucleotide composition 1
  • concoct 1
  • cnv calling 1
  • polymorphic 1
  • target 1
  • export 1
  • antitarget 1
  • access 1
  • cmseq 1
  • protein coding genes 1
  • polymorphic sites 1
  • genome browser 1
  • pixel classification 1
  • bedtobigbed 1
  • pedfilter 1
  • multimapper 1
  • Ancestor 1
  • LCA 1
  • salsa2 1
  • salsa 1
  • rocplot 1
  • rtg 1
  • sambamba 1
  • integrity 1
  • mapping-based 1
  • sequence-based 1
  • read distribution 1
  • inner_distance 1
  • fragment_size 1
  • read_pairs 1
  • flagstat 1
  • duplicate marking 1
  • strandedness 1
  • cluster analysis 1
  • seacr 1
  • chromatin 1
  • cut&run 1
  • cut&tag 1
  • peak-caller 1
  • clusteridentifier 1
  • scramble 1
  • ampliconclip 1
  • readgroup 1
  • read pairs 1
  • paired 1
  • repair 1
  • insert size 1
  • faidx 1
  • calmd 1
  • experiment 1
  • bamstat 1
  • applyvarcal 1
  • pmdtools 1
  • contact maps 1
  • bmp 1
  • jpg 1
  • pretext 1
  • contact 1
  • porechop_abi 1
  • variant genetic 1
  • intervals coverage 1
  • scoring 1
  • identifiers 1
  • whole genome association 1
  • recode 1
  • indep pairwise 1
  • indep 1
  • variant identifiers 1
  • gene finding 1
  • genomic intervals 1
  • R 1
  • Assembly curation 1
  • rhocall 1
  • long uncorrected reads 1
  • subsampling 1
  • neighbour-joining 1
  • quast 1
  • purging 1
  • False duplications 1
  • normal database 1
  • Haplotype purging 1
  • assembly curation 1
  • false duplications 1
  • duplicate purging 1
  • haplotype purging 1
  • cutoff 1
  • panel of normals 1
  • assembly-binning 1
  • VQSR 1
  • genetic 1
  • fracminhash sketch 1
  • detecting svs 1
  • variantcalling 1
  • sccmec 1
  • streptococcus 1
  • spa 1
  • spatype 1
  • hash sketch 1
  • svtk/baftest 1
  • signatures 1
  • ribosomal RNA 1
  • rRNA 1
  • constant 1
  • invariant 1
  • SNPs 1
  • predictions 1
  • short-read sequencing 1
  • baftest 1
  • snippy 1
  • eucaryotes 1
  • bigbed 1
  • bedgraphtobigwig 1
  • sequencing adapters 1
  • transcroder 1
  • cds 1
  • coding 1
  • chromosomal rearrangements 1
  • countsvtypes 1
  • Mycobacterium tuberculosis 1
  • fast5 1
  • polya tail 1
  • decompress 1
  • vcf2bed 1
  • rdtest 1
  • rdtest2vcf 1
  • dbnsfp 1
  • core 1
  • variant recalibration 1
  • random draw 1
  • induce 1
  • gc_wiggle 1
  • bam2seqz 1
  • freqsum 1
  • pseudodiploid 1
  • pseudohaploid 1
  • selection 1
  • genetic sex 1
  • seq 1
  • header 1
  • interleave 1
  • sertotype 1
  • sequence headers 1
  • grep 1
  • subseq 1
  • sex determination 1
  • relative coverage 1
  • sniffles 1
  • density 1
  • POA 1
  • SMN2 1
  • SMN1 1
  • CRAM 1
  • sliding window 1
  • features 1
  • boxplot 1
  • rare variants 1
  • exploratory 1
  • shinyngs 1
  • 256 bit 1
  • sha256 1
  • longread 1
  • de-novo 1
  • error 1
  • exclude 1
  • GRO-seq 1
  • pixel_classification 1
  • reduced 1
  • MD5 1
  • mcr-1 1
  • mass-spectroscopy 1
  • metagenome-assembled genomes 1
  • maxbin2 1
  • representations 1
  • mash/sketch 1
  • megahit 1
  • taxonomic assignment 1
  • estimate 1
  • damage patterns 1
  • NGS 1
  • DNA damage 1
  • rra 1
  • maximum-likelihood 1
  • 128 bit 1
  • denovo 1
  • sgRNA 1
  • unionsum 1
  • microrna 1
  • de Bruijn 1
  • assembler 1
  • mbias 1
  • methylation bias 1
  • metaphlan 1
  • ploidy 1
  • debruijn 1
  • smudgeplot 1
  • Merqury 1
  • contour map 1
  • 3D heat map 1
  • Neisseria meningitidis 1
  • rma6 1
  • daa 1
  • CRISPR-Cas9 1
  • functional genomics 1
  • mitochondrial genome 1
  • jupytext 1
  • effective genome size 1
  • k-mer counting 1
  • digital normalization 1
  • quant 1
  • kallisto/index 1
  • papermill 1
  • Jupyter 1
  • pneumoniae 1
  • Python 1
  • jasmine 1
  • jasminesv 1
  • insertion 1
  • genomic islands 1
  • interproscan 1
  • probability_maps 1
  • Klebsiella 1
  • kegg 1
  • peptide prediction 1
  • pneumophila 1
  • AMP 1
  • qualities 1
  • lofreq/filter 1
  • lofreq/call 1
  • Listeria monocytogenes 1
  • limma 1
  • clinical 1
  • kofamscan 1
  • legionella 1
  • collapsing 1
  • adapter removal 1
  • train 1
  • spliced 1
  • reorder 1
  • combining 1
  • target prediction 1
  • reference genome 1
  • PRO-seq 1
  • read 1
  • identification 1
  • prophage 1
  • phantom peaks 1
  • ChIP-Seq 1
  • motif 1
  • pedigrees 1
  • pair-end 1
  • phylogenetic composition 1
  • pbp 1
  • subreads 1
  • pbmerge 1
  • pbbam 1
  • graphs 1
  • paragraph 1
  • select 1
  • illumina datasets 1
  • hybrid-selection 1
  • pairstools 1
  • GRO-cap 1
  • STRIPE-seq 1
  • csRNA-seq 1
  • RAMPAGE 1
  • NETCAGE 1
  • CAGE 1
  • PRO-cap 1
  • CoPRO 1
  • mate-pair 1
  • tandem duplications 1
  • insertions 1
  • deletions 1
  • sortvcf 1
  • picard/renamesampleinvcf 1
  • pcr 1
  • liftovervcf 1
  • restriction fragments 1
  • pairtools 1
  • mosdepth 1
  • GATK UnifiedGenotyper 1
  • NextGenMap 1
  • mobile element insertions 1
  • somatic structural variations 1
  • cancer genome 1
  • contaminant 1
  • SNP table 1
  • Beautiful stand-alone HTML report 1
  • Neisseria gonorrhoeae 1
  • bioinformatics tools 1
  • mitochondrial to nuclear ratio 1
  • ratio 1
  • mtnucratio 1
  • scan 1
  • microsatellite instability 1
  • otu table 1
  • ngm 1
  • gender 1
  • ligation junctions 1
  • tumor/normal 1
  • upper-triangular matrix 1
  • flip 1
  • PCR/optical duplicates 1
  • block-compressed 1
  • HLA-I 1
  • ILP 1
  • hla-typing 1
  • graph viz 1
  • graph construction 1
  • graph formats 1
  • graph unchopping 1
  • graph stats 1
  • combine graphs 1
  • odgi 1
  • squeeze 1
  • graph drawing 1
  • braker 1

Contiguate draft genome assembly

0101

0 0

Screen assemblies for antimicrobial resistance against multiple databases

01databasedir

0 0

abricate:

Mass screening of contigs for antibiotic resistance genes

Screen assemblies for antimicrobial resistance against multiple databases

01

0 0

abricate:

Mass screening of contigs for antibiotic resistance genes

A NATA accredited tool for reporting the presence of antimicrobial resistance genes in bacterial genomes

01

0 0 0 0 0 0

abritamr:

A pipeline for running AMRfinderPlus and collating results into functional classes

Trim sequencing adapters and collapse overlapping reads

01adapterlist

0 0 0 0 0 0 0 0

Fixes prefixes from AdapterRemoval2 output to make sure no clashing read names are in the output. For use with DeDup.

01

0 0

ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study).

0123K

0 0 0

Read CEL files into an ExpressionSet and generate a matrix

01201

0 0 0 0

affy:

Methods for Affymetrix Oligonucleotide Arrays

Takes a bed12 file and converts to a GFF3 file

01

0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Converts a GFF/GTF file into a proper GTF file

01

0 0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Converts a GFF/GTF file into a TSV file

01

0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Fixes and standardizes GFF/GTF files and outputs a cleaned GFF/GTF file

01

0 0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Add intron features to gtf/gff file without intron features.

01config

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

This script extracts sequences in fasta format according to features described in a gff file.

01fastaconfig

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.

01config

0 0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.

01kill_listconfig

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script flags the short introns with the attribute . Is is usefull to avoid ERROR when submiting the data to EBI. (Typical EBI error message: ****ERROR: Intron usually expected to be at least 10 nt long. Please check the accuracy)

01config

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Filters GFF records to keep only the longest isoform per gene

01config

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

This script merge different gff annotation files in one. It uses the AGAT parser that takes care of duplicated names and fixes other oddities met in those files.

01config

0 0

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Provides different type of statistics in text format from a GFF/GTF annotation file

01

0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF files

Provides basic statistics in text format from a GFF/GTF annotation file

01

0 0

agat:

AGAT is a toolkit for manipulation and getting information from GFF/GTF annotation files

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants

01

0 0 0

ALE: assembly likelihood estimator.

012

0 0

Generates a count of coverage of alleles

012locifasta

0 0

A tool to parse and summarise results from antimicrobial peptides tools and present functional classification.

01faa_inputopt_amp_db

0 0 0 0 0 0 0 0 0 0 0 0

A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster.

summary_file

0 0 0 0

ampcombi2/cluster:

A tool for clustering all AMP hits found across many samples and supporting many AMP prediction tools.

A submodule that merges all output summary tables from ampcombi/parsetables in one summary file.

summaries

0 0 0

ampcombi2/complete:

This merges the per sample AMPcombi summaries generated by running 'ampcombi2/parsetables'.

A submodule that parses and standardizes the results from various antimicrobial peptide identification tools.

01faa_inputgbk_inputopt_amp_dbopt_amp_db_diropt_interproscan

0 0 0 0 0 0 0 0 0 0 0 0

ampcombi2/parsetables:

A parsing tool to convert and summarise the outputs from multiple AMP detection tools in a standardized format.

A fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs.

01modelmin_lengthmin_probability

0 0 0

AMPlify is an attentive deep learning model for antimicrobial peptide prediction.

01model_dir

0 0

amplify:

Attentive deep learning model for antimicrobial peptide prediction

Post-processing script of the MaltExtract component of the HOPS package

maltextract_resultstaxon_listfilter

0 0 0 0 0

Identify antimicrobial resistance in gene or protein sequences

01db

0 0 0 0 0

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Identify antimicrobial resistance in gene or protein sequences

NO input

0 0

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

A module to create antiberta2 embeddings of antibody (BCR) amino acid sequences using amulety.

01chain

0 0

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

A module to create antiberty embeddings of antibody (BCR) amino acid sequences using amulety.

01chain

0 0

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

A module to create BALM paired embeddings of antibody (BCR) amino acid sequences using amulety.

01chain

0 0

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

A module to create esm2 embeddings of antibody (BCR) amino acid sequences using amulety.

01chain

0 0

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

A module to translate BCR and TCR nucleotide sequences into amino acid sequences using amulety and igblast.

01reference_igblast

0 0

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

igblast:

A tool for immunoglobulin (IG, BCR) and T cell receptor (TCR) V domain sequences blasting.

A tool to estimate nuclear contamination in males based on heterozygosity in the female chromosome.

0101

0 0

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculates base frequency statistics across reference positions from BAM.

0123

0 0 0 0 0 0 0

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculated genotype likelihoods from BAM files.

010101

0 0

angsd:

ANGSD: Analysis of next generation Sequencing Data

Module to subset AnnData object to cells with matching barcodes from the csv file

012

0 0

anndata:

An annotated data matrix.

Get the size (n_cells or n_genes) of an anndata object stored as a h5ad file

01size_type

0 0

anndata:

An annotated data matrix.

Accelerating de novo SINE annotation in plant and animal genomes

01mode

0 0 0

Annotation and Ranking of Structural Variation

012301010101

0 0 0 0

annotsv:

Annotation and Ranking of Structural Variation

Install the AnnotSV annotations

NO input

0 0

annotsv:

Annotation and Ranking of Structural Variation

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

0123012

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

anota2seq:

Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

01databasesgff

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

0 0

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters.

01databasesantismash_dir

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

antismashlite:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. This module downloads the antiSMASH databases for conda and docker/singularity runs.

database_cssdatabase_detectiondatabase_modules

0 0 0

antismash:

antiSMASH - the antibiotics and Secondary Metabolite Analysis SHell

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

01

0 0 0 0 0 0

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Normalize antibiotic resistance genes (ARGs) using the ARO ontology (developed by CARD).

01tooldb

0 0

CLI Download utility

01

0 0

Download and prepare database for Ariba analysis

01

0 0

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Query input FASTQs against Ariba formatted databases

0101

0 0

ariba:

ARIBA: Antibiotic Resistance Identification By Assembly

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

010101blacklistknown_fusionscytobandsprotein_domains

0 0 0

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

genome

0 0 0 0 0

arriba:

Fast and accurate gene fusion detection from RNA-Seq data

Simulation tool to generate synthetic Illumina next-generation sequencing reads

01sequencing_systemfold_coverageread_length

0 0 0 0

art:

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles.

Aggregates fastq files with demultiplexed reads

01

0 0

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

0 0 0 0 0 0 0 0 0 0 0 0

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

copy number profiles of tumour cells.

01234allele_filesloci_filesbed_filefastagc_filert_file

0 0 0 0 0 0 0 0 0

Alignment by Simultaneous Harmonization of Layer/Adjacency Registration

01opt_dfpopt_ffp

0 0

Assembly summary statistics in JSON format

01

0 0

ataqv function of a corresponding ataqv tool

0123organismmito_nametss_fileexcl_regs_fileautosom_ref_file

0 0 0

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

mkarv function of a corresponding ataqv tool

jsons/*

0 0

ataqv:

ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing.

generate VCF file from a BAM file using various calling methods

01234fastafaiknown_allelesmethod

0 0

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Estimate the post-mortem damage patterns of DNA

0123fastafai

0 0 0 0 0

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Gives an estimation of the sequencing bias based on known invariant sites

01234allelesinvariant_sites

0 0

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

split single end read groups by length and merge paired end reads

01234

0 0 0

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Generate tables of feature metadata from GTF files

0101

0 0 0

atlasgeneannotationmanipulation:

Scripts for manipulating gene annotation

Use deamination patterns to estimate contamination in single-stranded libraries

010101

0 0

authentict:

Estimates present-day DNA contamination in ancient DNA single-stranded libraries.

Pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images.

0101

0 0 0

A bacteriophage lifestyle prediction tool

01

0 0 0

Annotation of bacterial genomes (isolates, MAGs) and plasmids

01dbproteinsprodigal_tf

0 0 0 0 0 0 0 0 0 0 0

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.

Downloads BAKTA database from Zenodo

NO input

0 0

bakta:

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data

012

0 0

bam2fastx:

Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files

removes unused references from header of sorted BAM/CRAM files.

01

0 0

This module is used to clip primer sequences from your alignments.

0123

0 0 0

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

0 0 0

write your description here

01

0 0

bamstats:

A command line tool to compute mapping statistics from a BAM file

Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis

01

0 0

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

0 0

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

0 0

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

0 0

bamtools:

C++ API & command-line toolkit for working with BAM data

clips overlapping read pairs. When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.

01

0 0 0

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

trims the end of reads in a SAM/BAM file, changing read ends to ‘N’ and quality to ‘!’, or by soft clipping

0123

0 0

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

Render an assembly graph in GFA 1.0 format to PNG and SVG image formats

01

0 0 0

bandage:

Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

0 0

Demultiplex Element Biosciences bases files

012

0 0 0 0 0 0 0 0

BaSiCPy is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm.

01

0 0

Align short or PacBio reads to a reference genome using BBMap

01ref

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Adapter and quality trimming of sequencing reads

01contaminants

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Merging overlapping paired reads into a single read.

01interleave

0 0 0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

BBNorm is designed to normalize coverage by down-sampling reads over high-depth areas of a genome, to result in a flat coverage distribution.

01

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Split sequencing reads by mapping them to multiple references simultaneously

01indexprimary_ref01only_build_index

0 0 0 0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates

01

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Filter out sequences by sequence header name(s)

01names_to_filteroutput_formatinterleaved_output

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Creates an index from a fasta file, ready to be used by bbmap.sh in mapping mode.

fasta

0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.

01

0 0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Re-pairs reads that became disordered or had some mates eliminated.

01interleave

0 0 0 0

repair:

Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated tools.

Compares query sketches to reference sketches hosted on a remote server via the Internet.

01

0 0

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Add or remove annotations.

01234header_linesrename_chrs

0 0 0 0

annotate:

Add or remove annotations.

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012regionstargetssamples

0 0 0 0

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Concatenate VCF files

012

0 0 0 0

concat:

Concatenate VCF files.

Compresses VCF files

01234

0 0

consensus:

Create consensus sequence by applying VCF variants to a reference fasta file.

Converts certain output formats to VCF

01201bed

0 0 0 0 0 0 0 0 0 0

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools Haplotype-aware consequence caller

01010101

0 0 0 0

reheader:

Haplotype-aware consequence caller

Filters VCF files

012

0 0 0 0

filter:

Apply fixed-threshold filters to VCF files.

Index VCF tools

01

0 0 0

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

Apply set operations to VCF files

012

0 0

isec:

Computes intersections, unions and complements of VCF files.

Merge VCF files

012010101

0 0 0

merge:

Merge VCF files.

Compresses VCF files

01201save_mpileup

0 0 0 0 0

mpileup:

Generates genotype likelihoods at each genomic position with coverage.

Normalize VCF file

01201

0 0 0 0

norm:

Normalize VCF files.

Compute and fill various INFO tags

012regionstargetssamples

0 0 0 0

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin fill-tags:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The fill-tags plugin compute and fill various INFO tags

Adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available.

012regionstargets

0 0 0 0

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin impute-info:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The impute-info plugin adds imputation information metrics to the INFO field based on selected FORMAT tags. Only the IMPUTE2 INFO metric from FORMAT/GP tags is currently available

Split VCF by chunks or regions, creating multiple VCFs.

012sites_per_chunkscatterscatter_fileregionstargets

0 0 0 0

pluginscatter:

Split VCF by chunks or regions, creating multiple VCFs.

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

012target_gtnew_gtregionstargets

0 0 0 0

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

Split VCF by sample, creating single- or multi-sample VCFs.

012samplesgroupsregionstargets

0 0 0 0

pluginsplit:

Split VCF by sample, creating single- or multi-sample VCFs.

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

012regionstargets

0 0 0 0

view:

Converts between similar tags, such as GL,PL,GP or QR,QA,QS or localized alleles, eg LPL,LAD.

Extracts fields from VCF or BCF files and outputs them in user-defined format.

012regionstargetssamples

0 0

query:

Extracts fields from VCF or BCF files and outputs them in user-defined format.

Reheader a VCF file

012301

0 0 0

reheader:

Modify header of VCF/BCF files, change sample names.

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

01201genetic_mapregions_filesamples_filetargets_file

0 0

roh:

A program for detecting runs of homo/autozygosity. Only bi-allelic sites are considered.

Sorts VCF files

01

0 0 0 0

sort:

Sort VCF files by coordinates.

Split a vcf file into files per chromosome

012

0 0

bcftools:

Sort VCF files by coordinates.

Generates stats from VCF files

0120101010101

0 0

stats:

Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012regionstargetssamples

0 0 0 0

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Demultiplex Illumina BCL files

012

0 0 0 0 0 0 0 0

Demultiplex Illumina BCL files

012

0 0 0 0 0 0 0 0

Beagle v5.2 is a software package for phasing genotypes and for imputing ungenotyped markers.

01refpanelgenmapexclsamplesexclmarkers

0 0 0

beagle5:

Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.

Convert a BED file to a VCF file according to a YAML config

01201

0 0

Convert BAM/GFF/GTF/GVF/PSL files to bed

01

0 0

bedops:

High-performance genomic feature operations.

Convert gtf format to bed format

01

0 0

gtf2bed:

The gtf2bed script converts 1-based, closed [start, end] Gene Transfer Format v2.2 (GTF2.2) to sorted, 0-based, half-open [start-1, end) extended BED-formatted data.

Converts a bam file to a bed12 file.

01

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

For each feature in A, finds the closest feature (upstream or downstream) in B.

012fasta_fai

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Returns all intervals in a genome that are not covered by at least one interval in the input BED/GFF/VCF file.

01sizes

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

computes both the depth and breadth of coverage of features in file B on the features in file A

012genome_file

0 0

bedtools:

A powerful toolset for genome arithmetic

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

012sizesextensionsort

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

extract sequences in a FASTA file based on intervals defined in a feature file.

01fasta

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Groups features in a BED file by given column(s) and computes summary statistics for each group to another column.

01summary_col

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

01201

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Calculate Jaccard statistic b/w two feature files.

01201

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Makes adjacent or sliding windows across a genome or BED file.

01

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Allows one to screen for overlaps between two sets of genomic features.

01201

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

masks sequences in a FASTA file based on intervals defined in a feature file.

01fasta

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.

01

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Identifies common intervals among multiple (and subsets thereof) sorted BED/GFF/VCF files.

01chrom_sizes

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Profiles the nucleotide content of intervals in a fasta file.

012

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Shifts each feature by specific number of bases

0101

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

bedtools shuffle will randomly permute the genomic locations of a feature file among a genome defined in a genome file

0101exclude_fileinclude_file

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Adds a specified number of bases in each direction (unique values may be specified for either -l or -r)

01sizes

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Sorts a feature file by chromosome and other criteria.

01genome_file

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Split BED files into several smaller BED files

012

0 0

bedtools:

A powerful toolset for genome arithmetic

Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.

012

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Combines multiple BedGraph files into a single file

0101

0 0

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats.

01

0 0

Locate and tag duplicate reads in a BAM file

01

0 0 0

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Merge a list of sorted bam files

01

0 0 0 0

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Parallel sorting and duplicate marking

0101

0 0 0 0 0

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.

01

0 0

Use k-mers to rapidly subtype S. enterica genomes

01scheme_metadata

0 0 0 0

Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.

010101

0 0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

010101

0 0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Summarize and/or filter reads based on bisulfite conversion rate

01010101

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.

0101010101

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Indexes a reference genome for use with Biscuit

01

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Merges methylation information for opposite-strand C's in a CpG context

010101

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

012340101

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Perform basic quality control on a BAM file generated with Biscuit

010101

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes methylation or SNV information from a Biscuit VCF in a standard-compliant BED file.

01

0 0

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Performs alignment of BS-Seq reads using bismark

010101

0 0 0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Relates methylation calls back to genomic cytosine contexts.

010101

0 0 0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Removes alignments to the same position in the genome from the Bismark mapping output.

01

0 0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Converts a specified reference genome into two different bisulfite converted versions and indexes them for alignments.

01

0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Extracts methylation information for individual cytosines from alignments.

0101

0 0 0 0 0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Collects bismark alignment reports

01234

0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Uses Bismark report files of several samples in a run folder to generate a graphical summary HTML report.

bamalign_reportdedup_reportsplitting_reportmbias

0 0

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Retrieve entries from a BLAST database

01201

0 0 0

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

0 0

blast:

BLAST finds regions of similarity between biological sequences.

BLASTP (Basic Local Alignment Search Tool- Protein) compares an amino acid (protein) query sequence against a protein database

0101out_ext

0 0 0 0

blast:

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit.

Builds a BLAST database

01

0 0

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

0 0

blast:

Protein to Translated Nucleotide BLAST.

Downloads a BLAST database from NCBI

01

0 0

blast:

BLAST finds regions of similarity between biological sequences.

Queries a sequence subject

0101

0 0

Align reads to a reference genome using bowtie

0101save_unaligned

0 0 0 0

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create bowtie index for reference genome

01

0 0

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

010101save_unalignedsort_bam

0 0 0 0 0 0 0 0

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

01

0 0

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

01database

0 0 0

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

0 0 0

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

0 0

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Gene prediction in novel genomes using RNA-seq and protein homology information

0100000

gtf cds aa log hintsfile gff3 citations versions

Benchmarking Universal Single Copy Orthologs

metafastamodelineagebusco_lineages_pathconfig_file

meta batch_summary short_summaries_txt short_summaries_json busco_dir full_table missing_busco_list single_copy_proteins seq_dir translated_proteins versions

Benchmarking Universal Single Copy Orthologs

01modelineagebusco_lineages_pathconfig_fileclean_intermediates

0 0 0 0 0 0 0 0 0 0 0 0 0 0

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Download database for BUSCO

lineage

0 0

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

BUSCO plot generation tool

short_summary_txt

0 0

busco:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Construct species phylogenies using BUSCO proteins

01

0 0 0

busco:

Construct species phylogenies using BUSCO proteins

Find SA coordinates of the input reads for bwa short-read mapping

0101

0 0

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA index for reference genome

01

0 0

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

010101sort_bam

0 0 0 0 0

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

01201

0 0

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

01201

0 0

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-mem2 index for reference genome

01

0 0

bwamem2:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

010101sort_bam

0 0 0 0 0 0

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-MEME index for reference genome

01

0 0

bwameme:

Faster BWA-MEM2 using learned-index

Performs fastq alignment to a fasta reference using BWA-MEME

010101sort_bammbuffersamtools_threads

0 0 0 0 0 0

bwameme:

Faster BWA-MEM2 using learned-index

Performs alignment of BS-Seq reads using bwameth

010101

0 0

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Performs indexing of c2t converted reference genome

01

0 0

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

010101

0 0

Analysis of gene family evolution

01tree

0 0 0 0 0 0

Hierarchical Hi-C compartment computation

01resolution

0 0 0

Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

01modegenomesize

0 0 0 0 0 0 0 0 0

A module for concatenation of gzipped or uncompressed files

01

0 0

cat:

Just concatenation

Concatenates fastq files

01

0 0

cat:

The cat utility reads files sequentially, writing them to the standard output.

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101

0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101010101bin_suffix

0 0 0 0 0 0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101

0 0 0 0 0 0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Downloads the required files for either Nr or GTDB for building into a CAT database

01

0 0 0 0 0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Creates a CAT_pack database based on input FASTAs

01namesnodesacc2tax

0 0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101mode01010101010101

0 0 0 0 0 0 0 0 0 0 0 0 0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Summarises results from CAT/BAT/RAT classification steps

0101

0 0

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Cluster protein sequences using sequence similarity

01

0 0 0

cdhit:

Clusters and compares protein or nucleotide sequences

Cluster nucleotide sequences using sequence similarity

01

0 0 0

cdhit:

Clusters and compares protein or nucleotide sequences

Unsupervised machine learning for cell type identification in multiplexed imaging using protein expression and cell neighborhood information without ground truth

01signaturehigh_thresholdslow_thresholds

0 0 0

Module to use CellBender to remove ambient RNA from single-cell RNA-seq data

0123

0 0

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data

01

0 0 0 0 0 0 0 0 0 0

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

cellpose segments cells in images

01model

0 0 0

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Gene Expression.

01reference

0 0

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to create FASTQs needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkfastq command.

012

0 0 0 0 0 0

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.

gtf

0 0

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkref command.

fastagtfreference_name

0 0

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.

fastagtfseqsreference_name

0 0

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's pipelines to analyze sequencing data produced from various Chromium technologies, including Single Cell Gene Expression, Single Cell Immune Profiling, Feature Barcoding, and Cell Multiplexing.

meta010101010101gex_referencegex_frna_probesetgex_targetpanelvdj_referencevdj_primer_indexfb_referencebeam_antigen_panelbeam_control_panelcmo_referencecmo_barcodescmo_barcode_assignmentfrna_sampleinfoskip_renaming

0 0 0

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.

01reference

0 0

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's ARC pipelines analyze sequencing data produced from Chromium Single Cell ARC. Uses the cellranger-arc count command.

0123reference

0 0 0

cellrangerarc:

Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell ARC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkfastq command.

01csv

0 0

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.

gtf

0 0

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkref command.

fastagtfmotifsreference_configreference_name

0 0 0

cellrangerarc:

Cell Ranger Arc is a set of analysis pipelines that process Chromium Single Cell Arc data.

Module to use Cell Ranger's ATAC pipelines analyze sequencing data produced from Chromium Single Cell ATAC.

01reference

0 0

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Module to create fastqs needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkfastq command.

bclcsv

0 0

cellranger-atac:

Cell Ranger ATAC by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build the reference needed by the 10x Genomics Cell Ranger ATAC tool. Uses the cellranger-atac mkref command.

fastagtfmotifsreference_configreference_name

0 0

cellranger-atac:

Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.

Cellsnp-lite is a C/C++ tool for efficient genotyping bi-allelic SNPs on single cells. You can use the mode A of cellsnp-lite after read alignment to obtain the snp x cell pileup UMI or read count matrices for each alleles of given or detected SNPs for droplet based single cell data.

01234

0 0 0 0 0 0 0

cellsnp:

Efficient genotyping bi-allelic SNPs on single cells

Build centrifuge database for taxonomic profiling

01conversion_tabletaxonomy_treename_tablesize_table

0 0

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

01dbsave_unalignedsave_aligned

0 0 0 0 0 0

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

01db

0 0

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

01fasta_extdb

0 0 0 0

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

0123exclude_marker_file

0 0 0

checkm:

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

CheckM2 database download

db_zenodo_id

0 0

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

CheckM2 bin quality prediction

0101

0 0 0

checkm2:

CheckM2 - Rapid assessment of genome bin quality using machine learning

A simple program to parse Illumina NGS data and check it for quality criteria

01checkqc_config

0 0

Construct the database necessary for checkv's quality assessment

NO input

0 0

checkv:

Assess the quality of metagenome-assembled viral genomes.

Assess the quality of metagenome-assembled viral genomes.

01db

0 0 0 0 0 0 0

checkv:

Assess the quality of metagenome-assembled viral genomes.

Construct the database necessary for checkv's quality assessment

01db

0 0

checkv:

Assess the quality of metagenome-assembled viral genomes.

Determine the allelic profiles of a genome using a pre-defined schema

0101

0 0 0 0 0 0 0 0 0 0

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Create a schema to determine the allelic profiles of a genome

01prodigal_tfcds

0 0 0 0

chewbbaca:

A complete suite for gene-by-gene schema creation and strain identification.

Filter and trim long read data.

01fasta

0 0

zcat:

zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

010101barcodeswhitelistchr_orderpairs_chr_order

0 0 0 0 0

chromap:

Fast alignment and preprocessing of chromatin profiles

Indexes a fasta reference genome ready for chromatin profiling.

01

0 0

chromap:

Fast alignment and preprocessing of chromatin profiles

Chromograph is a python package to create PNG images from genetics data such as BED and WIG files.

01010101010101

0 0

Annotate circRNAs detected in the output from CIRCexplorer2 parse

01fastagene_annotation

0 0

circexplorer2:

Circular RNA analysis toolkits

CIRCexplorer2 parses fusion junction files from multiple aligners to prepare them for CIRCexplorer2 annotate.

01

0 0

circexplorer2:

Circular RNA analysis toolkit

A method to improve mappings on circular genomes, using the BWA mapper.

010101

0 0 0

circulargenerator:

Creating a modified reference genome, with an elongation of the an specified amount of bases

Realign reads mapped with BWA to elongated reference genome

01010101

0 0

circularmapper:

A method to improve mappings on circular genomes such as Mitochondria.

Clair3 is a germline small variant caller for long-reads

0123450101

0 0 0 0 0

binning of metagenomic sequences

01

0 0 0 0 0 0 0

ClipKIT is a fast and flexible alignment trimming tool that keeps phylogenetically informative sites and removes those that display characteristics poor phylogenetic signal.

01out_format

0 0 0

Runs the Clippy CLIP peak caller

01gtffai

0 0 0 0

Predict recomination events in bacterial genomes

012

0 0 0 0 0 0 0

Align sequences using Clustal Omega

0101hmm_inhmm_batchprofile1profile2compress

0 0

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

Renders a guidetree in clustalo

01

0 0

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

Calculates polymorphic site rates over protein coding genes

01234

0 0

cmseq:

Set of utilities on sequences and BAM files

Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, output as a BED file.

0101

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Derive off-target (“antitarget”) bins from target regions.

01

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

01201010101panel_of_normals

0 0 0 0 0 0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number

012

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.

01

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

012

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Compile a coverage reference from the given files (normal samples).

fastatargetsantitargets

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Transform bait intervals into targets more suitable for CNVkit.

0101

0 0

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

CNVnator is a command line tool for CNV/CNA analysis from depth-of-coverage by mapped reads.

012010101

0 0 0

cnvnator:

Tool for calling copy number variations.

convert2vcf.pl is command line tool to convert CNVnator calls to vcf format.

01

0 0

cnvnator:

Tool for calling copy number variations.

Command line tool for calling CNVs in whole genome sequencing data

01bin_sizes

0 0

cnvpytor:

calling CNVs using read depth

calculates read depth histograms

01bin_sizes

0 0

cnvpytor:

calling CNVs using read depth

command line tool for CNV/CNA analysis. This step imports the read depth data into a root pytor file.

012fastafai

0 0

cnvpytor -rd:

calling CNVs using read depth

Calculate segmentation for specified bin size (multiple bin sizes separate by space)

01bin_sizes

0 0

cnvpytor:

Calling CNVs using read depth

view function to generate vcfs

01bin_sizesoutput_format

0 0 0 0

cnvpytor:

calling CNVs using read depth

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.

01010101assemblerminkmaxk

0 0 0 0 0 0 0 0 0

cobra-meta:

COBRA is a tool to get higher quality viral genomes assembled from metagenomes.

Builds a classic bloom filter COBS index

01

0 0

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Builds a compact bloom filter COBS index

01

0 0

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples

012

0 0 0 0 0 0 0

concoct:

Clustering cONtigs with COverage and ComposiTion

Generate the input coverage table for CONCOCT using a BEDFile

0123

0 0

concoct:

Clustering cONtigs with COverage and ComposiTion

Cut up fasta file in non-overlapping or overlapping parts of equal length.

01bed

0 0 0

concoct:

Clustering cONtigs with COverage and ComposiTion

Creates a FASTA file for each new cluster assigned by CONCOCT

012

0 0

concoct:

Clustering cONtigs with COverage and ComposiTion

Merge consecutive parts of the original contigs original cut up by cut_up_fasta.py

01

0 0

concoct:

Clustering cONtigs with COverage and ComposiTion

Calculate confidence scores from Kraken2 output

01kraken_taxon_db

0 0

Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC

012

0 0

controlfreec/assesssignificance:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Copy number and genotype annotation from whole genome and whole exome sequencing data

0123456fastafaisnp_positionknown_snpsknown_snps_tbichr_directorymappabilitytarget_bedgccontent_profile

0 0 0 0 0 0 0 0 0 0

controlfreec/freec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

01

0 0

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Format Freec output to circos input format

01

0 0

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

0123

0 0 0 0

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Plot Freec output

012

0 0 0 0

controlfreec:

Copy number and genotype annotation from whole genome and whole exome sequencing data.

Run matrix balancing on a cool file

012

0 0

cooler:

Sparse binary format for genomic interaction matrices

Create a cooler from genomic pairs and bins

0123chromsizes

0 0

cooler:

Sparse binary format for genomic interaction matrices

Generate fragment-delimited genomic bins

fastachromsizesenzyme

0 0

cooler:

Sparse binary format for genomic interaction matrices

Dump a cooler’s data to a text stream.

012

0 0

cooler:

Sparse binary format for genomic interaction matrices

Generate fixed-width genomic bins

012

0 0

cooler:

Sparse binary format for genomic interaction matrices

Merge multiple coolers with identical axes

01

0 0

cooler:

Sparse binary format for genomic interaction matrices

Generate a multi-resolution cooler file by coarsening

01

0 0

cooler:

Sparse binary format for genomic interaction matrices

Calculate the diamond insulation scores and call insulating boundaries

01

0 0 0

cooltools:

Analysis tools for genomic interaction data stored in .cool format

Calculates peak-to-through ratio (PTR) from metagenomic sequence data

01

0 0

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Computes the coverage map along the reference genome

01

0 0

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Indexes a directory of fasta files for use with CoPTR

01

0 0

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Maps the reads to the reference database

0101

0 0

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Merge reads that were mapped to multiple indices

01

0 0

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Great....yet another TMA dearray program. What does this one do? Coreograph uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types.

01

0 0 0 0 0

Map reads to contigs and estimate coverage

0101bam_inputinterleaved

0 0

coverm:

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications

Print any text in a cow or other characters

01

0 0

In-house generated or curated data can be imported into CRABS.

01010101import_format

0 0

crabs:

Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses

CRABS extracts the amplicon region of the primer set by conducting an in silico PCR.

01

0 0

crabs:

Crabs (Creating Reference databases for Amplicon-Based Sequencing) is a program to download and curate reference databases for eDNA metabarcoding analyses

Compress files with crabz

01

0 0

crabz:

Like pigz, but rust

Decompress files with crabz

01

0 0

crabz:

Like pigz, but rust

remove false positives of functional crispr genomics due to CNVs

012min_readsmin_targeted_genes

0 0

crisprcleanr:

Analysis of CRISPR functional genomics, remove false positive due to CNVs.

Controllable lossy compression of BAM/CRAM files

01keepbedbedout

0 0 0 0 0

Concatenate two or more CSV (or TSV) tables into a single table

01in_formatout_format

0 0

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Join two or more CSV (or TSV) tables by selected fields into a single table

01

0 0

csvtk:

A cross-platform, efficient, practical CSV/TSV toolkit

Splits CSV/TSV into multiple files according to column values

01in_formatout_format

0 0

csvtk:

CSVTK is a cross-platform, efficient and practical CSV/TSV toolkit that allows rapid data investigation and manipulation.

Annotate a VEP annotated VCF with the most severe consequence field

0101

0 0

custom:

Custom module to annotate a VEP annotated VCF with the most severe consequence field

Annotate a VEP annotated VCF with the most severe pLi field

01

0 0

custom:

Custom module to annotate a VEP annotated VCF with the most severe pLi field

Custom module to Add a new fasta file to an old one and update an associated GTF

01201biotype

0 0 0

custom:

Custom module to Add a new fasta file to an old one and update an associated GTF

Custom module used to dump software versions within the nf-core pipeline template

versions

0 0 0

custom:

Custom module used to dump software versions within the nf-core pipeline template

Filters a differential expression table based on logFC and adjusted p-value thresholds

01012012

0 0

pandas:

Python library for data manipulation and analysis

Generates a FASTA file of chromosome sizes and a fasta index file

01

0 0 0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

0101

0 0

gtffilter:

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

filter a matrix based on a minimum value and numbers of samples that must pass.

0101

0 0 0 0

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

Test for the presence of suitable NCBI settings or create them on the fly.

ids

0 0

sratools:

SRA Toolkit and SDK from NCBI

Make a GSEA class file (.chip) from tabular inputs

0101

0 0

custom:

Make a GSEA annotation file (.chip) from tabular inputs

Make a GSEA class file (.cls) from tabular inputs

01

0 0

custom:

Make a GSEA class file (.cls) from tabular inputs

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

01

0 0

tabulartogseagct:

Convert a TSV or CSV with features by row and observations by column to a GCT format file as consumed by GSEA

Make a transcript/gene mapping from a GTF and cross-reference with transcript quantifications.

0101quant_typeidextra

0 0

custom:

"Custom module to create a transcript to gene mapping from a GTF and check it against transcript quantifications"

Perform adapter/quality trimming on sequencing reads

01

0 0 0

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

structural-variant calling with cutesv

01201

0 0

A Java based tool to determine damage patterns on ancient DNA as a replacement for mapDamage

01fastafaispecieslist

0 0

DAS Tool binning step.

0123db_directory

0 0 0 0 0 0 0 0 0 0 0 0 0

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

01extension

0 0

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format

01extension

0 0

dastool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

Datavzrd is a tool to create visual HTML reports from collections of CSV/TSV tables.

meta

0 0

Create deacon index for reference genome

01

0 0

deacon:

Fast alignment-free sequence filter

decoupler is a package containing different statistical methods to extract biological activities from omics data within a unified framework. It allows to flexibly test any enrichment method with any prior knowledge resource and incorporates methods that take into account the sign and weight. It can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

01netgtf

0 0 0

DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).

01

0 0 0 0 0

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

NO input

0 0

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

012db

0 0 0 0 0

deeparg:

A deep learning based approach to predict Antibiotic Resistance Genes (ARGs) from metagenomes

Database download module for DeepBGC which detects BGCs in bacterial and fungal genomes using deep learning.

NO input

0 0

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

DeepBGC detects BGCs in bacterial and fungal genomes using deep learning.

01db

0 0 0 0 0 0 0 0 0 0 0 0

deepbgc:

DeepBGC - Biosynthetic Gene Cluster detection and classification

Deepcell/mesmer segmentation for whole-cell

0101

0 0

mesmer:

Deep cell is a collection of tools to segment imaging data

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

0 0 0 0 0

A Deep Learning Model for Transmembrane Topology Prediction and Classification

01

0 0 0 0 0 0

This tool filters alignments in a BAM/CRAM file according the the specified parameters.

012

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.

012fastafasta_fai

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

calculates scores per genome regions for other deeptools plotting utilities

01bed

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Computes read coverage for genomic regions (bins) across the entire genome.

0123

0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

01methodplot_type

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots cumulative reads coverages by BAM file

012

0 0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a heatmap

01

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.

01

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots values produced by deeptools_computematrix as a profile plot

01

0 0 0

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

(DEPRECATED - see main.nf) DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

0 0 0 0 0

Call variants from the examples produced by make_examples

01

0 0

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Transforms the input alignments to a format suitable for the deep neural network variant caller

012301010101

0 0 0 0

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01234010101

0 0 0 0 0

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

012301010101

0 0 0 0 0

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

01

0 0

deepvariant:

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

Call structural variants

0123450101

0 0 0

delly:

Structural variant discovery by integrated paired-end and split-read analysis

Demultiplexing cell nucleus hashing data, using the estimated antibody background probability.

012output_namegenerate_gender_plotgenomegenerate_diagnostic_plots

0 0 0

runs a differential expression analysis with DESeq2

01230120101

0 0 0 0 0 0 0 0 0 0

deseq2:

Differential gene expression analysis based on the negative binomial distribution

Queries a DIAMOND database using blastp mode

0101outfmtblast_columns

0 0 0 0 0 0 0 0

diamond:

Accelerated BLAST compatible local sequence aligner

Queries a DIAMOND database using blastx mode

0101out_extblast_columns

0 0 0 0 0 0 0 0 0

diamond:

Accelerated BLAST compatible local sequence aligner

calculate clusters of highly similar sequences

01

0 0

diamond:

Accelerated BLAST compatible local sequence aligner

Builds a DIAMOND database

01taxonmaptaxonnodestaxonnames

0 0

diamond:

Accelerated BLAST compatible local sequence aligner

Doublet detection in single-cell RNA-seq data

01

0 0 0

Performs fastq alignment to a reference using DRAGMAP

010101sort_bam

0 0 0 0 0 0 0

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

01

0 0

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Assemble bacterial isolate genomes from Nanopore reads

012

0 0 0 0 0 0

Performs rapid genome comparisons for a group of genomes and visualize their relatedness

01

0 0

drep:

De-replication of microbial genomes assembled from multiple samples

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

0 0

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped BED format

01

0 0

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

01

0 0

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped BED format

01

0 0

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Split features in gzipped GFF3 format

01

0 0

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Calculates secondary structure assignments from PDB files using mkdssp (DSSP). DSSP is a standard tool for assigning secondary structure to amino acids in protein structures.

01format

0 0

dssp:

Calculates secondary structure information from PDB files.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

012345fastafasta_fai

0 0

Assessment of duplication rates in RNA-Seq datasets

0101

0 0 0 0 0 0 0 0

Dysgu calls structural variants (SVs) from mapped sequencing reads. It is designed for accurate and efficient detection of structural variations.

012012

0 0 0

Perform phasing of genotyped data with or without a reference panel

012345

0 0

In silico prediction of E. coli serotype

01

0 0 0 0

Fast genome-wide functional annotation through orthology assignment.

01eggnog_dbeggnog_data_dir01

0 0 0 0

Convert any PEP project or Nextflow samplesheet to any format

samplesheetformatpep_input_base_dir

0 0

eido:

Convert any PEP project or Nextflow samplesheet to any format

Validate samplesheet or PEP config against a schema

samplesheetschemapep_input_base_dir

0 0

validate:

Validate samplesheet or PEP config against a schema.

Provide the SNP coverage of each individual in an eigenstrat formatted dataset.

0123

0 0 0

eigenstratdatabasetools:

A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.

Perform eigen value decomposition on a cooler matrix to calculate compartment signal by finding the eigenvector that correlates best with the phasing track

010

result bigwig versions

cooltools:

Analysis tools for genomic interaction data stored in .cool format

tool for detection and quantification of large mtDNA rearrangements.

012ref_gb

0 0 0 0

Convert a file in FASTA format to the ELFASTA format

01

0 0 0

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

0123456010101run_haplotypecallerrun_bqsrbqsr_tables_onlyget_activity_profileget_assembly_regions

0 0 0 0 0 0 0 0 0

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

01

0 0

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

01

0 0

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment.

01

0 0

emboss:

The European Molecular Biology Open Software Suite

the revseq program from emboss reverse complements a nucleotide sequence

01

0 0

emboss:

The European Molecular Biology Open Software Suite

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

01out_ext

0 0

emboss:

The European Molecular Biology Open Software Suite

EMM typing of Streptococcus pyogenes assemblies

01

0 0

A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.

01db

0 0 0 0 0 0

emu:

Emu is a relative abundance estimator for 16s genomic data.

endorS.py calculates endogenous DNA from samtools flagstat files and print to screen

0123

0 0

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

0123

0 0

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

01feature_file

0 0

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

012genomespeciescache_versioncache01extra_files

0 0 0 0 0 0

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Searches a term in a public NCBI database

01database

0 0

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using Unique Identifier(s)

012database

0 0

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

Queries an NCBI database using an UID

01patternelementsep

0 0

entrezdirect:

Entrez Direct (EDirect) is a method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.

phylogenetic placement of query sequences in a reference tree

0123bfastfilebinaryfile

0 0 0 0

epang:

Massively parallel phylogenetic placement of genetic sequences

splits an alignment into reference and query parts

012

0 0 0

epang:

Massively parallel phylogenetic placement of genetic sequences

estimation of the unfolded site frequency spectrum

0123

0 0 0

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

01

0 0 0

evigene:

EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.

Estimate repeat sizes using NGS data

012010101

0 0 0 0

Merge STR profiles into a multi-sample STR profile

010101

0 0

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Compute genome-wide STR profile

0120101

0 0 0 0

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Run falco on sequenced reads

01

0 0 0

fastqc:

falco is a drop-in C++ implementation of FastQC to assess the quality of sequence reads.

A fasta linter/validator

01

0 0 0

Aligns sequences using FAMSA

0101compress

0 0

famsa:

Algorithm for large-scale multiple sequence alignments

Renders a guidetree in famsa

01

0 0

famsa:

Algorithm for large-scale multiple sequence alignments

Perform adapter and quality trimming on sequencing reads with reporting

01

0 0 0 0 0 0 0 0

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

01hmm_model

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A program that counts sequence occurrences in FASTQ files.

0101

0 0 0 0 0 0

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

Alignment-free computation of average nucleotide Identity (ANI)

01reference

0 0

"Python C-extension for a simple validator for fasta files. The module emits the validated file or an error log upon validation failure."

01

0 0 0

fasta_validate:

"Python C-extension for a simple C code to validate a fasta file. It only checks a few things, and by default only sets its response via the return code, so you will need to check that!"

Quickly compute statistics over a fasta file in windows.

01

0 0 0 0 0 0

A fast K-mer counter for high-fidelity shotgun datasets

01

0 0 0 0

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A fast K-mer counter for high-fidelity shotgun datasets

01

0 0

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

A tool to merge FastK histograms

0123

0 0 0 0

fastk:

A fast K-mer counter for high-fidelity shotgun datasets

Distance-based phylogeny with FastME

012

0 0 0 0 0

Perform adapter/quality trimming on sequencing reads

01adapter_fastadiscard_trimmed_passsave_trimmed_failsave_merged

0 0 0 0 0 0 0

Run FastQC on sequenced reads

01

0 0 0

fastqe is a bioinformatics command line tool that uses emojis to represent and analyze genomic data.

01

0 0

FASTQ summary statistics in JSON format

01

0 0

Build fastq screen config file from bowtie index files

genome_namesindexes

0 0

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Align reads to multiple reference genomes using fastq-screen

01database

0 0 0 0 0

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Performs quality control of FASTQ files

01

0 0

fastqutils:

Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.

Produces a Newick format phylogeny from a multiple sequence alignment. Capable of bacterial genome size alignments.

alignment

0 0

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

01

0 0

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Run NCBI's FCS adaptor on assembled genomes

01

0 0 0 0 0 0

fcs:

The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly.

Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid.

01gxdb

0 0 0

fcs:

"The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign organisms in genome assemblies to prepare your data for submission. Therefore, the submission process to NCBI is faster and fewer contaminated genomes are submitted. This reduces errors in analyses and conclusions, not just for the original data submitter but for all subsequent users of the assembly."

Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to remove foreign contamination from genome assemblies

012

0 0 0

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Fetches the NCBI FCS-GX database using a provided manifest URL

manifest

0 0

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

Runs FCS-GX (Foreign Contamination Screen - Genome eXtractor) to screen and remove foreign contamination from genome assemblies

012gxdbramdisk_path

0 0 0 0 0

fcsgx:

The NCBI Foreign Contamination Screen. Genomic cross-species aligner, for contamination detection.

A command line tool that makes it easier to find sequencing data from the SRA / GEO / ENA.

ids

0 0

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

01min_readsmin_baseq

0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

01min_readsmin_baseq

0 0

fgbio:

Tools for working with genomic and high throughput sequencing data.

Collects a suite of metrics to QC duplex sequencing data.

01interval_list

0 0 0 0 0 0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

r-ggplot2:

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

Copies the UMI at the end of a bam files read name to the RX tag.

012

0 0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

01

0 0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

0101min_readsmin_baseqmax_base_error_rate

0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

01strategy

0 0 0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.

01

0 0

fgbio:

Tools for working with genomic and high throughput sequencing data.

FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM

01010101

0 0

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Filtlong filters long reads based on quality measures or short read data.

012

0 0 0

A module for concatenation of gzipped or uncompressed files getting around UNIX terminal argument size

01

0 0

find:

GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression

pigz:

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

A module for decompressing a large number of gzipped files, getting around the UNIX terminal argument limit

01

0 0

find:

GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression

pigz:

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

Perform merging of mate paired-end sequencing reads

01

0 0 0 0

De novo assembler for single molecule sequencing reads

01mode

0 0 0 0 0 0 0

Efficient compression tool for protein structures

01

0 0

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Decompression tool for foldcomp compressed structures

01

0 0

foldcomp:

Foldcomp: a library and format for compressing and indexing large protein structure sets

Creates a database for Foldmason.

01

0 0

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Aligns protein structures using foldmason

0101compress

0 0 0

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Renders a visualization report using foldmason

01010101

0 0

foldmason:

Multiple Protein Structure Alignment at Scale with FoldMason

Create a database from protein structures

01

0 0

foldseek:

Foldseek: fast and accurate protein structure search

Search for protein structural hits against a foldseek database of protein structures

0101

0 0

foldseek:

Foldseek: fast and accurate protein structure search

Generate processing masks for a give datacube definition and area of interest. These files can be used to spatially restrict downstream analysis tasks.

aoimask/datacube-definition.prjshapefile_dbfshapefile_prjshapefile_shx

0 0

force:

A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.

Compute valid tiles for a given datacube definition and area of interest. This list can be used by downstream analysis tasks to limit processing to the area of interest when satellite data covers a larger region.

aoidatacube_definitionshapefile_dbfshapefile_prjshapefile_shx

0 0

force:

A all-in-one tool for processing satellite data. Specialized on medium resolution data such as Landsat or Sentinel imagery.

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

meta

0 0

fq:

fq is a library to generate and validate FASTQ file pairs.

fq lint is a FASTQ file pair validator.

01

0 0

fq:

fq is a library to generate and validate FASTQ file pairs.

fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args.

01

0 0

fq:

fq is a library to generate and validate FASTQ file pairs.

Demultiplex fastq files

0123

0 0 0 0

Bootstrap sample demixing by resampling each site based on a multinomial distribution of read depth across all sites, where the event probabilities were determined by the fraction of the total sample reads found at each site, followed by a secondary resampling at each site according to a multinomial distribution (that is, binomial when there was only one SNV at a site), where event probabilities were determined by the frequencies of each base at the site, and the number of trials is given by the sequencing depth.

012repeatsbarcodeslineages_meta

0 0 0

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

specify the relative abundance of each known haplotype

012barcodeslineages_meta

0 0

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

downloads new versions of the curated SARS-CoV-2 lineage file and barcodes

db_name

0 0 0 0

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

call variant and sequencing depth information of the variant

01fasta

0 0

freyja:

Freyja recovers relative lineage abundances from mixed SARS-CoV-2 samples and provides functionality to analyze lineage dynamics.

Build references for fusioncatcher

meta

0 0

fusioncatcher:

Build genome for fusioncatcher

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data

0101

0 0 0 0

fusioncatcher:

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data

fusionreport_detect

0123010

fusion_list fusion_list_filtered report html csv json versions

fusionreport:

Tool for parsing outputs from fusion detection tools

Build DB for fusionreport

NO input

0 0

fusionreport:

Generate an interactive summary report from fusion detection tools.

Cluster genome FASTA files by average nucleotide identity

0123

0 0 0

Gene Allele Mutation Microbial Assessment

01db

0 0 0 0 0

gamma:

Tool for Gene Allele Mutation Microbial Assessment

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

0123fastafasta_fai

0 0 0 0

Build ganon database using custom reference sequences.

01input_tsvtaxonomy_filesgenome_size_files

0 0 0

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

01db

0 0 0 0 0 0 0

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

01db

0 0

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

01

0 0

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

assigns taxonomy to query sequences in phylogenetic placement output

012

0 0 0 0 0 0 0

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Grafts query sequences from phylogenetic placement on the reference tree

01

0 0

gappa:

Genesis Applications for Phylogenetic Placement Analysis

colours a phylogeny with placement densities

01

0 0 0 0 0 0 0

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Performs local realignment around indels to correct for mapping errors

012301010101

0 0

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

0 0

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

0 0

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Assigns all the reads in a file to a single new read-group

010101

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Annotates intervals with GC content, mappability, and segmental-duplication content

0101010101010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234fastafaidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345fastafaidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the allele-specific read counts for allele-specific expression analysis of RNAseq data

01234010101intervals

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

01230101010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

metainputinput_indexintervalsfastafaidictknown_sitesknown_sites_tbi

meta versions table

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an interval list from a bed file and a reference dict

0101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

012fastafasta_faidictstrtablefile

0 0

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

01234fastafaidictarchitectureweights

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

0 0 0

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

01234fastafasta_faidict

0 0 0 0 0 0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

012fastafaidict

0 0

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

fastafasta_faidict

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges adjacent DepthEvidence records

012fastafasta_faidict

0 0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel.

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates a sequence dictionary for a reference sequence

01

0 0

gatk:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a panel of normals constraining germline and artifactual sites for use with mutect2.

01010101

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Denoises read counts to produce denoised copy ratios

0101

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Determines the baseline contig ploidy for germline samples given counts data

012301contig_ploidy_table

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Estimates the numbers of unique molecules in a sequencing library.

01fastafaidict

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Converts FastQ file to SAM/BAM format

01

0 0

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

0123resourcesresources_indexfastafaidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Gathers scattered BQSR recalibration reports into a single file

01

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

write your description here

01dict

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.

012345run_intlistrun_updatewspaceinput_map

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.

012340101010101

0 0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy.

01234

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

0123010101variantsvariants_tbi

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Creates an index for a feature file, e.g. VCF or BED file.

01

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Converts an Picard IntervalList file to a BED file.

01

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Splits the interval list file into unique, equally-sized interval files and place it under a directory

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Left align and trim variants using GATK4 LeftAlignAndTrimVariants.

0123fastafaidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01fastafasta_fai

0 0 0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

0120101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merges mutect2 stats generated on different intervals/regions

01

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Merges several vcf files

0101

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts copy number ratios (and optonally allelic counts) to copy number segments

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call somatic SNVs and indels via local assembly of haplotypes.

0123010101germline_resourcegermline_resource_tbipanel_of_normalspanel_of_normals_tbi

0 0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

0123

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Prepares bins for coverage collection.

0101010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

WARNING - this tool is still experimental and shouldn't be used in a production setting. Gathers paired-end and split read evidence files for use in the GATK-SV pipeline. Output files are a file containing the location of and orientation of read pairs marked as discordant, and a file containing the clipping location of all soft clipped reads and the orientation of the clipping.

012bedfastafasta_faidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Condenses homRef blocks in a single-sample GVCF

0123fastafaidictdbsnpdbsnp_tbi

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Select a subset of variants from a VCF file

0123

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Create a fasta with the bases shifted by offset

010101

0 0 0 0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

EXPERIMENTAL TOOL! Convert SiteDepth to BafEvidence

01201fastafasta_faidict

0 0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Splits CRAM files efficiently by taking advantage of their container based structure

01

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Split intervals into sub-interval files.

01010101

0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Splits reads that contain Ns in their cigar string

0123010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Adds predicted functional consequence, gene overlap, and noncoding element overlap annotations to SV VCF from GATK-SV pipeline. Input files are an SV VCF, a GTF file containing primary or canonical transcripts, and a BED file containing noncoding elements. Output file is an annotated SV VCF.

0123fastafasta_faidict

0 0 0

gatk4:

Genome Analysis Toolkit (GATK4)

Clusters structural variants based on coordinates, event type, and supporting algorithms

012ploidy_tablefastafasta_faidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

01201010101

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012resource_vcfresource_tbilabelsfastafaidict

0 0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234fastafaidict

0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Generate recalibration table for Base Quality Score Recalibration (BQSR)

0123fastafaidictknown_sitesknown_sites_tbi

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01fastafasta_faidict

0 0 0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.

01program_filedisable_redirect_output

0 0

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

012model_dir

0 0 0 0 0 0

gecco:

Biosynthetic Gene Cluster prediction with Conditional Random Fields.

Convert a mappability file to bedgraph format

0101

0 0 0

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

01

0 0 0

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Define the mappability of a reference

01read_length

0 0

gem2:

GEM2 is a high-performance mapping tool. It also provide a unique tool to evaluate mappability.

Create a GEM index from a FASTA file

01

0 0 0

gem3:

The GEM indexer (v3).

Performs fastq alignment to a fasta reference using using gem3-mapper

0101sort_bam

0 0

gem3:

The GEM indexer (v3).

A derivative of GenomeScope2.0 modified to work with FastK

01

0 0 0 0 0 0 0 0

create index file for genmap

01

0 0

genmap:

Ultra-fast computation of genome mappability.

create mappability files for a genome

0101

0 0 0 0 0

genmap:

Ultra-fast computation of genome mappability.

for annotating regions, frequencies, cadd scores

01

0 0

genmod:

Annotate genetic inheritance models in variant files

Score compounds

01

0 0

genmod:

Annotate genetic inheritance models in variant files

annotate models of inheritance

012reduced_penetrance

0 0

genmod:

Annotate genetic inheritance models in variant files

Score the variants of a vcf based on their annotation

012score_config

0 0

genmod:

Annotate genetic inheritance models in variant files

Download geNomad databases and related files

NO input

0 0

genomad:

Identification of mobile genetic elements

Identify mobile genetic elements present in genomic assemblies

01genomad_db

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

genomad:

Identification of mobile genetic elements

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach

01

0 0 0 0 0 0 0 0 0

Genotype Salmonella Typhi from Mykrobe results

01

0 0

genotyphi:

Assign genotypes to Salmonella Typhi genomes based on VCF files (mapped to Typhi CT18 reference genome)

Peak-calling for ChIP-seq and ATAC-seq enrichment experiments

012blacklist_bed

0 0 0 0 0 0

geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA

geo_accession

0 0

Retrieves GEO data from the Gene Expression Omnibus (GEO)

01

0 0 0 0

geoquery:

Get data from NCBI Gene Expression Omnibus (GEO)

Downloads databases needed for running getorganelle

organelle_type

0 0

getorganelle:

Get organelle genomes from genome skimming data

Assembles organelle genomes from genomic data

0101

0 0 0

getorganelle:

Get organelle genomes from genome skimming data

Collapse walk-preserving shared affixes in variation graphs in GFA format

01

0 0 0

A single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

01out_fmtgenome_sizetarget01010101

0 0 0

Converts GFA or rGFA files to FASTA

01

0 0

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Summary statistics for GFA files

01

0 0

gfatools:

Tools for manipulating sequence graphs in the GFA and rGFA formats

Compare, merge, annotate and estimate accuracy of generated gtf files

0101201

0 0 0 0 0 0 0 0

Validate, filter, convert and perform various other operations on GFF files

01fasta

0 0 0 0

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

01

0 0 0

gget:

gget enables efficient querying of genomic databases

Defines chunks where to run imputation

0123

0 0

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.

01234567min_probmin_dpbins

0 0 0 0 0 0

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Concatenates imputation chunks in a single VCF/BCF file ligating phased information.

012

0 0

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

012345678

0 0

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Generates haplotype calls by sampling haplotype estimates

01

0 0

glimpse:

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.

Defines chunks where to run imputation

01234model

0 0

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Program to compute the genotyping error rate at the sample or marker level.

0123456780123456

0 0 0 0 0 0 0

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing.

012

0 0

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool for imputation and phasing from vcf file or directly from bam files.

0123456789012

0 0 0

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Tool to create a binary reference panel for quick reading time.

0123401

0 0

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

merge gVCF files and perform joint variant calling

01201vcf_output

0 0 0

GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).

012type_reportsummary_reportskipexamine

0 0 0 0 0 0 0

Writes a sorted concatenation of file/s

01

0 0

sort:

Writes a sorted concatenation of file/s

Split a file into consecutive or interleaved sections

01

0 0

gnu:

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.

Query metadata for any taxon across the tree of life.

012

0 0

goat:

goat-cli is a command line interface to query the Genomes on a Tree Open API.

Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.

01201

0 0 0 0 0 0 0 0

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Quickly generate evenly sized (by amount of data) regions across a number of bam/cram files

0101split

0 0

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

runs a functional enrichment analysis with gprofiler2

010101

0 0 0 0 0 0 0 0 0

gprofiler2:

An R interface corresponding to the 2019 update of g:Profiler web tool.

Checks if the input file is bgzip compressed or not

01

0 0

grabix:

a wee tool for random access into BGZF files.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

01fastaindex

0 0

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

fasta

0 0

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Tools for population-scale genotyping using pangenome graphs.

0120101region_file

0 0 0

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

Tools for population-scale genotyping using pangenome graphs.

01

0 0 0

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

0 0 0

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

0 0

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0123010101

bedpe bed versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

0 0 0

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

run the Broad Gene Set Enrichment tool in GSEA mode

01230101

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

gsea:

Gene Set Enrichment Analysis (GSEA)

Collapse redundant transcript models in Iso-Seq data.

01fasta

0 0 0 0 0 0 0 0 0 0

tama_collapse.py:

Collapse similar gene model

Merge multiple transcriptomes while maintaining source information.

01filelist

0 0 0 0 0

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

01

0 0 0 0

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files

01

0 0 0

gt:

The GenomeTools genome analysis system

GenomeTools gt-gff3validator utility to strictly validate a GFF3 file

01

0 0 0

gt:

The GenomeTools genome analysis system

Predicts LTR retrotransposons using GenomeTools gt-ltrharvest utility

01

0 0 0 0 0

gt:

The GenomeTools genome analysis system

GenomeTools gt-stat utility to show statistics about features contained in GFF3 files

01

0 0

gt:

The GenomeTools genome analysis system

Computes enhanced suffix array using GenomeTools gt-suffixerator utility

01mode

0 0

gt:

The GenomeTools genome analysis system

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

0101use_pplacer_scratch_dirmash_db

0 0 0 0 0 0 0 0 0 0 0

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Converts the output classifications of GTDB-TK from GTDB taxonomy to NCBI taxonomy

0120101

0 0

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Sort GTF files in chr/pos/feature order

gtf

0 0

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

alignment

0 0 0 0 0 0 0 0 0 0

Download database for GUNC detection of Chimerism and Contamination in Prokaryotic Genomes

db_name

0 0

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Merging of CheckM and GUNC results in one summary table

012

0 0

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Detection of Chimerism and Contamination in Prokaryotic Genomes

01db

0 0 0

gunc:

Python package for detection of chimerism and contamination in prokaryotic genomes.

Compresses and decompresses files.

01

0 0

Removes all non-variant blocks from a gVCF file to produce a smaller variant-only VCF file.

01

0 0

gvcftools:

gvcftools is a package of small utilities for creating and analyzing gVCF files

gzrecover is a program that will attempt to extract any readable data out of a gzip file that has been corrupted

01

0 0

Tool to convert and summarize ABRicate outputs using the hAMRonization specification

01formatsoftware_versionreference_db_version

0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize AMRfinderPlus outputs using the hAMRonization specification.

01formatsoftware_versionreference_db_version

0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize DeepARG outputs using the hAMRonization specification

01formatsoftware_versionreference_db_version

0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize fARGene outputs using the hAMRonization specification

01formatsoftware_versionreference_db_version

0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to convert and summarize RGI outputs using the hAMRonization specification.

01formatsoftware_versionreference_db_version

0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

Tool to summarize and combine all hAMRonization reports into a single file

reportsformat

0 0 0 0

hamronization:

Tool to convert and summarize AMR gene detection outputs using the hAMRonization specification

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

01mapexclude

0 0 0 0

Haplocheck detects contamination patterns in mtDNA AND WGS sequencing studies by analyzing the mitochondrial DNA. Haplocheck also works as a proxy tool for nDNA studies and provides users a graphical report to investigate the contamination further. Internally, it uses the Haplogrep tool, that supports rCRS and RSRS mitochondrial versions.

01

0 0 0

classification into haplogroups

01format

0 0

haplogrep2:

A tool for mtDNA haplogroup classification.

classification into haplogroups

01

0 0

haplogrep3:

A tool for mtDNA haplogroup classification.

Somatic VCF Feature Extraction tool from hap.y.

012340101

0 0

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

0 0 0 0 0 0 0 0 0 0 0 0

happy:

Haplotype VCF comparison tools

Pre.py is a preprocessing tool made to preprocess VCF files for Hap.py

0120101

0 0

happy:

Haplotype VCF comparison tools

Hap.py is a tool to compare diploid genotypes at haplotype level. som.py is a part of hap.py compares somatic variations.

012340101010101

0 0 0 0

sompy:

Haplotype VCF comparison tools somatic variant comparison

Generating cell hashing calls from a matrix of count data.

0123

0 0 0 0 0 0 0 0

HelitronScanner draw tool for Helitron transposons in genomes

010101

0 0

helitronscanner:

HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes

HelitronScanner scanHead and scanTail tools for Helitron transposons in genomes

01commandlcv_filepathbuffer_size

0 0

helitronscanner:

HelitronScanner uncovers a large overlooked cache of Helitron transposons in many genomes

Fast and sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)

0101

0 0

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

Sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs)

0101

0 0

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

Reformat a Multiple Sequence Alignment (MSA) file

01informatoutformat

0 0

hhsuite:

HH-suite3 for fast remote homology detection and deep protein annotation

Identify cap locus serotype and structure in your Haemophilus influenzae assemblies

01database_dirmodel_fp

0 0 0 0

Computes PCA eigenvectors for a Hi-C matrix.

01

0 0 0 0

hicexplorer:

Set of programs to process, analyze and visualize Hi-C and capture Hi-C data

Whole-genome assembly using PacBio HiFi reads

01201201201

0 0 0 0 0 0 0 0 0 0 0

pacbio structural variant calling tool

01201201

0 0 0

Align RNA-Seq reads to a reference with HISAT2

010101

0 0 0 0

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Builds HISAT2 index for reference genome

010101

0 0

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Extracts splicing sites from a gtf files

01

0 0

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

Pre-compute the graph index structure.

01

0 0

hlala:

HLA typing from short and long reads

Performs HLA typing based on a population reference graph and employs a new linear projection method to align reads to the graph.

0123

0 0 0 0 0 0 0 0 0 0 0

hlala:

HLA typing from short and long reads

gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference

01

0 0

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.

01

0 0

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file

01

0 0

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

readCounter function from HMMcopy utilities, used to generate read in windows

012

0 0

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Mask multiple sequence alignments

01234567maskfile

0 0 0 0 0 0 0 0

hmmer:

Biosequence analysis using profile hidden Markov models

reformats sequence files, see HMMER documentation for details. The module requires that the format is specified in ext.args in a config file, and that this comes last. See the tools help for possible values.

01

0 0

hmmer:

Biosequence analysis using profile hidden Markov models

hmmalign from the HMMER suite aligns a number of sequences to an HMM profile

01hmm

0 0

hmmer:

Biosequence analysis using profile hidden Markov models

create an hmm profile from a multiple sequence alignment

01mxfile

0 0 0

hmmer:

Biosequence analysis using profile hidden Markov models

extract hmm from hmm database file or create index for hmm database

01keykeyfileindex

0 0 0

hmmer:

Biosequence analysis using profile hidden Markov models

compress and index profile database for hmmscan

01

0 0

hmmer:

Biosequence analysis using profile hidden Markov models

R script that scores output from multiple runs of hmmer/hmmsearch

01

0 0

hmmer:

Biosequence analysis using profile hidden Markov models

R:

A Language and Environment for Statistical Computing

Tidyverse:

Tidyverse: R packages for data science

search profile(s) against a sequence database

012345

0 0 0 0 0

hmmer:

Biosequence analysis using profile hidden Markov models

iterative searches to detect distant homologs by refining an HMM profile from hits

012345

0 0 0 0 0

hmmer:

Biosequence analysis using profile hidden Markov models

Human mitochondrial variants annotation using HmtVar. Contains .plk file with annotation, so can be run offline

01

0 0

hmtnote:

Human mitochondrial variants annotation using HmtVar.

Annotate peaks with HOMER suite

01fastagtf

0 0 0

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Find peaks with HOMER suite

01uniqmap

0 0

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Create a tag directory with the HOMER suite

01fasta

0 0 0

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

Create a UCSC bed graph with the HOMER suite

01

0 0

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

Converting from HOMER peak to BED file formats

01

0 0

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

write your description here

0101

0 0 0

hostile:

Hostile: accurate host decontamination

Downloads required reference genomes for Hostile

index_name

0 0

hostile:

Hostile: accurate host decontamination

Serotype prediction of Haemophilus parasuis assemblies

01

0 0

Demultiplex samples based on data from cell hashing.

012

0 0 0 0 0

count how many reads map to each feature

01201

0 0

htseq/count:

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will instead want whole exome) coverage and uses that as an expectation of variants.

012012

0 0

htsnimtools:

useful command-line tools written to show-case hts-nim

HUMID is a tool to quickly and easily remove duplicate reads from FastQ files, with or without UMIs.

0101

0 0 0 0 0

Assembly polisher using short (and long) reads

0101draftgenome_sizereads_coverage

0 0

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA. This module generates a panel of normals

wigsgc_wigmap_wigcentromererep_time_wigexons

0 0 0

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

ichorCNA is an R package for calculating copy number alteration from (low-pass) whole genome sequencing, particularly for use in cell-free DNA

01gc_wigmap_wignormal_wignormal_backgroundcentromererep_time_wigexons

0 0 0 0 0 0 0 0 0

ichorcna:

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.

Plot a metagene of cross-link events/sites around various transcriptomic landmarks.

01segmentation

0 0

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount peaks on a BED file of crosslinks

012

0 0

icount:

Computational pipeline for analysis of iCLIP data

Formats a GTF file for use with iCount sigxls

01fai

0 0 0

icount:

Computational pipeline for analysis of iCLIP data

Runs iCount sigxls on a BED file of crosslinks

01segmentation

0 0 0

icount:

Computational pipeline for analysis of iCLIP data

Report proportion of cross-link events/sites on each region type.

01segmentation

0 0 0 0

icount:

Computational pipeline for analysis of iCLIP data

Demultiplex paired-end FASTQ files from QuantSeq-Pool

012

0 0 0 0

Measures reproducibility of ChIP-seq, ATAC-seq peaks using IDR (Irreproducible Discovery Rate)

peakspeak_typeprefix

0 0 0 0

igv.js is an embeddable interactive genome visualization component

012

0 0 0 0

igv:

Create an embeddable interactive genome browser component. Output files are expected to be present in the same directory as the genome browser html file. To visualise it, files have to be served. Check the documentation at: https://github.com/igvteam/igv-webapp for an example and https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements for server requirements

A Python application to generate self-contained HTML reports for variant review and other genomic applications

0123012

0 0

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

012ilp

0 0

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Ilastik is a tool that utilizes machine learning algorithms to classify pixels, segment, track and count cells in images. Ilastik contains a graphical user interface to interactively label pixels. However, this nextflow module will implement the --headless mode, to apply pixel classification using a pre-trained .ilp file on an input image.

0123ilp

0 0

ilastik:

Ilastik is a user friendly tool that enables pixel classification, segmentation and analysis.

Perform immune cell deconvolution using RNA-seq data and various computational methods.

0123gene_symbol_col

0 0 0

Search covariance models against a sequence database

012write_alignwrite_target

0 0 0 0

infernal:

Infernal is for searching DNA sequence databases for RNA structure and sequence similarities.

Strain-level comparisons across multiple inStrain profiles

012stb_file

0 0 0 0 0 0

instrain:

Calculation of strain-level metrics

inStrain is python program for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification

01genome_fastagenes_fastastb_file

0 0 0 0 0 0 0 0

instrain:

Calculation of strain-level metrics

Detect integrons in DNA sequences

01

0 0 0 0 0

Produces protein annotations and predictions from an amino acids FASTA file

01interproscan_database

0 0 0 0 0

Download, extract, and check md5 of iPHoP databases

NO input

0 0

iphop:

Predict host genus from genomes of uncultivated phages.

Predict phage host using iPHoP

01iphop_db

0 0 0 0

iphop:

Predict host genus from genomes of uncultivated phages.

Produces a Newick format phylogeny from a multiple sequence alignment using the maximum likelihood algorithm. Capable of bacterial genome size alignments.

012tree_telmclustmdefpartitions_equalpartitions_proportionalpartitions_unlinkedguide_treesitefreq_inconstraint_treetrees_zsuptreetrees_rf

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Quantification of transposable elements expression in scRNA-seq

01genomebed

0 0 0 0 0

Genomic island prediction in bacterial and archaeal genomes

01

0 0 0

Identify insertion sites positions in bacterial genomes

0123

0 0

IsoSeq - Cluster - Cluster trimmed consensus sequences

01

0 0 0 0 0 0 0 0 0 0 0 0

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

01primers

0 0 0 0 0 0

isoseq:

IsoSeq - Scalable De Novo Isoform Discovery

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq3:

IsoSeq3 - Scalable De Novo Isoform Discovery

Extract UMI and cell barcodes

01design

0 0 0

isoseq3:

Iso-Seq - Scalable De Novo Isoform Discovery

Generate a consensus sequence from a BAM file using iVar

01fastasave_mpileup

0 0 0 0

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

012bed

0 0 0

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Call variants from a BAM file using iVar

01fastafaigffsave_mpileup

0 0 0

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

01230101chr_norm

0 0

Efficiently counts k-mers from DNA sequencing reads using a fast, memory-efficient, parallelized algorithm

01kmer_lengthsize

0 0

jellyfish:

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence

Dumps the results from a jellyfish binary file into a human readable format

01

0 0

jellyfish:

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence

Render jupyter (or jupytext) notebooks to HTML reports. Supports parametrization through papermill.

01parametersinput_files

0 0 0

jupytext:

Jupyter notebooks as plain text scripts or markdown documents

papermill:

Parameterize, execute, and analyze notebooks

nbconvert:

Parameterize, execute, and analyze notebooks

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

01

0 0

jvarkit:

Java utilities for Bioinformatics.

Convert sam files to tsv files

01230123

0 0

jvarkit:

Java utilities for Bioinformatics.

Convert VCF to a user friendly table

012301

0 0

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Filtering VCF with dynamically-compiled java expressions

01230101010101

0 0 0 0

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

annotate VCF files for poly repeats

0123010101

0 0 0 0

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Plot whole genome coverage from BAM/CRAM file as SVG

012010101

0 0

jvarkit:

Java utilities for Bioinformatics.

Taxonomic classification of metagenomic sequence data using a protein reference database

01db

0 0

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Convert Kaiju's tab-separated output file into a tab-separated text file which can be imported into Krona.

01db

0 0

kaiju:

Fast and sensitive taxonomic classification for metagenomics

write your description here

01dbtaxon_rank

0 0

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Merge two tab-separated output files of Kaiju and Kraken in the column format

012db

0 0

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Make Kaiju FMI-index file from a protein FASTA file

01keep_intermediate

0 0 0 0

kaiju:

Fast and sensitive taxonomic classification for metagenomics

Aligns sequences using kalign

01compress

0 0

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

Create kallisto index

01

0 0

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

0101gtfchromosomesfragment_lengthfragment_length_sd

0 0 0 0

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

quantifies scRNA-seq data from fastq files using kb-python.

01indext2gt1ct2ctechnologyworkflow_mode

0 0 0

kb:

kallisto and bustools are wrapped in an easy-to-use program called kb

index creation for kb count quantification of single-cell data.

fastagtfworkflow_mode

0 0 0 0 0 0 0

kb:

kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data.

Creates a histogram of the number of distinct k-mers having a given frequency.

01

0 0 0 0 0 0 0

kat:

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts

Module that calls normalize-by-median.py from khmer. The module can take a mix of paired end (interleaved) and single end reads. If both types are provided, only a single file with single ends is possible.

012

0 0

khmer:

khmer k-mer counting library

Removes low abundance k-mers from FASTA/FASTQ files

01

0 0

khmer:

khmer k-mer counting library

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

01kmer_size

0 0 0

khmer:

khmer k-mer counting library

Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).

01

0 0

This module wraps the index module of the KMA alignment tool.

01

0 0

kma:

Rapid and precise alignment of raw reads against redundant databases with KMA

Generate k-mers (sketches) from FASTA/Q sequences

01

0 0 0

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

01

0 0 0

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

01

0 0

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

01db

0 0

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

01db

0 0

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Produces annotation using kofamscan against a Profile database and a KO list

01profilesko_list

0 0 0

Adds fasta files to a Kraken2 taxonomic database

01taxonomy_namestaxonomy_nodesaccession2taxidseqid2taxid

0 0

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

01cleaning

0 0

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Downloads and builds Kraken2 standard database

cleaning

0 0

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Classifies metagenomic sequence data

01dbsave_output_fastqssave_reads_assignment

0 0 0 0 0

kraken2:

Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads

Takes multiple kraken-style reports and combines them into a single report file

01

0 0

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Extract reads classified at any user-specified taxonomy IDs.

taxid010101

0 0

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Takes a Kraken report file and prints out a krona-compatible TEXT file

01

0 0

krakentools:

KrakenTools is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results. Please cite the relevant paper if using KrakenTools with any of the listed programs.

Download and build (custom) KrakenUniq databases

0123keep_intermediate

0 0

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Download KrakenUniq databases and related fles

pattern

0 0

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

Classifies metagenomic sequence data using unique k-mer counts

012sequence_typedbsave_output_readsreport_filesave_output

0 0 0 0 0

krakenuniq:

Metagenomics classifier with unique k-mer counting for more specific results

KronaTools Update Taxonomy downloads a taxonomy database

NO input

0 0

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Import Taxonomy imports taxonomy classifications and produces an interactive Krona plot.

01taxonomy

0 0

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Creates a Krona chart from text files listing quantities and lineages.

01

0 0

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KronaTools Update Taxonomy downloads a taxonomy database

NO input

0 0

krona:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Makes a dotplot (Oxford Grid) of pair-wise sequence alignments

01201formatfilter

0 0 0

last:

LAST finds & aligns related regions of sequences.

Aligns query sequences to target sequences indexed with lastdb

012index

0 0 0

last:

LAST finds & aligns related regions of sequences.

Prepare sequences for subsequent alignment with lastal.

01

0 0

last:

LAST finds & aligns related regions of sequences.

Converts MAF alignments in another format.

01201010101

0 0 0 0 0 0 0 0 0 0 0 0 0

last:

LAST finds & aligns related regions of sequences.

Reorder alignments in a MAF file

01

0 0

last:

LAST finds & aligns related regions of sequences.

Post-alignment masking

01

0 0

last:

LAST finds & aligns related regions of sequences.

Find split or spliced alignments in a MAF file

01

0 0 0

last:

LAST finds & aligns related regions of sequences.

Find suitable score parameters for sequence alignment

01index

0 0 0

last:

LAST finds & aligns related regions of sequences.

Align sequences using learnMSA

01

0 0

learnmsa:

learnMSA: Learning and Aligning large Protein Families

Bayesian reconstruction of ancient DNA fragments

01

0 0 0 0 0 0 0 0 0

Typing of clinical and environmental isolates of Legionella pneumophila

01

0 0

Index chain files for lift over

01chain

0 0

leviosam2:

Fast and accurate coordinate conversion between assemblies

Converting aligned short and long reads records from one reference to another

0101

0 0

leviosam2:

Fast and accurate coordinate conversion between assemblies

Uses Liftoff to accurately map annotations in GFF or GTF between assemblies of the same, or closely-related species

01ref_faref_annotationref_db

0 0 0 0

lima - The PacBio Barcode Demultiplexer and Primer Remover

01primers

0 0 0 0 0 0 0 0 0 0 0 0 0 0

runs a differential expression analysis with Limma

012345012

0 0 0 0 0 0 0

limma:

Linear Models for Microarray Data

LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. This module is for LINKS >=2.0.0 and does not support MPET input.

0101

0 0 0 0 0 0 0 0 0 0 0

Serogrouping Listeria monocytogenes assemblies

01

0 0

Lofreq subcommand to for insert base and indel alignment qualities

01fasta

0 0

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments

012fasta

0 0

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

It predicts variants using multiple processors

01230101

0 0 0

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

0 0

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Inserts indel qualities in a BAM file

0101

0 0

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0123450101

0 0

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0101

0 0

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

0 0 0

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

0 0

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

"A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps:

  • Tigmint cuts the draft assembly at potentially misassembled regions
  • ntLink is then used to scaffold the corrected assembly
  • followed by ARKS for further scaffolding (optional)"

0101commandspangenomesizelongmap

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

01

0 0 0

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

Predicts LTR retrotransposons using the parallel version of GenomeTools gt-ltrharvest utility included in the EDTA toolchain

01

0 0 0

LTR_HARVEST_parallel:

A Perl wrapper for LTR_harvest

gt:

The GenomeTools genome analysis system

Identifies LTR retrotransposons using LTR_retriever

metagenomeharvestfindermgescannon_tgca

meta log pass_list pass_list_gff ltrlib annotation_out annotation_gff versions

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

Estimates the mean LTR sequence identity in the genome. The input genome fasta should have short alphanumeric IDs without comments

01pass_listannotation_outmonoploid_seqs

0 0 0

lai:

Assessing genome assembly quality using the LTR Assembly Index (LAI)

Identifies LTR retrotransposons using LTR_retriever

01harvestfindermgescannon_tgca

0 0 0 0 0 0 0

LTR_retriever:

Sensitive and accurate identification of LTR retrotransposons

A tool that mines antimicrobial peptides (AMPs) from (meta)genomes by predicting peptides from genomes (provided as contigs) and outputs all the predicted anti-microbial peptides found.

01

0 0 0 0 0 0

macrel:

A pipeline for AMP (antimicrobial peptide) prediction

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

012macs2_gsize

0 0 0 0 0 0

macs2:

Model Based Analysis for ChIP-Seq data

Peak calling of enriched genomic regions of ChIP-seq and ATAC-seq experiments

012macs3_gsize

0 0 0 0 0 0

macs3:

Model Based Analysis for ChIP-Seq data

Multiple sequence alignment using MAFFT

0101010101010

fas versions

pigz:

Parallel implementation of the gzip algorithm.

Multiple sequence alignment using MAFFT

010101010101compress

0 0

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

pigz:

Parallel implementation of the gzip algorithm.

Guide tree rendering using MAFFT

01

0 0

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

mageck count for functional genomics, reads are usually mapped to a specific sgRNA

01library

0 0 0

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

maximum-likelihood analysis of gene essentialities computation

01design_matrix

0 0 0

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Mageck test performs a robust ranking aggregation (RRA) to identify positively or negatively selected genes in functional genomics screens.

01

0 0 0 0

mageck:

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), an algorithm to process, QC, analyze and visualize CRISPR screening data.

Multiple Sequence Alignment using Graph Clustering

0101compress

0 0

magus:

Multiple Sequence Alignment using Graph Clustering

Multiple Sequence Alignment using Graph Clustering

01

0 0

magus:

Multiple Sequence Alignment using Graph Clustering

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

fastasgffmapping_dbmap_type

0 0 0

malt:

A tool for mapping metagenomic data

MALT, an acronym for MEGAN alignment tool, is a sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics.

01index

0 0 0 0

malt:

A tool for mapping metagenomic data

Tool for evaluation of MALT results for true positives of ancient metagenomic taxonomic screening

01taxon_listncbi_dir

0 0

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. This script reformats inversions into single inverted sequence junctions which was the format used in Manta versions <= 1.4.0.

0101

0 0 0

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012340101config

0 0 0 0 0 0 0

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

01234560101config

0 0 0 0 0 0 0 0 0

manta:

Structural variant and indel caller for mapped sequencing data

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.

012340101config

0 0 0 0 0 0 0

manta:

Structural variant and indel caller for mapped sequencing data

Create mapAD index for reference genome

01

0 0

mapad:

An aDNA aware short-read mapper

Map short-reads to an indexed reference genome

0101mismatch_parameterdouble_stranded_libraryfive_prime_overhangthree_prime_overhangdeam_rate_double_strandeddeam_rate_single_strandedindel_rate

0 0

mapad:

An aDNA aware short-read mapper

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

01fasta

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Calculate Mash distances between reference and query sequences

01reference

0 0

mash:

Fast sequence distance estimator that uses MinHash

Screens query sequences against large sequence databases

0101

0 0

mash:

Fast sequence distance estimator that uses MinHash

Creates vastly reduced representations of sequences using MinHash

01

0 0 0

mash:

Fast sequence distance estimator that uses MinHash

Mashmap is an approximate long read or contig mapper based on Jaccard similarity

0101

0 0

Quickly create a tree using Mash distances

01

0 0 0

MaxBin is a software that is capable of clustering metagenomic contigs

0123

0 0 0 0 0 0 0 0 0 0

Run standard proteomics data analysis with MaxQuant, mostly dedicated to label-free. Paths to fasta and raw files needs to be marked by "PLACEHOLDER"

012raw

0 0

maxquant:

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted.

Mcquant extracts single-cell data given a multi-channel image and a segmentation mask.

010101

0 0

Analysis of mcr-1 gene (mobilized colistin resistance) for sequence variation

01

0 0 0

Staging module for MCMICRO transforming Imaging Mass Cytometry .txt files to .tif files with OME-XML metadata. Includes optional hot pixel removal.

01

0 0

mcstaging:

Staging modules for MCMICRO

Staging module for MCMICRO transforming PhenoImager .tif files into stacked and normalized ome-tif files per cycle, compatible as ASHLAR input.

01

0 0

mcstaging:

Staging modules for MCMICRO

Create MD5 (128-bit) checksums

01as_separate_files

0 0

mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences

01

0 0

A tool to create consensus sequences and variant calls from nanopore sequencing data

012

0 0

An ultra-fast metagenomic assembler for large and complex metagenomics

012

0 0 0 0 0 0 0

pigz:

Parallel implementation of the gzip algorithm.

Analyses a DAA file and exports information in text format

01megan_summary

0 0 0

megan:

A tool for studying the taxonomic content of a set of DNA reads

Analyses an RMA file and exports information in text format

01megan_summary

0 0 0

megan:

A tool for studying the taxonomic content of a set of DNA reads

Performs taxonomic profiling of long metagenomic reads against the melon database

01databasek2_db

0 0 0 0

Serotyping of Neisseria meningitidis assemblies

01

0 0

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101lookup_tableseqmerspeak

0 0 0

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

k-mer based assembly evaluation.

metameryl_dbassembly

meta versions assembly_only_kmers_bed assembly_only_kmers_wig stats dist_hist spectra_cn_fl_png spectra_cn_ln_png spectra_cn_st_png spectra_cn_hist spectra_asm_fl_png spectra_asm_ln_png spectra_asm_st_png spectra_asm_hist assembly_qv scaffold_qv read_ploidy

A script to generate hap-mer dbs for trios

01maternal_merylpaternal_meryl

0 0 0 0 0 0

merqury:

Evaluate genome assemblies with k-mers and more.

k-mer based assembly evaluation.

012

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

merqury:

Evaluate genome assemblies with k-mers and more.

Produces maternal and paternal FastK kmer tables from maternal, paternal and child FastK tables

010101

0 0 0

merquryfk:

FastK based version of Merqury

A reimplemenation of Kat Comp to work with FastK databases

01234

0 0 0 0

merquryfk:

FastK based version of Merqury

A reimplemenation of KatGC to work with FastK databases

012

0 0 0 0

merquryfk:

FastK based version of Merqury

FastK based version of Merqury

012340101

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

merquryfk:

FastK based version of Merqury

An improved version of Smudgeplot using FastK

012

0 0 0 0

merquryfk:

FastK based version of Merqury

A genomic k-mer counter (and sequence utility) with nice features.

01kvalue

0 0

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

01kvalue

0 0

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

A genomic k-mer counter (and sequence utility) with nice features.

01kvalue

0 0

meryl:

A genomic k-mer counter (and sequence utility) with nice features.

Depth computation per contig step of metabat2

012

0 0

metabat2:

Metagenome binning

Metagenome binning of contigs

012

0 0 0 0 0 0

metabat2:

Metagenome binning

Taxonomic profiling database building with MetaCache

01taxonomyseq2taxid

0 0

metacache:

MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin. It aims to reduce the memory requirement usually associated with k-mer based methods while retaining their speed. MetaCache uses locality sensitive hashing to quickly identify candidate regions within one or multiple reference genomes. A read is then classified based on the similarity to those regions.

For an independent comparison to other tools in terms of classification accuracy see the LEMMI benchmarking site.

The latest version of MetaCache classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs.

Metacache query command for taxonomic classification

01dbdo_abundances

0 0 0

metacache:

MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin.

Annotation of eukaryotic metagenomes using MetaEuk

01database

0 0 0 0 0

metaeuk:

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Strain-level metagenomic assignment

01234database_folder

0 0 0 0 0 0 0 0

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Maps long reads to a metamaps database

01database

0 0 0 0 0

metamaps:

MetaMaps is a tool for long-read metagenomic analysis

Metagenome assembler for long-read sequences (HiFi and ONT).

01input_type

0 0 0

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

Build MetaPhlAn database for taxonomic profiling.

NO input

0 0

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn4

01

0 0

metaphlan4:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

01metaphlan_db_latest

0 0 0 0

metaphlan:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Merges output abundance tables from MetaPhlAn3

01

0 0

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

01metaphlan_db

0 0 0 0

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Export METASPACE datasets to AnnData and SpatialData objects

ds_id

0 0 0

A module to download dataset results from the METASPACE platform and save them as CSV files, using a containerized Python script. Inputs are provided via a CSV file or a list of datasets, with results saved to a specified output directory.

012

0 0 0

metaspace2020:

Python package providing programmatic access to the METASPACE platform

Extracts per-base methylation metrics from alignments

012fastafai

0 0 0

methyldackel:

Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.

Generates methylation bias plots from alignments

012fastafai

0 0

methyldackel:

Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.

Demultiplex MGI fastq files

012

0 0 0 0 0 0 0 0 0 0

mgikit demultiplex:

Demultiplex MGI fastq files

A tool to estimate bacterial species abundance

0101mode

0 0

midas:

An integrated pipeline for estimating strain-level genomic variation from metagenomic data

marks duplicate spots along gridline edges.

01

0 0

mindagap:

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

Takes a single panorama image and fills the empty grid lines with neighbour-weighted values.

01

0 0

mindagap:

Mindagap is a collection of tools to process multiplexed FISH data, such as produced by Resolve Biosciences Molecular Cartography.

Minia is a short-read assembler based on a de Bruijn graph

01

0 0 0 0

A very fast OLC-based de novo assembler for noisy long reads

012

0 0 0

Compression of a reference panel for genotype imputation to .msav format

012

0 0

minimac4:

Computationally efficient genotype imputation

Imputation of genotypes using a reference panel

0123456

0 0

minimac4:

Computationally efficient genotype imputation

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0101bam_formatbam_index_extensioncigar_paf_formatcigar_bam

0 0 0 0

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Provides fasta index required by minimap2 alignment.

01

0 0

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0101

0 0 0

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

Provides fasta index required by miniprot alignment.

01

0 0

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

miRanda is an algorithm for finding genomic targets for microRNAs

01mirbase

0 0

miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.

0101

0 0

mirdeep2:

miRDeep2 Mapper (mapper.pl) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.

miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.

012010123

0 0

mirdeep2:

miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs. It includes three main scripts: miRDeep2.pl, mapper.pl, and quantifier.pl for comprehensive miRNA detection and quantification.

mirtop counts generates a file with the minimal information about each sequence and the count data in columns for each samples.

0101012

0 0

mirtop:

Small RNA-seq annotation

mirtop export generates files such as fasta, vcf or compatible with isomiRs bioconductor package

0101012

0 0 0 0

mirtop:

Small RNA-seq annotation

mirtop gff generates the GFF3 adapter format to capture miRNA variations

0101012

0 0

mirtop:

Small RNA-seq annotation

mirtop gff gets the number of isomiRs and miRNAs annotated in the GFF file by isomiR category.

01

0 0 0

mirtop:

Small RNA-seq annotation

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

012mirtrace_species

0 0 0 0 0 0

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

Download a mitochondrial genome to be used as reference for MitoHiFi

01

0 0 0

findMitoReference.py:

Fetch mitochondrial genome in Fasta and Genbank format from NCBI

A python workflow that assembles mitogenomes from Pacbio HiFi reads

01ref_faref_gbinput_modemito_code

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

Run Torsten Seemann's classic MLST on a genome assembly

01

0 0

Cluster sequences using MMSeqs2 cluster.

01

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create an MMseqs database from an existing FASTA/Q file

01

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Creates sequence index for mmseqs database

01

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Create a tsv file from a query and a target database as well as the result database

010101

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Download an mmseqs-formatted database

database

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences using MMSeqs2 easy cluster.

01

0 0 0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a database using MMseqs2

0101

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences in linear time using MMSeqs2 linclust.

01

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Search and calculate a score for similar sequences in a query and a target database.

0101

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Computes the lowest common ancestor by identifying the query sequence homologs against the target database.

01db_target

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Conversion of expandable profile to databases to the MMseqs2 databases format

database

0 0

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Subclonal deconvolution of cancer genome sequencing data.

01

0 0 0 0 0 0 0

A tool to reconstruct plasmids in bacterial assemblies

01

0 0 0 0 0

mobsuite:

Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.

A bioinformatics tool for working with modified bases

01201201

0 0 0 0

modkit:

A bioinformatics tool for working with modified bases in Oxford Nanopore sequencing data

Contrast-limited adjusted histogram equalization (CLAHE) on single-channel tif images.

01

0 0

molkartgarage:

One-stop-shop for scripts and tools for processing data for molkart and spatial omics pipelines.

Calculates genome-wide sequencing coverage.

012301

0 0 0 0 0 0 0 0 0 0 0 0 0

Download the mOTUs database

motus_downloaddb_script

0 0

motus:

The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

Taxonomic meta-omics profiling using universal marker genes

01dbprofile_version_yml

0 0 0

motus:

Marker gene-based OTU (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

01db

0 0

motus:

Marker gene-based operational taxonomic unit (mOTU) profiling

Taxonomic meta-omics profiling using universal marker genes

01db

0 0 0 0 0

motus:

Marker gene-based OTU (mOTU) profiling

Evaluate microsattelite instability (MSI) using paired tumor-normal sequencing data

0123456

0 0 0 0 0

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

Scan a reference genome to get microsatellite & homopolymer information

01

0 0

msisensor:

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline.

msisensor2 detection of MSI regions.

012345scanmodels

0 0 0 0 0

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

msisensor2 detection of MSI regions.

fastaoutput

0 0

msisensor2:

MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

01234501msisensor_scan

0 0 0 0 0

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

MSIsensor-pro/pro is a tool used to evaluate MSI using single (tumor) sample sequencing data

012010101

0 0 0 0 0

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input

01

0 0

msisensorpro:

Microsatellite Instability (MSI) detection using high-throughput sequencing data.

Aligns protein structures using mTM-align

01compress

0 0 0

mTM-align:

Algorithm for structural multiple sequence alignments

pigz:

Parallel implementation of the gzip algorithm.

A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.

01mt_id

0 0 0

Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.

01indexgtfrad

0 0 0

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.

gtf

0 0

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Aggregate results from bioinformatics analyses across many samples into a single report

multiqc_filesmultiqc_configextra_multiqc_configmultiqc_logoreplace_namessample_names

0 0 0 0

multiqc:

MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

Identify singlets, doublets and negative cells from multiplexing experiments. Annotate singlets by tags.

012

0 0 0 0

SNP table generator from GATK UnifiedGenotyper with functionality geared for aDNA

01010101allele_freqsgenotype_qualitycoveragehomozygous_freqheterozygous_freq01

0 0 0 0 0 0 0 0 0 0 0 0

MUMmer is a system for rapidly aligning entire genomes

012

0 0

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

01

0 0 0 0 0 0 0 0 0

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

01compress

0 0

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

pre-filtering and calculating position-specific summary statistics using the Markov substitution model

0123401

0 0

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants

01012

0 0 0

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Fetch the GO concepts for a list of genes

01

0 0 0

AMR predictions for supported species

01species

0 0 0

mykrobe:

Antibiotic resistance prediction in minutes

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

0 0 0

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data. NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes. Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

0101

0 0 0 0

NACHO:

R package that uses two main functions to summarize and visualize NanoString RCC files, namely: load_rcc() and visualise(). It also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data. For more information vignette("NACHO") and vignette("NACHO-analysis")

nail search is a fast and scalable tool for searching protein sequences against protein databases

0101write_align

0 0 0 0

nail:

Profile Hidden Markov Model (pHMM) biological sequence alignment tool

Compare multiple runs of long read sequencing data and alignments

01

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Filtering and trimming of Oxford Nanopore Sequencing data

01summary_file

0 0 0

DNA contaminant removal using NanoLyse

01fasta

0 0 0

Parse all the supporting reads of putative somatic SVs using nanomonsv. After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements. A precursor to "nanomonsv get"

012

0 0 0 0 0 0 0 0 0

nanomonsv:

nanomonsv is a software for detecting somatic structural variations from paired (tumor and matched control) cancer genome sequence data.

Run NanoPlot on nanopore-sequenced reads

01

0 0 0 0 0

Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.

01output_format

0 0 0

Performs fastq alignment to a reference using NARFMAP

010101sort_bam

0 0 0

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

Create DRAGEN hashtable for reference genome

01

0 0

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

A tool to quickly download assemblies from NCBI's Assembly database

metaaccessionstaxidsgroups

0 0 0 0 0 0 0 0 0 0 0 0 0 0

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

0101

0 0

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

Get dataset for SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

datasettag

0 0

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks (C++ implementation)

01dataset

0 0 0 0 0 0 0 0 0 0 0

nextclade:

SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks

Performs fastq alignment to a fasta reference using NextGenMap

01fasta

0 0

bwa:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime

Serotyping Neisseria gonorrhoeae assemblies

01

0 0

Merging paired-end reads and removing sequencing adapters.

01

0 0 0 0

Annotates GC content fraction to regions in a BED file.

010101

0 0

ngsbits:

Short-read sequencing tools

Annotates a BED file with the average coverage of the regions from one or several BAM/CRAM file(s).

01230101

0 0

ngsbits:

Short-read sequencing tools

Determines the gender of a sample from the BAM/CRAM file.

0120101method

0 0

ngsbits:

Short-read sequencing tools

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates vaf files for individual fastq file(s), ready for the vafncm module.

0101

0 0

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.

010101

0 0 0 0 0 0

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

010101

0 0

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Determining whether sequencing data comes from the same individual by using SNP matching. This module generates PT files from a bed file containing individual positions.

01

0 0 0 0 0

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

write your description here

metareadsformatmode

meta versions npa npc npl npo

Visualise metagenome redundancy curve in PNG format from a single Nonpareil npo file

01

0 0

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Calculate metagenome redundancy curve from FASTQ files

01formatmode

0 0 0 0 0

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Generate summary reports with raw data for Nonpareil NPO curves, including MultiQC compatible JSON/TSV files

01

0 0 0 0 0

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

Visualise metagenome redundancy curves in PNG format from multiple Nonpareil npo files in a single image

01

0 0

nonpareil:

Estimate average coverage and create curves for metagenomic datasets

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

012

0 0 0

An nf-core module for the OATK

010123401234

0 0 0 0 0 0 0 0 0 0 0 0 0 0

Construct a dynamic succinct variation graph in ODGI format from a GFAv1.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Draw previously-determined 2D layouts of the graph with diverse annotations.

012

0 0

odgi:

An optimized dynamic genome/graph implementation

Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.

01

0 0 0

odgi:

An optimized dynamic genome/graph implementation

Apply different kind of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Squeezes multiple graphs in ODGI format into the same file in ODGI format.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Metrics describing a variation graph and its path relationship.

01

0 0 0

odgi:

An optimized dynamic genome/graph implementation

Merge unitigs into a single node preserving the node order.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Project a graph into other formats.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Visualize a variation graph in 1D.

01

0 0

odgi:

An optimized dynamic genome/graph implementation

Calls CNVs in bam files from tumor patients

01234bedfasta

0 0 0 0

Create a decoy peptide database from a standard FASTA database.

01

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Filters peptide/protein identification results by different criteria.

01

0 0 0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Filters peptide/protein identification results by different criteria.

012

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Calculates a distribution of the mass error from given mass spectra and IDs.

012

0 0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Merges several idXML files into one idXML file.

01

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Split a merged identification file into their originating identification files

01

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Switches between different scores of peptide or protein hits in identification data

01

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

A tool for peak detection in high-resolution profile data (Orbitrap or FTICR)

01

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Refreshes the protein references for all peptide hits.

012

0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Annotates MS/MS spectra using Comet.

012

0 0 0

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Perform HLA-I typing of sequencing data

012

0 0 0

OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics.

0101

0 0 0

A python library and a command-line client for up- and downloading files to and from your Open Science Framework projects

012

0 0

osfclient:

The osfclient is a python library and a command-line client for up- and downloading files to and from your Open Science Framework projects.

A program to convert bam into paf.

01

0 0

paftools:

A program to manipulate paf files / convert to and from paf.

a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates

01

0 0

Find and remove PCR/optical duplicates

01

0 0 0

pairtools:

CLI tools to process mapped Hi-C data

Flip pairs to get an upper-triangular matrix

01chromsizes

0 0

pairtools:

CLI tools to process mapped Hi-C data

Merge multiple pairs/pairsam files

01

0 0

pairtools:

CLI tools to process mapped Hi-C data

Find ligation junctions in .sam, make .pairs

01chromsizes

0 0 0

pairtools:

CLI tools to process mapped Hi-C data

Assign restriction fragments to pairs

01frag

0 0

pairtools:

CLI tools to process mapped Hi-C data

Select pairs according to given condition by options.args

01

0 0 0

pairtools:

CLI tools to process mapped Hi-C data

Sort a .pairs/.pairsam file

01

0 0

pairtools:

CLI tools to process mapped Hi-C data

Split a .pairsam file into .pairs and .sam.

01

0 0 0

pairtools:

CLI tools to process mapped Hi-C data

Calculate pairs statistics

01

0 0

pairtools:

CLI tools to process mapped Hi-C data

Calculates a coverage histogram from a GFA file and constructs a growth table from this as either a TSV or HTML file

01bed_subsetbed_excludetsv_groupby

0 0

panacus:

panacus is a tool for computing counting statistics for GFA files

Create visualizations from a tsv coverage histogram created with panacus.

01

0 0

panacus:

panacus is a tool for computing counting statistics for GFA files

A fast and scalable tool for bacterial pangenome analysis

01

0 0 0

panaroo:

panaroo - an updated pipeline for pangenome investigation

Phylogenetic Assignment of Named Global Outbreak LINeages

01

report versions

Phylogenetic Assignment of Named Global Outbreak LINeages

01db

0 0

pangolin:

Phylogenetic Assignment of Named Global Outbreak LINeages

Phylogenetic Assignment of Named Global Outbreak LINeages

dbname

0 0

pangolin:

Phylogenetic Assignment of Named Global Outbreak LINeages

NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).

0101010101

0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated variant calls annotation based on dbSNP database

0123

0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating deepvariant.

012301

0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.

0101010101output_fmt

0 0 0 0 0 0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

010101known_sites

0 0 0 0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated joint genotyping, replicating GATK GenotypeGVCFs

0101

0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated germline variant calling, replicating GATK haplotypecaller.

012301

0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated gvcf indexing tool.

01

0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated minimap2 for aligning long read sequences against a large reference database using an accelerated KSW2 to convert FASTQ to BAM/CRAM.

01010101output_fmt

0 0 0 0 0 0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated somatic variant calling, replicating GATK Mutect2.

01234501panel_of_normalspanel_of_normals_index

0 0 0

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Paraclu finds clusters in data attached to sequences.

01min_cluster

0 0

Determines the depth in a BAM/CRAM file

0120101

0 0 0

paragraph:

Graph realignment tools for structural variants

Genotype structural variants using paragraph and grmpy

0123450101

0 0 0

paragraph:

Graph realignment tools for structural variants

Convert a VCF file to a JSON graph

0101

0 0

paragraph:

Graph realignment tools for structural variants

HiFi-based caller for highly homologous genes

0120101

0 0 0 0 0 0

Serogroup Pseudomonas aeruginosa assemblies

01

0 0 0 0

The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.

01

0 0 0

pbbam:

PacBio BAM C++ library

Pacbio ccs - Generate Highly Accurate Single-Molecule Consensus Reads

012chunk_numchunk_on

0 0 0 0 0 0

Alignment with PacBio's minimap2 frontend

0101

0 0

pbmm2:

A minimap2 frontend for PacBio native data formats

Assign PBP type of Streptococcus pneumoniae assemblies

01db

0 0 0

pbsv/call - PacBio structural variant (SV) calling and analysis tools

0101

0 0

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

pbsv - PacBio structural variant (SV) signature discovery tool

0101

0 0

pbsv:

pbsv - PacBio structural variant (SV) calling and analysis tools

converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq

012

0 0

pbtk:

pbtk - PacBio BAM toolkit

Minimalistic tool which creates an index file that enables random access into PacBio BAM files

01

0 0

pbtk:

pbtk - PacBio BAM toolkit

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.

01

0 0 0 0

Manipulation, validation and exploration of pedigrees

0120101

0 0 0 0 0 0 0 0 0 0 0

Runs PEKA CLIP peak k-mer analysis

0101fastafaigtf

0 0 0 0 0 0 0 0

Per-base metrics on BAM/CRAM files.

0123012

0 0

"This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays."

01

0 0 0 0

Install databases necessary for Pharokka's functional analysis

NO input

0 0

pharokka:

Fast Phage Annotation Program

Functional annotation of phages

01pharokka_db

0 0 0 0 0 0 0 0

pharokka:

Fast Phage Annotation Program

Predict prophages in bacterial genomes

01

0 0 0 0 0 0 0 0 0 0 0 0

phispy:

Prophage finder using multiple metrics

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.

01silva_dbunivec_db

0 0

Assigns all the reads in a file to a single new read-group

010101

0 0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates an interval list from a bed file and a reference dict

0101

0 0

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

01

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the alignment summary of a paired-end library.

0101

0 0

picard:

Java tools for working with NGS data in the BAM format

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

01234010101

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the insert size distribution of a paired-end library.

01

0 0 0

picard:

Java tools for working with NGS data in the BAM format

Collect multiple metrics from a BAM file

0120101

0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics from a RNAseq BAM file

01ref_flatfastarrna_intervals

0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

0120101intervallist

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Creates a sequence dictionary for a reference sequence.

01

0 0

picard:

Creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

Checks that all data in the set of input files appear to come from the same individual

01234501

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.

012haplotype_mapfastafasta_faisequence_dictionary

0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Converts a FASTQ file to an unaligned BAM or SAM file.

01

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

012filter

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Verify mate-pair information between mates and fix if needed

01

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Lifts over a VCF file from one reference build to another.

01010101

0 0 0

picard:

Move annotations from one assembly to another

Locate and tag duplicate reads in a BAM file

010101

0 0 0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the mean quality by cycle of a paired-end library.

01

0 0 0

picard:

Java tools for working with NGS data in the BAM format

Merges multiple BAM files into a single file

01

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

0 0 0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

01

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases

010101

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

This tool takes in a coordinate-sorted SAM or BAM and calculatesthe NM, MD, and UQ tags by comparing with the reference.

0101

0 0 0

picard:

Java tools for working with NGS data in the BAM format

Sorts BAM/SAM files based on a variety of picard specific criteria

01sort_order

0 0

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts vcf files

010101

0 0

picard:

Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format

Compresses files with pigz.

01

0 0

pigz:

Parallel implementation of the gzip algorithm.

write your description here

01

0 0

pigz:

Parallel implementation of the gzip algorithm.

Automatically improve draft assemblies and find variation among strains, including large event detection

01012pilon_mode

0 0 0 0 0 0

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

012fastafaibed

0 0 0 0 0 0 0 0 0 0 0

pindel:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data

Main caller script for peak calling

012assay_type

0 0 0 0 0

pints:

Peak Identifier for Nascent Transcripts Starts (PINTS)

Pangenome toolbox for bacterial genomes

01

0 0 0

Identify plasmids in bacterial sequences and assemblies

01

0 0 0 0 0 0

assembles bacterial plasmids

01fasta

0 0 0 0 0 0 0 0 0

Platypus is a tool that efficiently and accurately calling genetic variants from next-generation DNA sequencing data

01234fastafaiskipregions_file

0 0 0 0

Analyses binary variant call format (BCF) files using plink

01

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

0 0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Exclude variant identifiers from plink bfiles

01234

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Subset plink bfiles with a text file of variant identifiers

01234

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Fast Epistasis in PLINK, analyzing how the effects of one gene depend on the presence of others.

0123010101

0 0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Calculates identity-by-descent over autosomal SNPs

0123

0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate GWAS association studies

0123010101

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Generate Hardy-Weinberg statistics for provided input

01230101

0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other.

0123window_sizevariant_countvariance_inflation_factor

0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Produce a pruned subset of markers that are in approximate linkage equilibrium with each other. Pairs of variants in the current window with squared correlation greater than the threshold are noted and variants are greedily pruned from the window until no such pairs remain.

0123window_sizevariant_countr2_threshold

0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

LD analysis in PLINK examines genetic variant associations within populations

0123010101

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Recodes plink bfiles into a new text fileset applying different modifiers

0123

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Analyses variant calling files using plink

01

0 0 0 0

plink:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Subset plink pfiles with a text file of variant identifiers

01234

0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Filters plink bfiles or pfiles with filters such as maf or var

0123

0 0 0 0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Calculate Inbreeding data with plink2

0123

0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Filters plink bfiles or pfiles with maf filters

01230

bed bim fam pgen pvar psam versions

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Produce pruned set of variants in approximatelinkage equilibrium

0123winstepr2

0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Perform PCA analysis using PLINK

012345

0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Remove samples from a plink2 dataset

0123sample_exclude_list

0 0 0 0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Apply a scoring system to each sample in a plink 2 fileset

0123scorefile

0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Import variant genetic data using plink2

01

0 0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

Convert from VCF file to BGEN file version 1.2 format preserving dosages.

01234

0 0 0 0

plink2:

Whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner

pmdtools command to filter ancient DNA molecules from others

012thresholdreference

0 0

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

0 0 0

Polishing genome assemblies with short reads.

0101save_debug

0 0 0

polypolish:

Polishing genome assemblies with short reads.

PoolSNP is a heuristic SNP caller, which uses an MPILEUP file and a reference genome in FASTA format as inputs.

0101012

0 0 0 0

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is available.

0123

0 0

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Software to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.

012

0 0 0 0 0

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Software to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing and external genotyping data for each sample is not available.

012

0 0 0 0 0 0

popscle:

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxiliary tools

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

01custom_adapters

0 0 0

Adapter removal and demultiplexing of Oxford Nanopore reads

01

0 0 0

porechop:

Adapter removal and demultiplexing of Oxford Nanopore reads

Run all Portcullis steps in one go

010101

0 0 0 0 0 0 0 0

portcullis:

Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.

Software for predicting library complexity and genome coverage in high-throughput sequencing

01

0 0 0

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Software for predicting library complexity and genome coverage in high-throughput sequencing

01

0 0 0

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing

Calculate pairwise nucleotide identity with respect to a reference sequence

0101compress

0 0 0 0 0

Filter reads by quality score.

01

0 0 0 0

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

converts sam/bam/cram/pairs into genome contact map

01012

0 0

a module to generate images from Pretext contact maps.

01

0 0

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

0 0 0 0 0

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

01output_format

0 0 0 0 0

Whole genome annotation of small genomes (bacterial, archeal, viral)

01proteinsprodigal_tf

0 0 0 0 0 0 0 0 0 0 0 0 0

frame-shift correction for long read (meta)genomics - fix frameshifts in reads

0101

0 0

proovframe:

frame-shift correction for long read (meta)genomics

frame-shift correction for long read (meta)genomics - maps proteins to reads

012

0 0

proovframe:

frame-shift correction for long read (meta)genomics

Perform Gene Ratio Enrichment Analysis

0101

0 0 0

grea:

Gene Ratio Enrichment Analysis

Transform the data matrix using centered logratio transformation (CLR) or additive logratio transformation (ALR)

01

0 0 0

propr:

Logratio methods for omics data

Perform differential proportionality analysis

0123012

0 0 0 0 0 0 0 0 0

propr:

Logratio methods for omics data

Perform logratio-based correlation analysis -> get proportionality & basis shrinkage partial correlation coefficients. One can also compute standard correlation coefficients, if required.

01

0 0 0 0 0 0 0

propr:

Logratio methods for omics data

corpcor:

Efficient Estimation of Covariance and (Partial) Correlation

Proteinortho is a tool to detect orthologous genes within different species.

01

0 0 0 0

reads a maxQuant proteinGroups file with Proteus

012

0 0 0 0 0 0 0 0 0 0

proteus:

R package for analysing proteomics data

PureCLIP is a tool to detect protein-RNA interaction footprints from single-nucleotide CLIP-seq data, such as iCLIP and eCLIP.

01201201input_control

0 0 0

Calculate intervals coverage for each sample. N.B. the tool can not handle staging files with symlinks, stageInMode should be set to 'link'.

012intervals

0 0 0 0 0

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Generate on and off-target intervals for PureCN from a list of targets

0101genome

0 0 0

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Build a normal database for coverage normalization from all the (GC-normalized) normal coverage files. N.B. as reported in https://www.bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html, it is advised to provide a normal panel (VCF format) to precompute mapping bias for faster runtimes.

0123genomeassay

0 0 0 0 0 0

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Run PureCN workflow to normalize, segment and determine purity and ploidy

012normal_dbgenome

0 0 0 0 0 0 0 0 0 0 0 0

purecn:

Copy number calling and SNV classification using targeted short read sequencing

Calculate coverage cutoffs to determine when to purge duplicated sequence.

01

0 0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Separates out sequences purged of falsely duplicated sequences.

012

0 0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Plots the read coverage from a purge dups statistics file and cutoffs.

012

0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Create read depth histogram and base-level read depth for an assembly based on pacbio data

01

0 0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Purge haplotigs and overlaps for an assembly

0123

0 0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Split fasta file by 'N's to aid in self alignment for duplicate purging

01

0 0

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Identify, orient and trim nanopore cDNA reads

01

0 0

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

write your description here

01

0 0 0

Damage parameter estimation for ancient DNA

012

0 0

pydamage:

Damage parameter estimation for ancient DNA

Damage parameter estimation for ancient DNA

01

0 0

pydamage:

Damage parameter estimation for ancient DNA

Compute summary statistics for control gene from BAM files.

012control_gene

0 0

pypgx:

A Python package for pharmacogenomics research

Call SNVs/indels from BAM files for all target genes.

01201

0 0 0

pypgx:

A Python package for pharmacogenomics research

Prepare a depth of coverage file for all target genes with SV from BAM files.

012

0 0

pypgx:

A Python package for pharmacogenomics research

PyPGx pharmacogenomics genotyping pipeline for NGS data.

01234501

0 0 0 0

pypgx:

A Python package for pharmacogenomics research

Pyrodigal is a Python module that provides bindings to Prodigal, a fast, reliable protein-coding gene prediction for prokaryotic genomes.

01output_format

0 0 0 0 0

Demultiplexer for Nanopore samples

01barcode_kit

0 0

Evaluate alignment data

01gff

0 0

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

012gfffastafasta_fai

0 0

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

0101

0 0

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Render a Quarto notebook, including parametrization.

01parametersinput_filesextensions

0 0 0 0 0 0

papermill:

Parameterize, execute, and analyze notebooks

Quality Assessment Tool for Genome Assemblies

010101

0 0 0 0 0 0

QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel.

012345678910111213141501

0 0 0 0 0

quilt:

Read aware low coverage whole genome sequence imputation from a reference panel

Consensus module for raw de novo DNA assembly of long uncorrected reads

0123

0 0

Homology-based assembly patching: Make continuous joins and fill gaps in 'target.fa' using sequences from 'query.fa'

01010101

0 0 0 0 0 0 0 0 0 0

ragtag:

Fast reference-guided genome assembly scaffolding

Scaffolding is the process of ordering and orienting draft assembly (query) sequences into longer sequences. Gaps (stretches of "N" characters) are placed between adjacent query sequences to indicate the presence of unknown sequence. RagTag uses whole-genome alignments to a reference assembly to scaffold query sequences. RagTag does not alter input query sequence in any way and only orders and orients sequences, joining them with gaps.

010101012

0 0 0 0

ragtag:

Fast reference-guided genome assembly scaffolding

Produces a Newick format phylogeny from a multiple sequence alignment using a Neighbour-Joining algorithm. Capable of bacterial genome size alignments.

alignment

0 0 0

Randomly subsample sequencing reads to a specified coverage

012depth_cutoff

0 0

De novo genome assembler for long uncorrected reads.

01

0 0 0

write your description here

01

0 0

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion.

012

0 0 0

Extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format.

012

0 0

regtools:

RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.

Screening DNA sequences for interspersed repeats and low complexity DNA sequences

01lib

0 0 0 0 0

repeatmasker:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

A utility script to assist to convert old RepeatMasker *.out files to version 3 gff files.

01

0 0

repeatmasker:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

Create a database for RepeatModeler

01

0 0

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

Performs de novo transposable element (TE) family identification with RepeatModeler

01

0 0 0 0

repeatmodeler:

RepeatModeler is a de-novo repeat family identification and modeling package.

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

012db_pointdb_res

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

resfinder:

ResFinder identifies acquired antimicrobial resistance genes in total or partial sequenced isolates of bacteria

Preprocess the CARD database for RGI to predict antibiotic resistance from protein or nucleotide data

card

0 0 0 0

rgi:

This module preprocesses the downloaded Comprehensive Antibiotic Resistance Database (CARD) which can then be used as input for RGI.

Predict antibiotic resistance from protein or nucleotide data

01cardwildcard

0 0 0 0 0 0

rgi:

This tool provides a preliminary annotation of your DNA sequence(s) based upon the data available in The Comprehensive Antibiotic Resistance Database (CARD). Hits to genes tagged with Antibiotic Resistance ontology terms will be highlighted. As CARD expands to include more pathogens, genomes, plasmids, and ontology terms this tool will grow increasingly powerful in providing first-pass detection of antibiotic resistance associated genes. See license at CARD website

Markup VCF file using rho-calls.

01201bed

0 0

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Call regions of homozygosity and make tentative UPD calls

0101

0 0 0

rhocall:

Call regions of homozygosity and make tentative UPD calls.

Quality control of riboseq bam data

012012012010101

0 0 0 0

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Quality control of riboseq bam data

01201

0 0 0 0

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Accurate detection of short and long active ORFs using Ribo-seq data

01201

0 0 0 0 0 0 0 0 0 0 0

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Accurate detection of short and long active ORFs using Ribo-seq data

012

0 0

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data

010101

0 0 0 0 0 0 0 0 0 0

ripgrep recursively searches directories for a regex pattern

01patterncompress

0 0

pigz:

Parallel implementation of the gzip algorithm.

Render an rmarkdown notebook. Supports parametrization.

01parametersinput_files

0 0 0 0 0

rmarkdown:

Dynamic Documents for R

Assess the quality of an RNAseq assembly with or without a reference genome

010101

0 0

Calculate pan-genome from annotated bacterial assemblies in GFF3 format

01

0 0 0

Ribosomal RNA extraction from a GTF file.

gtf

0 0

Calculate expression with RSEM

01index

0 0 0 0 0 0 0 0

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Prepare a reference genome for RSEM

fastagtf

0 0 0

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Generate statistics from a bam file

01

0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Infer strandedness from sequencing reads

01bed

0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate inner distance between read pairs.

01bed

0 0 0 0 0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

01bed

0 0 0 0 0 0 0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

compare detected splice junctions to reference gene model

01bed

0 0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate how mapped reads are distributed over genomic features

01bed

0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate read duplication rate

01

0 0 0 0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Calculate TIN (transcript integrity number) from RNA-seq reads

012bed

0 0 0

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

The bndeval tool of RTG tools. It is used to evaluate called BND type of variants for agreement with a BND baseline variant set

012345

0 0 0 0 0 0 0 0 0 0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Converts a PED file to VCF headers

01

0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Plot ROC curves from vcfeval ROC data files, either to an image, or an interactive GUI. The interactive GUI isn't possible for nextflow.

01

0 0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The svdecompose tool of RTG tools. It is used to decompose structural variants to BNDs

012

0 0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

The VCFeval tool of RTG tools. It is used to evaluate called variants for agreement with a baseline variant set

012345601

0 0 0 0 0 0 0 0 0 0 0 0 0 0

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Uses the RTN R package for transcriptional regulatory network inference (TNI).

01

0 0 0 0 0

rtn:

RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons

CAZyme annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

01dbcan_db

0 0 0 0 0

dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

command from run_dbcan to prepare the database for dbCAN annotation.

NO input

0 0

run_dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

CGC annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

01012dbcan_db

0 0 0 0 0 0 0 0 0 0

dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

01012dbcan_db

0 0 0 0 0 0 0 0 0 0 0 0 0

dbcan:

Standalone version of dbCAN annotation tool for automated CAZyme annotation.

Prediction of a protein's secondary structure from its amino acid sequence

01

0 0

s4pred:

Accurate prediction of a protein's secondary structure from its amino acid sequence

sage is a search software for proteomics data

010101

0 0 0 0 0 0

sageproteomics:

Proteomics searching so fast it feels like magic.

Create index for salmon

genome_fastatranscript_fasta

0 0

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

gene/transcript quantification with Salmon

01indexgtftranscript_fastaalignment_modelib_type

0 0 0 0

salmon:

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data

SALSA, A tool to scaffold long read assemblies with HiC

012bedgfadupfilter_bed

0 0 0 0

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

012database

0 0 0 0

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

Outputs some statistics drawn from read flags.

01

0 0

sambamba:

Tools for working with SAM/BAM data

find and mark duplicate reads in BAM file

01

0 0 0

sambamba:

process your BAM data faster!

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

0 0

Module to validate illumina® Sample Sheet v2 files.

01file_schema_validator

0 0

Clips read alignments where they match BED file defined regions

01bedsave_cliprejectssave_clipstats

0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

01split

0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

reports coverage over regions in a supplied BED file

012010101

0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

Outputs a FASTA file compressed with the BGZF algorithm

01

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

calculates MD and NM tags

0101

0 0

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Concatenate BAM or CRAM file

01

0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

shuffles and groups reads together by their names

0101

0 0 0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format

0101interleave

0 0 0 0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

Produces a consensus FASTA/FASTQ/PILEUP

01

0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

convert and then index CRAM -> BAM or BAM -> CRAM file

0120101

0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

produces a histogram or table of coverage per chromosome

0120101

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

List CRAM Content-ID and Data-Series sizes

01

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Computes the depth at each position or region.

0101

0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files; samtools depth – computes the read depth at each position or region

Create a sequence dictionary file from a FASTA file

01

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index FASTA file, and optionally generate a file of chromosome sizes

0101get_sizes

0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Converts a SAM/BAM/CRAM file to FASTA

01interleave

0 0 0 0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

Converts a SAM/BAM/CRAM file to FASTQ

01interleave

0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

012

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

01

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Reports alignment summary statistics for a BAM/CRAM/SAM file

012

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

converts FASTQ files to unmapped SAM/BAM/CRAM

01

0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index SAM/BAM/CRAM file

01

0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

mark duplicate alignments in a coordinate sorted file

0101

0 0 0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

Merge BAM or CRAM file

01010101

0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

BAM

01201

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.

01

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

0101

0 0 0 0 0 0

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Sort SAM/BAM/CRAM file

0101

0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces comprehensive statistics from SAM/BAM/CRAM file

01201

0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

01201qnameindex_format

0 0 0 0 0 0 0 0 0

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Filter cells and genes in single-cell RNA-seq data using Scanpy

01000000

h5ad versions

scanpy:

Single-Cell Analysis in Python

Detect doublets in single-cell RNA-seq data using Scrublet via Scanpy

01batch_col

0 0 0

scanpy:

Single-Cell Analysis in Python

Module to use scds for doublet scoring

01

0 0 0

SCIMAP is a suite of tools that enables spatial single-cell analyses

01

0 0 0

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data.

SpatialLDA uses an LDA based approach for the identification of cellular neighborhoods, using cell type identities.

01

0 0 0 0

scimap:

Scimap is a scalable toolkit for analyzing spatial molecular data. The underlying framework is generalizable to spatial datasets mapped to XY coordinates. The package uses the anndata framework making it easy to integrate with other popular single-cell analysis toolkits. It includes preprocessing, phenotyping, visualization, clustering, spatial analysis and differential spatial testing. The Python-based implementation efficiently deals with large datasets of millions of cells.

Use pangenome outputs for GWAS

012tree

0 0

The Cluster Analysis tool of Scramble analyses and interprets the soft-clipped clusters found by cluster_identifier

0101mei_ref

0 0 0 0

scramble:

Soft Clipped Read Alignment Mapper

The cluster_identifier tool of Scramble identifies soft clipped clusters

01201

0 0

scramble:

Soft Clipped Read Alignment Mapper

Module to use scAR to remove ambient RNA from single-cell RNA-seq data

012

0 0

scvitools:

scvi-tools (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data

scar:

scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics.

Detect doublets in single-cell RNA-Seq data

01

0 0 0

scvitools:

A scalable toolkit for probabilistic modeling applied to single-cell omics data

Call peaks using SEACR on sequenced reads in bedgraph format

012threshold

0 0

seacr:

SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

01fastaindex

0 0 0 0 0

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

Generate genome indices for segemehl align

fasta

0 0

segemehl:

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection

metagenomic binning with self-supervised learning

012

0 0 0 0 0

semibin:

Metagenomic binning with semi-supervised siamese neural network

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Create BWA index for reference genome

01

0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

01010101

0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the Picard CollectVariantCallingMetrics tool.

012012010101

0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Accelerated implementation of the GATK DepthOfCoverage tool.

01201010101

0 0 0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects multiple quality metrics from a bam file

0120101plot_results

0 0 0 0 0 0 0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

0120101

0 0 0 0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

DNAscope algorithm performs an improved version of Haplotype variant calling.

01230101010101pcr_indel_modelemit_vcfemit_gvcf

0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.

012301010101

0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs Sentieon's haplotyper for germline variant calling.

0123401010101emit_vcfemit_gvcf

0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Generate recalibration table and optionally perform base quality recalibration

0120101010101generate_recalibrated_bams

0 0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

0120101

0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

01234560101

0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Tnhaplotyper2 performs somatic variant calling on the tumor-normal matched pairs.

012301010101010101emit_orientation_dataemit_contamination_data

0 0 0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.

01230101010101010101

0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm

012resource_vcfresource_tbilabelsfastafai

0 0 0 0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects whole genome quality metrics from a bam file

012010101

0 0

sentieon:

Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.

01

0 0

seqcluster:

Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.

Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.

01

0 0

seqfu:

DNA sequence utilities for FASTX files

Statistics for FASTA or FASTQ files

01

0 0 0

seqfu:

Cross-platform compiled suite of tools to manipulate and inspect FASTA and FASTQ files

Concatenating multiple uncompressed sequence files together

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTQ to FASTA format

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert FASTA/Q to tabular format, and provide various information, like sequence length, GC content/GC skew.

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Select sequences from a large file based on name/ID

01pattern

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Subset FASTA/FASTQ files to some number of sequences

012

0 0

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

match up paired-end reads from two fastq files

01

0 0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Use seqkit to find/replace strings within sequences and sequence headers

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

0 0 0

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

0 0

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Use seqkit to generate sliding windows of input fasta

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Sorts sequences by id/name/sequence/length

01

0 0

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Split single or paired-end fastq.gz files

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

simple statistics of FASTA/Q files

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Convert tabular format (first two/three columns) to FASTA/Q format.

01

0 0

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Translate DNA/RNA to protein sequence

01

0 0

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Salmonella serotype prediction from reads and assemblies

01

0 0 0 0

Computes sequence statistics from FASTQ or FASTA files

01

0 0

Generates a BED file containing genomic locations of lengths of N.

01

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

01

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Rename sequence names in FASTQ or FASTA files.

01

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.

Subsample reads from FASTQ files

012

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Common transformation operations on FASTA or FASTQ files.

01

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

01filter_list

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Trim low quality bases from FastQ files

01

0 0

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Sequence quality metrics for FASTQ and uBAM files.

01

0 0 0

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

01snpfilesample_names_fn

0 0 0 0

sequencetools:

Tools for population genetics on sequencing data

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

012fastawigfile

0 0

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

0 0

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Induce a variation graph in GFA format from alignments in PAF format

012

0 0

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

Determine Streptococcus pneumoniae serotype from Illumina paired-end reads

01

0 0 0

seroba:

SeroBA is a k-mer based pipeline to identify the Serotype from Illumina NGS reads for given references.

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT)

01234501

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Calculate the relative coverage on the Gonosomes vs Autosomes from the output of samtools depth, with error bars.

01sample_list_file

0 0 0

Demultiplex bgzip'd fastq files

012

0 0 0 0 0 0 0

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically run to ligate multiple chunks of phased common variants.

012

0 0

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data.

0123401201201

0 0

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Tool to phase rare variants onto a scaffold of common variants (output of phase_common / ligate). Require feature AVX2.

01234012301

0 0

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

Program to compute switch error rate and genotyping error rate given simulated or trio data.

01234012012

0 0

shapeit5:

Fast and accurate method for estimation of haplotypes (phasing)

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using DNA reads generated by Oxford Nanopore flow cells as input. Please note Assembler is design to focus on speed, so assembly may be considered somewhat non-deterministic as final assembly may vary across executions. See https://github.com/chanzuckerberg/shasta/issues/296.

01

0 0 0 0

Print SHA256 (256-bit) checksums.

01

0 0

md5sum:

Create an SHA256 (256-bit) checksum.

Determine Shigella serotype from Illumina or Oxford Nanopore reads

01

0 0 0

Determine Shigella serotype from assemblies or Illumina paired-end reads

01

0 0

build and deploy Shiny apps for interactively mining differential abundance data

0123012contrast_stats_assay

0 0

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make plots for interpretation of differential abundance statistics

010123

0 0 0

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Make exploratory plots for analysis of matrix data, including PCA, Boxplots and density plots

01234

0 0 0 0 0 0 0 0 0 0 0 0

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

validate consistency of feature and sample annotations with matrices and contrasts

0120101

0 0 0 0 0

shinyngs:

Provides Shiny applications for various array and NGS applications. Currently very RNA-seq centric, with plans for expansion.

Assemble bacterial isolate genomes from Illumina paired-end reads

01

0 0 0 0 0 0

A windowed adaptive trimming tool for FASTQ files using quality

012

0 0 0 0 0

Indexing of transcriptome for gene expression quantification using SimpleAF

012010101

0 0 0 0

simpleaf:

SimpleAF is a tool for quantification of gene expression from RNA-seq data

simpleaf is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

0120120123resolution01

0 0 0

simpleaf:

SimpleAF is a program to simplify and customize the running and configuration of single-cell processing with alevin-fry.

Serovar prediction of salmonella assemblies

01

0 0 0 0 0

Calculate pairwise distances and basic clustering from SKA sketches

012

0 0 0 0 0

ska:

SKA (Split Kmer Analysis)

Create genome sketch using split k-mers

012

0 0

ska:

SKA (Split Kmer Analysis)

Simple ANI calculation between reference and query genomes.

0101

0 0

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

Memory-efficient ANI database queries with skani.

0101

0 0

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

Storing skani sketches/indices on disk.

01

0 0 0 0

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

All-to-all ANI computation.

01

0 0

skani:

skani is a fast and robust tool for calculating ANI between metagenome assembled genomes and contigs.

Fast, efficient, lossless compression of FASTQ files.

01

0 0

tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2Δ7–8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.

012

0 0 0

Linearize and simplify variation graph in GFA format using blocked partial order alignment

01

0 0 0

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls. Developed by Brent Pedersen.

01230101

0 0

smoove:

structural variant calling and genotyping with existing tools, but, smoothly

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. This module runs a simple Snakemake pipeline based on input snakefile. Expect many limitations."

0101

0 0 0

Performs fastq alignment to a fasta reference using SNAP

0101

0 0 0

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

Create a SNAP index for reference genome

01234

0 0

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

structural-variant calling with sniffles

0120101vcf_outputsnf_output

0 0 0 0

Core-SNP alignment from Snippy outputs

012reference

0 0 0 0 0 0

snippy:

Rapid bacterial SNP calling and core genome alignments

Rapid haploid variant calling

01reference

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

snippy:

Rapid bacterial SNP calling and core genome alignments

Pairwise SNP distance matrix from a FASTA sequence alignment

01

0 0

Genetic variant annotation and functional effect prediction toolbox

01

0 0

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Genetic variant annotation and functional effect prediction toolbox

01db01

0 0 0 0 0

snpeff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Annotate a VCF file with another VCF file

012012

0 0

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

012012

0 0

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Splits/Joins VCF(s) file into chromosomes

01

0 0

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Rapidly extracts SNPs from a multi-FASTA alignment.

alignment

0 0 0 0

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

0 0 0

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

0 0

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012sample_groups

0 0 0 0

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Local sequence alignment tool for filtering, mapping and clustering.

010101

0 0 0 0

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Classifies and predicts the origin of metagenomic samples

01sourceslabelstaxa_sqlitetaxa_sqlite_traverse_pkl

0 0

Compare many FracMinHash signatures generated by sourmash sketch.

01file_listsave_numpy_matrixsave_csv

0 0 0 0

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Search a metagenome sourmash signature against one or many reference databases and return the minimum set of genomes that contain the k-mers in the metagenome.

01databasesave_unassignedsave_matches_sigsave_prefetchsave_prefetch_csv

0 0 0 0 0 0

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a database of sourmash signatures (a group of FracMinHash sketches) to be used as references.

01ksize

0 0

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Create a signature (a group of FracMinHash sketches) of a sequence using sourmash

01

0 0

sourmash:

Compute and compare FracMinHash signatures for DNA and protein data sets.

Annotate list of metagenome members (based on sourmash signature matches) with taxonomic information.

01taxonomy

0 0

sourmash:

Compute and compare FracMinHash signatures for DNA data sets.

Module to use the 10x Space Ranger pipeline to process 10x spatial transcriptomics data

0123456789referenceprobeset

0 0

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.

gtf

0 0

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Module to build the reference needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkref command.

fastagtfreference_name

0 0

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Assembles a small genome (bacterial, fungal, viral)

0123ymlhmm

0 0 0 0 0 0 0 0

mutational signature deconvolution of cancer cells

01

0 0 0 0 0 0 0

sparsesignatures:

SparseSignatures is an R-based computational framework which performs de novo extraction, inference, interpretation, or deconvolution of mutational counts of a large number of patients.

bsgenome.hsapiens.1000genomes.hs37d5:

Reference Genome Sequence (hs37d5), based on NCBI GRCh37

bsgenome.hsapiens.ucsc.hg38:

Full genomic sequences for Homo sapiens (UCSC genome hg38)

Computational method for finding spa types.

01repeatsrepeat_order

0 0

split one ubam into multiple, per line, fast

01

0 0

Spotiflow, accurate and efficient spot detection with stereographic flow.

01

0 0

Fast, efficient, lossless compression of FASTQ files.

012

0 0

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Fast, efficient, lossless decompression of FASTQ files.

01write_one_fastq_gz

0 0

spring:

SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)

Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA).

01ncbi_settingscertificate

0 0

sratools:

SRA Toolkit and SDK from NCBI

Download sequencing data from the NCBI Sequence Read Archive (SRA).

01ncbi_settingscertificate

0 0

sratools:

SRA Toolkit and SDK from NCBI

Test for the presence of suitable NCBI settings or create them on the fly.

NO input

versions ncbi_settings

sratools:

SRA Toolkit and SDK from NCBI

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

012db_type

0 0 0 0 0 0

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Serotype prediction of Streptococcus suis assemblies

01

0 0

Advanced sequence file format conversions

01fastafaigzi

0 0 0

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Predicts Staphylococcus aureus SCCmec type based on primers.

01

0 0

Align reads to a reference genome using STAR

010101star_ignore_sjdbgtfseq_platformseq_center

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create index for STAR

0101

0 0

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Get the minimal allowed index version from STAR

NO input

0 0

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

01

0 0 0 0 0 0 0 0 0

staramr:

Scan genome contigs against the ResFinder and PointFinder databases. In order to use the PointFinder databases, you will have to add --pointfinder-organism ORGANISM to the ext.args options.

Cell and nuclear segmentation with star-convex shapes

01

0 0

Framework that scores enhancer–gene interactions using the Activity-By-Contact model and derives transcription factor affinities on gene level

01230101010101

0 0

Download STAR-fusion genome resource required to run STAR-Fusion caller

0101fusion_annot_libdfam_species

0 0

star-fusion:

Fusion calling algorithm for RNAseq data

Create a counts matrix for single-cell data using STARSolo, handling cell barcodes and UMI information.

012opt_whitelist01

0 0 0 0 0 0

Serotype STEC samples from paired-end reads or assemblies

01

0 0

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

012345678910012seed

0 0 0 0 0 0

Annotates output files from ExpansionHunter with the pathologic implications of the repeat sizes.

0101

0 0 0

Tandem repeat genotyper for long reads

012010101

0 0 0

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation

01234fastafai

0 0 0 0 0

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs

012345678fastafai

0 0 0 0 0

strelka:

Strelka calls somatic and germline small variants from mapped sequencing reads

Merges the annotation gtf file and the stringtie output gtf files

stringtie_gtfannotation_gtf

0 0

stringtie2:

Transcript assembly and quantification for RNA-Seq

Transcript assembly and quantification for RNA-Se

01annotation_gtf

0 0 0 0 0

stringtie2:

Transcript assembly and quantification for RNA-Seq

Count reads that map to genomic features

012

0 0 0

featurecounts:

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads.

SummarizedExperiment container

010101

0 0 0

summarizedexperiment:

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Converts a bedpe file to a VCF file (beta version)

01

0 0

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

012minsvmaxsvminallelefreqminnumreads

0 0

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01max_distance_breakpointsmin_supporting_callersaccount_for_typeaccount_for_sv_strandsestimate_distanced_by_sv_sizemin_sv_size

0 0

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

010101snp_mutation_frequencysim_reads

0 0 0 0 0 0

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01minsvmaxsvminnumreads

0 0

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

SVbenchmark compares a set of “test” structural variants in VCF format to a known truth set (also in VCF format) and outputs estimates of sensitivity and specificity.

0123450101

0 0 0 0 0 0

svanalyzer:

SVanalyzer: tools for the analysis of structural variation in genomes

Build a structural variant database

01input_type

0 0

svdb:

structural variant database software

The merge module merges structural variants within one or more vcf files.

01input_prioritysort_inputs

0 0 0 0

svdb:

structural variant database software

Query a structural variant database, using a vcf file as query

01in_occsin_frqsout_occsout_frqsvcf_dbsbedpe_dbs

0 0

svdb:

structural variant database software

Performs tests on BAF files

01234

0 0

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

0 0

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

012fasta_fai

0 0 0

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

0101

0 0

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

0 0

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert a VCF file to a BEDPE file.

01

0 0

svtools:

Tools for processing and analyzing structural variants

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

0 0 0 0

svtyper:

Compute genotype of structural variants based on breakpoint depth

SVTyper-sso computes structural variant (SV) genotypes based on breakpoint depth on a SINGLE sample

012301

0 0 0

svtyper:

Bayesian genotyper for structural variants

A tool to standardize VCF files from structural variant callers

0123

0 0 0

Sylph profile command for taxonoming profiling

01database

0 0

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

Sketching/indexing sequencing reads

01reference

0 0

sylph:

Sylph quickly enables querying of genomes against even low-coverage shotgun metagenomes to find nearest neighbour ANI.

Merge multiple taxonomic profiles from sylphtaxt/taxprof into a tsv table

01data_type

0 0

sylphtax:

Integrating taxonomic information into the sylph metagenome profiler.

Incorporates taxonomy into sylph metagenomic classifier

01taxonomy

0 0

sylphtax:

Integrating taxonomic information into the sylph metagenome profiler.

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

010101file_type

0 0 0

Compresses/decompresses files

01

0 0 0

bgzip:

Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip.

bgzip a sorted tab-delimited genome file and then create tabix index

01

0 0 0

tabix:

Generic indexer for TAB-delimited genome position files.

create tabix index from a sorted bgzip tab-delimited genome file

01

0 0 0

tabix:

Generic indexer for TAB-delimited genome position files.

A tool for tagging BAM files.

01

0 0

Estimating poly(A)-tail lengths from basecalled fast5 files produced by Nanopore sequencing of RNA and DNA

01

0 0

Compress directories into tarballs with various compression options

01compress_type

0 0

Convert taxonids to taxon lineages

012taxdb

0 0

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

Convert taxon names to TaxIds

012taxdb

0 0

taxonkit:

A Cross-platform and Efficient NCBI Taxonomy Toolkit

Standardise and merge two or more taxonomic profiles into a single table

01profilerformattaxonomysamplesheet

0 0

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

Standardise the output of a wide range of taxonomic profilers

01profilerformattaxonomy

0 0

taxpasta:

TAXonomic Profile Aggregation and STAndardisation

A tool to detect resistance and lineages of M. tuberculosis genomes

01

0 0 0 0 0 0

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Aligns sequences using T_COFFEE

0101012compress

0 0 0

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Compares 2 alternative MSAs to evaluate them.

012

0 0

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Computes a consensus alignment using T_COFFEE

0101compress

0 0 0

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats the header of PDB files with t-coffee

01

0 0

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Computes the irmsd score for a given alignment and the structures.

01012

0 0

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package

0101012compress

0 0

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

01

0 0

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Compute the TCS score for a MSA or for a MSA plus a library file. Outputs the tcs as it is and a csv with just the total TCS score.

0101

0 0 0

tcoffee:

A collection of tools for Multiple Alignments of DNA, RNA, Protein Sequence

pigz:

Parallel implementation of the gzip algorithm.

Telseq: a software for calculating telomere length

012010101

0 0

samtools:

Tools for dealing with SAM, BAM and CRAM files

An accurate and fast method to classify LTR-retrotransposons in plant genomes

0101

0 0 0 0 0 0 0 0

Parses a Thermo RAW file containing mass spectra to an open file format

01

0 0

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

0 0 0 0

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

Computes the coverage of different regions from the bam file.

0101

0 0 0

tiddit:

TIDDIT - structural variant calling.

Identify chromosomal rearrangements.

0120101

0 0 0

sv:

Search for structural variants.

tidk explore attempts to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT).

01

0 0 0

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Plots telomeric repeat frequency against sliding window location using data produced by tidk/search

01

0 0

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

Searches a genome for a telomere string such as TTAGGG

01string

0 0 0

tidk:

tidk is a toolkit to identify and visualise telomeric repeats in genomes

write your description here

012

0 0 0 0 0

Create fasta consensus with TOPAS toolkit with options to penalize substitutions for typical DNA damage present in ancient DNA

01010101vcf_output

0 0 0 0 0

topas:

This toolkit allows the efficient manipulation of sequence data in various ways. It is organized into modules: The FASTA processing modules, the FASTQ processing modules, the GFF processing modules and the VCF processing modules.

A post sequencing QC tool for Oxford Nanopore sequencers

01

0 0 0 0 0

TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.

01

0 0 0 0 0 0

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

01fold

0 0 0 0 0

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Tandem repeat genotyping from PacBio HiFi data

0123010101

0 0 0

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Merge TRGT VCFs from multiple samples

0120101

0 0

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Visualize tandem repeats genotyped by TRGT

012345010101

0 0

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

Trim FastQ files using Trim Galore!

01

0 0 0 0 0 0

Performs quality and adapter trimming on paired end and single end reads

01

0 0 0 0 0 0

Assembles a de novo transcriptome from RNAseq reads

01

0 0 0

Detection of tRNA sequences using covariance models

01

0 0 0 0 0 0 0

Run TRUST4 on RNA-seq data

012fastavdj_referencebarcode_whitelistcell_barcode_readumi_readread_format

0 0 0 0 0 0 0 0 0

Given baseline and comparison sets of variants, calculate the recall/precision/f-measure

0123450101

0 0 0 0 0 0 0 0 0 0

truvari:

Structural variant comparison tool for VCFs

Over multiple vcfs, calculate their intersection/consistency.

01

0 0

truvari:

Structural variant comparison tool for VCFs

Normalization of SVs into disjointed genomic regions

01

0 0

truvari:

Structural variant comparison tool for VCFs

Cluster contigs from multiple assemblies by similarity

012

0 0

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Subsample a long-read sequencing fastq file for multiple assemblies

01

0 0

trycycler:

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes

Transcript Selector for BRAKER TSEBRA combines gene predictions by selecting transcripts based on their extrisic evidence support

01hints_fileskeep_gtfsconfig

0 0 0

Import transcript-level abundances and estimated counts for gene-level analysis packages

0101quant_type

0 0 0 0 0 0 0 0 0

tximeta:

Transcript Quantification Import with Automatic Metadata

Remove lines from bed file that refer to off-chromosome locations.

01sizes

0 0

ucsc:

Remove lines from bed file that refer to off-chromosome locations.

Convert a bedGraph file to bigWig format.

01sizes

0 0

ucsc:

Convert a bedGraph file to bigWig format.

Convert file from bed to bigBed format

01sizesautosql

0 0

ucsc:

Convert file from bed to bigBed format

compute average score of bigwig over bed file

01bigwig

0 0

ucsc:

Compute average score of big wig over each bed, which may have introns.

compute average score of bigwig over bed file

01

0 0 0

ucsc:

Convert GTF files to GenePred format

convert between genome builds

01chain

0 0 0

ucsc:

Move annotations from one assembly to another

Convert ascii format wig file to binary big wig format

01sizes

0 0

ucsc:

Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

0101012

0 0

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Index gtf file for reads alignment

0101

0 0 0

ultra:

Splice aligner of long transcriptomic reads to genome.

uLTRA aligner - A wrapper around minimap2 to improve small exon detection

01genomegtf

0 0

ultra:

Splice aligner of long transcriptomic reads to genome.

Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.

01barcode_file

0 0 0 0

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

012mode

0 0 0 0

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

012get_output_stats

0 0 0 0 0 0

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

01

0 0 0

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

012create_bamget_group_info

0 0 0 0

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

012

0 0 0

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Assembles bacterial genomes

012

0 0 0 0

Module to run UniverSC an open-source pipeline to demultiplex and process single-cell RNA-Seq data

0101technology

0 0

Extract files.

01

0 0

Extract files.

01

0 0

untar:

Extract tar.gz files.

Unzip ZIP archive files

01

0 0

Unzip ZIP archive files

01

0 0

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Simple software to call UPD regions from germline exome/wgs trios.

01

0 0

Aligns protein structures using UPP

0101compress

0 0

upp:

SATe-enabled phylogenetic placement

The Java port of the VarDict variant caller

01230101

0 0

Runs a differential expression analysis with dream() from variancePartition R package

012345012

0 0 0

dream:

Differential expression for repeated measures

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

0 0

Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing

012scenarioscenario_sample_name

0 0

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.

010101

0 0

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Obtains per-sample observations for the actual calling process with varlociraptor calls

012340101

0 0

varlociraptor:

Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.

Convert VCF with structural variations to CytoSure format

01010101blacklist_bed

0 0

A tool to create a Gemini-compatible DB file from an annotated VCF

012

0 0

vcf2maf

01fastavep_cache

0 0

quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files

0123tomlluaresources

0 0 0

If multiple alleles are specified in a single record, break the record into several lines preserving allele-specific INFO fields

012

0 0

vcflib:

Command-line tools for manipulating VCF files

Command line tools for parsing and manipulating VCF files.

012

0 0

vcflib:

Command line tools for parsing and manipulating VCF files.

Generates a VCF stream where AC and NS have been generated for each record using sample genotypes.

012

0 0

vcflib:

Command-line tools for manipulating VCF files

List unique genotypes. Like GNU uniq, but for VCF records. Remove records which have the same position, ref, and alt as the previous record.

012

0 0

vcflib:

Command-line tools for manipulating VCF files

A set of tools written in Perl and C++ for working with VCF files

01beddiff_variant_file

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The align command performs pairwise sequence alignments of viral genomes and provides similarity measures like ANI and coverage (alignment fraction)

0101save_alignment

0 0 0 0

vclust:

Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.

Vclust cluster performs threshold-based clustering by assigning a genome sequence to a cluster if its similarity (e.g., ANI) to the cluster meets or exceeds a user-defined threshold.

0101metrictaniganiani

0 0 0

vclust:

"Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes."

The prefilter command creates a pre-alignment filter that reduces the number of genome pairs to be aligned by filtering out dissimilar sequences before the alignment step.

01

0 0

vclust:

Fast and accurate tool for calculating ANI and clustering virus genomes and metagenomes.

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto tutorial

0123gtf

0 0

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

012refvcf

0 0 0 0 0 0 0 0

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

012012refvcfreferences

0 0 0 0 0 0 0

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Constructs a graph from a reference and variant calls or a multiple sequence alignment file

01230101

0 0

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

Deconstruct snarls present in a variation graph in GFA format to variants in VCF format

01pbgbwt

0 0

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

write your description here

01

0 0 0

vg:

Variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods.

calculate secondary structures of two RNAs with dimerization

01

0 0 0

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and “dot plot” files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

0 0 0

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

calculate locally stable secondary structures of RNAs

fasta

0 0

viennarna:

calculate locally stable secondary structures of RNAs

Compute locally stable RNA secondary structure with a maximal base pair span. For a sequence of length n and a base pair span of L the algorithm uses only O(n+LL) memory and O(nL*L) CPU time. Thus it is practical to “scan” very large genomes for short RNA structures. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure.

Use vireo to perform donor deconvolution for multiplexed scRNA-seq data

01234

0 0 0 0 0

The module prepares the specification JSON file for Vizgen's post-processing tool cell segmentation workflow.

012algorithm_jsonimages_regex

0 0

vizgenpostprocessing:

Vizgen's post-processing tool

The module runs the segmentation algorithm on a specific tile using Vizgen's post-processing tool.

0123algorithm_jsoncustom_weights

0 0

vizgenpostprocessing:

Vizgen's post-processing tool

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

0 0

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

0 0

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Binning virus genomes from metagenomes

0101

0 0 0 0

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

0 0 0 0 0 0 0 0 0 0 0 0 0

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

0 0 0 0

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

0 0 0

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

01db

0 0

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

01sort_arg

0 0

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

01dbidcutoffoutoptionuser_columns

0 0 0 0 0 0 0 0 0 0

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

decomposes multiallelic variants into biallelic in a VCF file.

012

0 0

vt:

A tool set for short variant discovery in genetic sequence data

Decomposes biallelic block substitutions into its constituent SNPs.

0123

0 0

vt:

A tool set for short variant discovery in genetic sequence data

normalizes variants in a VCF file

01230101

0 0 0

vt:

A tool set for short variant discovery in genetic sequence data

The VueGen nf-core module is designed to automate report generation from outputs produced by other modules, subworkflows, or pipelines. The module integrates the VueGen Python library and customizes it for compatibility with the Nextflow environment. VueGen automates the creation of reports from bioinformatics outputs, supporting formats like PDF, HTML, DOCX, ODT, PPTX, Reveal.js, Jupyter notebooks, and Streamlit web applications.

input_typeinput_pathreport_type

0 0

a pangenome-scale aligner

01234query_selffasta_query_list

0 0

The non-interactive network downloader

01

0 0

simulating sequence reads from a reference genome

01

0 0

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

012fastafasta_fai

0 0 0 0

Masks out highly repetitive DNA sequences with low complexity in a genome

01

0 0

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to generate frequency counts of repetitive units.

01

0 0

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to take a counts file and creates a file of genomic co-ordinates to be masked.

0101

0 0

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter

01

0 0

fastqgather:

A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter.

A tool of the wipertools suite that splits FASTQ files into chunks

01num_splits

0 0

fastqscatter:

A tool of the wipertools suite that splits FASTQ files into chunks.

A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files

01

0 0 0

fastqwiper:

A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files.

A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper

01

0 0

reportgather:

A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper.

Convert and filter aligned reads to .npz

0120101

0 0

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Returns the gender of a .npz resulting from convert, based on a Gaussian mixture model trained during the newref phase

0101

0 0

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Create a new reference using healthy reference samples

01

0 0

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Find copy number aberrations

010101

0 0 0 0 0 0 0

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

A large variant benchmarking tool analogous to hap.py for small variants.

01234

0 0 0 0

Fast lightweight accurate xenograft sorting

host_fastagraft_fastaindexnobjectsmask

0 0 0

xengsort:

A fast xenograft read sorter based on space-efficient k-mer hashing

The xeniumranger import-segmentation module allows you to specify 2D nuclei and/or cell segmentation results for assigning transcripts to cells and recalculate all Xenium Onboard Analysis (XOA) outputs that depend on segmentation. Segmentation results can be generated by community-developed tools or prior Xenium segmentation result.

01expansion_distancecoordinate_transformnucleicellstranscript_assignmentviz_polygons

0 0

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger relabel module allows you to change the gene labels applied to decoded transcripts.

01gene_panel

0 0

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger rename module allows you to change the sample region_name and cassette_name throughout all the Xenium Onboard Analysis output files that contain this information.

01region_namecassette_name

0 0

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

The xeniumranger resegment module allows you to generate a new segmentation of the morphology image space by rerunning the Xenium Onboard Analysis (XOA) segmentation algorithms with modified parameters.

01expansion_distancedapi_filterboundary_staininterior_stain

0 0

xeniumranger:

Xenium Ranger is a set of analysis pipelines that process Xenium In Situ Gene Expression data to relabel, resegment, or import new segmentation results from community-developed tools. Xenium Ranger provides flexible off-instrument reanalysis of Xenium In Situ data. Relabel transcripts, resegment cells with the latest 10x segmentation algorithms, or import your own segmentation data to assign transcripts to cells.

Compresses files with xz.

01

0 0

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Decompresses files with xz.

01

0 0

xz:

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2.

Performs assembly scaffolding using YaHS

01fastafai

0 0 0 0

a tool to build k-mer hash table for fasta and fastq files

01

0 0

yak:

Yet another k-mer analyzer

Builds a YARA index for a reference genome

01

0 0

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Align reads to a reference genome using YARA

0101

0 0 0

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Compress file lists to produce ZIP archive files

01

0 0

unzip:

p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.

Click here to trigger an update.