Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • bam 175
  • cram 57
  • sam 47
  • fastq 31
  • fasta 28
  • map 23
  • vcf 21
  • alignment 21
  • genomics 18
  • sort 15
  • align 15
  • gatk4 14
  • coverage 14
  • bed 12
  • variant calling 12
  • genome 11
  • reference 11
  • statistics 11
  • pacbio 10
  • merge 9
  • depth 9
  • samtools 9
  • markduplicates 9
  • index 8
  • filter 8
  • bisulfite 8
  • picard 8
  • metrics 8
  • qc 7
  • convert 7
  • copy number 7
  • isoseq 7
  • methylation 7
  • bisulphite 7
  • methylseq 7
  • 5mC 7
  • bwa 7
  • consensus 6
  • bqsr 6
  • WGBS 6
  • scWGBS 6
  • DNA methylation 6
  • variants 5
  • cnv 5
  • sentieon 5
  • quality 5
  • mapping 5
  • bisulfite sequencing 5
  • biscuit 5
  • aligner 5
  • dedup 5
  • duplicates 5
  • mem 5
  • clipping 5
  • split 4
  • ancient DNA 4
  • long-read 4
  • QC 4
  • stats 4
  • base quality score recalibration 4
  • umi 4
  • bismark 4
  • short-read 4
  • deduplication 4
  • fgbio 4
  • ccs 4
  • hmmcopy 4
  • metagenomics 3
  • structural variants 3
  • quality control 3
  • contamination 3
  • binning 3
  • trimming 3
  • bedtools 3
  • gvcf 3
  • bedGraph 3
  • cnvkit 3
  • 3-letter genome 3
  • riboseq 3
  • counts 3
  • view 3
  • pypgx 3
  • STR 3
  • family 3
  • umitools 3
  • chromosome 3
  • ancestry 3
  • bamtools 3
  • pileup 3
  • informative sites 3
  • kinship 3
  • identity 3
  • relatedness 3
  • amplicon sequencing 3
  • indel 3
  • insert 3
  • fingerprint 3
  • variant 2
  • clustering 2
  • imputation 2
  • rnaseq 2
  • mags 2
  • sv 2
  • matrix 2
  • cluster 2
  • histogram 2
  • transcriptome 2
  • mappability 2
  • damage 2
  • genotyping 2
  • population genetics 2
  • low frequency variant calling 2
  • json 2
  • merging 2
  • de novo assembly 2
  • mpileup 2
  • coptr 2
  • ptr 2
  • preprocessing 2
  • HiFi 2
  • paf 2
  • interval_list 2
  • chunk 2
  • clean 2
  • add 2
  • UMI 2
  • rsem 2
  • angsd 2
  • RNA-seq 2
  • spark 2
  • replace 2
  • bwameth 2
  • aln 2
  • gatk4spark 2
  • regions 2
  • sequenzautils 2
  • mapcounter 2
  • read-group 2
  • bam2fq 2
  • collate 2
  • dict 2
  • mudskipper 2
  • transcriptomic 2
  • fixmate 2
  • polyA_tail 2
  • refine 2
  • primer 2
  • Pharmacogenetics 2
  • UMIs 2
  • duplex 2
  • unaligned 2
  • realignment 2
  • BAM 2
  • assembly 1
  • gff 1
  • bacteria 1
  • classification 1
  • nanopore 1
  • classify 1
  • taxonomic profiling 1
  • conversion 1
  • count 1
  • contigs 1
  • illumina 1
  • compression 1
  • indexing 1
  • serotype 1
  • antimicrobial resistance 1
  • expression 1
  • pairs 1
  • plot 1
  • aDNA 1
  • neural network 1
  • haplotype 1
  • archaeogenomics 1
  • low-coverage 1
  • machine learning 1
  • phasing 1
  • bcf 1
  • palaeogenomics 1
  • sequence 1
  • LAST 1
  • genotype 1
  • glimpse 1
  • peaks 1
  • vsearch 1
  • mitochondria 1
  • splicing 1
  • extract 1
  • reads 1
  • pangenome 1
  • snp 1
  • profile 1
  • detection 1
  • deamination 1
  • MAF 1
  • visualization 1
  • microbiome 1
  • gridss 1
  • cat 1
  • fragment 1
  • ont 1
  • ngscheckmate 1
  • matching 1
  • rna 1
  • haplotypecaller 1
  • compress 1
  • miscoding lesions 1
  • palaeogenetics 1
  • archaeogenetics 1
  • bin 1
  • bigwig 1
  • quantification 1
  • SV 1
  • telomere 1
  • fai 1
  • resistance 1
  • sample 1
  • uLTRA 1
  • host 1
  • minimap2 1
  • typing 1
  • long_read 1
  • fusion 1
  • subsample 1
  • arriba 1
  • dictionary 1
  • mapper 1
  • mlst 1
  • repeat expansion 1
  • hi-c 1
  • chip-seq 1
  • PCA 1
  • atac-seq 1
  • converter 1
  • ancient dna 1
  • lift 1
  • leviosam2 1
  • GPU-accelerated 1
  • import 1
  • orf 1
  • salmon 1
  • barcode 1
  • soft-clipped clusters 1
  • expansionhunterdenovo 1
  • repeat_expansions 1
  • pharmacogenetics 1
  • reheader 1
  • eigenstrat 1
  • graft 1
  • trim 1
  • scatter 1
  • bayesian 1
  • short reads 1
  • estimation 1
  • splice 1
  • heatmap 1
  • xenograft 1
  • artic 1
  • aggregate 1
  • demultiplexed reads 1
  • gatk 1
  • RNA-Seq 1
  • mapad 1
  • adna 1
  • c to t 1
  • unmarkduplicates 1
  • junction 1
  • copy-number 1
  • wham 1
  • whamg 1
  • bgen 1
  • readwriter 1
  • md 1
  • nm 1
  • uq 1
  • snv 1
  • downsample 1
  • downsample bam 1
  • subsample bam 1
  • Mycobacterium tuberculosis 1
  • umicollapse 1
  • chromosomal rearrangements 1
  • bedcov 1
  • verifybamid 1
  • DNA contamination estimation 1
  • xml 1
  • svg 1
  • multi-tool 1
  • predict 1
  • haplotag 1
  • genotype likelihood 1
  • probabilistic realignment 1
  • tag 1
  • bwamem2 1
  • Pacbio 1
  • bwameme 1
  • cell_barcodes 1
  • realign 1
  • circular 1
  • size 1
  • cram-size 1
  • paraphase 1
  • rna velocity 1
  • 10x 1
  • rad 1
  • bam2fastx 1
  • bam2fastq 1
  • leafcutter 1
  • regtools 1
  • shift 1
  • ATACshift 1
  • ATACseq 1
  • telseq 1
  • elprep 1
  • quality_control 1
  • controlstatistics 1
  • elfasta 1
  • collectreadcounts 1
  • calibratedragstrmodel 1
  • targets 1
  • gangstr 1
  • consensus sequence 1
  • groupreads 1
  • duplexumi 1
  • unmapped 1
  • ubam 1
  • zipperbams 1
  • revert 1
  • printreads 1
  • mergebamalignment 1
  • split by chromosome 1
  • trimBam 1
  • bamUtil 1
  • bamtools/split 1
  • yaml 1
  • bamtools/convert 1
  • mouse 1
  • genomecov 1
  • bamtobed 1
  • allele counts 1
  • doCounts 1
  • HLA 1
  • read group 1
  • post mortem damage 1
  • atlas 1
  • paired-end 1
  • pcr duplicates 1
  • track 1
  • corrrelation 1
  • cumulative coverage 1
  • scatterplot 1
  • subcontigs 1
  • sorted 1
  • cmseq 1
  • protein coding genes 1
  • polymorphic sites 1
  • polymorphic 1
  • polymut 1
  • duplicate removal 1
  • chromap 1
  • contact 1
  • pmdtools 1
  • bamstat 1
  • sortvcf 1
  • picard/renamesampleinvcf 1
  • pcr 1
  • mate-pair 1
  • hybrid-selection 1
  • bam2seqz 1
  • gc_wiggle 1
  • freqsum 1
  • pseudodiploid 1
  • pseudohaploid 1
  • random draw 1
  • CRAM 1
  • SMN2 1
  • SMN1 1
  • rtg 1
  • multimapper 1
  • calmd 1
  • ampliconclip 1
  • amplicon 1
  • duplicate marking 1
  • sambamba 1
  • flagstat 1
  • Ancestor 1
  • insert size 1
  • LCA 1
  • faidx 1
  • repair 1
  • paired 1
  • readgroup 1
  • read pairs 1
  • collapsing 1
  • adapter removal 1
  • qualities 1
  • damage patterns 1
  • NGS 1
  • DNA damage 1
  • readcounter 1
  • gccounter 1
  • jasmine 1
  • jasminesv 1
  • tumor/normal 1
  • gender 1
  • ngm 1
  • paragraph 1
  • subreads 1
  • pbmerge 1
  • pbbam 1
  • NextGenMap 1
  • mbias 1
  • methylation bias 1
  • mitochondrial to nuclear ratio 1
  • ratio 1
  • mtnucratio 1
  • mosdepth 1
  • annotation 0
  • database 0
  • download 0
  • gtf 0
  • MSA 0
  • k-mer 0
  • taxonomy 0
  • gfa 0
  • somatic 0
  • proteomics 0
  • single-cell 0
  • VCF 0
  • phylogeny 0
  • long reads 0
  • build 0
  • bcftools 0
  • graph 0
  • variation graph 0
  • reporting 0
  • kmer 0
  • cna 0
  • visualisation 0
  • table 0
  • databases 0
  • protein 0
  • wgs 0
  • taxonomic classification 0
  • openms 0
  • imaging 0
  • tsv 0
  • demultiplex 0
  • phage 0
  • sequences 0
  • pangenome graph 0
  • repeat 0
  • searching 0
  • amr 0
  • protein sequence 0
  • plink2 0
  • structure 0
  • example 0
  • bins 0
  • filtering 0
  • transcript 0
  • annotate 0
  • gzip 0
  • mmseqs2 0
  • virus 0
  • validation 0
  • completeness 0
  • seqkit 0
  • cooler 0
  • iCLIP 0
  • gene 0
  • db 0
  • checkm 0
  • metagenome 0
  • germline 0
  • complexity 0
  • gff3 0
  • decompression 0
  • mag 0
  • hmmer 0
  • kraken2 0
  • blast 0
  • segmentation 0
  • evaluation 0
  • feature 0
  • newick 0
  • spatial 0
  • ucsc 0
  • msa 0
  • mkref 0
  • sketch 0
  • hmmsearch 0
  • ncbi 0
  • prediction 0
  • demultiplexing 0
  • mirna 0
  • antimicrobial peptides 0
  • antimicrobial resistance genes 0
  • kmers 0
  • csv 0
  • prokaryote 0
  • scRNA-seq 0
  • multiple sequence alignment 0
  • report 0
  • differential 0
  • NCBI 0
  • tumor-only 0
  • single 0
  • plasmid 0
  • text 0
  • adapters 0
  • idXML 0
  • diversity 0
  • tabular 0
  • indels 0
  • interval 0
  • summary 0
  • FASTQ 0
  • kallisto 0
  • fastx 0
  • single cell 0
  • svtk 0
  • profiling 0
  • sourmash 0
  • isolates 0
  • benchmark 0
  • antibiotic resistance 0
  • mutect2 0
  • concatenate 0
  • amps 0
  • de novo 0
  • arg 0
  • call 0
  • structural 0
  • reference-free 0
  • query 0
  • compare 0
  • wxs 0
  • distance 0
  • circrna 0
  • read depth 0
  • propr 0
  • CLIP 0
  • logratio 0
  • sylph 0
  • snps 0
  • cut 0
  • dna 0
  • retrotransposon 0
  • genome assembler 0
  • isomir 0
  • ganon 0
  • HMM 0
  • phylogenetic placement 0
  • happy 0
  • enrichment 0
  • transcriptomics 0
  • peak-calling 0
  • bedgraph 0
  • public datasets 0
  • hic 0
  • deep learning 0
  • bedpe 0
  • microsatellite 0
  • gsea 0
  • xeniumranger 0
  • containment 0
  • redundancy 0
  • diamond 0
  • mtDNA 0
  • genmod 0
  • ranking 0
  • image 0
  • bcl2fastq 0
  • fungi 0
  • ATAC-seq 0
  • DNA sequencing 0
  • bgzip 0
  • abundance 0
  • BGC 0
  • targeted sequencing 0
  • hybrid capture sequencing 0
  • biosynthetic gene cluster 0
  • copy number alteration calling 0
  • malt 0
  • DNA sequence 0
  • ampir 0
  • parsing 0
  • normalization 0
  • union 0
  • skani 0
  • microarray 0
  • sequencing 0
  • tabix 0
  • krona 0
  • html 0
  • image_analysis 0
  • mcmicro 0
  • fastk 0
  • highly_multiplexed_imaging 0
  • transposons 0
  • bakta 0
  • checkv 0
  • small indels 0
  • adapter trimming 0
  • bacterial 0
  • duplication 0
  • polishing 0
  • remove 0
  • archiving 0
  • zip 0
  • quality trimming 0
  • unzip 0
  • uncompress 0
  • untar 0
  • benchmarking 0
  • scaffolding 0
  • pangolin 0
  • panel 0
  • entrez 0
  • ataqv 0
  • khmer 0
  • spaceranger 0
  • chimeras 0
  • popscle 0
  • genotype-based deconvoltion 0
  • observations 0
  • DRAMP 0
  • neubi 0
  • amplify 0
  • macrel 0
  • lossless 0
  • rna_structure 0
  • PacBio 0
  • RNA 0
  • ligate 0
  • virulence 0
  • transcripts 0
  • genome assembly 0
  • dist 0
  • score 0
  • shapeit 0
  • pseudoalignment 0
  • miRNA 0
  • seqtk 0
  • krona chart 0
  • SNP 0
  • complement 0
  • reports 0
  • notebook 0
  • wastewater 0
  • eukaryotes 0
  • prokaryotes 0
  • survivor 0
  • population genomics 0
  • cfDNA 0
  • genome mining 0
  • hidden Markov model 0
  • mask 0
  • ambient RNA removal 0
  • organelle 0
  • covid 0
  • dump 0
  • variant_calling 0
  • mkfastq 0
  • windowmasker 0
  • cellranger 0
  • combine 0
  • prefetch 0
  • comparisons 0
  • amplicon sequences 0
  • prokka 0
  • C to T 0
  • das tool 0
  • das_tool 0
  • vrhyme 0
  • nucleotide 0
  • CRISPR 0
  • intervals 0
  • cut up 0
  • bracken 0
  • cool 0
  • somatic variants 0
  • mzml 0
  • bim 0
  • fam 0
  • npz 0
  • guide tree 0
  • fcs-gx 0
  • deeparg 0
  • proteome 0
  • gene expression 0
  • genomes 0
  • scores 0
  • lineage 0
  • png 0
  • microbes 0
  • kraken 0
  • wig 0
  • structural_variants 0
  • pairsam 0
  • pan-genome 0
  • roh 0
  • comparison 0
  • variation 0
  • hla_typing 0
  • hlala_typing 0
  • Streptococcus pneumoniae 0
  • snpsift 0
  • nextclade 0
  • snpeff 0
  • ampgram 0
  • reformat 0
  • effect prediction 0
  • reformatting 0
  • instrain 0
  • SimpleAF 0
  • metamaps 0
  • hla 0
  • genomad 0
  • ChIP-seq 0
  • ichorcna 0
  • hlala 0
  • de novo assembler 0
  • rrna 0
  • nucleotides 0
  • taxids 0
  • taxon name 0
  • FracMinHash sketch 0
  • rgfa 0
  • small variants 0
  • multiallelic 0
  • proportionality 0
  • regression 0
  • mitochondrion 0
  • registration 0
  • ped 0
  • cnvnator 0
  • gene set analysis 0
  • zlib 0
  • gstama 0
  • differential expression 0
  • functional analysis 0
  • concordance 0
  • gene set 0
  • genetics 0
  • switch 0
  • haplogroups 0
  • small genome 0
  • trancriptome 0
  • shigella 0
  • signature 0
  • image_processing 0
  • tnhaplotyper2 0
  • graph layout 0
  • phase 0
  • interactions 0
  • tama 0
  • polish 0
  • iphop 0
  • pharokka 0
  • k-mer index 0
  • vg 0
  • bloom filter 0
  • minhash 0
  • cancer genomics 0
  • mash 0
  • purge duplications 0
  • library 0
  • rtgtools 0
  • preseq 0
  • adapter 0
  • retrotransposons 0
  • long terminal repeat 0
  • tree 0
  • COBS 0
  • lofreq 0
  • megan 0
  • runs_of_homozygosity 0
  • scaffold 0
  • contig 0
  • assembly evaluation 0
  • vcflib 0
  • junctions 0
  • GC content 0
  • k-mer frequency 0
  • resolve_bioscience 0
  • Duplication purging 0
  • spatial_transcriptomics 0
  • xz 0
  • archive 0
  • checksum 0
  • duplicate 0
  • Read depth 0
  • long terminal retrotransposon 0
  • maximum likelihood 0
  • msisensor-pro 0
  • subset 0
  • screen 0
  • bustools 0
  • standardization 0
  • salmonella 0
  • parallelized 0
  • tumor 0
  • micro-satellite-scan 0
  • orthology 0
  • krakentools 0
  • profiles 0
  • rename 0
  • transformation 0
  • svdb 0
  • removal 0
  • bfiles 0
  • homoploymer 0
  • pair 0
  • serogroup 0
  • kma 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • msi 0
  • MSI 0
  • fusions 0
  • variant pruning 0
  • interactive 0
  • krakenuniq 0
  • taxonomic profile 0
  • instability 0
  • varcal 0
  • function 0
  • immunoprofiling 0
  • amptransformer 0
  • fetch 0
  • GEO 0
  • metagenomic 0
  • identifier 0
  • frame-shift correction 0
  • long-read sequencing 0
  • genome bins 0
  • metadata 0
  • tab 0
  • sequence analysis 0
  • intersection 0
  • windows 0
  • emboss 0
  • haplotypes 0
  • region 0
  • deconvolution 0
  • allele-specific 0
  • ome-tif 0
  • MCMICRO 0
  • mirdeep2 0
  • microbial 0
  • RNA sequencing 0
  • microscopy 0
  • smrnaseq 0
  • concat 0
  • tbi 0
  • intersect 0
  • merge mate pairs 0
  • normalize 0
  • reads merging 0
  • norm 0
  • sizes 0
  • bases 0
  • interval list 0
  • cnv calling 0
  • decontamination 0
  • human removal 0
  • screening 0
  • cleaning 0
  • trgt 0
  • gem 0
  • split_kmers 0
  • calling 0
  • corrupted 0
  • CNV 0
  • correction 0
  • nacho 0
  • cvnkit 0
  • nanostring 0
  • mRNA 0
  • vdj 0
  • recombination 0
  • eCLIP 0
  • parse 0
  • hostile 0
  • version 0
  • validate 0
  • samplesheet 0
  • format 0
  • doublets 0
  • eido 0
  • anndata 0
  • awk 0
  • blastp 0
  • deseq2 0
  • rna-seq 0
  • blastn 0
  • settings 0
  • pigz 0
  • spatial_omics 0
  • random forest 0
  • metagenomes 0
  • structural-variant calling 0
  • gene labels 0
  • fasterq-dump 0
  • find 0
  • sra-tools 0
  • single cells 0
  • joint genotyping 0
  • allele 0
  • WGS 0
  • gwas 0
  • antibiotics 0
  • RiPP 0
  • authentication 0
  • secondary metabolites 0
  • simulate 0
  • join 0
  • evidence 0
  • dereplicate 0
  • panelofnormals 0
  • MaltExtract 0
  • antismash 0
  • HOPS 0
  • baf 0
  • cgMLST 0
  • NRPS 0
  • edit distance 0
  • repeats 0
  • filtermutectcalls 0
  • ragtag 0
  • orthologs 0
  • scanner 0
  • geo 0
  • helitron 0
  • spatype 0
  • wavefront 0
  • mashmap 0
  • covariance models 0
  • proteus 0
  • remove samples 0
  • 16S 0
  • yahs 0
  • hmmscan 0
  • hhsuite 0
  • CRISPRi 0
  • detecting svs 0
  • copy number analysis 0
  • hmmpress 0
  • short-read sequencing 0
  • variantcalling 0
  • gender determination 0
  • phylogenies 0
  • sccmec 0
  • streptococcus 0
  • copy number alterations 0
  • copy number variation 0
  • spa 0
  • signatures 0
  • readproteingroups 0
  • groupby 0
  • data-download 0
  • dnamodelapply 0
  • constant 0
  • invariant 0
  • dnascope 0
  • doublet 0
  • patterns 0
  • regex 0
  • SNPs 0
  • paired reads re-pairing 0
  • samples 0
  • denoisereadcounts 0
  • tnscope 0
  • hwe 0
  • fix 0
  • qualty 0
  • chloroplast 0
  • confidence 0
  • malformed 0
  • blat 0
  • alr 0
  • metabolite annotation 0
  • fracminhash sketch 0
  • ribosomal RNA 0
  • taxonomic composition 0
  • hash sketch 0
  • eigenvectors 0
  • trna 0
  • hicPCA 0
  • sliding 0
  • mzML 0
  • snakemake 0
  • workflow 0
  • genome annotation 0
  • workflow_mode 0
  • copyratios 0
  • prepare 0
  • createreadcountpanelofnormals 0
  • catpack 0
  • mobile genetic elements 0
  • rRNA 0
  • integron 0
  • Computational Immunology 0
  • Bioinformatics Tools 0
  • metaspace 0
  • Immune Deconvolution 0
  • all versus all 0
  • inbreeding 0
  • melon 0
  • disomy 0
  • pca 0
  • dream 0
  • plink2_pca 0
  • coding 0
  • upd 0
  • uniparental 0
  • files 0
  • eucaryotes 0
  • vcf2db 0
  • short 0
  • gemini 0
  • maf 0
  • lua 0
  • toml 0
  • pruning 0
  • cds 0
  • bigbed 0
  • heterozygous genotypes 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • covariance model 0
  • dereplication 0
  • microbial genomics 0
  • drep 0
  • variancepartition 0
  • scRNA-Seq 0
  • homozygous genotypes 0
  • agat 0
  • longest 0
  • bedgraphtobigwig 0
  • f coefficient 0
  • isoform 0
  • sequencing adapters 0
  • transcroder 0
  • linkage equilibrium 0
  • bgen file 0
  • svtk/baftest 0
  • vsearch/sort 0
  • vcf2bed 0
  • extractunbinned 0
  • Indel 0
  • host removal 0
  • rdtest 0
  • linkbins 0
  • haploype 0
  • impute 0
  • sintax 0
  • reference compression 0
  • SNV 0
  • usearch 0
  • rdtest2vcf 0
  • long read alignment 0
  • reference panel 0
  • SINE 0
  • bedtobigbed 0
  • countsvtypes 0
  • baftest 0
  • pangenome-scale 0
  • plant 0
  • decompress 0
  • shuffleBed 0
  • vcf file 0
  • uniq 0
  • genotype dosages 0
  • assembly polishing 0
  • genome polishing 0
  • comp 0
  • fast5 0
  • masking 0
  • vcfbreakmulti 0
  • low-complexity 0
  • GFF/GTF 0
  • deduplicate 0
  • graph projection to vcf 0
  • trio binning 0
  • VCFtools 0
  • wget 0
  • polya tail 0
  • tandem repeats 0
  • construct 0
  • long read 0
  • network 0
  • intron 0
  • peak picking 0
  • partitioning 0
  • Illumina 0
  • clahe 0
  • refresh 0
  • java 0
  • rank 0
  • hashing-based deconvolution 0
  • association 0
  • tag2tag 0
  • GWAS 0
  • tags 0
  • impute-info 0
  • functional 0
  • uniques 0
  • case/control 0
  • drug categorization 0
  • Read report 0
  • Read trimming 0
  • Read filters 0
  • associations 0
  • spatial_neighborhoods 0
  • scimap 0
  • Bayesian 0
  • structural-variants 0
  • omics 0
  • biological activity 0
  • script 0
  • prior knowledge 0
  • staging 0
  • search engine 0
  • mass_error 0
  • multiqc 0
  • distance-based 0
  • nucleotide sequence 0
  • homologs 0
  • microRNA 0
  • Staging 0
  • hardy-weinberg 0
  • machine_learning 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • collapse 0
  • liftover 0
  • seqfu 0
  • n50 0
  • cell_type_identification 0
  • standard 0
  • cell_phenotyping 0
  • nanoq 0
  • minimum_evolution 0
  • cellsnp 0
  • guidetree 0
  • translation 0
  • paired reads merging 0
  • AC/NS/AF 0
  • overlap-based merging 0
  • check 0
  • vcflib/vcffixup 0
  • trimfq 0
  • hamming-distance 0
  • donor deconvolution 0
  • grabix 0
  • genotype-based demultiplexing 0
  • lexogen 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • droplet based single cells 0
  • busco 0
  • InterProScan 0
  • retrieval 0
  • MMseqs2 0
  • ribosomal 0
  • redundant 0
  • mygene 0
  • go 0
  • extraction 0
  • featuretable 0
  • mass spectrometry 0
  • pile up 0
  • sage 0
  • orthogroup 0
  • spot 0
  • quality check 0
  • functional enrichment 0
  • selector 0
  • transcription factors 0
  • regulatory network 0
  • nanopore sequencing 0
  • cobra 0
  • extension 0
  • grea 0
  • poolseq 0
  • phylogenetics 0
  • chip 0
  • gost 0
  • tnfilter 0
  • scanpy 0
  • metagenome assembler 0
  • morphology 0
  • resegment 0
  • array_cgh 0
  • cytosure 0
  • relabel 0
  • cell segmentation 0
  • nuclear segmentation 0
  • gprofiler2 0
  • import segmentation 0
  • ancestral alleles 0
  • solo 0
  • scvi 0
  • p-value 0
  • structural variant 0
  • significance statistic 0
  • logFC 0
  • immcantation 0
  • airrseq 0
  • subsetting 0
  • derived alleles 0
  • site frequency spectrum 0
  • immunoinformatics 0
  • reverse complement 0
  • updatedata 0
  • run 0
  • pdb 0
  • clr 0
  • boxcox 0
  • Escherichia coli 0
  • propd 0
  • Read coverage histogram 0
  • block substitutions 0
  • decomposeblocksub 0
  • identity-by-descent 0
  • simulation 0
  • plotting 0
  • hmmfetch 0
  • decompose 0
  • transmembrane 0
  • genome graph 0
  • tnseq 0
  • mgi 0
  • recovery 0
  • decoy 0
  • htseq 0
  • sompy 0
  • barcodes 0
  • co-orthology 0
  • variant-calling 0
  • jvarkit 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • setgt 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • translate 0
  • fastqfilter 0
  • quarto 0
  • python 0
  • r 0
  • coexpression 0
  • vsearch/fastqfilter 0
  • correlation 0
  • corpcor 0
  • vsearch/dereplicate 0
  • assay 0
  • stardist 0
  • plastid 0
  • tar 0
  • homology 0
  • doublet_detection 0
  • sequence similarity 0
  • spectral clustering 0
  • comparative genomics 0
  • deep variant 0
  • mutect 0
  • idx 0
  • emoji 0
  • source tracking 0
  • parallel 0
  • transform 0
  • nucleotide content 0
  • gaps 0
  • AT content 0
  • introns 0
  • nucBed 0
  • bclconvert 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • targz 0
  • tarball 0
  • vector 0
  • metaphlan 0
  • predictions 0
  • getpileupsummaries 0
  • short variant discovery 0
  • combinegvcfs 0
  • collectsvevidence 0
  • cnnscorevariants 0
  • cross-samplecontamination 0
  • dragstr 0
  • calculatecontamination 0
  • bedtointervallist 0
  • asereadcounter 0
  • vqsr 0
  • variant quality score recalibration 0
  • annotateintervals 0
  • composestrtablefile 0
  • condensedepthevidence 0
  • heattree 0
  • gatherbqsrreports 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • panelofnormalscreation 0
  • jointgenotyping 0
  • genomicsdbimport 0
  • genomicsdb 0
  • tranche filtering 0
  • createsequencedictionary 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • determinegermlinecontigploidy 0
  • createsomaticpanelofnormals 0
  • getpileupsumaries 0
  • antibiotic resistance genes 0
  • public 0
  • ENA 0
  • SRA 0
  • ANI 0
  • ARGs 0
  • faqcs 0
  • str 0
  • cache 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • swissprot 0
  • gene-calling 0
  • variant caller 0
  • gamma 0
  • UShER 0
  • bootstrapping 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • rust 0
  • fq 0
  • lint 0
  • random 0
  • generate 0
  • single molecule 0
  • germlinevariantsites 0
  • readcountssummary 0
  • embl 0
  • Haplotypes 0
  • tama_collapse.py 0
  • genomes on a tree 0
  • merge compare 0
  • GNU 0
  • joint-variant-calling 0
  • Imputation 0
  • Sample 0
  • TAMA 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • gfastats 0
  • gene model 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • extractvariants 0
  • hbd 0
  • ibd 0
  • rgi 0
  • fARGene 0
  • amrfinderplus 0
  • abricate 0
  • extract_variants 0
  • gstama/polyacleanup 0
  • gvcftools 0
  • gunzip 0
  • gunc 0
  • archaea 0
  • genome taxonomy database 0
  • GTDB taxonomy 0
  • Mykrobe 0
  • repeat content 0
  • indexfeaturefile 0
  • preprocessintervals 0
  • shiftchain 0
  • selectvariants 0
  • reblockgvcf 0
  • printsvevidence 0
  • postprocessgermlinecnvcalls 0
  • shiftintervals 0
  • snvs 0
  • mutectstats 0
  • leftalignandtrimvariants 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • shiftfasta 0
  • site depth 0
  • genome heterozygosity 0
  • txt 0
  • genome size 0
  • models 0
  • compound 0
  • genome profile 0
  • bgc 0
  • file parsing 0
  • gawk 0
  • splitcram 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • svcluster 0
  • svannotate 0
  • splitintervals 0
  • genbank 0
  • mitochondrial 0
  • illumiation_correction 0
  • BCF 0
  • csi 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • background_correction 0
  • element 0
  • biallelic 0
  • update header 0
  • homozygosity 0
  • virulent 0
  • chunking 0
  • subtract 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • maskfasta 0
  • jaccard 0
  • autozygosity 0
  • overlap 0
  • getfasta 0
  • closest 0
  • sorting 0
  • bacphlip 0
  • temperate 0
  • bioawk 0
  • amp 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • Staphylococcus aureus 0
  • installation 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • lifestyle 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • bias 0
  • utility 0
  • ATLAS 0
  • sequencing_bias 0
  • mkarv 0
  • http(s) 0
  • unionBedGraphs 0
  • file manipulation 0
  • deletion 0
  • Segmentation 0
  • cutesv 0
  • gct 0
  • cls 0
  • na 0
  • custom 0
  • Cores 0
  • TMA dearray 0
  • UNet 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • cooler/balance 0
  • escherichia coli 0
  • circos 0
  • eklipse 0
  • eigenstratdatabasetools 0
  • pep 0
  • schema 0
  • PEP 0
  • depth information 0
  • structural variation 0
  • duphold 0
  • segment 0
  • blastx 0
  • cload 0
  • compartments 0
  • multiomics 0
  • mkvdjref 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • domains 0
  • topology 0
  • antibody capture 0
  • calder2 0
  • cadd 0
  • postprocessing 0
  • tblastn 0
  • subtyping 0
  • Salmonella enterica 0
  • antigen capture 0
  • crispr 0
  • nucleotide composition 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • antitarget 0
  • access 0
  • qa 0
  • chromosome_visualization 0
  • quality assurnce 0
  • beagle 0
  • Haemophilus influenzae 0
  • dbnsfp 0
  • genomic intervals 0
  • false duplications 0
  • duplicate purging 0
  • haplotype purging 0
  • cutoff 0
  • panel of normals 0
  • normal database 0
  • intervals coverage 0
  • Haplotype purging 0
  • gene finding 0
  • contact maps 0
  • bmp 0
  • jpg 0
  • pretext 0
  • assembly curation 0
  • False duplications 0
  • read distribution 0
  • inner_distance 0
  • fragment_size 0
  • read_pairs 0
  • experiment 0
  • strandedness 0
  • R 0
  • Assembly curation 0
  • rhocall 0
  • long uncorrected reads 0
  • subsampling 0
  • neighbour-joining 0
  • quast 0
  • purging 0
  • porechop_abi 0
  • variant genetic 0
  • mapping-based 0
  • liftovervcf 0
  • tandem duplications 0
  • insertions 0
  • deletions 0
  • GRO-cap 0
  • phylogenetic composition 0
  • illumina datasets 0
  • identification 0
  • prophage 0
  • phantom peaks 0
  • CoPRO 0
  • PRO-cap 0
  • scoring 0
  • exclude 0
  • identifiers 0
  • whole genome association 0
  • recode 0
  • indep pairwise 0
  • indep 0
  • variant identifiers 0
  • genetic 0
  • CAGE 0
  • GRO-seq 0
  • PRO-seq 0
  • STRIPE-seq 0
  • csRNA-seq 0
  • RAMPAGE 0
  • NETCAGE 0
  • sequence-based 0
  • integrity 0
  • motif 0
  • rare variants 0
  • relative coverage 0
  • genetic sex 0
  • sex determination 0
  • induce 0
  • de-novo 0
  • selection 0
  • seq 0
  • header 0
  • error 0
  • longread 0
  • sertotype 0
  • snippy 0
  • core 0
  • sniffles 0
  • POA 0
  • sliding window 0
  • sha256 0
  • features 0
  • density 0
  • boxplot 0
  • exploratory 0
  • shinyngs 0
  • 256 bit 0
  • interleave 0
  • sequence headers 0
  • salsa2 0
  • salsa 0
  • rtg-tools 0
  • rocplot 0
  • pedfilter 0
  • grep 0
  • chromatin 0
  • subseq 0
  • variant recalibration 0
  • VQSR 0
  • applyvarcal 0
  • assembly-binning 0
  • seacr 0
  • cut&run 0
  • cut&tag 0
  • peak-caller 0
  • clusteridentifier 0
  • cluster analysis 0
  • scramble 0
  • ChIP-Seq 0
  • pedigrees 0
  • haplotype resolution 0
  • legionella 0
  • lofreq/filter 0
  • lofreq/call 0
  • Listeria monocytogenes 0
  • limma 0
  • pneumophila 0
  • clinical 0
  • AMP 0
  • train 0
  • spliced 0
  • reorder 0
  • combining 0
  • kofamscan 0
  • peptide prediction 0
  • pneumoniae 0
  • estimate 0
  • metagenome-assembled genomes 0
  • maxbin2 0
  • representations 0
  • reduced 0
  • mash/sketch 0
  • taxonomic assignment 0
  • functional genomics 0
  • rra 0
  • maximum-likelihood 0
  • CRISPR-Cas9 0
  • sgRNA 0
  • kegg 0
  • Klebsiella 0
  • mcr-1 0
  • pos 0
  • js 0
  • igv.js 0
  • igv 0
  • IDR 0
  • panel_of_normals 0
  • haemophilus 0
  • annotations 0
  • multicut 0
  • hmtnote 0
  • Hidden Markov Model 0
  • amino acid 0
  • HMMER 0
  • genome browser 0
  • pixel classification 0
  • effective genome size 0
  • Jupyter 0
  • k-mer counting 0
  • digital normalization 0
  • quant 0
  • kallisto/index 0
  • papermill 0
  • jupytext 0
  • Python 0
  • pixel_classification 0
  • insertion 0
  • genomic islands 0
  • interproscan 0
  • probability_maps 0
  • mass-spectroscopy 0
  • MD5 0
  • read 0
  • combine graphs 0
  • hla-typing 0
  • graph viz 0
  • graph formats 0
  • graph unchopping 0
  • graph stats 0
  • odgi 0
  • HLA-I 0
  • squeeze 0
  • graph drawing 0
  • graph construction 0
  • Neisseria gonorrhoeae 0
  • ILP 0
  • block-compressed 0
  • sequencing summary 0
  • pair-end 0
  • pbp 0
  • graphs 0
  • select 0
  • PCR/optical duplicates 0
  • restriction fragments 0
  • pairstools 0
  • pairtools 0
  • ligation junctions 0
  • upper-triangular matrix 0
  • flip 0
  • mobile element insertions 0
  • 128 bit 0
  • contour map 0
  • unionsum 0
  • ploidy 0
  • smudgeplot 0
  • Merqury 0
  • 3D heat map 0
  • de Bruijn 0
  • Neisseria meningitidis 0
  • rma6 0
  • daa 0
  • debruijn 0
  • denovo 0
  • megahit 0
  • assembler 0
  • microrna 0
  • somatic structural variations 0
  • cancer genome 0
  • contaminant 0
  • SNP table 0
  • GATK UnifiedGenotyper 0
  • Beautiful stand-alone HTML report 0
  • bioinformatics tools 0
  • target prediction 0
  • scan 0
  • microsatellite instability 0
  • otu table 0
  • reference genome 0
  • mitochondrial genome 0
  • patch 0

Calculates base frequency statistics across reference positions from BAM.

0123

depth_sample depth_global qs pos counts icounts versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Calculated genotype likelihoods from BAM files.

010101

genotype_likelihood versions

angsd:

ANGSD: Analysis of next generation Sequencing Data

Extracts reads mapped to chromosome 6 and any HLA decoys or chromosome 6 alternates.

01

extracted_reads_fastq log intermediate_sam intermediate_bam intermediate_sorted_bam versions

arcashla:

arcasHLA performs high resolution genotyping for HLA class I and class II genes from RNA sequencing, supporting both paired and single-end samples.

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.

metabammeta2fastameta3gtfmeta4blacklistmeta5known_fusionsmeta6structural_variantsmeta7tagsmeta8protein_domains

meta versions fusions fusions_fail

Run the alignment/variant-call/consensus logic of the artic pipeline

01012012

results bam bai bam_trimmed bai_trimmed bam_primertrimmed bai_primertrimmed fasta vcf tbi json versions

artic:

ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore

copy number profiles of tumour cells.

01234000000

allelefreqs bafs cnvs logrs metrics png purityploidy segments versions

generate VCF file from a BAM file using various calling methods

012340000

vcf versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Estimate the post-mortem damage patterns of DNA

012300

empiric exponential counts table versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

split single end read groups by length and merge paired end reads

01234

bam txt versions

atlas:

ATLAS, a suite of methods to accurately genotype and estimate genetic diversity

Conversion of PacBio BAM files into gzipped fastq files, including splitting of barcoded data

012

fastq versions

bam2fastx:

Converting and demultiplexing of PacBio BAM files into gzipped fasta and fastq files

removes unused references from header of sorted BAM/CRAM files.

01

bam versions

This module is used to clip primer sequences from your alignments.

0123

bam bai versions

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

primary_filtered_bam contamination_bam versions

write your description here

01

json versions

bamstats:

A command line tool to compute mapping statistics from a BAM file

Tool for converting 10x BAMs produced by Cell Ranger, Space Ranger, Cell Ranger ATAC, Cell Ranger DNA, and Long Ranger back to FASTQ files that can be used as inputs to re-run analysis

01

fastq versions

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

data versions

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

bam versions

bamtools:

C++ API & command-line toolkit for working with BAM data

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

01

stats versions

bamtools:

C++ API & command-line toolkit for working with BAM data

trims the end of reads in a SAM/BAM file, changing read ends to โ€˜Nโ€™ and quality to โ€˜!โ€™, or by soft clipping

0123

bam versions

bamutil:

Programs that perform operations on SAM/BAM files, all built into a single executable, bam.

Align short or PacBio reads to a reference genome using BBMap

010

bam log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Calculates per-scaffold or per-base coverage information from an unsorted sam or bam file.

01

covstats hist versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Convert BAM/GFF/GTF/GVF/PSL files to bed

01

bed versions

bedops:

High-performance genomic feature operations.

Converts a bam file to a bed12 file.

01

bed versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

computes both the depth and breadth of coverage of features in file B on the features in file A

0120

bed versions

bedtools:

A powerful toolset for genome arithmetic

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

012000

genomecov versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Locate and tag duplicate reads in a BAM file

01

bam metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Merge a list of sorted bam files

01

bam bam_index checksum versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Parallel sorting and duplicate marking

0101

bam bam_index cram metrics versions

biobambam:

biobambam is a set of tools for early stage alignment file processing.

Aligns single- or paired-end reads from bisulfite-converted libraries to a reference genome using Biscuit.

010101

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

A fast, compact one-liner to produce duplicate-marked, sorted, and indexed BAM files using Biscuit

010101

bam bai versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. By default, samblaster reads SAM input from stdin and writes SAM to stdout.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Summarize and/or filter reads based on bisulfite conversion rate

01010101

bam versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Summarizes read-level methylation (and optionally SNV) information from a Biscuit BAM file in a standard-compliant BED format.

0101010101

bed versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Computes cytosine methylation and callable SNV mutations, optionally in reference to a germline BAM to call somatic variants

012340101

vcf versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Perform basic quality control on a BAM file generated with Biscuit

010101

reports versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Performs alignment of BS-Seq reads using bismark

010101

bam report unmapped versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Relates methylation calls back to genomic cytosine contexts.

010101

coverage report summary versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Removes alignments to the same position in the genome from the Bismark mapping output.

01

bam report versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Extracts methylation information for individual cytosines from alignments.

0101

bedgraph methylation_calls coverage report mbias versions

bismark:

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Align reads to a reference genome using bowtie

01010

bam log fastq versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

01010100

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Performs fastq alignment to a fasta reference using BWA

0101010

bam cram csi crai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

sam bam cram crai csi versions

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA-MEME

010101000

sam bam cram crai csi versions

bwameme:

Faster BWA-MEM2 using learned-index

Performs alignment of BS-Seq reads using bwameth

010101

bam versions

bwameth:

Fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome.

Performs preprocessing and alignment of chromatin fastq files to fasta reference files using chromap.

0101010000

bed bam tagAlign pairs versions

chromap:

Fast alignment and preprocessing of chromatin profiles

Realign reads mapped with BWA to elongated reference genome

01010101

bam versions

circularmapper:

A method to improve mappings on circular genomes such as Mitochondria.

Calculates polymorphic site rates over protein coding genes

01234

polymut versions

cmseq:

Set of utilities on sequences and BAM files

Copy number variant detection from high-throughput sequencing data

012010101010

bed cnn cnr cns pdf png versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Given segmented log2 ratio estimates (.cns), derive each segmentโ€™s absolute integer copy number

012

cns versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Copy number variant detection from high-throughput sequencing data

012

tsv cnn versions

cnvkit:

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Generate the input coverage table for CONCOCT using a BEDFile

0123

tsv versions

concoct:

Clustering cONtigs with COverage and ComposiTion

Maps the reads to the reference database

0101

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Merge reads that were mapped to multiple indices

01

bam versions

coptr:

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reads.

Controllable lossy compression of BAM/CRAM files

0100

bam cram sam bed versions

Generates a FASTA file of chromosome sizes and a fasta index file

01

sizes fai gzi versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

DeDup is a tool for read deduplication in paired-end read merging (e.g. for ancient DNA experiments).

01

bam json hist log versions

DeepSomatic is an extension of deep learning-based variant caller DeepVariant that takes aligned reads (in BAM or CRAM format) from tumor and normal data, produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports somatic variants in a standard VCF or gVCF file.

0123401010101

vcf vcf_tbi gvcf gvcf_tbi versions

This tool filters alignments in a BAM/CRAM file according the the specified parameters.

012

bam logs versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.

01200

bigwig bedgraph versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Computes read coverage for genomic regions (bins) across the entire genome.

0123

matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Visualises sample correlations using a compressed matrix generated by mutlibamsummary or multibigwigsummary as input.

0100

pdf matrix versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

plots cumulative reads coverages by BAM file

012

pdf matrix metrics versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Generates principal component analysis (PCA) plot using a compressed matrix generated by multibamsummary or multibigwigsummary as input.

01

pdf tab versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Performs fastq alignment to a reference using DRAGMAP

0101010

sam bam cram crai csi log versions

dragmap:

Dragmap is the Dragen mapper/aligner Open Source Software.

Convert a file in FASTA format to the ELFASTA format

01

elfasta log versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Merge split bam/sam chunks in one file

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Split bam file into manageable chunks

01

bam versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Estimate repeat sizes using NGS data

012010101

vcf json bam versions

Compute genome-wide STR profile

0120101

locus_tsv motif_tsv str_profile versions

expansionhunterdenovo:

ExpansionHunter Denovo (EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

Collects a suite of metrics to QC duplex sequencing data.

010

family_sizes duplex_family_sizes duplex_yield_metrics umi_counts duplex_qc duplex_umi_counts versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

r-ggplot2:

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads

01

bam cram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

0101000

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5โ€™ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.

010

bam histogram versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Sorts a SAM or BAM file. Several sort orders are available, including coordinate, queryname, random, and randomquery.

01

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

FGBIO tool to zip together an unmapped and mapped BAM to transfer metadata into the output BAM

01010101

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

GangSTR is a tool for genome-wide profiling tandem repeats from short reads.

012300

vcf samplestats versions

Performs local realignment around indels to correct for mapping errors

012301010101

bam versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Generates a list of locations that should be considered for local realignment prior genotyping.

01201010101

intervals versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

SNP and Indel variant caller on a per-locus basis

01201010101010101

vcf versions

gatk:

The full Genome Analysis Toolkit (GATK) framework, license restricted.

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

metainputinput_indexbqsr_tableintervalsfastafaidict

meta versions bam cram

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

estimates the parameters for the DRAGstr model

0120000

dragstr_model versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

0123010101

hdf5 tsv versions

gatk4:

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts FastQ file to SAM/BAM format

01

bam versions

gatk4:

Genome Analysis Toolkit (GATK4) Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Call germline SNPs and indels via local re-assembly of haplotypes

012340101010101

vcf tbi bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

0100

cram bam crai bai metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

metabamfastafaidict

meta versions output bam_index

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Merge unmapped with mapped BAM files

0120101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Print reads in the SAM/BAM/CRAM file

012010101

bam cram sam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Reverts SAM or BAM files to a previous state.

01

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Converts BAM/SAM file to FastQ format

01

fastq versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Splits reads that contain Ns in their cigar string

0123010101

bam versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and unmark the marked duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01

bam bai versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply base quality score recalibration (BQSR) to a bam file

01234000

bam cram versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

01000

output bam_index metrics versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Performs fastq alignment to a fasta reference using using gem3-mapper

01010

bam versions

gem3:

The GEM indexer (v3).

Tool for imputation and phasing from vcf file or directly from bam files.

0123456789012

phased_variants stats_coverage versions

glimpse2:

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.

Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.

01201

output ped bed bed_index roc html png versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Quickly generate evenly sized (by amount of data) regions across a number of bam/cram files

01010

bed versions

goleft:

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

Tools for population-scale genotyping using pangenome graphs.

01201010

vcf tbi versions

graphtyper:

A graph-based variant caller capable of genotyping population-scale short read data sets while incorporating previously discovered variants.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

01010101

vcf versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

Align RNA-Seq reads to a reference with HISAT2

010101

bam summary fastq versions

hisat2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.

gcCounter function from HMMcopy utilities, used to generate GC content in non-overlapping windows from a fasta reference

01

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Perl script (generateMap.pl) generates the mappability of a genome given a certain size of reads, for input to hmmcopy mapcounter. Takes a very long time on large genomes, is not parallelised at all.

01

bigwig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

mapCounter function from HMMcopy utilities, used to generate mappability in non-overlapping windows from a bigwig file

01

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

readCounter function from HMMcopy utilities, used to generate read in windows

012

wig versions

hmmcopy:

C++ based programs for analyzing BAM files and preparing read counts -- used with bioconductor-hmmcopy

Create a tag directory with the HOMER suite

010

tagdir taginfo versions

homer:

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

DESeq2:

Differential gene expression analysis based on the negative binomial distribution

edgeR:

Empirical Analysis of Digital Gene Expression Data in R

IsoSeq - Cluster - Cluster trimmed consensus sequences

01

bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi versions

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

010

bam pbi consensusreadset summary report versions

isoseq:

IsoSeq - Scalable De Novo Isoform Discovery

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Remove polyA tail and artificial concatemers

metabamprimers

meta bam pbi consensusreadset summary report versions

isoseq3:

IsoSeq3 - Scalable De Novo Isoform Discovery

Extract UMI and cell barcodes

010

bam pbi versions

isoseq3:

Iso-Seq - Scalable De Novo Isoform Discovery

Generate a consensus sequence from a BAM file using iVar

0100

fasta qual mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Trim primer sequences rom a BAM file with iVar

0120

bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Call variants from a BAM file using iVar

010000

tsv mpileup versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Jointly Accurate Sv Merging with Intersample Network Edges

012301010

vcf versions

Extract BED file from hts files containing a dictionary (VCF,BAM, CRAM, DICT, etc...)

01

bed versions

jvarkit:

Java utilities for Bioinformatics.

Plot whole genome coverage from BAM/CRAM file as SVG

012010101

output versions

jvarkit:

Java utilities for Bioinformatics.

Converts MAF alignments in another format.

012010101

axt_gz bam blast_gz blasttab_gz chain_gz cram gff_gz html_gz psl_gz sam_gz tab_gz versions

last:

LAST finds & aligns related regions of sequences.

Bayesian reconstruction of ancient DNA fragments

01

bam fq_pass fq_fail unmerged_r1_fq_pass unmerged_r1_fq_fail unmerged_r2_fq_pass unmerged_r2_fq_fail log versions

Converting aligned short and long reads records from one reference to another

0101

bam versions

leviosam2:

Fast and accurate coordinate conversion between assemblies

lima - The PacBio Barcode Demultiplexer and Primer Remover

010

counts report summary versions bam pbi fasta fastagz fastq fastqgz xml json clips guess

Lofreq subcommand to for insert base and indel alignment qualities

010

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Inserts indel qualities in a BAM file

0101

bam versions

lofreq:

Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's indelqual programme inserts indel qualities in a BAM file

Lofreq subcommand to call low frequency variants from alignments when tumor-normal paired samples are available

0101

bam versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

0123450101

bam log versions

longphase:

LongPhase is an ultra-fast program for simultaneously co-phasing SNPs, small indels, large SVs, and (5mC) modifications for Nanopore and PacBio platforms.

Map short-reads to an indexed reference genome

01010000000

bam versions

mapad:

An aDNA aware short-read mapper

Computational framework for tracking and quantifying DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

010

runtime_log fragmisincorporation_plot length_plot misincorporation lgdistribution dnacomp stats_out_mcmc_hist stats_out_mcmc_iter stats_out_mcmc_trace stats_out_mcmc_iter_summ_stat stats_out_mcmc_post_pred stats_out_mcmc_correct_prob dnacomp_genome rescaled pctot_freq pgtoa_freq fasta folder versions

Depth computation per contig step of metabat2

012

depth versions

metabat2:

Metagenome binning

Metagenome binning of contigs

012

tooshort lowdepth unbinned membership fasta versions

metabat2:

Metagenome binning

MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

010

profile biom bt2out versions

metaphlan3:

Identify clades (phyla to species) present in the metagenome obtained from a microbiome sample and their relative abundance

Extracts per-base methylation metrics from alignments

01200

bedgraph methylkit versions

methyldackel:

Methylation caller from MethylDackel, a (mostly) universal methylation extractor for methyl-seq experiments.

Generates methylation bias plots from alignments

01200

txt versions

methyldackel:

Read position methylation bias tools from MethylDackel, a (mostly) universal extractor for methyl-seq experiments.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

01010000

paf bam index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Calculates genome-wide sequencing coverage.

012301

global_txt summary_txt regions_txt per_base_d4 per_base_bed per_base_csi regions_bed regions_csi quantized_bed quantized_csi thresholds_bed thresholds_csi versions

Taxonomic meta-omics profiling using universal marker genes

010

out bam mgc log versions

motus:

Marker gene-based OTU (mOTU) profiling

A small Java tool to calculate ratios between MT and nuclear sequencing reads in a given BAM file.

010

mtnucratio json versions

Convert genomic BAM/SAM files to transcriptomic BAM/RAD files.

01000

bam rad versions

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

Build and store a gtf index, which is useful for converting genomic BAM/SAM files to transcriptomic BAM/SAM files.

0

index versions

mudskipper:

mudskipper is a tool for converting genomic BAM/SAM files to transcriptomic BAM/RAD files.

AMR predictions for supported species

010

csv json versions

mykrobe:

Antibiotic resistance prediction in minutes

Compare multiple runs of long read sequencing data and alignments

01

report_html lengths_violin_html log_length_violin_html n50_html number_of_reads_html overlay_histogram_html overlay_histogram_normalized_html overlay_log_histogram_html overlay_log_histogram_normalized_html total_throughput_html quals_violin_html overlay_histogram_identity_html overlay_histogram_phredscore_html percent_identity_violin_html active_pores_over_time_html cumulative_yield_plot_gigabases_html sequencing_speed_over_time_html stats_txt versions

Performs fastq alignment to a reference using NARFMAP

0101010

bam log versions

narfmap:

narfmap is a fork of the Dragen mapper/aligner Open Source Software.

Performs fastq alignment to a fasta reference using NextGenMap

010

bam versions

bwa:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime

Determines the gender of a sample from the BAM/CRAM file.

01201010

tsv versions

ngsbits:

Short-read sequencing tools

Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.

010101

corr_matrix matched all pdf vcf versions

ngscheckmate:

NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.

Calls CNVs in bam files from tumor patients

0123400

png profile summary versions

A program to convert bam into paf.

01

paf versions

paftools:

A program to manipulate paf files / convert to and from paf.

Split a .pairsam file into .pairs and .sam.

01

pairs bam versions

pairtools:

CLI tools to process mapped Hi-C data

NVIDIA Clara Parabricks GPU-accelerated apply Base Quality Score Recalibration (BQSR).

0101010101

bam bai versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.

01010101010

bam bai cram crai bqsr_table qc_metrics duplicate_metrics versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

VIDIA Clara Parabricks GPU-accelerated fast, accurate algorithm for mapping methylated DNA sequence reads to a reference genome, performing local alignment, and producing alignment for different parts of the query sequence

0101010

bam bai qc_metrics bqsr_table duplicate_metrics versions

parabricks:

NVIDIA Clara Parabricks GPU-accelerated genomics tools

Determines the depth in a BAM/CRAM file

0120101

depth binned_depth versions

paragraph:

Graph realignment tools for structural variants

HiFi-based caller for highly homologous genes

0120101

json bam bai vcf vcf_index versions

The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.

01

bam pbi versions

pbbam:

PacBio BAM C++ library

Pacbio ccs - Generate Highly Accurate Single-Molecule Consensus Reads

01200

bam pbi report_txt report_json metrics versions

Alignment with PacBio's minimap2 frontend

0101

bam versions

pbmm2:

A minimap2 frontend for PacBio native data formats

converts pacbio bam files to fastq.gz using PacBioToolKit (pbtk) bam2fastq

012

fastq versions

pbtk:

pbtk - PacBio BAM toolkit

Minimalistic tool which creates an index file that enables random access into PacBio BAM files

01

pbi versions

pbtk:

pbtk - PacBio BAM toolkit

Per-base metrics on BAM/CRAM files.

012012

tsv versions

Assigns all the reads in a file to a single new read-group

010101

bam bai cram versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Cleans the provided BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collects hybrid-selection (HS) metrics for a SAM or BAM file.

01234010101

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about the insert size distribution of a paired-end library.

01

metrics histogram versions

picard:

Java tools for working with NGS data in the BAM format

Collect multiple metrics from a BAM file

0120101

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics from a RNAseq BAM file

01000

metrics pdf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

01201010

metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Checks that all data in the set of input files appear to come from the same individual

01234501

crosscheck_metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Computes/Extracts the fingerprint genotype likelihoods from the supplied file. It is given as a list of PLs at the fingerprinting sites.

0120000

vcf tbi versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Converts a FASTQ file to an unaligned BAM or SAM file.

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

0120

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Verify mate-pair information between mates and fix if needed

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Locate and tag duplicate reads in a BAM file

010101

bam bai cram metrics versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Merges multiple BAM files into a single file

01

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Samples a SAM/BAM/CRAM file using flowcell position information for the best approximation of having sequenced fewer reads

012

bam bai num_reads versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

changes name of sample in the vcf file

01

vcf versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases

010101

intervals versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

This tool takes in a coordinate-sorted SAM or BAM and calculatesthe NM, MD, and UQ tags by comparing with the reference.

0101

bam bai versions

picard:

Java tools for working with NGS data in the BAM format

Sorts BAM/SAM files based on a variety of picard specific criteria

010

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Sorts vcf files

010101

vcf versions

picard:

Java tools for working with NGS data in the BAM/CRAM/SAM and VCF format

pmdtools command to filter ancient DNA molecules from others

01200

bam versions

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Run all Portcullis steps in one go

010101

log pass_junctions_bed pass_junctions_tab intron_gff exon_gff spliced_bam spliced_bai versions

portcullis:

Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.

converts sam/bam/cram/pairs into genome contact map

01012

pretext versions

Compute summary statistics for control gene from BAM files.

01200

control_stats versions

pypgx:

A Python package for pharmacogenomics research

Call SNVs/indels from BAM files for all target genes.

0120100

vcf tbi versions

pypgx:

A Python package for pharmacogenomics research

Prepare a depth of coverage file for all target genes with SV from BAM files.

01200

coverage versions

pypgx:

A Python package for pharmacogenomics research

Evaluate alignment data

010

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Evaluate alignment data

012000

results versions

qualimap:

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Interface and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format.

012

junc versions

regtools:

RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.

Quality control of riboseq bam data

012012012010101

predictions all transprofile versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Quality control of riboseq bam data

01201

distribution pdf offset versions

ribotish:

Ribo TIS Hunter (Ribo-TISH) identifies translation activities using ribosome profiling data.

Accurate detection of short and long active ORFs using Ribo-seq data

01201

protocol bam_summary read_length_dist metagene_profile_5p metagene_profile_3p metagene_plots psite_offsets pos_wig neg_wig orfs versions

ribotricer:

Python package to detect translating ORF from Ribo-seq data

Calculate expression with RSEM

010

counts_gene counts_transcript stat logs versions bam_star bam_genome bam_transcript

rseqc:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Generate statistics from a bam file

01

txt versions

rseqc:

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.

Converts the contents of sequence data files (FASTA/FASTQ/SAM/BAM) into the RTG Sequence Data File (SDF) format.

0123

sdf versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Calling lowest common ancestors from multi-mapped reads in SAM/BAM/CRAM files

0120

csv json bam versions

sam2lca:

Lowest Common Ancestor on SAM/BAM/CRAM alignment files

Outputs some statistics drawn from read flags.

01

stats versions

sambamba:

Tools for working with SAM/BAM data

find and mark duplicate reads in BAM file

01

bam bai versions

sambamba:

process your BAM data faster!

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

Clips read alignments where they match BED file defined regions

01000

bam stats rejects_bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format

010

reads versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

reports coverage over regions in a supplied BED file

012010101

coverage versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

calculates MD and NM tags

0101

bam versions

samtoolscalmd:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Concatenate BAM or CRAM file

01

bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

shuffles and groups reads together by their names

0101

bam cram sam versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format

01010

fastq fastq_interleaved fastq_other fastq_singleton versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Produces a consensus FASTA/FASTQ/PILEUP

01

fasta fastq pileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

convert and then index CRAM -> BAM or BAM -> CRAM file

0120101

bam cram bai crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

produces a histogram or table of coverage per chromosome

0120101

coverage versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

List CRAM Content-ID and Data-Series sizes

01

size versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Computes the depth at each position or region.

0101

tsv versions

samtools:

Tools for dealing with SAM, BAM and CRAM files; samtools depth โ€“ computes the read depth at each position or region

Create a sequence dictionary file from a FASTA file

01

dict versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index FASTA file, and optionally generate a file of chromosome sizes

01010

fa fai sizes gzi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Converts a SAM/BAM/CRAM file to FASTA

010

fasta interleaved singleton other versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Converts a SAM/BAM/CRAM file to FASTQ

010

fastq interleaved singleton other versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Samtools fixmate is a tool that can fill in information (insert size, cigar, mapq) about paired end reads onto the corresponding other read. Also has options to remove secondary/unmapped alignments and recalculate whether reads are proper pairs.

01

bam cram sam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

012

flagstat versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

01

readgroup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Reports alignment summary statistics for a BAM/CRAM/SAM file

012

idxstats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

converts FASTQ files to unmapped SAM/BAM/CRAM

01

sam bam cram versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Index SAM/BAM/CRAM file

01

bai csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

mark duplicate alignments in a coordinate sorted file

0101

bam cram sam versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Merge BAM or CRAM file

010101

bam cram csi crai versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

BAM

0120

mpileup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Replace the header in the bam file with the header generated by the command. This command is much faster than replacing the header with a BAMโ†’SAMโ†’BAM conversion.

01

bam versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Collate/Fixmate/Sort/Markdup SAM/BAM/CRAM file

0101

bam cram csi crai metrics versions

samtools_cat:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_collate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_fixmate:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_sort:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

samtools_markdup:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Sort SAM/BAM/CRAM file

0101

bam cram crai csi versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Produces comprehensive statistics from SAM/BAM/CRAM file

01201

stats versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

0120100

bam cram sam bai csi crai unselected unselected_index versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

The cluster_identifier tool of Scramble identifies soft clipped clusters

0120

clusters versions

scramble:

Soft Clipped Read Alignment Mapper

Performs fastq alignment to a fasta reference using Sentieon's BWA MEM

01010101

bam_and_bai versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects multiple quality metrics from a bam file

01201010

mq_metrics qd_metrics gc_summary gc_metrics aln_metrics is_metrics mq_plot qd_plot is_plot gc_plot versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.

0120101

cram crai bam bai score metrics metrics_multiqc_tsv versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Merges BAM files, and/or convert them into cram files. Also, outputs the result of applying the Base Quality Score Recalibration to a file.

0120101

output index output_index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Collects whole genome quality metrics from a bam file

012010101

wgs_metrics versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Sequence quality metrics for FASTQ and uBAM files.

01

json html versions

PileupCaller is a tool to create genotype calls from bam files using read-sampling methods

0100

eigenstrat plink freqsum versions

sequencetools:

Tools for population genetics on sequencing data

Sequenza-utils bam2seqz process BAM and Wiggle files to produce a seqz file

01200

seqz versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program - bam2seqz - process a paired set of BAM/pileup files (tumour and matching normal), and GC-content genome-wide information, to extract the common positions with A and B alleles frequencies.

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

wig versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

tool to call the copy number of full-length SMN1, full-length SMN2, as well as SMN2ฮ”7โ€“8 (SMN2 with a deletion of Exon7-8) from a whole-genome sequencing (WGS) BAM file.

012

smncopynumber run_metrics versions

Performs fastq alignment to a fasta reference using SNAP

0101

bam bai versions

snapaligner:

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

Rapid haploid variant calling

010

tab csv html vcf bed gff bam bai log aligned_fa consensus_fa consensus_subs_fa raw_vcf filt_vcf vcf_gz vcf_csi txt versions

snippy:

Rapid bacterial SNP calling and core genome alignments

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

01012

tsv html versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

012010101

extract versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

0120

html pairs_tsv samples_tsv versions

somalier:

Somalier can extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF or from jointly-called VCFs

split one ubam into multiple, per line, fast

01

bam versions

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

012

gene_results fullgene_results mlst_results pileup sorted_bam versions

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Advanced sequence file format conversions

01000

cram gzi versions

scramble:

Staden Package 'io_lib' (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.

Align reads to a reference genome using STAR

010101000

log_final log_out log_progress versions bam bam_sorted bam_sorted_aligned bam_transcript bam_unsorted fastq tab spl_junc_tab read_per_gene_tab junction sam wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

0123456789100120

input rdata plots vcf bgen versions

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data

01230101

json gt_vcf bam versions

svtyper:

Compute genotype of structural variants based on breakpoint depth

A tool for tagging BAM files.

01

bam versions

A tool to detect resistance and lineages of M. tuberculosis genomes

01

bam csv json txt vcf versions

tbprofiler:

Profiling tool for Mycobacterium tuberculosis to detect drug resistance and lineage from WGS data

Telseq: a software for calculating telomere length

012010101

output versions

samtools:

Tools for dealing with SAM, BAM and CRAM files

Computes the coverage of different regions from the bam file.

0101

cov wig versions

tiddit:

TIDDIT - structural variant calling.

Tandem repeat genotyping from PacBio HiFi data

0123010101

vcf bam versions

trgt:

Tandem repeat genotyping and visualization from PacBio HiFi data

uLTRA aligner - A wrapper around minimap2 to improve small exon detection - Map reads on genome

01001

bam versions

ultra:

Splice aligner of long transcriptomic reads to genome.

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam fastq log versions

Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.

0120

bam log tsv_edit_distance tsv_per_umi tsv_umi_per_position versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Group reads based on their UMI and mapping coordinates

01200

log bam tsv versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Make the output from umi_tools dedup or group compatible with RSEM

012

bam log versions

umi_tools:

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

The Java port of the VarDict variant caller

01230101

vcf versions

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

bam versions

Velocyto is a library for the analysis of RNA velocity. velocyto.py CLI use Path(resolve_path=True) and breaks the nextflow logic of symbolic links. If in the work dir velocyto find a file named EXACTLY cellsorted_[ORIGINAL_BAM_NAME] it will skip the samtools sort step. Cellsorted bam file should be cell sorted with:

    samtools sort -t CB -O BAM -o cellsorted_input.bam input.bam

See module test for an example with the SAMTOOLS_SORT nf-core module. Config example to cellsort input bam using SAMTOOLS_SORT:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "cellsorted_${bam.baseName}" }
        ext.args = '-t CB -O BAM'
    }

Optional mask must be passed with ext.args and option --mask This is why I need to stage in the work dir 2 bam files (cellsorted and original). See also velocyto tutorial

01230

loom versions

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

0120

log selfsm depthsm selfrg depthrg bestsm bestrg versions

verifybamid:

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

01201200

log ud bed mu self_sm ancestry versions

verifybamid2:

A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

The wham suite consists of two programs, wham and whamg. wham, the original tool, is a very sensitive method with a high false discovery rate. The second program, whamg, is more accurate and better suited for general structural variant (SV) discovery.

01200

vcf tbi graph versions

Convert and filter aligned reads to .npz

0120101

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Align reads to a reference genome using YARA

0101

bam bai versions

yara:

Yara is an exact tool for aligning DNA sequencing reads to reference genomes.

Click here to trigger an update.