Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • filter 35
  • vcf 26
  • structural variants 12
  • gatk4 9
  • filtering 9
  • bam 8
  • genomics 8
  • variant calling 8
  • fasta 7
  • fastq 6
  • index 5
  • svtk 5
  • bed 4
  • sort 4
  • sam 4
  • annotation 4
  • gff 4
  • VCF 4
  • view 4
  • reference 3
  • cram 3
  • gtf 3
  • nanopore 3
  • sentieon 3
  • bcftools 3
  • aDNA 3
  • bcf 3
  • damage 3
  • feature 3
  • mkref 3
  • survivor 3
  • metagenomics 2
  • assembly 2
  • quality control 2
  • k-mer 2
  • variant 2
  • conversion 2
  • proteomics 2
  • quality 2
  • ancient DNA 2
  • sv 2
  • QC 2
  • openms 2
  • archaeogenomics 2
  • seqkit 2
  • palaeogenomics 2
  • gene 2
  • reads 2
  • de novo assembly 2
  • gridss 2
  • deamination 2
  • miscoding lesions 2
  • palaeogenetics 2
  • archaeogenetics 2
  • C to T 2
  • somatic variants 2
  • remove 2
  • khmer 2
  • cancer genomics 2
  • bloom filter 2
  • k-mer index 2
  • COBS 2
  • snpsift 2
  • filtermutectcalls 2
  • genome 1
  • merge 1
  • bacteria 1
  • statistics 1
  • variants 1
  • qc 1
  • classification 1
  • split 1
  • contamination 1
  • somatic 1
  • convert 1
  • clustering 1
  • single-cell 1
  • long reads 1
  • rnaseq 1
  • trimming 1
  • contigs 1
  • wgs 1
  • picard 1
  • table 1
  • consensus 1
  • taxonomic classification 1
  • mapping 1
  • scWGBS 1
  • pairs 1
  • DNA methylation 1
  • WGBS 1
  • matrix 1
  • expression 1
  • bisulfite sequencing 1
  • aligner 1
  • completeness 1
  • transcript 1
  • genotype 1
  • annotate 1
  • validation 1
  • biscuit 1
  • gff3 1
  • umi 1
  • evaluation 1
  • population genetics 1
  • duplicates 1
  • low frequency variant calling 1
  • scRNA-seq 1
  • differential 1
  • vsearch 1
  • wxs 1
  • mutect2 1
  • visualization 1
  • fastx 1
  • text 1
  • idXML 1
  • transcriptomics 1
  • interval_list 1
  • logratio 1
  • propr 1
  • fgbio 1
  • abundance 1
  • bedpe 1
  • enrichment 1
  • clean 1
  • SV 1
  • subsample 1
  • html 1
  • amplicon sequences 1
  • host 1
  • complement 1
  • PacBio 1
  • spaceranger 1
  • ambient RNA removal 1
  • gene set analysis 1
  • lofreq 1
  • gene set 1
  • differential expression 1
  • standardization 1
  • repeats 1
  • tnhaplotyper2 1
  • proportionality 1
  • mitochondrion 1
  • join 1
  • subset 1
  • ped 1
  • duplicate 1
  • transformation 1
  • varcal 1
  • rtgtools 1
  • polish 1
  • short reads 1
  • xenograft 1
  • trim 1
  • metagenomic 1
  • identifier 1
  • tab 1
  • emboss 1
  • splice 1
  • graft 1
  • baf 1
  • tnfilter 1
  • gost 1
  • immcantation 1
  • airrseq 1
  • immunoinformatics 1
  • gprofiler2 1
  • simulation 1
  • agat 1
  • longest 1
  • isoform 1
  • transform 1
  • gaps 1
  • GFF/GTF 1
  • VCFtools 1
  • subsample bam 1
  • downsample bam 1
  • downsample 1
  • dnamodelapply 1
  • chloroplast 1
  • propd 1
  • copy-number 1
  • fastqfilter 1
  • vsearch/fastqfilter 1
  • ATACseq 1
  • shift 1
  • ATACshift 1
  • setgt 1
  • mzML 1
  • java 1
  • script 1
  • Read trimming 1
  • logFC 1
  • significance statistic 1
  • p-value 1
  • Read report 1
  • Read filters 1
  • mouse 1
  • nanoq 1
  • regulatory network 1
  • transcription factors 1
  • junction 1
  • redundant 1
  • detecting svs 1
  • getpileupsummaries 1
  • cross-samplecontamination 1
  • calculatecontamination 1
  • cnnscorevariants 1
  • vqsr 1
  • variant quality score recalibration 1
  • lofreq/filter 1
  • variantrecalibrator 1
  • recalibration model 1
  • variantfiltration 1
  • GTDB taxonomy 1
  • genome taxonomy database 1
  • archaea 1
  • effective genome size 1
  • tranche filtering 1
  • filtervarianttranches 1
  • filterintervals 1
  • seq 1
  • na 1
  • selection 1
  • readgroup 1
  • grep 1
  • applyvarcal 1
  • VQSR 1
  • subseq 1
  • rdtest 1
  • short-read sequencing 1
  • svtk/baftest 1
  • baftest 1
  • countsvtypes 1
  • rdtest2vcf 1
  • vcf2bed 1
  • dbnsfp 1
  • predictions 1
  • rRNA 1
  • ribosomal RNA 1
  • duplicate marking 1
  • antibiotic resistance genes 1
  • pmdtools 1
  • select 1
  • duplexumi 1
  • ARGs 1
  • depth information 1
  • structural variation 1
  • duphold 1
  • pedfilter 1
  • swissprot 1
  • genbank 1
  • embl 1
  • alignment 0
  • database 0
  • align 0
  • map 0
  • coverage 0
  • download 0
  • classify 0
  • cnv 0
  • MSA 0
  • taxonomy 0
  • gfa 0
  • taxonomic profiling 0
  • pacbio 0
  • binning 0
  • count 0
  • copy number 0
  • imputation 0
  • phylogeny 0
  • bedtools 0
  • graph 0
  • kmer 0
  • build 0
  • bisulfite 0
  • mags 0
  • gvcf 0
  • reporting 0
  • variation graph 0
  • isoseq 0
  • methylation 0
  • indexing 0
  • visualisation 0
  • databases 0
  • bisulphite 0
  • methylseq 0
  • compression 0
  • protein 0
  • bqsr 0
  • long-read 0
  • illumina 0
  • cna 0
  • stats 0
  • tsv 0
  • serotype 0
  • phage 0
  • sequences 0
  • imaging 0
  • 5mC 0
  • metrics 0
  • demultiplex 0
  • depth 0
  • antimicrobial resistance 0
  • samtools 0
  • markduplicates 0
  • base quality score recalibration 0
  • protein sequence 0
  • repeat 0
  • histogram 0
  • searching 0
  • bins 0
  • haplotype 0
  • example 0
  • structure 0
  • pangenome graph 0
  • plot 0
  • neural network 0
  • amr 0
  • cluster 0
  • mappability 0
  • transcriptome 0
  • LAST 0
  • bwa 0
  • plink2 0
  • low-coverage 0
  • machine learning 0
  • cooler 0
  • phasing 0
  • gzip 0
  • germline 0
  • iCLIP 0
  • virus 0
  • sequence 0
  • mmseqs2 0
  • metagenome 0
  • checkm 0
  • db 0
  • decompression 0
  • ncbi 0
  • hmmer 0
  • ucsc 0
  • complexity 0
  • spatial 0
  • newick 0
  • genotyping 0
  • peaks 0
  • mag 0
  • segmentation 0
  • kraken2 0
  • msa 0
  • blast 0
  • bismark 0
  • glimpse 0
  • hmmsearch 0
  • dedup 0
  • sketch 0
  • splicing 0
  • prokaryote 0
  • report 0
  • deduplication 0
  • antimicrobial resistance genes 0
  • cnvkit 0
  • tumor-only 0
  • pangenome 0
  • single 0
  • demultiplexing 0
  • bedGraph 0
  • plasmid 0
  • mitochondria 0
  • json 0
  • snp 0
  • kmers 0
  • prediction 0
  • short-read 0
  • NCBI 0
  • csv 0
  • multiple sequence alignment 0
  • extract 0
  • antimicrobial peptides 0
  • mirna 0
  • profile 0
  • ont 0
  • mem 0
  • ptr 0
  • diversity 0
  • riboseq 0
  • cat 0
  • call 0
  • antibiotic resistance 0
  • clipping 0
  • sourmash 0
  • benchmark 0
  • MAF 0
  • 3-letter genome 0
  • counts 0
  • structural 0
  • concatenate 0
  • coptr 0
  • indels 0
  • isolates 0
  • detection 0
  • mpileup 0
  • adapters 0
  • compare 0
  • distance 0
  • amps 0
  • query 0
  • profiling 0
  • de novo 0
  • FASTQ 0
  • microbiome 0
  • fragment 0
  • summary 0
  • interval 0
  • single cell 0
  • merging 0
  • tabular 0
  • reference-free 0
  • arg 0
  • kallisto 0
  • containment 0
  • snps 0
  • sample 0
  • sequencing 0
  • umitools 0
  • HMM 0
  • gsea 0
  • microarray 0
  • pypgx 0
  • isomir 0
  • compress 0
  • bgzip 0
  • hic 0
  • deep learning 0
  • haplotypecaller 0
  • cut 0
  • resistance 0
  • ATAC-seq 0
  • rna 0
  • read depth 0
  • bin 0
  • preprocessing 0
  • ccs 0
  • bigwig 0
  • dna 0
  • fungi 0
  • CLIP 0
  • diamond 0
  • circrna 0
  • DNA sequencing 0
  • biosynthetic gene cluster 0
  • mtDNA 0
  • ampir 0
  • family 0
  • bedgraph 0
  • chunk 0
  • happy 0
  • targeted sequencing 0
  • ranking 0
  • ancestry 0
  • chromosome 0
  • normalization 0
  • peak-calling 0
  • matching 0
  • fai 0
  • malt 0
  • ngscheckmate 0
  • genome assembler 0
  • ganon 0
  • redundancy 0
  • paf 0
  • skani 0
  • add 0
  • telomere 0
  • retrotransposon 0
  • microsatellite 0
  • union 0
  • DNA sequence 0
  • genmod 0
  • phylogenetic placement 0
  • public datasets 0
  • xeniumranger 0
  • quantification 0
  • BGC 0
  • parsing 0
  • image 0
  • STR 0
  • bcl2fastq 0
  • hmmcopy 0
  • HiFi 0
  • hybrid capture sequencing 0
  • copy number alteration calling 0
  • indel 0
  • DRAMP 0
  • deeparg 0
  • genome mining 0
  • mlst 0
  • arriba 0
  • panel 0
  • das_tool 0
  • prokka 0
  • small indels 0
  • fusion 0
  • typing 0
  • das tool 0
  • SNP 0
  • polishing 0
  • entrez 0
  • insert 0
  • rsem 0
  • regions 0
  • bim 0
  • replace 0
  • fam 0
  • fastk 0
  • PCA 0
  • fingerprint 0
  • fcs-gx 0
  • spark 0
  • structural_variants 0
  • benchmarking 0
  • macrel 0
  • dictionary 0
  • amplify 0
  • UMI 0
  • neubi 0
  • lineage 0
  • RNA-seq 0
  • eukaryotes 0
  • bacterial 0
  • duplication 0
  • pangolin 0
  • genomes 0
  • covid 0
  • pan-genome 0
  • pairsam 0
  • prokaryotes 0
  • angsd 0
  • scores 0
  • reports 0
  • krona 0
  • mkfastq 0
  • aln 0
  • bwameth 0
  • mapper 0
  • npz 0
  • windowmasker 0
  • hi-c 0
  • bakta 0
  • vrhyme 0
  • nucleotide 0
  • highly_multiplexed_imaging 0
  • mcmicro 0
  • image_analysis 0
  • seqtk 0
  • archiving 0
  • gene expression 0
  • zip 0
  • unzip 0
  • uncompress 0
  • untar 0
  • mask 0
  • kraken 0
  • RNA 0
  • proteome 0
  • guide tree 0
  • microbes 0
  • transposons 0
  • roh 0
  • transcripts 0
  • organelle 0
  • converter 0
  • intervals 0
  • genome assembly 0
  • gatk4spark 0
  • mzml 0
  • chimeras 0
  • comparisons 0
  • combine 0
  • comparison 0
  • quality trimming 0
  • score 0
  • adapter trimming 0
  • popscle 0
  • pileup 0
  • genotype-based deconvoltion 0
  • bamtools 0
  • variant_calling 0
  • bracken 0
  • rna_structure 0
  • cellranger 0
  • hidden Markov model 0
  • sylph 0
  • amplicon sequencing 0
  • notebook 0
  • informative sites 0
  • kinship 0
  • identity 0
  • relatedness 0
  • wastewater 0
  • virulence 0
  • cut up 0
  • miRNA 0
  • tabix 0
  • cool 0
  • krona chart 0
  • dist 0
  • dump 0
  • lossless 0
  • observations 0
  • pseudoalignment 0
  • shapeit 0
  • CRISPR 0
  • prefetch 0
  • ataqv 0
  • repeat expansion 0
  • long_read 0
  • minimap2 0
  • checkv 0
  • uLTRA 0
  • atac-seq 0
  • chip-seq 0
  • png 0
  • cfDNA 0
  • wig 0
  • population genomics 0
  • ligate 0
  • megan 0
  • nacho 0
  • mash 0
  • pigz 0
  • profiles 0
  • hla_typing 0
  • bustools 0
  • gstama 0
  • resolve_bioscience 0
  • spatial_transcriptomics 0
  • checksum 0
  • maximum likelihood 0
  • screen 0
  • krakentools 0
  • phase 0
  • haplotypes 0
  • iphop 0
  • krakenuniq 0
  • assembly evaluation 0
  • trancriptome 0
  • tama 0
  • polyA_tail 0
  • hlala_typing 0
  • k-mer frequency 0
  • corrupted 0
  • reformat 0
  • GC content 0
  • tree 0
  • minhash 0
  • nanostring 0
  • barcode 0
  • mapcounter 0
  • pair 0
  • instrain 0
  • haplogroups 0
  • mRNA 0
  • find 0
  • refine 0
  • hlala 0
  • ichorcna 0
  • primer 0
  • serogroup 0
  • interactive 0
  • hla 0
  • long terminal repeat 0
  • split_kmers 0
  • WGS 0
  • regression 0
  • taxids 0
  • taxon name 0
  • zlib 0
  • variation 0
  • vg 0
  • vcflib 0
  • ampgram 0
  • amptransformer 0
  • orthologs 0
  • cgMLST 0
  • dereplicate 0
  • image_processing 0
  • taxon tables 0
  • otu tables 0
  • standardisation 0
  • standardise 0
  • svdb 0
  • ome-tif 0
  • de novo assembler 0
  • small genome 0
  • MCMICRO 0
  • signature 0
  • FracMinHash sketch 0
  • mirdeep2 0
  • interactions 0
  • functional analysis 0
  • reformatting 0
  • function 0
  • pharokka 0
  • archive 0
  • xz 0
  • mudskipper 0
  • long terminal retrotransposon 0
  • transcriptomic 0
  • kma 0
  • parallelized 0
  • orthology 0
  • rrna 0
  • genetics 0
  • salmon 0
  • rgfa 0
  • small variants 0
  • multiallelic 0
  • nucleotides 0
  • cnvnator 0
  • orf 0
  • leviosam2 0
  • lift 0
  • metamaps 0
  • registration 0
  • RNA sequencing 0
  • trgt 0
  • GPU-accelerated 0
  • purge duplications 0
  • library 0
  • preseq 0
  • adapter 0
  • import 0
  • variant pruning 0
  • doublets 0
  • bfiles 0
  • anndata 0
  • gene labels 0
  • read-group 0
  • hostile 0
  • Read depth 0
  • decontamination 0
  • graph layout 0
  • human removal 0
  • screening 0
  • nextclade 0
  • removal 0
  • msisensor-pro 0
  • micro-satellite-scan 0
  • cleaning 0
  • tumor 0
  • msi 0
  • instability 0
  • MSI 0
  • homoploymer 0
  • Duplication purging 0
  • Pharmacogenetics 0
  • snpeff 0
  • effect prediction 0
  • shigella 0
  • switch 0
  • ancient dna 0
  • Streptococcus pneumoniae 0
  • sequenzautils 0
  • rename 0
  • salmonella 0
  • smrnaseq 0
  • fusions 0
  • soft-clipped clusters 0
  • fixmate 0
  • contig 0
  • retrotransposons 0
  • dict 0
  • collate 0
  • bam2fq 0
  • frame-shift correction 0
  • long-read sequencing 0
  • scaffolding 0
  • sequence analysis 0
  • junctions 0
  • pharmacogenetics 0
  • runs_of_homozygosity 0
  • scaffold 0
  • taxonomic profile 0
  • SimpleAF 0
  • concordance 0
  • duplex 0
  • deconvolution 0
  • bayesian 0
  • merge mate pairs 0
  • reads merging 0
  • unaligned 0
  • UMIs 0
  • fetch 0
  • realignment 0
  • GEO 0
  • microscopy 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • metadata 0
  • microbial 0
  • allele-specific 0
  • concat 0
  • panelofnormals 0
  • MaltExtract 0
  • gatk 0
  • HOPS 0
  • joint genotyping 0
  • authentication 0
  • edit distance 0
  • secondary metabolites 0
  • NRPS 0
  • RiPP 0
  • demultiplexed reads 0
  • evidence 0
  • antibiotics 0
  • antismash 0
  • RNA-Seq 0
  • simulate 0
  • artic 0
  • interval list 0
  • aggregate 0
  • tbi 0
  • gwas 0
  • CNV 0
  • sra-tools 0
  • settings 0
  • BAM 0
  • blastn 0
  • version 0
  • correction 0
  • calling 0
  • cnv calling 0
  • immunoprofiling 0
  • awk 0
  • cvnkit 0
  • estimation 0
  • vdj 0
  • single cells 0
  • recombination 0
  • eCLIP 0
  • parse 0
  • genome bins 0
  • fasterq-dump 0
  • structural-variant calling 0
  • intersect 0
  • blastp 0
  • normalize 0
  • norm 0
  • eigenstrat 0
  • scatter 0
  • reheader 0
  • validate 0
  • samplesheet 0
  • format 0
  • eido 0
  • deseq2 0
  • metagenomes 0
  • rna-seq 0
  • intersection 0
  • windows 0
  • heatmap 0
  • region 0
  • sizes 0
  • spatial_omics 0
  • bases 0
  • random forest 0
  • allele 0
  • ChIP-seq 0
  • gem 0
  • genomad 0
  • vector 0
  • f coefficient 0
  • homozygous genotypes 0
  • jaccard 0
  • heterozygous genotypes 0
  • overlap 0
  • inbreeding 0
  • array_cgh 0
  • cytosure 0
  • getfasta 0
  • run 0
  • genomecov 0
  • closest 0
  • rad 0
  • bamtobed 0
  • sorting 0
  • structural variant 0
  • bam2fastx 0
  • bam2fastq 0
  • derived alleles 0
  • homology 0
  • genome graph 0
  • unionBedGraphs 0
  • reverse complement 0
  • hmmfetch 0
  • decompose 0
  • subtract 0
  • slopBed 0
  • transmembrane 0
  • vcf file 0
  • bgen file 0
  • plink2_pca 0
  • pca 0
  • tnseq 0
  • ancestral alleles 0
  • pruning 0
  • decoy 0
  • linkage equilibrium 0
  • htseq 0
  • shiftBed 0
  • multinterval 0
  • sompy 0
  • overlapped bed 0
  • maskfasta 0
  • peak picking 0
  • chunking 0
  • site frequency spectrum 0
  • co-orthology 0
  • spectral clustering 0
  • sequence similarity 0
  • python 0
  • plastid 0
  • smaller fastqs 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • clumping fastqs 0
  • quarto 0
  • r 0
  • deduping 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • assay 0
  • phylogenetics 0
  • minimum_evolution 0
  • distance-based 0
  • short 0
  • nucleotide sequence 0
  • intron 0
  • masking 0
  • low-complexity 0
  • uq 0
  • parallel 0
  • file manipulation 0
  • comparative genomics 0
  • autozygosity 0
  • homozygosity 0
  • covariance model 0
  • deep variant 0
  • dereplication 0
  • mutect 0
  • microbial genomics 0
  • drep 0
  • idx 0
  • biallelic 0
  • update header 0
  • nm 0
  • introns 0
  • variancepartition 0
  • dream 0
  • install 0
  • joint-genotyping 0
  • genotypegvcf 0
  • BCF 0
  • md 0
  • csi 0
  • bioawk 0
  • Read coverage histogram 0
  • remove samples 0
  • gemini 0
  • maf 0
  • lua 0
  • toml 0
  • scanner 0
  • helitron 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • verifybamid 0
  • DNA contamination estimation 0
  • mkvdjref 0
  • construct 0
  • melon 0
  • graph projection to vcf 0
  • plant 0
  • cellpose 0
  • hifi 0
  • extractunbinned 0
  • linkbins 0
  • Assembly 0
  • sintax 0
  • vsearch/sort 0
  • domains 0
  • vcf2db 0
  • long read alignment 0
  • umicollapse 0
  • genepred 0
  • refflat 0
  • gtftogenepred 0
  • ucsc/liftover 0
  • chromap 0
  • quality assurnce 0
  • qa 0
  • metabolite annotation 0
  • metaspace 0
  • integron 0
  • mobile genetic elements 0
  • genome annotation 0
  • trna 0
  • scRNA-Seq 0
  • crispr 0
  • antibody capture 0
  • files 0
  • antigen capture 0
  • covariance models 0
  • multiomics 0
  • upd 0
  • uniparental 0
  • disomy 0
  • unmarkduplicates 0
  • snv 0
  • usearch 0
  • pangenome-scale 0
  • genotype dosages 0
  • comp 0
  • denoisereadcounts 0
  • readwriter 0
  • SINE 0
  • dnascope 0
  • tblastn 0
  • network 0
  • wget 0
  • groupby 0
  • tnscope 0
  • bgen 0
  • subtyping 0
  • bedcov 0
  • createreadcountpanelofnormals 0
  • genome polishing 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • Salmonella enterica 0
  • boxcox 0
  • sorted 0
  • Escherichia coli 0
  • assembly polishing 0
  • copyratios 0
  • postprocessing 0
  • all versus all 0
  • geo 0
  • mashmap 0
  • wavefront 0
  • whamg 0
  • wham 0
  • compartments 0
  • copy number analysis 0
  • gender determination 0
  • topology 0
  • copy number alterations 0
  • copy number variation 0
  • yahs 0
  • mapad 0
  • workflow_mode 0
  • adna 0
  • c to t 0
  • calder2 0
  • proteus 0
  • readproteingroups 0
  • ploidy 0
  • eigenvectors 0
  • hicPCA 0
  • sliding 0
  • cadd 0
  • snakemake 0
  • workflow 0
  • homologs 0
  • predict 0
  • multi-tool 0
  • microRNA 0
  • admixture 0
  • multiqc 0
  • mass_error 0
  • search engine 0
  • taxonomic composition 0
  • poolseq 0
  • variant-calling 0
  • stardist 0
  • telseq 0
  • vsearch/dereplicate 0
  • CRISPRi 0
  • http(s) 0
  • utility 0
  • jvarkit 0
  • translate 0
  • tar 0
  • tarball 0
  • targz 0
  • HLA 0
  • adapterremoval 0
  • 16S 0
  • bclconvert 0
  • rank 0
  • antimicrobial reistance 0
  • drug categorization 0
  • ATLAS 0
  • uniques 0
  • Illumina 0
  • functional 0
  • impute-info 0
  • phylogenies 0
  • tags 0
  • tag2tag 0
  • sequencing_bias 0
  • hashing-based deconvolution 0
  • hhsuite 0
  • post mortem damage 0
  • hmmscan 0
  • xml 0
  • svg 0
  • standard 0
  • haplotag 0
  • atlas 0
  • staging 0
  • mkarv 0
  • Staging 0
  • hmmpress 0
  • prepare 0
  • nucBed 0
  • plotting 0
  • post Post-processing 0
  • patterns 0
  • regex 0
  • paired reads re-pairing 0
  • fix 0
  • metagenome assembler 0
  • malformed 0
  • partitioning 0
  • model 0
  • scanpy 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • regtools 0
  • resegment 0
  • leafcutter 0
  • amp 0
  • chip 0
  • recovery 0
  • mgi 0
  • Staphylococcus aureus 0
  • affy 0
  • updatedata 0
  • reference panels 0
  • identity-by-descent 0
  • decomposeblocksub 0
  • block substitutions 0
  • morphology 0
  • nuclear contamination estimate 0
  • AT content 0
  • installation 0
  • nucleotide content 0
  • elfasta 0
  • elprep 0
  • catpack 0
  • Computational Immunology 0
  • controlstatistics 0
  • source tracking 0
  • emoji 0
  • Bioinformatics Tools 0
  • quality_control 0
  • Immune Deconvolution 0
  • doublet 0
  • doublet_detection 0
  • relabel 0
  • barcodes 0
  • doCounts 0
  • subsetting 0
  • scvi 0
  • solo 0
  • import segmentation 0
  • nuclear segmentation 0
  • cell segmentation 0
  • allele counts 0
  • pdb 0
  • pile up 0
  • structural-variants 0
  • omics 0
  • biological activity 0
  • bamtools/split 0
  • prior knowledge 0
  • tag 0
  • cell_barcodes 0
  • mygene 0
  • go 0
  • yaml 0
  • bamtools/convert 0
  • shuffleBed 0
  • scimap 0
  • SNV 0
  • bigbed 0
  • Indel 0
  • host removal 0
  • haploype 0
  • bacphlip 0
  • virulent 0
  • nanopore sequencing 0
  • rna velocity 0
  • cobra 0
  • extension 0
  • grea 0
  • Bayesian 0
  • spatial_neighborhoods 0
  • functional enrichment 0
  • cell_type_identification 0
  • background_correction 0
  • illumiation_correction 0
  • hardy-weinberg 0
  • hwe statistics 0
  • hwe equilibrium 0
  • reference-independent 0
  • genotype likelihood 0
  • collapse 0
  • liftover 0
  • probabilistic realignment 0
  • seqfu 0
  • n50 0
  • cell_phenotyping 0
  • associations 0
  • machine_learning 0
  • element 0
  • trimBam 0
  • bamUtil 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • trio binning 0
  • tandem repeats 0
  • case/control 0
  • long read 0
  • temperate 0
  • translation 0
  • realign 0
  • ribosomal 0
  • 10x 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • paraphase 0
  • selector 0
  • cram-size 0
  • size 0
  • quality check 0
  • circular 0
  • bwameme 0
  • spot 0
  • orthogroup 0
  • authentict 0
  • sage 0
  • contiguate 0
  • mass spectrometry 0
  • featuretable 0
  • extraction 0
  • read group 0
  • bias 0
  • grabix 0
  • bwamem2 0
  • paired reads merging 0
  • MMseqs2 0
  • overlap-based merging 0
  • check 0
  • lifestyle 0
  • hamming-distance 0
  • hashing-based deconvoltion 0
  • gnu 0
  • coreutils 0
  • generic 0
  • transposable element 0
  • retrieval 0
  • autofluorescence 0
  • cycif 0
  • InterProScan 0
  • guidetree 0
  • busco 0
  • droplet based single cells 0
  • impute 0
  • lexogen 0
  • genotype-based demultiplexing 0
  • donor deconvolution 0
  • cellsnp 0
  • reference compression 0
  • trimfq 0
  • vcflib/vcffixup 0
  • AC/NS/AF 0
  • Pacbio 0
  • reference panel 0
  • bedtobigbed 0
  • bedgraphtobigwig 0
  • megahit 0
  • calibratedragstrmodel 0
  • reduced 0
  • representations 0
  • maxbin2 0
  • metagenome-assembled genomes 0
  • mass-spectroscopy 0
  • mcr-1 0
  • MD5 0
  • 128 bit 0
  • bedtointervallist 0
  • denovo 0
  • debruijn 0
  • asereadcounter 0
  • daa 0
  • rma6 0
  • Neisseria meningitidis 0
  • 3D heat map 0
  • contour map 0
  • Merqury 0
  • annotateintervals 0
  • targets 0
  • smudgeplot 0
  • mash/sketch 0
  • taxonomic assignment 0
  • metaphlan 0
  • peptide prediction 0
  • determinegermlinecontigploidy 0
  • legionella 0
  • clinical 0
  • pneumophila 0
  • createsomaticpanelofnormals 0
  • limma 0
  • Listeria monocytogenes 0
  • createsequencedictionary 0
  • condensedepthevidence 0
  • lofreq/call 0
  • qualities 0
  • AMP 0
  • dragstr 0
  • collectreadcounts 0
  • functional genomics 0
  • sgRNA 0
  • CRISPR-Cas9 0
  • maximum-likelihood 0
  • rra 0
  • composestrtablefile 0
  • short variant discovery 0
  • combinegvcfs 0
  • DNA damage 0
  • NGS 0
  • damage patterns 0
  • collectsvevidence 0
  • estimate 0
  • unionsum 0
  • heattree 0
  • adapter removal 0
  • unmapped 0
  • contaminant 0
  • cancer genome 0
  • somatic structural variations 0
  • mobile element insertions 0
  • sequencing summary 0
  • NextGenMap 0
  • ngm 0
  • Neisseria gonorrhoeae 0
  • gender 0
  • zipperbams 0
  • ubam 0
  • graph construction 0
  • graph drawing 0
  • squeeze 0
  • GATK UnifiedGenotyper 0
  • odgi 0
  • combine graphs 0
  • graph stats 0
  • graph unchopping 0
  • graph formats 0
  • graph viz 0
  • tumor/normal 0
  • hla-typing 0
  • ILP 0
  • HLA-I 0
  • block-compressed 0
  • groupreads 0
  • PCR/optical duplicates 0
  • SNP table 0
  • Beautiful stand-alone HTML report 0
  • methylation bias 0
  • mosdepth 0
  • mbias 0
  • gangstr 0
  • assembler 0
  • de Bruijn 0
  • gene-calling 0
  • microrna 0
  • gamma 0
  • target prediction 0
  • mitochondrial genome 0
  • reference genome 0
  • UShER 0
  • bootstrapping 0
  • bacterial variant calling 0
  • otu table 0
  • bioinformatics tools 0
  • germline variant calling 0
  • somatic variant calling 0
  • variant caller 0
  • rust 0
  • fq 0
  • microsatellite instability 0
  • lint 0
  • random 0
  • generate 0
  • scan 0
  • mtnucratio 0
  • ratio 0
  • single molecule 0
  • mitochondrial to nuclear ratio 0
  • collapsing 0
  • train 0
  • upper-triangular matrix 0
  • gawk 0
  • amrfinderplus 0
  • fARGene 0
  • rgi 0
  • ibd 0
  • hbd 0
  • beagle 0
  • mitochondrial 0
  • genome profile 0
  • bgc 0
  • Haemophilus influenzae 0
  • haplotype resolution 0
  • file parsing 0
  • txt 0
  • compound 0
  • svcluster 0
  • svannotate 0
  • gccounter 0
  • splitintervals 0
  • readcounter 0
  • splitcram 0
  • site depth 0
  • HMMER 0
  • amino acid 0
  • shiftintervals 0
  • shiftfasta 0
  • abricate 0
  • extractvariants 0
  • hmtnote 0
  • gene model 0
  • Haplotypes 0
  • Imputation 0
  • joint-variant-calling 0
  • GNU 0
  • merge compare 0
  • genomes on a tree 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • tama_collapse.py 0
  • gfastats 0
  • TAMA 0
  • extract_variants 0
  • Mykrobe 0
  • gstama/merge 0
  • Salmonella Typhi 0
  • repeat content 0
  • gstama/polyacleanup 0
  • genome heterozygosity 0
  • genome size 0
  • gunc 0
  • gunzip 0
  • models 0
  • gvcftools 0
  • Hidden Markov Model 0
  • annotations 0
  • spliced 0
  • pneumoniae 0
  • learnreadorientationmodel 0
  • indexfeaturefile 0
  • readcountssummary 0
  • getpileupsumaries 0
  • kallisto/index 0
  • quant 0
  • germlinevariantsites 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • digital normalization 0
  • k-mer counting 0
  • Klebsiella 0
  • panelofnormalscreation 0
  • papermill 0
  • kegg 0
  • kofamscan 0
  • jointgenotyping 0
  • combining 0
  • genomicsdbimport 0
  • genomicsdb 0
  • gatherbqsrreports 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • reorder 0
  • readorientationartifacts 0
  • jupytext 0
  • shiftchain 0
  • probability_maps 0
  • pos 0
  • haemophilus 0
  • selectvariants 0
  • revert 0
  • panel_of_normals 0
  • IDR 0
  • igv 0
  • igv.js 0
  • js 0
  • genome browser 0
  • multicut 0
  • pixel classification 0
  • pixel_classification 0
  • reblockgvcf 0
  • Jupyter 0
  • printsvevidence 0
  • printreads 0
  • interproscan 0
  • preprocessintervals 0
  • postprocessgermlinecnvcalls 0
  • genomic islands 0
  • insertion 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • jasminesv 0
  • jasmine 0
  • Python 0
  • flip 0
  • ligation junctions 0
  • duplicate removal 0
  • sex determination 0
  • interleave 0
  • header 0
  • random draw 0
  • pseudohaploid 0
  • pseudodiploid 0
  • freqsum 0
  • bam2seqz 0
  • gc_wiggle 0
  • induce 0
  • custom 0
  • genetic sex 0
  • cls 0
  • relative coverage 0
  • Cores 0
  • Segmentation 0
  • rare variants 0
  • error 0
  • TMA dearray 0
  • de-novo 0
  • longread 0
  • sha256 0
  • 256 bit 0
  • UNet 0
  • shinyngs 0
  • exploratory 0
  • sertotype 0
  • sequence headers 0
  • density 0
  • cluster analysis 0
  • cumulative coverage 0
  • scatterplot 0
  • calmd 0
  • corrrelation 0
  • faidx 0
  • track 0
  • insert size 0
  • repair 0
  • paired 0
  • read pairs 0
  • paired-end 0
  • scramble 0
  • clusteridentifier 0
  • peak-caller 0
  • cut&tag 0
  • cut&run 0
  • chromatin 0
  • seacr 0
  • pcr duplicates 0
  • assembly-binning 0
  • cutesv 0
  • variant recalibration 0
  • gct 0
  • boxplot 0
  • features 0
  • amplicon 0
  • spa 0
  • streptococcus 0
  • sccmec 0
  • variantcalling 0
  • cmseq 0
  • protein coding genes 0
  • Sample 0
  • polymorphic sites 0
  • polymorphic 0
  • access 0
  • decompress 0
  • polymut 0
  • polya tail 0
  • fast5 0
  • chromosome_visualization 0
  • Mycobacterium tuberculosis 0
  • chromosomal rearrangements 0
  • eucaryotes 0
  • coding 0
  • cds 0
  • transcroder 0
  • sequencing adapters 0
  • spatype 0
  • antitarget 0
  • mcool 0
  • cooler/balance 0
  • sliding window 0
  • genomic bins 0
  • makebins 0
  • CRAM 0
  • SMN1 0
  • SMN2 0
  • POA 0
  • sniffles 0
  • core 0
  • snippy 0
  • enzyme 0
  • digest 0
  • cload 0
  • subcontigs 0
  • fracminhash sketch 0
  • nucleotide composition 0
  • SNPs 0
  • invariant 0
  • constant 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • signatures 0
  • hash sketch 0
  • ampliconclip 0
  • pairtools 0
  • insertions 0
  • tandem duplications 0
  • CoPRO 0
  • GRO-cap 0
  • PRO-cap 0
  • CAGE 0
  • NETCAGE 0
  • RAMPAGE 0
  • csRNA-seq 0
  • STRIPE-seq 0
  • PRO-seq 0
  • GRO-seq 0
  • genetic 0
  • faqcs 0
  • sortvcf 0
  • str 0
  • exclude 0
  • variant identifiers 0
  • indep 0
  • indep pairwise 0
  • recode 0
  • whole genome association 0
  • cache 0
  • identifiers 0
  • scoring 0
  • percent on target 0
  • variant genetic 0
  • deletions 0
  • picard/renamesampleinvcf 0
  • endogenous DNA 0
  • read 0
  • pairstools 0
  • restriction fragments 0
  • consensus sequence 0
  • public 0
  • paragraph 0
  • graphs 0
  • pbbam 0
  • pbmerge 0
  • subreads 0
  • pbp 0
  • pair-end 0
  • pedigrees 0
  • pcr 0
  • ENA 0
  • SRA 0
  • motif 0
  • ChIP-Seq 0
  • phantom peaks 0
  • prophage 0
  • identification 0
  • illumina datasets 0
  • phylogenetic composition 0
  • ANI 0
  • hybrid-selection 0
  • mate-pair 0
  • liftovervcf 0
  • porechop_abi 0
  • Streptococcus pyogenes 0
  • sambamba 0
  • segment 0
  • rhocall 0
  • escherichia coli 0
  • R 0
  • bamstat 0
  • strandedness 0
  • experiment 0
  • read_pairs 0
  • fragment_size 0
  • inner_distance 0
  • read distribution 0
  • sequence-based 0
  • subsampling 0
  • mapping-based 0
  • blastx 0
  • integrity 0
  • rtg 0
  • rocplot 0
  • rtg-tools 0
  • salsa 0
  • salsa2 0
  • LCA 0
  • Ancestor 0
  • multimapper 0
  • flagstat 0
  • long uncorrected reads 0
  • neighbour-joining 0
  • genomic intervals 0
  • contact 0
  • pretext 0
  • jpg 0
  • bmp 0
  • contact maps 0
  • gene finding 0
  • split by chromosome 0
  • intervals coverage 0
  • deletion 0
  • circos 0
  • eklipse 0
  • normal database 0
  • PEP 0
  • panel of normals 0
  • cutoff 0
  • eigenstratdatabasetools 0
  • haplotype purging 0
  • duplicate purging 0
  • false duplications 0
  • assembly curation 0
  • Haplotype purging 0
  • False duplications 0
  • Assembly curation 0
  • pep 0
  • schema 0
  • purging 0
  • quast 0
  • data-download 0

The script reads a gff annotation file, and create two output files, one contains the gene models with ORF passing the test, the other contains the rest. By default the test is "> 100" that means all gene models that have ORF longer than 100 Amino acids, will pass the test.

010

passed_gff failed_gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

The script aims to remove features based on a kill list. The default behaviour is to look at the features's ID. If the feature has an ID (case insensitive) listed among the kill list it will be removed. /!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and removing all children of a feature will automatically remove this feature too.

0100

gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Filters GFF records to keep only the longest isoform per gene

010

gff versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Generate tables of feature metadata from GTF files

0101

feature_annotation filtered_cdna versions

atlasgeneannotationmanipulation:

Scripts for manipulating gene annotation

Bamcmp (Bam Compare) is a tool for assigning reads between a primary genome and a contamination genome. For instance, filtering out mouse reads from patient derived xenograft mouse models (PDX).

012

primary_filtered_bam contamination_bam versions

Filter out sequences by sequence header name(s)

01000

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

This command replaces the former bcftools view caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the -c option.

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Filters VCF files

012

vcf tbi csi versions

filter:

Apply fixed-threshold filters to VCF files.

Sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

0120000

vcf tbi csi versions

bcftools:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

bcftools plugin setGT:

Bcftools plugins are tools that can be used with bcftools to manipulate variant calls in Variant Call Format (VCF) and BCF. The setGT plugin sets genotypes according to the specified criteria and filtering expressions. For example, missing genotypes can be set to ref, but much more than that.

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

012000

vcf tbi csi versions

view:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Summarize and/or filter reads based on bisulfite conversion rate

01010101

bam versions

biscuit:

A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data

Module to use CellBender to estimate ambient RNA from single-cell RNA-seq data

01

h5 filtered_h5 posterior_h5 barcodes metrics report pdf log checkpoint versions

cellbender:

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.

Module to build a filtered GTF needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkgtf command.

0

gtf versions

cellranger:

Cell Ranger by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Module to build a filtered gtf needed by the 10x Genomics Cell Ranger Arc tool. Uses the cellranger-arc mkgtf command.

0

gtf versions

cellrangerarc:

Cell Ranger Arc by 10x Genomics is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.

Filter and trim long read data.

010

fastq versions

zcat:

zcat uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

gzip:

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77).

Builds a classic bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Builds a compact bloom filter COBS index

01

index versions

cobs:

Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)

Filters a differential expression table based on logFC and adjusted p-value thresholds

01012012

filtered versions

pandas:

Python library for data manipulation and analysis

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

0101

gtf versions

gtffilter:

Filter a gtf file to keep only regions that are located on a chromosome represented in a given fasta file

filter a matrix based on a minimum value and numbers of samples that must pass.

0101

filtered tests session_info versions

matrixfilter:

filter a matrix based on a minimum value and numbers of samples

This tool filters alignments in a BAM/CRAM file according the the specified parameters.

012

bam logs versions

deeptools:

A set of user-friendly tools for normalization and visualization of deep-sequencing data

Filter features in gzipped BED format

01

bed versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

Filter features in gzipped GFF3 format

01

gff3 versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

01234500

vcf versions

Filter, sort and markdup sam/bam files, with optional BQSR and variant calling.

012345601010100000

bam logs metrics recall gvcf table activity_profile assembly_regions versions

elprep:

elPrep is a high-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines. It can be used as a drop-in replacement for SAMtools/Picard/GATK4.

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

010

outseq versions

emboss:

The European Molecular Biology Open Software Suite

Filter variants based on Ensembl Variant Effect Predictor (VEP) annotations.

010

output versions

ensemblvep:

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Uses evigene/scripts/prot/tr2aacds.pl to filter a transcript assembly

01

dropset okayset versions

evigene:

EvidentialGene is a genome informatics project for "Evidence Directed Gene Construction for Eukaryotes", for constructing high quality, accurate gene sets for animals and plants (any eukaryotes), being developed by Don Gilbert at Indiana University, gilbertd at indiana edu.

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

Uses FGBIO FilterConsensusReads to filter consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

0101000

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Filtlong filters long reads based on quality measures or short read data.

012

reads log versions

Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.

012345000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.

012

contamination segmentation versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply a Convolutional Neural Net to filter annotated variants

0123400000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters intervals based on annotations and/or count statistics.

010101

interval_list versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.

01234567010101

vcf tbi stats versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Apply tranche filtering

012300000

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Filter variants

01201010101

vcf tbi versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

012000000

recal idx tranches plots versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Extract fields from a VCF file to a tab-delimited table

012345010101

table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Validate, filter, convert and perform various other operations on GFF files

010

gtf gffread_gff gffread_fasta versions

runs a functional enrichment analysis with gprofiler2

010101

all_enrich rds plot_png plot_html sub_enrich sub_plot filtered_gmt session_info versions

gprofiler2:

An R interface corresponding to the 2019 update of g:Profiler web tool.

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

0101

high_conf_sv all_sv versions

gridss:

GRIDSS: the Genomic Rearrangement IDentification Software Suite

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

010100

summary tree markers msa user_msa filtered failed log warnings versions

gtdbtk:

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Convert VCF to a user friendly table

012301

output versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Filtering VCF with dynamically-compiled java expressions

01230101010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

annotate VCF files for poly repeats

01010101

vcf tbi csi versions

jvarkit:

Java utilities for Bioinformatics.

bcftools:

View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF

Removes low abundance k-mers from FASTA/FASTQ files

01

trimmed versions

khmer:

khmer k-mer counting library

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

00

report kmers versions

khmer:

khmer k-mer counting library

Lofreq subcommand to remove variants with low coverage or strand bias potential

01

vcf versions

lofreq:

A fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data

Compare k-mer frequency in reads and assembly to devise the metrics K and QV

0101000

hist log_stderr versions

merfin:

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of Merqury) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.

A python workflow that assembles mitogenomes from Pacbio HiFi reads

010000

fasta stats gb gff all_potential_contigs contigs_annotations contigs_circularization contigs_filtering coverage_mapping coverage_plot final_mitogenome_annotation final_mitogenome_choice final_mitogenome_coverage potential_contigs reads_mapping_and_assembly shared_genes versions

mitohifi.py:

A python workflow that assembles mitogenomes from Pacbio HiFi reads

pre-filtering and calculating position-specific summary statistics using the Markov substitution model

0123401

txt versions

MuSE:

Somatic point mutation caller based on Markov substitution model for molecular evolution

Filtering and trimming of Oxford Nanopore Sequencing data

010

filtreads log_file versions

Nanoq implements ultra-fast read filters and summary reports for high-throughput nanopore reads.

010

stats reads versions

Filters peptide/protein identification results by different criteria.

01

mzml featurexml consensusxml versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Filters peptide/protein identification results by different criteria.

012

filtered versions

openms:

OpenMS is an open-source software C++ library for LC-MS data management and analyses

Select pairs according to given condition by options.args

01

selected unselected versions

pairtools:

CLI tools to process mapped Hi-C data

Filters SAM/BAM files to include/exclude either aligned/unaligned reads or based on a read list

0120

bam versions

picard:

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

pmdtools command to filter ancient DNA molecules from others

01200

bam versions

pmdtools:

Compute postmortem damage patterns and decontaminate ancient genomes

Run all Portcullis steps in one go

010101

log pass_junctions_bed pass_junctions_tab intron_gff exon_gff spliced_bam spliced_bai versions

portcullis:

Portcullis is a tool that filters out invalid splice junctions from RNA-seq alignment data. It accepts BAM files from various RNA-seq mappers, analyzes splice junctions and removes likely false positives, outputting filtered results in multiple formats for downstream analysis.

Filter reads by quality score.

01

reads logs versions log_tab

presto:

A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.

PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program. It can be used to filter, reformat or trim genomic and metagenomic sequence data

01

good_reads single_reads bad_reads log versions

Perform differential proportionality analysis

0123012

results_genewise genewise_plot rdata results_pairwise results_pairwise_filtered adjacency fdr session_info versions

propr:

Logratio methods for omics data

Damage parameter estimation for ancient DNA

012

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Damage parameter estimation for ancient DNA

01

csv versions

pydamage:

Damage parameter estimation for ancient DNA

Converts a PED file to VCF headers

01

output versions

rtgtools:

RealTimeGenomics Tools -- Utilities for accurate VCF comparison and manipulation

Uses the RTN R package for transcriptional regulatory network inference (TNI).

01

tni tni_perm tni_bootstrap tni_filtered versions

rtn:

RTN: Reconstruction of Transcriptional regulatory Networks and analysis of regulons

This module combines samtools and samblaster in order to use samblaster capability to filter or tag SAM files, with the advantage of maintaining both input and output in BAM format. Samblaster input must contain a sequence header: for this reason it has been piped with the "samtools view -h" command. Additional desired arguments for samtools can be passed using: options.args2 for the input bam file options.args3 for the output bam file

01

bam versions

filter/convert SAM/BAM/CRAM file

01

readgroup versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

filter/convert SAM/BAM/CRAM file

0120100

bam cram sam bai csi crai unselected unselected_index versions

samtools:

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm

0123450101

vcf tbi versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

modifies the input VCF file by adding the MLrejected FILTER to the variants

012010101

vcf index versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Filters the raw output of sentieon/tnhaplotyper2.

01234560101

vcf vcf_tbi stats versions

sentieon:

Sentieonยฎ provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.

Select sequences from a large file based on name/ID

010

filter versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Subset FASTA/FASTQ files to some number of sequences

012

subset versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx log versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Common transformation operations on FASTA or FASTQ files.

01

fastx versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

010

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Annotate a VCF file with another VCF file

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

The dbNSFP is an integrated database of functional predictions from multiple algorithms

012012

vcf versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Splits/Joins VCF(s) file into chromosomes

01

out_vcfs versions

snpsift:

SnpSift is a toolbox that allows you to filter and manipulate annotated files

Local sequence alignment tool for filtering, mapping and clustering.

010101

reads log index versions

SortMeRNA:

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Module to build a filtered GTF needed by the 10x Genomics Space Ranger tool. Uses the spaceranger mkgtf command.

0

gtf versions

spaceranger:

Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on total mRNA. Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images. Space Ranger allows users to map the whole transcriptome in formalin fixed paraffin embedded (FFPE) and fresh frozen tissues to discover novel insights into normal development, disease pathology, and clinical translational research. Space Ranger provides pipelines for end to end analysis of Visium Spatial Gene Expression experiments.

Converts a bedpe file to a VCF file (beta version)

01

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Filter a vcf file based on size and/or regions to ignore

0120000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Compare or merge VCF files to generate a consensus or multi sample VCF files.

01000000

vcf versions

survivor:

Toolset for SV simulation, comparison and filtering

Simulate an SV VCF file based on a reference genome

01010100

parameters vcf bed fasta insertions versions

survivor:

Toolset for SV simulation, comparison and filtering

Report multiple stats over a VCF file

01000

stats versions

survivor:

Toolset for SV simulation, comparison and filtering

SvABA is an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements

01234010101010101

sv indel germ_indel germ_sv som_indel som_sv unfiltered_sv unfiltered_indel unfiltered_germ_indel unfiltered_germ_sv unfiltered_som_indel unfiltered_som_sv raw_calls discordants log versions

Performs tests on BAF files

01234

metrics versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Count the instances of each SVTYPE observed in each sample in a VCF.

01

counts versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert an RdTest-formatted bed to the standard VCF format.

0120

vcf tbi versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Convert SV calls to a standardized format.

0101

vcf versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Converts VCFs containing structural variants to BED format

012

bed versions

svtk:

Utilities for consolidating, filtering, resolving, and annotating structural variants.

Filtering, downsampling and profiling alignments in BAM/CRAM formats

01

bam versions

Command line tools for parsing and manipulating VCF files.

012

vcf versions

vcflib:

Command line tools for parsing and manipulating VCF files.

A set of tools written in Perl and C++ for working with VCF files

0100

vcf bcf frq frq_count idepth ldepth ldepth_mean gdepth hap_ld geno_ld geno_chisq list_hap_ld list_geno_ld interchrom_hap_ld interchrom_geno_ld tstv tstv_summary tstv_count tstv_qual filter_summary sites_pi windowed_pi weir_fst heterozygosity hwe tajima_d freq_burden lroh relatedness relatedness2 lqual missing_individual missing_site snp_density kept_sites removed_sites singeltons indel_hist hapcount mendel format info genotypes_matrix genotypes_matrix_individual genotypes_matrix_position impute_hap impute_hap_legend impute_hap_indv ldhat_sites ldhat_locs beagle_gl beagle_pl ped map_ tped tfam diff_sites_in_files diff_indv_in_files diff_sites diff_indv diff_discd_matrix diff_switch_error versions

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Convert and filter aligned reads to .npz

0120101

npz versions

wisecondorx:

WIthin-SamplE COpy Number aberration DetectOR, including sex chromosomes

Click here to trigger an update.