Available Modules

Modules are the building stones of all DSL2 nf-core blocks. You can find more info from nf-core website, if you would like to write your own module.

  • fasta 54
  • fastq 27
  • genomics 25
  • metagenomics 21
  • alignment 21
  • reference 21
  • genome 16
  • index 15
  • align 14
  • assembly 11
  • classification 11
  • sequences 10
  • classify 9
  • bam 8
  • map 8
  • LAST 8
  • MSA 7
  • build 7
  • database 6
  • taxonomy 6
  • phylogeny 6
  • cluster 6
  • db 6
  • vsearch 6
  • sort 5
  • filter 5
  • gff 5
  • taxonomic profiling 5
  • clustering 5
  • long reads 5
  • mags 5
  • bwa 5
  • kraken2 5
  • blast 5
  • kmers 5
  • MAF 5
  • bed 4
  • sam 4
  • k-mer 4
  • databases 4
  • taxonomic classification 4
  • searching 4
  • protein sequence 4
  • mmseqs2 4
  • sequence 4
  • msa 4
  • profiling 4
  • fastx 4
  • ganon 4
  • nucleotide 4
  • annotation 3
  • bacteria 3
  • indexing 3
  • seqkit 3
  • population genetics 3
  • report 3
  • short-read 3
  • multiple sequence alignment 3
  • phylogenetic placement 3
  • paf 3
  • amplicon sequences 3
  • windowmasker 3
  • bracken 3
  • aln 3
  • nanopore 2
  • download 2
  • contamination 2
  • gfa 2
  • convert 2
  • binning 2
  • single-cell 2
  • trimming 2
  • contigs 2
  • isoseq 2
  • protein 2
  • repeat 2
  • reads 2
  • mem 2
  • antibiotic resistance 2
  • interval 2
  • kallisto 2
  • abundance 2
  • bin 2
  • mask 2
  • vrhyme 2
  • seqtk 2
  • rna_structure 2
  • RNA 2
  • cellranger 2
  • immunoprofiling 2
  • amino acid 2
  • ragtag 2
  • reformatting 2
  • identifier 2
  • smrnaseq 2
  • dereplicate 2
  • gatk4 1
  • merge 1
  • coverage 1
  • quality control 1
  • split 1
  • proteomics 1
  • copy number 1
  • bedtools 1
  • graph 1
  • reporting 1
  • variation graph 1
  • consensus 1
  • illumina 1
  • table 1
  • QC 1
  • long-read 1
  • haplotype 1
  • plot 1
  • filtering 1
  • histogram 1
  • example 1
  • validation 1
  • metagenome 1
  • transcript 1
  • umi 1
  • feature 1
  • sketch 1
  • rna 1
  • plasmid 1
  • NCBI 1
  • profile 1
  • extract 1
  • mirna 1
  • duplicates 1
  • clipping 1
  • benchmark 1
  • microbiome 1
  • FASTQ 1
  • distance 1
  • adapters 1
  • query 1
  • fgbio 1
  • add 1
  • cut 1
  • dna 1
  • DNA sequence 1
  • sample 1
  • retrotransposon 1
  • containment 1
  • happy 1
  • HiFi 1
  • typing 1
  • polishing 1
  • scaffold 1
  • amplicon sequencing 1
  • mlst 1
  • pseudoalignment 1
  • replace 1
  • scaffolding 1
  • guide tree 1
  • kraken 1
  • eukaryotes 1
  • adapter trimming 1
  • remove 1
  • quality trimming 1
  • complement 1
  • minhash 1
  • immunoinformatics 1
  • mash 1
  • long terminal retrotransposon 1
  • vdj 1
  • trancriptome 1
  • tama 1
  • translation 1
  • primer 1
  • pair 1
  • screen 1
  • blastn 1
  • rename 1
  • sequenzautils 1
  • transformation 1
  • nucleotides 1
  • gstama 1
  • graft 1
  • removal 1
  • adapter 1
  • subset 1
  • rrna 1
  • metagenomic 1
  • UMIs 1
  • version 1
  • duplex 1
  • estimation 1
  • recombination 1
  • emboss 1
  • covariance models 1
  • trna 1
  • genome annotation 1
  • mobile genetic elements 1
  • vsearch/fastqfilter 1
  • fastqfilter 1
  • vsearch/dereplicate 1
  • immunology 1
  • BCR 1
  • vsearch/sort 1
  • eucaryotes 1
  • metagenome assembler 1
  • coding 1
  • cds 1
  • transcroder 1
  • patch 1
  • long-reads 1
  • extractunbinned 1
  • linkbins 1
  • integron 1
  • sintax 1
  • usearch 1
  • collapse 1
  • Pacbio 1
  • low-complexity 1
  • retrieval 1
  • CRISPRi 1
  • taxonomic composition 1
  • trimfq 1
  • masking 1
  • vector 1
  • standard 1
  • pdb 1
  • reverse complement 1
  • microRNA 1
  • parallel 1
  • uniques 1
  • transform 1
  • gaps 1
  • dragstr 1
  • composestrtablefile 1
  • antibiotic resistance genes 1
  • consensus sequence 1
  • ARGs 1
  • swissprot 1
  • random 1
  • generate 1
  • embl 1
  • gstama/polyacleanup 1
  • genbank 1
  • maskfasta 1
  • getfasta 1
  • genomecov 1
  • segment 1
  • mkvdjref 1
  • postprocessing 1
  • cmseq 1
  • protein coding genes 1
  • polymorphic sites 1
  • polymorphic 1
  • polymut 1
  • assembly curation 1
  • false duplications 1
  • duplicate purging 1
  • haplotype purging 1
  • porechop_abi 1
  • induce 1
  • gc_wiggle 1
  • seq 1
  • selection 1
  • header 1
  • interleave 1
  • sequence headers 1
  • grep 1
  • subseq 1
  • train 1
  • spliced 1
  • reduced 1
  • representations 1
  • mash/sketch 1
  • taxonomic assignment 1
  • estimate 1
  • reorder 1
  • HMMER 1
  • kallisto/index 1
  • quant 1
  • TCR 1
  • vcf 0
  • cram 0
  • structural variants 0
  • variant calling 0
  • statistics 0
  • variants 0
  • qc 0
  • gtf 0
  • cnv 0
  • variant 0
  • pacbio 0
  • sentieon 0
  • somatic 0
  • conversion 0
  • count 0
  • quality 0
  • VCF 0
  • ancient DNA 0
  • imputation 0
  • rnaseq 0
  • bisulfite 0
  • gvcf 0
  • bcftools 0
  • sv 0
  • kmer 0
  • bisulphite 0
  • imaging 0
  • methylseq 0
  • methylation 0
  • picard 0
  • wgs 0
  • visualisation 0
  • bqsr 0
  • cna 0
  • compression 0
  • openms 0
  • demultiplex 0
  • stats 0
  • antimicrobial resistance 0
  • serotype 0
  • depth 0
  • metrics 0
  • phage 0
  • mapping 0
  • plink2 0
  • tsv 0
  • 5mC 0
  • DNA methylation 0
  • WGBS 0
  • structure 0
  • bins 0
  • samtools 0
  • base quality score recalibration 0
  • aDNA 0
  • pangenome graph 0
  • neural network 0
  • pairs 0
  • scWGBS 0
  • markduplicates 0
  • expression 0
  • matrix 0
  • amr 0
  • machine learning 0
  • mappability 0
  • cooler 0
  • gzip 0
  • transcriptome 0
  • low-coverage 0
  • annotate 0
  • iCLIP 0
  • virus 0
  • bcf 0
  • phasing 0
  • completeness 0
  • aligner 0
  • checkm 0
  • bisulfite sequencing 0
  • biscuit 0
  • damage 0
  • palaeogenomics 0
  • germline 0
  • gene 0
  • archaeogenomics 0
  • genotype 0
  • peaks 0
  • evaluation 0
  • bismark 0
  • ucsc 0
  • prediction 0
  • hmmsearch 0
  • hmmer 0
  • decompression 0
  • genotyping 0
  • spatial 0
  • glimpse 0
  • mag 0
  • mkref 0
  • newick 0
  • ncbi 0
  • segmentation 0
  • dedup 0
  • complexity 0
  • gff3 0
  • json 0
  • prokaryote 0
  • scRNA-seq 0
  • bedGraph 0
  • splicing 0
  • pangenome 0
  • cnvkit 0
  • single 0
  • tumor-only 0
  • antimicrobial peptides 0
  • csv 0
  • deduplication 0
  • antimicrobial resistance genes 0
  • mitochondria 0
  • snp 0
  • low frequency variant calling 0
  • differential 0
  • demultiplexing 0
  • ptr 0
  • wxs 0
  • arg 0
  • HMM 0
  • reference-free 0
  • sourmash 0
  • indels 0
  • detection 0
  • merging 0
  • coptr 0
  • diversity 0
  • concatenate 0
  • deamination 0
  • cat 0
  • compare 0
  • de novo 0
  • single cell 0
  • text 0
  • mpileup 0
  • idXML 0
  • gridss 0
  • isolates 0
  • tabular 0
  • 3-letter genome 0
  • mutect2 0
  • de novo assembly 0
  • structural 0
  • amps 0
  • visualization 0
  • riboseq 0
  • svtk 0
  • fragment 0
  • ont 0
  • call 0
  • counts 0
  • summary 0
  • view 0
  • propr 0
  • haplotypecaller 0
  • malt 0
  • gsea 0
  • STR 0
  • compress 0
  • parsing 0
  • microarray 0
  • hic 0
  • redundancy 0
  • family 0
  • bedpe 0
  • bedgraph 0
  • ranking 0
  • logratio 0
  • clean 0
  • genome assembler 0
  • transcriptomics 0
  • CLIP 0
  • read depth 0
  • genmod 0
  • circrna 0
  • pypgx 0
  • interval_list 0
  • peak-calling 0
  • ampir 0
  • enrichment 0
  • bgzip 0
  • union 0
  • isomir 0
  • microsatellite 0
  • normalization 0
  • umitools 0
  • DNA sequencing 0
  • fusion 0
  • ccs 0
  • quantification 0
  • sequencing 0
  • mtDNA 0
  • snps 0
  • ATAC-seq 0
  • targeted sequencing 0
  • resistance 0
  • hybrid capture sequencing 0
  • chunk 0
  • copy number alteration calling 0
  • xeniumranger 0
  • chromosome 0
  • bigwig 0
  • diamond 0
  • preprocessing 0
  • fai 0
  • telomere 0
  • SV 0
  • sylph 0
  • ngscheckmate 0
  • archaeogenetics 0
  • ancestry 0
  • bcl2fastq 0
  • deep learning 0
  • image 0
  • fungi 0
  • miscoding lesions 0
  • public datasets 0
  • skani 0
  • BGC 0
  • matching 0
  • biosynthetic gene cluster 0
  • palaeogenetics 0
  • hmmcopy 0
  • gatk4spark 0
  • somatic variants 0
  • dist 0
  • SNP 0
  • comparison 0
  • lossless 0
  • bacterial 0
  • mzml 0
  • identity 0
  • pairsam 0
  • relatedness 0
  • subsample 0
  • entrez 0
  • fastk 0
  • structural_variants 0
  • pan-genome 0
  • pangolin 0
  • spaceranger 0
  • lineage 0
  • anndata 0
  • covid 0
  • UMI 0
  • observations 0
  • survivor 0
  • panel 0
  • wastewater 0
  • mapper 0
  • benchmarking 0
  • bim 0
  • duplication 0
  • PacBio 0
  • fam 0
  • rsem 0
  • hidden Markov model 0
  • cfDNA 0
  • population genomics 0
  • notebook 0
  • reports 0
  • prokka 0
  • krona chart 0
  • transposons 0
  • khmer 0
  • npz 0
  • krona 0
  • html 0
  • small indels 0
  • popscle 0
  • genotype-based deconvoltion 0
  • indel 0
  • kinship 0
  • shapeit 0
  • spark 0
  • miRNA 0
  • tabix 0
  • dictionary 0
  • ambient RNA removal 0
  • informative sites 0
  • fusions 0
  • score 0
  • genome assembly 0
  • transcripts 0
  • uLTRA 0
  • insert 0
  • variant_calling 0
  • ligate 0
  • minimap2 0
  • long_read 0
  • untar 0
  • uncompress 0
  • chimeras 0
  • unzip 0
  • zip 0
  • archiving 0
  • organelle 0
  • angsd 0
  • genome mining 0
  • bamtools 0
  • pileup 0
  • cool 0
  • png 0
  • proteome 0
  • repeat expansion 0
  • bwameth 0
  • cut up 0
  • das tool 0
  • das_tool 0
  • wig 0
  • prefetch 0
  • prokaryotes 0
  • chip-seq 0
  • comparisons 0
  • ataqv 0
  • image_analysis 0
  • mcmicro 0
  • highly_multiplexed_imaging 0
  • CRISPR 0
  • dump 0
  • arriba 0
  • combine 0
  • bakta 0
  • intervals 0
  • host 0
  • RNA-seq 0
  • converter 0
  • deeparg 0
  • C to T 0
  • roh 0
  • virulence 0
  • fingerprint 0
  • macrel 0
  • amplify 0
  • neubi 0
  • fcs-gx 0
  • scores 0
  • gene expression 0
  • regions 0
  • mkfastq 0
  • checkv 0
  • hi-c 0
  • atac-seq 0
  • genomes 0
  • PCA 0
  • DRAMP 0
  • microbes 0
  • windows 0
  • intersect 0
  • norm 0
  • long terminal repeat 0
  • normalize 0
  • intersection 0
  • kma 0
  • retrotransposons 0
  • checksum 0
  • scatter 0
  • megan 0
  • assembly evaluation 0
  • GC content 0
  • k-mer frequency 0
  • k-mer index 0
  • archive 0
  • lofreq 0
  • bloom filter 0
  • pharokka 0
  • reheader 0
  • xz 0
  • function 0
  • profiles 0
  • COBS 0
  • resolve_bioscience 0
  • spatial_transcriptomics 0
  • tree 0
  • salmon 0
  • BAM 0
  • rna-seq 0
  • regression 0
  • reformat 0
  • haplotypes 0
  • functional analysis 0
  • mapcounter 0
  • hlala_typing 0
  • hla_typing 0
  • hlala 0
  • hla 0
  • haplogroups 0
  • interactions 0
  • taxids 0
  • ichorcna 0
  • taxon name 0
  • zlib 0
  • pigz 0
  • find 0
  • differential expression 0
  • genetics 0
  • barcode 0
  • orf 0
  • region 0
  • interactive 0
  • krakenuniq 0
  • sizes 0
  • bases 0
  • homologs 0
  • krakentools 0
  • bustools 0
  • metamaps 0
  • awk 0
  • tbi 0
  • polyA_tail 0
  • refine 0
  • maximum likelihood 0
  • iphop 0
  • instrain 0
  • leviosam2 0
  • lift 0
  • homoploymer 0
  • deseq2 0
  • MSI 0
  • dict 0
  • varcal 0
  • MaltExtract 0
  • HOPS 0
  • authentication 0
  • soft-clipped clusters 0
  • edit distance 0
  • qualty 0
  • samples 0
  • fixmate 0
  • collate 0
  • taxon tables 0
  • secondary metabolites 0
  • bam2fq 0
  • NRPS 0
  • RiPP 0
  • antibiotics 0
  • antismash 0
  • rtgtools 0
  • vcflib 0
  • junctions 0
  • vg 0
  • salmonella 0
  • allele 0
  • FracMinHash sketch 0
  • join 0
  • signature 0
  • cancer genomics 0
  • snpsift 0
  • snpeff 0
  • effect prediction 0
  • small genome 0
  • de novo assembler 0
  • gwas 0
  • shigella 0
  • otu tables 0
  • svdb 0
  • switch 0
  • ancient dna 0
  • Streptococcus pneumoniae 0
  • standardization 0
  • taxonomic profile 0
  • standardise 0
  • standardisation 0
  • runs_of_homozygosity 0
  • polish 0
  • instability 0
  • microscopy 0
  • GPU-accelerated 0
  • trim 0
  • multiallelic 0
  • small variants 0
  • rgfa 0
  • tnhaplotyper2 0
  • graph layout 0
  • nextclade 0
  • orthology 0
  • parallelized 0
  • transcriptomic 0
  • mudskipper 0
  • concat 0
  • msisensor-pro 0
  • micro-satellite-scan 0
  • tumor 0
  • msi 0
  • cnvnator 0
  • proportionality 0
  • RNA-Seq 0
  • preseq 0
  • contig 0
  • simulate 0
  • artic 0
  • duplicate 0
  • Read depth 0
  • aggregate 0
  • Duplication purging 0
  • demultiplexed reads 0
  • purge duplications 0
  • library 0
  • ped 0
  • import 0
  • variant pruning 0
  • bfiles 0
  • SimpleAF 0
  • copyratios 0
  • image_processing 0
  • registration 0
  • mitochondrion 0
  • read-group 0
  • xenograft 0
  • serogroup 0
  • nacho 0
  • cgMLST 0
  • unaligned 0
  • mass spectrometry 0
  • orthologs 0
  • trgt 0
  • nanostring 0
  • fetch 0
  • GEO 0
  • sra-tools 0
  • fasterq-dump 0
  • sequence analysis 0
  • baf 0
  • pharmacogenetics 0
  • expansionhunterdenovo 0
  • repeat_expansions 0
  • cleaning 0
  • structural-variant calling 0
  • metadata 0
  • screening 0
  • tab 0
  • gem 0
  • metagenomes 0
  • eCLIP 0
  • WGS 0
  • long-read sequencing 0
  • doublets 0
  • corrupted 0
  • mRNA 0
  • realignment 0
  • microbial 0
  • deconvolution 0
  • allele-specific 0
  • bayesian 0
  • interval list 0
  • RNA sequencing 0
  • filtermutectcalls 0
  • mirdeep2 0
  • MCMICRO 0
  • calling 0
  • ome-tif 0
  • Pharmacogenetics 0
  • cvnkit 0
  • split_kmers 0
  • evidence 0
  • repeats 0
  • panelofnormals 0
  • cnv calling 0
  • CNV 0
  • joint genotyping 0
  • gatk 0
  • merge mate pairs 0
  • reads merging 0
  • short reads 0
  • correction 0
  • frame-shift correction 0
  • splice 0
  • settings 0
  • random forest 0
  • amptransformer 0
  • gene set 0
  • gene set analysis 0
  • eigenstrat 0
  • variation 0
  • samplesheet 0
  • human removal 0
  • validate 0
  • format 0
  • genome bins 0
  • blastp 0
  • phase 0
  • decontamination 0
  • ChIP-seq 0
  • gene labels 0
  • genomad 0
  • single cells 0
  • hostile 0
  • parse 0
  • heatmap 0
  • ampgram 0
  • eido 0
  • spatial_omics 0
  • concordance 0
  • spatialdata 0
  • melon 0
  • c to t 0
  • proteus 0
  • plant 0
  • mapad 0
  • hash sketch 0
  • signatures 0
  • setgt 0
  • readproteingroups 0
  • metabolomics 0
  • cell segmentation 0
  • SINE 0
  • adna 0
  • copy-number 0
  • jvarkit 0
  • remove samples 0
  • gender determination 0
  • scanner 0
  • copy number alterations 0
  • helitron 0
  • tar 0
  • unmarkduplicates 0
  • translate 0
  • leafcutter 0
  • copy number analysis 0
  • wham 0
  • fracminhash sketch 0
  • tarball 0
  • copy number variation 0
  • yahs 0
  • geo 0
  • recovery 0
  • relabel 0
  • bedcov 0
  • genome polishing 0
  • assembly polishing 0
  • chloroplast 0
  • confidence 0
  • blat 0
  • alr 0
  • clr 0
  • boxcox 0
  • tnscope 0
  • telseq 0
  • rRNA 0
  • Escherichia coli 0
  • stardist 0
  • propd 0
  • Read coverage histogram 0
  • bgen 0
  • groupby 0
  • eigenvectors 0
  • secondary structure 0
  • network 0
  • resegment 0
  • wget 0
  • wavefront 0
  • hicPCA 0
  • sliding 0
  • mgi 0
  • snakemake 0
  • workflow 0
  • morphology 0
  • ATACseq 0
  • workflow_mode 0
  • ATACshift 0
  • createreadcountpanelofnormals 0
  • shift 0
  • denoisereadcounts 0
  • readwriter 0
  • ribosomal RNA 0
  • dnamodelapply 0
  • dnascope 0
  • comp 0
  • whamg 0
  • mashmap 0
  • source tracking 0
  • decompress 0
  • vcf2bed 0
  • significance statistic 0
  • scanpy 0
  • rdtest 0
  • hwe 0
  • emoji 0
  • umicollapse 0
  • data-download 0
  • scRNA-Seq 0
  • gtftogenepred 0
  • controlstatistics 0
  • rdtest2vcf 0
  • countsvtypes 0
  • p-value 0
  • scvi 0
  • elprep 0
  • files 0
  • baftest 0
  • elfasta 0
  • ucsc/liftover 0
  • refflat 0
  • upd 0
  • doublet_detection 0
  • subsetting 0
  • fast5 0
  • references 0
  • modelsegments 0
  • polya tail 0
  • Mycobacterium tuberculosis 0
  • chromosomal rearrangements 0
  • genepred 0
  • missingness 0
  • quality_control 0
  • sequencing adapters 0
  • logFC 0
  • bedgraphtobigwig 0
  • bigbed 0
  • bedtobigbed 0
  • nucleotide content 0
  • uniparental 0
  • all versus all 0
  • spa 0
  • graph projection to vcf 0
  • nucBed 0
  • bclconvert 0
  • plotting 0
  • variantcalling 0
  • sccmec 0
  • streptococcus 0
  • targz 0
  • iterative model refinement 0
  • nuclear segmentation 0
  • spatype 0
  • barcodes 0
  • long read alignment 0
  • pangenome-scale 0
  • regtools 0
  • construct 0
  • DNA contamination estimation 0
  • disomy 0
  • metabolite annotation 0
  • snv 0
  • downsample 0
  • svtk/baftest 0
  • downsample bam 0
  • subsample bam 0
  • vcf2db 0
  • AT content 0
  • gemini 0
  • maf 0
  • lua 0
  • detecting svs 0
  • toml 0
  • solo 0
  • import segmentation 0
  • short-read sequencing 0
  • vcfbreakmulti 0
  • uniq 0
  • deduplicate 0
  • VCFtools 0
  • verifybamid 0
  • metaspace 0
  • check 0
  • decoy 0
  • genotype dosages 0
  • impute 0
  • 10x 0
  • hwe statistics 0
  • ribosomal 0
  • grabix 0
  • SNV 0
  • hwe equilibrium 0
  • reference-independent 0
  • Indel 0
  • bwameme 0
  • host removal 0
  • haploype 0
  • genotype likelihood 0
  • patterns 0
  • liftover 0
  • probabilistic realignment 0
  • seqfu 0
  • n50 0
  • bwamem2 0
  • guidetree 0
  • doublet 0
  • cell_type_identification 0
  • cell_phenotyping 0
  • machine_learning 0
  • hardy-weinberg 0
  • regex 0
  • AC/NS/AF 0
  • distance-based 0
  • circular 0
  • python 0
  • r 0
  • realign 0
  • quality check 0
  • coexpression 0
  • correlation 0
  • corpcor 0
  • assay 0
  • phylogenetics 0
  • minimum_evolution 0
  • nucleotide sequence 0
  • shuffleBed 0
  • GFF/GTF 0
  • size 0
  • trio binning 0
  • cram-size 0
  • selector 0
  • paraphase 0
  • transcription factors 0
  • regulatory network 0
  • tandem repeats 0
  • multi-tool 0
  • long read 0
  • predict 0
  • reference compression 0
  • vcflib/vcffixup 0
  • spot 0
  • nanopore sequencing 0
  • cell_barcodes 0
  • hhsuite 0
  • 16S 0
  • mygene 0
  • go 0
  • pile up 0
  • catpack 0
  • prepare 0
  • transposable element 0
  • generic 0
  • hmmpress 0
  • coreutils 0
  • rna velocity 0
  • cobra 0
  • gnu 0
  • extension 0
  • grea 0
  • hashing-based deconvoltion 0
  • hamming-distance 0
  • functional enrichment 0
  • paired reads merging 0
  • overlap-based merging 0
  • tag 0
  • Computational Immunology 0
  • omics 0
  • clahe 0
  • refresh 0
  • association 0
  • GWAS 0
  • case/control 0
  • associations 0
  • reference panel 0
  • spatial_neighborhoods 0
  • scimap 0
  • cellsnp 0
  • Bayesian 0
  • structural-variants 0
  • donor deconvolution 0
  • hmmscan 0
  • genotype-based demultiplexing 0
  • lexogen 0
  • biological activity 0
  • droplet based single cells 0
  • junction 0
  • Immune Deconvolution 0
  • Bioinformatics Tools 0
  • prior knowledge 0
  • phylogenies 0
  • busco 0
  • InterProScan 0
  • MMseqs2 0
  • quarto 0
  • variant-calling 0
  • staging 0
  • derived alleles 0
  • tnfilter 0
  • heterozygous genotypes 0
  • inbreeding 0
  • array_cgh 0
  • cytosure 0
  • Staging 0
  • gprofiler2 0
  • gost 0
  • rad 0
  • block substitutions 0
  • covariance model 0
  • haplotag 0
  • svg 0
  • structural variant 0
  • xml 0
  • run 0
  • script 0
  • bam2fastx 0
  • bam2fastq 0
  • dereplication 0
  • java 0
  • ancestral alleles 0
  • immcantation 0
  • mass_error 0
  • vcf file 0
  • poolseq 0
  • bgen file 0
  • plink2_pca 0
  • search engine 0
  • simulation 0
  • hmmfetch 0
  • decompose 0
  • identity-by-descent 0
  • decomposeblocksub 0
  • transmembrane 0
  • genome graph 0
  • site frequency spectrum 0
  • pca 0
  • tnseq 0
  • multiqc 0
  • mzML 0
  • pruning 0
  • htseq 0
  • linkage equilibrium 0
  • sompy 0
  • f coefficient 0
  • peak picking 0
  • homozygous genotypes 0
  • rank 0
  • airrseq 0
  • orthogroup 0
  • uq 0
  • isoform 0
  • joint-genotyping 0
  • variancepartition 0
  • genotypegvcf 0
  • dream 0
  • redundant 0
  • fix 0
  • extraction 0
  • featuretable 0
  • md 0
  • nm 0
  • plastid 0
  • malformed 0
  • paired reads re-pairing 0
  • short 0
  • sage 0
  • resfinder 0
  • resistance genes 0
  • raw 0
  • mgf 0
  • parquet 0
  • intron 0
  • parser 0
  • dbsnp 0
  • standardize 0
  • install 0
  • nanoq 0
  • hashing-based deconvolution 0
  • deep variant 0
  • co-orthology 0
  • updatedata 0
  • homology 0
  • microbial genomics 0
  • chip 0
  • tag2tag 0
  • sequence similarity 0
  • spectral clustering 0
  • tags 0
  • comparative genomics 0
  • partitioning 0
  • functional 0
  • Illumina 0
  • Read filters 0
  • mutect 0
  • idx 0
  • drep 0
  • drug categorization 0
  • Read report 0
  • agat 0
  • Read trimming 0
  • introns 0
  • longest 0
  • impute-info 0
  • assembler 0
  • constant 0
  • getpileupsummaries 0
  • short variant discovery 0
  • combinegvcfs 0
  • collectsvevidence 0
  • collectreadcounts 0
  • cnnscorevariants 0
  • calibratedragstrmodel 0
  • cross-samplecontamination 0
  • calculatecontamination 0
  • bedtointervallist 0
  • asereadcounter 0
  • vqsr 0
  • variant quality score recalibration 0
  • annotateintervals 0
  • condensedepthevidence 0
  • heattree 0
  • gatherbqsrreports 0
  • germlinecnvcaller 0
  • germline contig ploidy 0
  • panelofnormalscreation 0
  • jointgenotyping 0
  • genomicsdbimport 0
  • genomicsdb 0
  • tranche filtering 0
  • createsequencedictionary 0
  • filtervarianttranches 0
  • filterintervals 0
  • estimatelibrarycomplexity 0
  • duplication metrics 0
  • determinegermlinecontigploidy 0
  • createsomaticpanelofnormals 0
  • targets 0
  • gangstr 0
  • getpileupsumaries 0
  • public 0
  • ENA 0
  • SRA 0
  • ANI 0
  • faqcs 0
  • groupreads 0
  • str 0
  • cache 0
  • percent on target 0
  • endogenous DNA 0
  • Streptococcus pyogenes 0
  • duplexumi 0
  • unmapped 0
  • gene-calling 0
  • variant caller 0
  • gamma 0
  • UShER 0
  • bootstrapping 0
  • bacterial variant calling 0
  • germline variant calling 0
  • somatic variant calling 0
  • rust 0
  • ubam 0
  • fq 0
  • lint 0
  • single molecule 0
  • zipperbams 0
  • germlinevariantsites 0
  • readcountssummary 0
  • Imputation 0
  • gene model 0
  • tama_collapse.py 0
  • genomes on a tree 0
  • merge compare 0
  • GNU 0
  • joint-variant-calling 0
  • Haplotypes 0
  • gstama/merge 0
  • Sample 0
  • low coverage 0
  • gget 0
  • genome statistics 0
  • genome manipulation 0
  • genome summary 0
  • TAMA 0
  • Mykrobe 0
  • abricate 0
  • beagle 0
  • hbd 0
  • ibd 0
  • rgi 0
  • fARGene 0
  • amrfinderplus 0
  • extractvariants 0
  • GTDB taxonomy 0
  • extract_variants 0
  • gvcftools 0
  • gunzip 0
  • gunc 0
  • archaea 0
  • genome taxonomy database 0
  • gfastats 0
  • Salmonella Typhi 0
  • indexfeaturefile 0
  • preprocessintervals 0
  • shiftchain 0
  • selectvariants 0
  • revert 0
  • reblockgvcf 0
  • printsvevidence 0
  • printreads 0
  • postprocessgermlinecnvcalls 0
  • shiftintervals 0
  • snvs 0
  • mutectstats 0
  • mergebamalignment 0
  • leftalignandtrimvariants 0
  • readorientationartifacts 0
  • learnreadorientationmodel 0
  • shiftfasta 0
  • site depth 0
  • repeat content 0
  • file parsing 0
  • genome heterozygosity 0
  • genome size 0
  • models 0
  • compound 0
  • genome profile 0
  • bgc 0
  • txt 0
  • splitcram 0
  • gawk 0
  • variantrecalibrator 0
  • recalibration model 0
  • variantfiltration 0
  • svcluster 0
  • svannotate 0
  • splitintervals 0
  • split by chromosome 0
  • Haemophilus influenzae 0
  • illumiation_correction 0
  • BCF 0
  • csi 0
  • deduping 0
  • smaller fastqs 0
  • clumping fastqs 0
  • background_correction 0
  • element 0
  • biallelic 0
  • trimBam 0
  • bamUtil 0
  • bamtools/split 0
  • yaml 0
  • bamtools/convert 0
  • mouse 0
  • update header 0
  • homozygosity 0
  • virulent 0
  • chunking 0
  • subtract 0
  • slopBed 0
  • shiftBed 0
  • multinterval 0
  • overlapped bed 0
  • jaccard 0
  • autozygosity 0
  • overlap 0
  • closest 0
  • bamtobed 0
  • sorting 0
  • bacphlip 0
  • temperate 0
  • bioawk 0
  • amp 0
  • allele counts 0
  • nuclear contamination estimate 0
  • post Post-processing 0
  • model 0
  • AMPs 0
  • antimicrobial peptide prediction 0
  • Staphylococcus aureus 0
  • installation 0
  • affy 0
  • reference panels 0
  • admixture 0
  • adapterremoval 0
  • antimicrobial reistance 0
  • contiguate 0
  • doCounts 0
  • HLA 0
  • lifestyle 0
  • read group 0
  • autofluorescence 0
  • cycif 0
  • background 0
  • single-stranded 0
  • ancientDNA 0
  • authentict 0
  • bias 0
  • utility 0
  • ATLAS 0
  • sequencing_bias 0
  • post mortem damage 0
  • atlas 0
  • mkarv 0
  • http(s) 0
  • unionBedGraphs 0
  • file manipulation 0
  • deletion 0
  • Segmentation 0
  • cutesv 0
  • gct 0
  • cls 0
  • na 0
  • custom 0
  • Cores 0
  • TMA dearray 0
  • paired-end 0
  • UNet 0
  • mcool 0
  • genomic bins 0
  • makebins 0
  • enzyme 0
  • digest 0
  • pcr duplicates 0
  • track 0
  • cooler/balance 0
  • escherichia coli 0
  • circos 0
  • eklipse 0
  • eigenstratdatabasetools 0
  • pep 0
  • schema 0
  • PEP 0
  • depth information 0
  • corrrelation 0
  • structural variation 0
  • duphold 0
  • blastx 0
  • cumulative coverage 0
  • scatterplot 0
  • cload 0
  • subcontigs 0
  • sorted 0
  • compartments 0
  • multiomics 0
  • cellpose 0
  • hifi 0
  • Assembly 0
  • domains 0
  • topology 0
  • antibody capture 0
  • calder2 0
  • cadd 0
  • tblastn 0
  • subtyping 0
  • Salmonella enterica 0
  • antigen capture 0
  • crispr 0
  • nucleotide composition 0
  • concoct 0
  • partition histograms 0
  • target 0
  • export 0
  • antitarget 0
  • access 0
  • qa 0
  • chromosome_visualization 0
  • duplicate removal 0
  • chromap 0
  • quality assurnce 0
  • mitochondrial 0
  • haplotype resolution 0
  • invariant 0
  • cutoff 0
  • False duplications 0
  • Haplotype purging 0
  • panel of normals 0
  • purging 0
  • normal database 0
  • genomic intervals 0
  • intervals coverage 0
  • gene finding 0
  • contact maps 0
  • bmp 0
  • Assembly curation 0
  • quast 0
  • pretext 0
  • read_pairs 0
  • integrity 0
  • mapping-based 0
  • sequence-based 0
  • read distribution 0
  • inner_distance 0
  • fragment_size 0
  • experiment 0
  • neighbour-joining 0
  • strandedness 0
  • bamstat 0
  • R 0
  • rhocall 0
  • long uncorrected reads 0
  • subsampling 0
  • jpg 0
  • contact 0
  • pedfilter 0
  • sortvcf 0
  • PRO-cap 0
  • GRO-cap 0
  • CoPRO 0
  • tandem duplications 0
  • insertions 0
  • deletions 0
  • picard/renamesampleinvcf 0
  • NETCAGE 0
  • pcr 0
  • liftovervcf 0
  • mate-pair 0
  • hybrid-selection 0
  • phylogenetic composition 0
  • illumina datasets 0
  • CAGE 0
  • RAMPAGE 0
  • indep pairwise 0
  • pmdtools 0
  • variant genetic 0
  • scoring 0
  • identifiers 0
  • whole genome association 0
  • recode 0
  • indep 0
  • csRNA-seq 0
  • variant identifiers 0
  • exclude 0
  • genetic 0
  • GRO-seq 0
  • PRO-seq 0
  • STRIPE-seq 0
  • rtg 0
  • rocplot 0
  • prophage 0
  • sex determination 0
  • longread 0
  • de-novo 0
  • error 0
  • rare variants 0
  • relative coverage 0
  • genetic sex 0
  • 256 bit 0
  • bam2seqz 0
  • freqsum 0
  • pseudodiploid 0
  • pseudohaploid 0
  • random draw 0
  • sha256 0
  • shinyngs 0
  • POA 0
  • SNPs 0
  • predictions 0
  • dbnsfp 0
  • snippy 0
  • core 0
  • sniffles 0
  • SMN2 0
  • exploratory 0
  • SMN1 0
  • CRAM 0
  • sliding window 0
  • features 0
  • density 0
  • boxplot 0
  • rtg-tools 0
  • duplicate marking 0
  • repair 0
  • insert size 0
  • faidx 0
  • calmd 0
  • ampliconclip 0
  • amplicon 0
  • sambamba 0
  • read pairs 0
  • flagstat 0
  • multimapper 0
  • Ancestor 0
  • LCA 0
  • salsa2 0
  • salsa 0
  • paired 0
  • readgroup 0
  • applyvarcal 0
  • sertotype 0
  • variant recalibration 0
  • VQSR 0
  • assembly-binning 0
  • scramble 0
  • seacr 0
  • chromatin 0
  • cut&run 0
  • cut&tag 0
  • peak-caller 0
  • clusteridentifier 0
  • cluster analysis 0
  • identification 0
  • phantom peaks 0
  • gccounter 0
  • limma 0
  • peptide prediction 0
  • AMP 0
  • qualities 0
  • lofreq/filter 0
  • lofreq/call 0
  • Listeria monocytogenes 0
  • pneumophila 0
  • sgRNA 0
  • clinical 0
  • legionella 0
  • collapsing 0
  • adapter removal 0
  • functional genomics 0
  • CRISPR-Cas9 0
  • combining 0
  • MD5 0
  • mcr-1 0
  • mass-spectroscopy 0
  • metagenome-assembled genomes 0
  • maxbin2 0
  • maximum-likelihood 0
  • damage patterns 0
  • NGS 0
  • DNA damage 0
  • rra 0
  • kofamscan 0
  • megahit 0
  • panel_of_normals 0
  • multicut 0
  • genome browser 0
  • js 0
  • igv.js 0
  • igv 0
  • IDR 0
  • haemophilus 0
  • pixel_classification 0
  • pos 0
  • annotations 0
  • hmtnote 0
  • Hidden Markov Model 0
  • readcounter 0
  • pixel classification 0
  • probability_maps 0
  • kegg 0
  • pneumoniae 0
  • Klebsiella 0
  • effective genome size 0
  • k-mer counting 0
  • digital normalization 0
  • papermill 0
  • interproscan 0
  • jupytext 0
  • Jupyter 0
  • Python 0
  • jasmine 0
  • jasminesv 0
  • insertion 0
  • genomic islands 0
  • 128 bit 0
  • denovo 0
  • ChIP-Seq 0
  • graph formats 0
  • block-compressed 0
  • HLA-I 0
  • ILP 0
  • hla-typing 0
  • tumor/normal 0
  • graph viz 0
  • graph unchopping 0
  • flip 0
  • graph stats 0
  • combine graphs 0
  • odgi 0
  • squeeze 0
  • graph drawing 0
  • graph construction 0
  • PCR/optical duplicates 0
  • upper-triangular matrix 0
  • Neisseria gonorrhoeae 0
  • pbmerge 0
  • motif 0
  • pedigrees 0
  • read 0
  • pair-end 0
  • pbp 0
  • subreads 0
  • pbbam 0
  • ligation junctions 0
  • graphs 0
  • paragraph 0
  • select 0
  • restriction fragments 0
  • pairstools 0
  • pairtools 0
  • gender 0
  • ngm 0
  • debruijn 0
  • ploidy 0
  • microrna 0
  • de Bruijn 0
  • mbias 0
  • methylation bias 0
  • metaphlan 0
  • unionsum 0
  • smudgeplot 0
  • mitochondrial genome 0
  • Merqury 0
  • contour map 0
  • 3D heat map 0
  • Neisseria meningitidis 0
  • rma6 0
  • daa 0
  • target prediction 0
  • reference genome 0
  • NextGenMap 0
  • GATK UnifiedGenotyper 0
  • sequencing summary 0
  • mobile element insertions 0
  • somatic structural variations 0
  • cancer genome 0
  • contaminant 0
  • SNP table 0
  • Beautiful stand-alone HTML report 0
  • mosdepth 0
  • bioinformatics tools 0
  • mitochondrial to nuclear ratio 0
  • ratio 0
  • mtnucratio 0
  • scan 0
  • microsatellite instability 0
  • otu table 0

This script extracts sequences in fasta format according to features described in a gff file.

0100

fasta versions

agat:

Another Gff Analysis Toolkit (AGAT). Suite of tools to handle gene annotations in any GTF/GFF format.

Identify antimicrobial resistance in gene or protein sequences

010

report mutation_report versions tool_version db_version

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

Identify antimicrobial resistance in gene or protein sequences

NO input

db versions

amrfinderplus:

AMRFinderPlus finds antimicrobial resistance and other genes in protein or nucleotide sequences.

A module to translate BCR and TCR nucleotide sequences into amino acid sequences using amulety and igblast.

010

repertoire_translated versions

amulety:

Python package to create embeddings of BCR and TCR amino acid sequences.

igblast:

A tool for immunoglobulin (IG, BCR) and T cell receptor (TCR) V domain sequences blasting.

This module is used to clip primer sequences from your alignments.

0123

bam bai versions

barrnap uses a hmmer profile to find rrnas in reads or contig fasta files

012

gff versions

Filter out sequences by sequence header name(s)

01000

reads log versions

bbmap:

BBMap is a short read aligner, as well as various other bioinformatic tools.

Computes histograms (default), per-base reports (-d) and BEDGRAPH (-bg) summaries of feature coverage (e.g., aligned sequences) for a given genome.

012000

genomecov versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

extract sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

masks sequences in a FASTA file based on intervals defined in a feature file.

010

fasta versions

bedtools:

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Retrieve entries from a BLAST database

01201

fasta text versions

blast:

BLAST finds regions of similarity between biological sequences.

Queries a BLAST DNA database

0101

txt versions

blast:

BLAST finds regions of similarity between biological sequences.

Builds a BLAST database

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Downloads a BLAST database from NCBI

01

db versions

blast:

BLAST finds regions of similarity between biological sequences.

Align reads to a reference genome using bowtie

01010

bam log fastq versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create bowtie index for reference genome

01

index versions

bowtie:

bowtie is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Align reads to a reference genome using bowtie2

01010100

sam bam cram csi crai log fastq versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Builds bowtie index for reference genome

01

index versions

bowtie2:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Re-estimate taxonomic abundance of metagenomic samples analyzed by kraken.

010

reports txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Extends a Kraken2 database to be compatible with Bracken

01

db bracken_files versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Combine output of metagenomic samples analyzed by bracken.

01

txt versions

bracken:

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

Find SA coordinates of the input reads for bwa short-read mapping

0101

sai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA index for reference genome

01

index versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

bam cram csi crai versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert paired-end bwa SA coordinate files to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Convert bwa SA coordinate file to SAM format

01201

bam versions

bwa:

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create BWA-mem2 index for reference genome

01

index versions

bwamem2:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Performs fastq alignment to a fasta reference using BWA

0101010

sam bam cram crai csi versions

bwa:

BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101

txt versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. MAGs / bins).

0101010101

orf2lca bin2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification of long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101

orf2lca contig2classification log diamond faa gff versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Taxonomic classification plus read-based abundance estimation from long DNA sequences and metagenome assembled genomes (e.g. contigs, MAGs / bins).

0101010101001010101010101

rat_log complete_abundance contig_abundance read2classification alignment_diamond contig2classification cat_log orf2lca faa gff unmapped_diamond unmapped_fasta unmapped2classification versions

catpack:

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Cluster protein sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Cluster nucleotide sequences using sequence similarity

01

fasta clusters versions

cdhit:

Clusters and compares protein or nucleotide sequences

Module to build the VDJ reference needed by the 10x Genomics Cell Ranger tool. Uses the cellranger mkvdjref command.

0000

reference versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Module to use Cell Ranger's pipelines analyze sequencing data produced from Chromium Single Cell Immune Profiling.

010

outs versions

cellranger:

Cell Ranger processes data from 10X Genomics Chromium kits. cellranger vdj takes FASTQ files from cellranger mkfastq or bcl2fastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

Build centrifuge database for taxonomic profiling

010000

cf versions

centrifuge:

Classifier for metagenomic sequences

Classifies metagenomic sequence data

01000

report results sam fastq_mapped fastq_unmapped versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

Creates Kraken-style reports from centrifuge out files

010

kreport versions

centrifuge:

Centrifuge is a classifier for metagenomic sequences.

binning of metagenomic sequences

01

fasta bins fm index links result versions

Align sequences using Clustal Omega

010100000

alignment versions

clustalo:

Latest version of Clustal: a multiple sequence alignment program for DNA or proteins

pigz:

Parallel implementation of the gzip algorithm.

Calculates polymorphic site rates over protein coding genes

01234

polymut versions

cmseq:

Set of utilities on sequences and BAM files

Perform adapter/quality trimming on sequencing reads

01

reads log versions

cuatadapt:

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

calculate clusters of highly similar sequences

01

tsv versions

diamond:

Accelerated BLAST compatible local sequence aligner

Export assembly segment sequences in GFA 1.0 format to FASTA format

01

fasta versions

dshbio:

Reads, features, variants, assemblies, alignments, genomic range trees, pangenome graphs, and a bunch of random command line tools for bioinformatics. LGPL version 3 or later.

the revseq program from emboss reverse complements a nucleotide sequence

01

revseq versions

emboss:

The European Molecular Biology Open Software Suite

Reads in one or more sequences, converts, filters, or transforms them and writes them out again

010

outseq versions

emboss:

The European Molecular Biology Open Software Suite

phylogenetic placement of query sequences in a reference tree

012300

epang jplace log versions

epang:

Massively parallel phylogenetic placement of genetic sequences

splits an alignment into reference and query parts

012

query reference versions

epang:

Massively parallel phylogenetic placement of genetic sequences

Aligns sequences using FAMSA

01010

alignment versions

famsa:

Algorithm for large-scale multiple sequence alignments

tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output.

010

log txt hmm hmm_genes orfs orfs_amino contigs contigs_pept filtered filtered_pept fragments trimmed spades metagenome tmp versions

A program that counts sequence occurrences in FASTQ files.

0101

count_matrix stats distribution_plot reads_plot reads_plot_percentage versions

2FAST2Q:

2FAST2Q is ideal for CRISPRi-Seq, and for extracting and counting any kind of information from reads in the fastq format, such as barcodes in Bar-seq experiments. 2FAST2Q can work with sequence mismatches, Phred-score, and be used to find and extract unknown sequences delimited by known sequences. 2FAST2Q can extract multiple features per read using either fixed positions or delimiting search sequences.

Build fastq screen config file from bowtie index files

00

database versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Align reads to multiple reference genomes using fastq-screen

010

txt png html fastq versions

fastqscreen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

01

fasta versions

fastx:

A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing

Uses FGBIO CallDuplexConsensusReads to call duplex consensus sequences from reads generated from the same double-stranded source molecule.

0100

bam versions

fgbio:

A set of tools for working with genomic and high throughput sequencing data, including UMIs

Calls consensus sequences from reads with the same unique molecular tag.

0100

bam versions

fgbio:

Tools for working with genomic and high throughput sequencing data.

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina. While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome. This requires a seed (--seed) to be supplied in ext.args.

0

fastq versions

fq:

fq is a library to generate and validate FASTQ file pairs.

Build ganon database using custom reference sequences.

01000

db info versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Classify FASTQ files against ganon database

010

tre report one all unc log versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a ganon report file from the output of ganon classify

010

tre versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

Generate a multi-sample report file from the output of ganon report runs

01

txt versions

ganon:

ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently

assigns taxonomy to query sequences in phylogenetic placement output

012

examineassign profile labelled_tree per_query krona sativa versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

Grafts query sequences from phylogenetic placement on the reference tree

01

newick versions

gappa:

Genesis Applications for Phylogenetic Placement Analysis

This tool looks for low-complexity STR sequences along the reference that are later used to estimate the Dragstr model during single sample auto calibration CalibrateDragstrModel.

000

str_table versions

gatk4:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0100

sam versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0

index versions

graphmap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Helper script, remove remaining polyA sequences from Full Length Non Chimeric reads (Pacbio isoseq3)

01

fasta report tails versions

gstama:

Gene-Switch Transcriptome Annotation by Modular Algorithms

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.

0

fasta gff vcf stats phylip embl_predicted embl_branch tree tree_labelled versions

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

012340101010101

summary_csv roc_all_csv roc_indel_locations_csv roc_indel_locations_pass_csv roc_snp_locations_csv roc_snp_locations_pass_csv extended_csv runinfo metrics_json vcf tbi versions

happy:

Haplotype VCF comparison tools

hmmalign from the HMMER suite aligns a number of sequences to an HMM profile

010

sto versions

hmmer:

Biosequence analysis using profile hidden Markov models

Detect integrons in DNA sequences

01

gbk integrons summary out versions

IsoSeq - Cluster - Cluster trimmed consensus sequences

01

bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi versions

isoseq:

IsoSeq - Cluster - Cluster trimmed consensus sequences

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

metabam

meta version bam pbi cluster cluster_report transcriptset hq_bam hq_pbi lq_bam lq_pbi singletons_bam singletons_pbi

isoseq3:

IsoSeq3 - Cluster - Cluster trimmed consensus sequences

Trim primer sequences rom a BAM file with iVar

0120

bam log versions

ivar:

iVar - a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Aligns sequences using kalign

010

alignment versions

kalign:

Kalign is a fast and accurate multiple sequence alignment algorithm.

Create kallisto index

01

index versions

kallisto:

Quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Computes equivalence classes for reads and quantifies abundances

01010000

results json_info log versions

kallisto:

Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Generate k-mers (sketches) from FASTA/Q sequences

01

outdir info versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Construct KMCP database from k-mer files

01

kmcp log versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Merge search results from multiple databases.

01

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Generate taxonomic profile from search results

010

profile versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Search sequences against database

010

result versions

kmcp:

Accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

Adds fasta files to a Kraken2 taxonomic database

010000

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Builds Kraken2 database

010

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Downloads and builds Kraken2 standard database

0

db versions

kraken2:

Kraken2 is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Makes a dotplot (Oxford Grid) of pair-wise sequence alignments

0120100

gif png versions

last:

LAST finds & aligns related regions of sequences.

Aligns query sequences to target sequences indexed with lastdb

0120

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Prepare sequences for subsequent alignment with lastal.

01

index versions

last:

LAST finds & aligns related regions of sequences.

Converts MAF alignments in another format.

012010101

axt_gz bam blast_gz blasttab_gz chain_gz cram gff_gz html_gz psl_gz sam_gz tab_gz versions

last:

LAST finds & aligns related regions of sequences.

Reorder alignments in a MAF file

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Post-alignment masking

01

maf versions

last:

LAST finds & aligns related regions of sequences.

Find split or spliced alignments in a MAF file

01

maf multiqc versions

last:

LAST finds & aligns related regions of sequences.

Find suitable score parameters for sequence alignment

010

param_file multiqc versions

last:

LAST finds & aligns related regions of sequences.

Align sequences using learnMSA

01

alignment versions

learnmsa:

learnMSA: Learning and Aligning large Protein Families

LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. This module is for LINKS >=2.0.0 and does not support MPET input.

0101

log pairing_distribution pairing_issues scaffolds_csv scaffolds_fasta bloom scaffolds_graph assembly_correspondence simplepair_checkpoint tigpair_checkpoint versions

Finds full-length LTR retrotranspsons in genome sequences using the parallel version of LTR_Finder

01

scn gff versions

LTR_FINDER_parallel:

A Perl wrapper for LTR_FINDER

LTR_Finder:

An efficient program for finding full-length LTR retrotranspsons in genome sequences

Multiple sequence alignment using MAFFT

0101010101010

fas versions

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

pigz:

Parallel implementation of the gzip algorithm.

Guide tree rendering using MAFFT

01

tree versions

mafft:

Multiple alignment program for amino acid or nucleotide sequences based on fast Fourier transform

Calculate Mash distances between reference and query sequences

010

dist versions

mash:

Fast sequence distance estimator that uses MinHash

Screens query sequences against large sequence databases

0101

screen versions

mash:

Fast sequence distance estimator that uses MinHash

Creates vastly reduced representations of sequences using MinHash

01

mash stats versions

mash:

Fast sequence distance estimator that uses MinHash

mdust from DFCI Gene Indices Software Tools for masking low-complexity DNA sequences

01

fasta versions

A tool to create consensus sequences and variant calls from nanopore sequencing data

012

assembly versions

Metagenome assembler for long-read sequences (HiFi and ONT).

010

contigs log versions

metamdbg:

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

01010000

paf bam index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

Provides fasta index required by minimap2 alignment.

01

index versions

minimap2:

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

A versatile pairwise aligner for genomic and spliced nucleotide sequences

0101

paf gff versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

Provides fasta index required by miniprot alignment.

01

index versions

miniprot:

A versatile pairwise aligner for genomic and protein sequences.

A tool for quality control and tracing taxonomic origins of microRNA sequencing data

0120

html json tsv all_fa rnatype_unknown_fa versions

mirtrace:

miRTrace is a new quality control and taxonomic tracing tool developed specifically for small RNA sequencing data (sRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). In addition to these routine quality control (QC) analyses, miRTrace can accurately and sensitively resolve taxonomic origins of small RNA-Seq data based on the composition of clade-specific miRNAs. This feature can be used to detect cross-clade contaminations in typical lab settings. It can also be applied for more specific applications in forensics, food quality control and clinical diagnosis, for instance tracing the origins of meat products or detecting parasitic microRNAs in host serum.

Cluster sequences using MMSeqs2 cluster.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Searches for the sequences of a fasta file in a database using MMseqs2

0101

tsv versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Cluster sequences in linear time using MMSeqs2 linclust.

01

db_cluster versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Search and calculate a score for similar sequences in a query and a target database.

0101

db_search versions

mmseqs:

MMseqs2: ultra fast and sensitive sequence search and clustering suite

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options are provided that give you the choice of optimizing accuracy, speed, or some compromise between the two

01

aligned_fasta phyi phys clustalw html msf tree log versions

Muscle is a program for creating multiple alignments of amino acid or nucleotide sequences. This particular module uses the super5 algorithm for very big alignments. It can permutate the guide tree according to a set of flags.

010

alignment versions

muscle -super5:

Muscle v5 is a major re-write of MUSCLE based on new algorithms.

pigz:

Parallel implementation of the gzip algorithm.

NCBI tool for detecting vector contamination in nucleic acid sequences. This tool is older than NCBI's FCS-adaptor, which is for the same purpose

0101

vecscreen_output versions

ncbitools:

"NCBI libraries for biology applications (text-based utilities)"

NUCmer is a pipeline for the alignment of multiple closely related nucleotide sequences.

012

delta coords versions

Paraclu finds clusters in data attached to sequences.

010

bed versions

Identify plasmids in bacterial sequences and assemblies

01

json txt tsv genome_seq plasmid_seq versions

Extension of Porechop whose purpose is to process adapter sequences in ONT reads.

010

reads log versions

Separates out sequences purged of falsely duplicated sequences.

012

haplotigs purged versions

purgedups:

Purge_dups is a package used to purge haplotigs and overlaps in an assembly based on read depth

Homology-based assembly patching: Make continuous joins and fill gaps in 'target.fa' using sequences from 'query.fa'

01010101

patch_fasta patch_agp patch_components_fasta assembly_alignments target_splits_agp target_splits_fasta qry_rename_agp qry_rename_fasta stderr versions

ragtag:

Fast reference-guided genome assembly scaffolding

Scaffolding is the process of ordering and orienting draft assembly (query) sequences into longer sequences. Gaps (stretches of "N" characters) are placed between adjacent query sequences to indicate the presence of unknown sequence. RagTag uses whole-genome alignments to a reference assembly to scaffold query sequences. RagTag does not alter input query sequence in any way and only orders and orients sequences, joining them with gaps.

010101012

corrected_assembly corrected_agp corrected_stats versions

ragtag:

Fast reference-guided genome assembly scaffolding

Screening DNA sequences for interspersed repeats and low complexity DNA sequences

010

masked out tbl gff versions

repeatmasker:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

Seqcluster collapse reduces computational complexity by collapsing identical sequences in a FASTQ file.

01

fastq versions

seqcluster:

Small RNA analysis from NGS data. Seqcluster generates a list of clusters of small RNA sequences, their genome location, their annotation and the abundance in all the sample of the project.

Dereplicate FASTX sequences, removing duplicate sequences and printing the number of identical sequences in the sequence header. Can dereplicate already dereplicated FASTA files, summing the numbers found in the headers.

01

fasta versions

seqfu:

DNA sequence utilities for FASTX files

Select sequences from a large file based on name/ID

010

filter versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Subset FASTA/FASTQ files to some number of sequences

012

subset versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Use seqkit to find/replace strings within sequences and sequence headers

01

fastx versions

seqkit:

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation, written by Wei Shen.

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx log versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...)

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Sorts sequences by id/name/sequence/length

01

fastx versions

seqkit:

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Generates a BED file containing genomic locations of lengths of N.

01

bed versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Interleave pair-end reads from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk mergepe command merges pair-end reads into one interleaved file.

Rename sequence names in FASTQ or FASTA files.

01

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk rename command renames sequence names.

Subsample reads from FASTQ files

012

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Seqtk sample command subsamples sequences.

Common transformation operations on FASTA or FASTQ files.

01

fastx versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The seqtk seq command enables common transformation operations on FASTA or FASTQ files.

Select only sequences that match the filtering condition

010

sequences versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Trim low quality bases from FastQ files

01

reads versions

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Sequenza-utils gc_wiggle computes the GC percentage across the sequences, and returns a file in the UCSC wiggle format, given a fasta file and a window size.

01

wig versions

sequenzautils:

Sequenza-utils provides 3 main command line programs to transform common NGS file format - such as FASTA, BAM - to input files for the Sequenza R package. The program -gc_wiggle- takes fasta file as an input, computes GC percentage across the sequences and returns a file in the UCSC wiggle format.

Induce a variation graph in GFA format from alignments in PAF format

012

gfa versions

seqwish:

seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments.

Short Read Sequence Typing for Bacterial Pathogens is a program designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

012

gene_results fullgene_results mlst_results pileup sorted_bam versions

srst2:

Short Read Sequence Typing for Bacterial Pathogens

Align reads to a reference genome using STAR

010101000

log_final log_out log_progress versions bam bam_sorted bam_sorted_aligned bam_transcript bam_unsorted fastq tab spl_junc_tab read_per_gene_tab junction sam wig bedgraph

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Create index for STAR

0101

index versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Get the minimal allowed index version from STAR

NO input

index_version versions

star:

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Aligns sequences using T_COFFEE

01010120

alignment lib versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Computes a consensus alignment using T_COFFEE

01010

alignment eval versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats the header of PDB files with t-coffee

01

formatted_pdb versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Aligns sequences using the regressive algorithm as implemented in the T_COFFEE package

01010120

alignment versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

pigz:

Parallel implementation of the gzip algorithm.

Reformats files with t-coffee

01

formatted_file versions

tcoffee:

A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.

Domain-level classification of contigs to bacterial, archaeal, eukaryotic, or organelle

01

classifications log fasta versions

tiara:

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

TransDecoder identifies candidate coding regions within transcript sequences. it is used to build gff file.

01

pep gff3 cds dat folder versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies candidate coding regions within transcript sequences. It is used to build gff file. You can use this module after transdecoder_longorf

010

pep gff3 cds bed versions

transdecoder:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Detection of tRNA sequences using covariance models

01

tsv log stats fasta gff bed versions

calculate secondary structures of two RNAs with dimerization

01

rnacofold_csv rnacofold_ps versions

viennarna:

calculate secondary structures of two RNAs with dimerization

The program works much like RNAfold, but allows one to specify two RNA sequences which are then allowed to form a dimer structure. RNA sequences are read from stdin in the usual format, i.e. each line of input corresponds to one sequence, except for lines starting with > which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch) Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and โ€œdot plotโ€ files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. The program will continue to read new sequences until a line consisting of the single character @ or an end of file condition is encountered.

Predict RNA secondary structure using the ViennaRNA RNAfold tools. Calculate minimum free energy secondary structures and partition function of RNAs.

01

rnafold_txt rnafold_ps versions

viennarna:

Calculate minimum free energy secondary structures and partition function of RNAs

The program reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. If not specified differently using commandline arguments, input is accepted from stdin or read from an input file, and output printed to stdout. If the -p option was given it also computes the partition function (pf) and base pairing probability matrix, and prints the free energy of the thermodynamic ensemble, the frequency of the mfe structure in the ensemble, and the ensemble diversity to stdout.

Extracting sequences that were unbinnned by vRhyme into a FASTA file

0101

unbinned_sequences versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Linking bins output by vRhyme to create one sequences per bin

01

linked_bins versions

vrhyme:

vRhyme functions by utilizing coverage variance comparisons and supervised machine learning classification of sequence features to construct viral metagenome-assembled genomes (vMAGs).

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

01

aln biom mothur otu bam out blast uc centroids clusters profile msa versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Merge strictly identical sequences contained in filename. Identical sequences are defined as having the same length and the same string of nucleotides (case insensitive, T and U are considered the same).

01

fasta clustering log versions

vsearch:

A versatile open source tool for metagenomics (USEARCH alternative)

Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

01

fasta log versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Taxonomic classification using the sintax algorithm.

010

tsv versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Sort fasta entries by decreasing abundance (--sortbysize) or sequence length (--sortbylength).

010

fasta versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Compare target sequences to fasta-formatted query sequences using global pairwise alignment.

010000

aln biom lca mothur otu sam tsv txt uc versions

vsearch:

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)

Masks out highly repetitive DNA sequences with low complexity in a genome

01

converted versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to generate frequency counts of repetitive units.

01

counts versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

A program to take a counts file and creates a file of genomic co-ordinates to be masked.

0101

intervals versions

windowmasker:

A program to mask highly repetitive and low complexity DNA sequences within a genome.

Click here to trigger an update.