modules/
metacache_build

Taxonomic profiling database building with MetaCache

genomics metagenomics taxonomy short reads long reads kmer k-mer metacache build reference

https://github.com/nf-core/modules/tree/master/modules/nf-core/metacache/build

Description

Taxonomic profiling database building with MetaCache

Input

Name

Description

Pattern

`0` ()

`1` ()

`taxonomy` (file)

NCBI taxonomy formatted files nodes.dmp and names.dmp

{names,nodes,merged}.dmp

`seq2taxid` (file)

NCBI-style 'accession2taxid' tab-separated file with 3 or 4 columns: accession, accession_version, taxid, and gid (optional)

*

Output

Name

Description

Pattern

`0` ()

Tools

metacache Documentation

MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin. It aims to reduce the memory requirement usually associated with k-mer based methods while retaining their speed. MetaCache uses locality sensitive hashing to quickly identify candidate regions within one or multiple reference genomes. A read is then classified based on the similarity to those regions.

For an independent comparison to other tools in terms of classification accuracy see the LEMMI benchmarking site.

The latest version of MetaCache classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs.

https://github.com/muellan/metacacheLicense: GPL v3-or-later