Description

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

Input

Name
Description
Pattern

meta (map)

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

bam (file)

Sorted BAM file

*.{bam}

fasta (file)

The reference fasta file

*.fasta

fai (file)

Index of reference fasta file

*.fasta.fai

dict (file)

GATK sequence dictionary

*.dict

Output

Name
Description
Pattern

meta (map)

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

versions (file)

File containing software versions

versions.yml

output (file)

Marked duplicates BAM/CRAM file

*.{bam,cram}

bam_index (file)

Optional BAM index file

*.bai

Tools

gatk4 Documentation

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.