Introduction
This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- VCF index - Create index of input VCF/gVCF files
- VCF processing - process VCF/gVCF files to analyse SNP density, InDel sizes, per-base nucleotide diversity, ROH, heterozygosity, and allele frequency
- Pipeline information - Report metrics generated during the workflow execution
VCF index
Use Bgzip
to generate index
Output files
- Index in
tbi
format:<filename>.tbi
VCF processing
Output files
- Single nucleotide polymorphisms (SNP) density calculated by
VCFtools
: <filename>.snpden
.
- Insertion or deletion (InDel) sizes calculated by
VCFtools
, output a histogram file: <filename>.indel.hist
.
- Per-site (per-base) nucleotide diversity calculated by
VCFtools
: <filename>.sites.pi
.
- Runs of homozygosity (ROH) generated by
BCFtools
: <filename>.roh
.
- Heterozygosity generated by
VCFtools
: <filename>.het
.
- Allele frequency calculated by
VCFtools
: <filename>.frq
.
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
, execution_timeline.html
, execution_trace.txt
and pipeline_dag.dot
/pipeline_dag.svg
.
- Reports generated by the pipeline:
pipeline_report.html
, pipeline_report.txt
and software_versions.yml
. The pipeline_report*
files will only be present if the --email
/ --email_on_fail
parameter's are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Parameters used by the pipeline run:
params.json
.
Output files
- Single nucleotide polymorphisms (SNP) density calculated by
VCFtools
:<filename>.snpden
. - Insertion or deletion (InDel) sizes calculated by
VCFtools
, output a histogram file:<filename>.indel.hist
. - Per-site (per-base) nucleotide diversity calculated by
VCFtools
:<filename>.sites.pi
. - Runs of homozygosity (ROH) generated by
BCFtools
:<filename>.roh
. - Heterozygosity generated by
VCFtools
:<filename>.het
. - Allele frequency calculated by
VCFtools
:<filename>.frq
.
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.