Introduction
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- GFASTATS - Collect statistics on the curated primary assembly
- MERQURYFK - Generate kmer plots for the curated assembly using previous run information
- SANGER_TOL_BTK - Run Blobtoolkit to generate plots and short_summary.txt from BUSCO.
- SANGER_TOL_CPRETEXT - Run Curationpretext to generate Pretext files and accessory tracks.
- Pipeline information - Report metrics generated during the workflow execution
GFASTATS
Output files
gfastats/
*.assembly.summary
: Assembly metrics of the input primary file.
*_fasta.gz
: GZipped primary assembly file.
Output files
gfastats/
*.assembly.summary
: Assembly metrics of the input primary file.*_fasta.gz
: GZipped primary assembly file.
GFASTATS is a single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
MERQURYFK
Output files
merquryfk/
*.completeness.stats
:
*{"primary","haplotype",""}_only.bed
:
*{"primary","haplotype",""}.qv
:
*.spectra-asm.{fl,ln,st}.png
:
*{"primary","haplotype"}.spectra-cn.{fl,ln,st}.png
:
Output files
merquryfk/
*.completeness.stats
:*{"primary","haplotype",""}_only.bed
:*{"primary","haplotype",""}.qv
:*.spectra-asm.{fl,ln,st}.png
:*{"primary","haplotype"}.spectra-cn.{fl,ln,st}.png
:
MERQURYFK is a FastK based version of Merqury.
Merqury is a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness.
SANGER_TOL_BTK
Output files
sanger/*_blobtoolkit_out/
blobtoolkit/plots/*png
: Blobtoolkit plots
blobtoolkit/{ASSEMBLY_NAME}/*.json.gz
: Blobtoolkit dataset for use in BTK_viewer.
busco/*_odb10/*.{tsv,tar.gz,json,txt}
: Busco output
muliqc/
: MultiQC plots/data and report.html.
pipeline_info
Output files
sanger/*_blobtoolkit_out/
blobtoolkit/plots/*png
: Blobtoolkit plotsblobtoolkit/{ASSEMBLY_NAME}/*.json.gz
: Blobtoolkit dataset for use in BTK_viewer.busco/*_odb10/*.{tsv,tar.gz,json,txt}
: Busco outputmuliqc/
: MultiQC plots/data and report.html.pipeline_info
SANGER_TOL_BTK is a bioinformatics pipeline that can be used to identify and analyse non-target DNA for eukaryotic genomes.
SANGER_TOL_CPRETEXT
Output files
sanger/*_curationpretext_out/
accessory_files/*.{bigWig,bed,bedgraph}
: Track files describing Telomere, gap, coverage data across the genome.
pretext_maps_raw
: Pre-accessory file ingestion pretext files.
pretext_maps_processed
: Post-accessory file ingestion pretext files, e.g. the final output.
pipeline_info
Output files
sanger/*_curationpretext_out/
accessory_files/*.{bigWig,bed,bedgraph}
: Track files describing Telomere, gap, coverage data across the genome.pretext_maps_raw
: Pre-accessory file ingestion pretext files.pretext_maps_processed
: Post-accessory file ingestion pretext files, e.g. the final output.pipeline_info
SANGER_TOL_CPRETEXT is a bioinformatics pipeline typically used in conjunction with TreeVal to generate pretext maps (and optionally telomeric, gap, coverage, and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
, execution_timeline.html
, execution_trace.txt
and pipeline_dag.dot
/pipeline_dag.svg
.
- Reports generated by the pipeline:
pipeline_report.html
, pipeline_report.txt
and software_versions.yml
. The pipeline_report*
files will only be present if the --email
/ --email_on_fail
parameter's are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Parameters used by the pipeline run:
params.json
.
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.