Introduction
sanger-tol/zippypretext is a bioinformatics pipeline designed to generate a Hi-C Pretext map using existing mapping information. The pipeline and method were developed primarily to assist curation work in the Tree of Life project. In this approach, users do not need to perform Hi-C read alignment with using minimap2 or bwa-mem2. Instead, a newly prepared AGP file is used as the main input, reflecting the updated sequence locations after editing the Pretext map. This pipeline can significantly save time, as no new mapping is required.
Pipeline summary
- Convert the Pretext AGP file to the correct format using PRETEXT_TO_ASM.
- Use JUICERC to extract the rearranged alignment information.
- Generate interaction pairs based on the new alignment information using MAKE_PAIRS.
- Use the pair information to generate a new Pretext map with PRETEXTMAP
Usage
[!NOTE]
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile test
before running the workflow on actual data.
First, prepare a pretext agp (direct output from pretext view), looks as follows:
sample.agp
:
##agp-version 2.1
# DESCRIPTION: Generated by PretextView Version 0.2.5
# HiC MAP RESOLUTION: 1086.098755 bp/texel
Scaffold_1 1 3761159 1 W HAP1_SCAFFOLD_1 1 3761159 +
Scaffold_2 1 296504 1 W HAP1_SCAFFOLD_2 1 296504 +
Scaffold_3 1 21722 1 W HAP1_SCAFFOLD_2 733117 754838 +
Scaffold_4 1 311710 1 W HAP1_SCAFFOLD_2 421407 733116 +
Scaffold_5 1 318227 1 W HAP1_SCAFFOLD_2 754839 1073065 +
Scaffold_6 1 124902 1 W HAP1_SCAFFOLD_2 296505 421406 +
Scaffold_7 1 1794235 1 W HAP1_SCAFFOLD_2 1073066 2867300 +
Second, collect the information used for the previous Hi-C alignment. This should include the following files:
sample.fa
sample.fa.fai
alignment.bin
These files are typically located in the TreeVal run folder, under the subfolder hic_file.
-->
Now, you can run the pipeline using:
nextflow run sanger-tol/zippypretext \
-profile <docker/singularity/.../institute> \
--input sample.fa \
--idxfile sample.fa.fai \
--hicmap alignment.bin \
--agp sample.agp
--outdir <OUTDIR>
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-file
option. Custom config files including those provided by the-c
Nextflow option can be used to provide any configuration except for parameters; see docs.
Credits
sanger-tol/zippypretext was originally written by Yumi sims and Chenxi Zhou.
We thank the following people for their extensive assistance in the development of this pipeline: Jim Downie
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.