Introduction

nf-core/nfmicrofinder is a bioinformatics pipeline that aids in the curation of bird genome assemblies by identifying putative microchromosome scaffolds and moving them to the start of the genome assembly FASTA file.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Pipeline Summary

  1. Input validation and parameter checks
  2. Index reference genome using Miniprot
  3. Align protein sequences to genome using Miniprot
  4. Filter alignments based on quality thresholds (identity ≥70%, score ≥60)
  5. Sort FASTA file based on filtered alignments to prioritize microchromosomes
  6. Generate final reordered assembly and pipeline reports

Quick Start

  1. Install Nextflow (>=24.10.5)

  2. Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run main.nf -profile test,docker

    Note that it is recommend to use the -profile parameter to specify the container technology of your choice. See the nf-core pipeline documentation for more information.

  4. Start running your own analysis!

    nextflow run main.nf \
        --input genome.fa \
        --pep_file proteins.fa \
        --output_prefix my_analysis \
        --outdir <OUTDIR>

Documentation

The nfmicrofinder pipeline comes with documentation about the pipeline usage and output.

Credits

nfmicrofinder was originally written by Yumi Sims and Will Eagle (@weaglesBio).

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Run with

Read how to configure the Seqera Platform CLI here.

clones in last 2 weeks

91

stars

0

watchers

0

last release

1 week ago

last updated

3 days ago

open issues

0

open pull requests

0

collaborators