Introduction

The Tree of Life projects will generate tens of thousands of high-quality genomes โ€“ more than have ever been sequenced! It is a challenging and extremely exciting task that will shape the future of biology, and the teamโ€™s role is to provide the platform for assembling and analysing those genomes at an unprecedented scale. We are the interface between the Tree of Life teams (assembly production and faculty research) and Sanger IT, working together with the informatics teams of the other programmes.

The team is organised in three poles.

๐Ÿ“‚ Data management: Our data curators and managers maintain the integrity, consistency, and quality, or multiple databases used in production, including Genomes on a Tree (GoaT), Sample Tracking System (STS), Collaborative Open Plant Omics (COPO), and BioSamples.

๐Ÿ’ป Bioinformatics: Our bioinformaticians develop the suite of analysis pipelines that will run on every genome produced in Tree of Life, providing a central database of core results available for all.

๐Ÿ”ฉ Systems: We develop and maintain some core systems used in production, including the execution and tracking of all bioinformatics pipelines, and the deployment of third-party web applications for internal use.

Tech stack

Tech stack

The team uses a wide range of technologies, frameworks and programming languages, including Nextflow, Python, Conda, Jira, LSF, Singularity, and Kubernetes. The technology wheel below shows most of their logos. How many can you recognise?


Projects

Genome After Party

๐Ÿงฎ Tech Stack: Nextflow DSL2, Python, React, SQLAlchemy, and PostgreSQL

Genome After Party is a suite of pipelines to standardise the downstream analyses performed on all genomes produced by the Tree of Life. These include:

Learn more about our pipelines on their dedicated pages. These pipelines are created using Nextflow DSL2 and nf-core template. They are designed for portability, scalability and biodiversity.

A portal is being developed to automate the production of genome note publications. It will execute the Nextflow pipeline and populate an associated database with generated statistics and images. The platform is being designed in collaboration with the Enabling Platforms team to create genome note style publications for both internal Tree of Life assemblies as well as external genome assemblies.

If you have an idea for a new feature โ€“ send us your request.

Members

Matthieu Muffato, Team Lead

     


Matthieu leads the Informatics Infrastructure team, which guides the implementation and delivery of the genome assembly pipelines, and provides support for large-scale genome analyses for the Tree of Life faculty teams. He joined the Wellcome Sanger Institute in February 2021, to form the Informatics Infrastructure team for the Tree of Life programme. He has recruited 7 team members, with skills covering data curation & management, software development & operations, and bioinformatics.

Guoying Qi, DevOps Software Developer

   


Guoying, a DevOps software engineer, has the responsibility of developing and deploying software and web applications for the Tree of Life project across various platforms such as computing farms, Kubernetes, OpenStack, and public clouds.

Priyanka Surana, Senior Bioinformatician

     


Priyanka is a Senior Bioinformatician, overseeing the development of Nextflow pipelines for genome assembly, curation and downstream analyses. She also facilitates the workflows community and is passionate about building networks that support peer learning.

Cibin Sadasivan Baby, Senior Software Developer

   


Cibin, a Senior Software Developer, is tasked with designing and implementing the production systems for TOL-IT. Currently, Cibin is focused on building an automated platform to execute high-throughput genomic pipelines. The ultimate goal of this project is to develop a system capable of efficiently processing large amounts of genomic data.

Cibele Sotero-Caio, Genomic Data Curator

       


Cibele is the data curator for the Genomes on a Tree (GoaT) - a platform developed to support the Tree of Life and other sequencing initiatives of the Earth Biogenome project (EBP).

Paul Davis, Data Manager

       


Paul works on the main ToL Genome Engine. This system was developed by the ToL to manage and track samples from collection, onboarding, processing in the lab, sequencing and finally the publication of the assembly and Genome Note publication. As there are many steps in this process developing methodology to identify issues as early as possible is vital to avoid wasted time and resource. Paul works at all levels of the project fielding questions about data flow, data fixes and helps other ToL staff and project stakeholders with data and information. Paul also interacts with external groups and stakeholders to maintain data integrity in the public domain.

Beth Yates, Bioinformatics Engineer

   


Beth is a Bioinformatics Engineer working on a building a platform to automate the production of Genome Note publications. The Universal Genome Note platform consists of a web portal, database and Nextflow pipelines. Beth is contributing to the genomenote pipeline, this pipeline fetches assembly meta data and generates some of the figures and statistics included in each genome note.

Alumni

  • Zaynab Butt, Informatics and Digital Associate