Log in to save to my catalogue

Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes

Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3148057422

Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes

About this item

Full title

Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

Journal title

bioRxiv, 2024-12

Language

English

Formats

Publication information

Publisher

Cold Spring Harbor: Cold Spring Harbor Laboratory Press

More information

Scope and Contents

Contents

The analysis of single-copy core genes, common to most members of a clade, is important for key tasks in biology including phylogenetic reconstruction and assessing genome quality. Core genes are traditionally identified by the analysis of amino acid similarities among proteomes, but can also be defined using structures, which bear potential in deep clades beyond the twilight zone of amino acids. Despite breakthroughs in accurate AI-driven protein structure prediction, obtaining full 3D structural models on a proteomic scale is still prohibitively slow. Here, we present Unicore, a novel method for identifying structural core genes at a scale suitable for downstream phylogenetic analysis. By applying the ProstT5 protein language model to the input proteomes to obtain their 3Di structural strings, Unicore saves over three orders of magnitude in runtime compared to a full 3D prediction. Using Foldseek clustering, Unicore identifies single-copy structures universally present in the species and aligns them using Foldmason. These structural core gene alignments are projected back to amino acid information for downstream phylogenetic analysis. We demonstrate that this approach defines core genes with linear run-time scaling over the number of species, up to 56 times faster than OrthoFinder, while reconstructing phylogenetic relationships congruent with conventional approaches. Unicore is universally applicable to any given set of taxa, even spanning superkingdoms and overcoming limitations of previous methods requiring orthologs of fixed taxonomic scope, and is available as a free and open source software at https://github.com/steineggerlab/unicore.Competing Interest StatementM.S. declares an outside interest in Stylus Medicine....

Alternative Titles

Full title

Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes

Authors, Artists and Contributors

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_proquest_journals_3148057422

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3148057422

Other Identifiers

E-ISSN

2692-8205

DOI

10.1101/2024.12.22.629535