Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes
Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes
About this item
Full title
Author / Creator
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Journal title
Language
English
Formats
Publication information
Publisher
Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Subjects
More information
Scope and Contents
Contents
The analysis of single-copy core genes, common to most members of a clade, is important for key tasks in biology including phylogenetic reconstruction and assessing genome quality. Core genes are traditionally identified by the analysis of amino acid similarities among proteomes, but can also be defined using structures, which bear potential in deep clades beyond the twilight zone of amino acids. Despite breakthroughs in accurate AI-driven protein structure prediction, obtaining full 3D structural models on a proteomic scale is still prohibitively slow. Here, we present Unicore, a novel method for identifying structural core genes at a scale suitable for downstream phylogenetic analysis. By applying the ProstT5 protein language model to the input proteomes to obtain their 3Di structural strings, Unicore saves over three orders of magnitude in runtime compared to a full 3D prediction. Using Foldseek clustering, Unicore identifies single-copy structures universally present in the species and aligns them using Foldmason. These structural core gene alignments are projected back to amino acid information for downstream phylogenetic analysis. We demonstrate that this approach defines core genes with linear run-time scaling over the number of species, up to 56 times faster than OrthoFinder, while reconstructing phylogenetic relationships congruent with conventional approaches. Unicore is universally applicable to any given set of taxa, even spanning superkingdoms and overcoming limitations of previous methods requiring orthologs of fixed taxonomic scope, and is available as a free and open source software at https://github.com/steineggerlab/unicore.Competing Interest StatementM.S. declares an outside interest in Stylus Medicine....
Alternative Titles
Full title
Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes
Authors, Artists and Contributors
Author / Creator
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_3148057422
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_3148057422
Other Identifiers
E-ISSN
2692-8205
DOI
10.1101/2024.12.22.629535
How to access this item
https://www.proquest.com/docview/3148057422?pq-origsite=primo&accountid=13902