FAS: Assessing the similarity between proteins using multi-layered feature architectures

  • Motivation Expert curation to differentiate between functionally diverged homologs and those that may still share a similar function routinely relies on the visual interpretation of domain architecture changes. However, the size of contemporary data sets integrating homologs from hundreds to thousands of species calls for alternate solutions. Scoring schemes to evaluate domain architecture similarities can help to automatize this procedure, in principle. But existing schemes are often too simplistic in the similarity assessment, many require an a-priori resolution of overlapping domain annotations, and those that allow overlaps to extend the set of annotations sources cannot account for redundant annotations. As a consequence, the gap between the automated similarity scoring and the similarity assessment based on visual architecture comparison is still too wide to make the integration of both approaches meaningful. Results Here, we present FAS, a scoring system for the comparison of multi-layered feature architectures integrating information from a broad spectrum of annotation sources. Feature architectures are represented as directed acyclic graphs, and redundancies are resolved in the course of comparison using a score maximization algorithm. A benchmark using more than 10,000 human-yeast ortholog pairs reveals that FAS consistently outperforms existing scoring schemes. Using three examples, we show how automated architecture similarity assessments can be routinely applied in the benchmarking of orthology assignment software, in the identification of functionally diverged orthologs, and in the identification of entries in protein collections that most likely stem from a faulty gene prediction.

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Julian DoschORCiD, Holger Bergmann, Ngoc Vinh TranORCiDGND, Ingo EbersbergerORCiDGND
URN:urn:nbn:de:hebis:30:3-730897
DOI:https://doi.org/10.1101/2022.09.01.506207
Parent Title (English):bioRxiv
Document Type:Preprint
Language:English
Date of Publication (online):2022/09/03
Date of first Publication:2022/09/03
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2023/06/22
Issue:2022.09.01.506207
Page Number:27
HeBIS-PPN:509407935
Institutes:Biowissenschaften
Fachübergreifende Einrichtungen / Biodiversität und Klima Forschungszentrum (BiK-F)
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - CC BY-NC-ND - Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International