Log in to save to my catalogue

Automated recognition of functional compound-protein relationships in literature

Automated recognition of functional compound-protein relationships in literature

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_bd52bdcda3eb4a59ac2ec64f731edda5

Automated recognition of functional compound-protein relationships in literature

About this item

Full title

Automated recognition of functional compound-protein relationships in literature

Publisher

United States: Public Library of Science

Journal title

PloS one, 2020-03, Vol.15 (3), p.e0220925-e0220925

Language

English

Formats

Publication information

Publisher

United States: Public Library of Science

More information

Scope and Contents

Contents

Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task.
We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated.
The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time.
The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline....

Alternative Titles

Full title

Automated recognition of functional compound-protein relationships in literature

Identifiers

Primary Identifiers

Record Identifier

TN_cdi_doaj_primary_oai_doaj_org_article_bd52bdcda3eb4a59ac2ec64f731edda5

Permalink

https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_doaj_primary_oai_doaj_org_article_bd52bdcda3eb4a59ac2ec64f731edda5

Other Identifiers

ISSN

1932-6203

E-ISSN

1932-6203

DOI

10.1371/journal.pone.0220925

How to access this item