College of Computing and Digital Media Dissertations

Ensemble labeling towards scientific information extraction (ELSIE)

Erin Murphy, DePaul UniversityFollow

Date of Award

Fall 11-9-2020

Degree Type

Thesis

Degree Name

Master of Science (MS)

School

School of Computing

First Advisor

Roselyne B. Tchoua, PhD

Second Advisor

Daniela Raicu, PhD

Third Advisor

Jacob Furst, PhD

Abstract

Extracting scientific facts from unstructured text is difficult due to challenges specific to the ambiguity of the language, the complexity of the scientific named entities and relations to be extracted. This problem is well illustrated through the extraction of polymer names and their properties. Even in the cases where the property is a temperature, identifying the polymer name associated with the temperature may require expertise due to the use of acronyms, synonyms, complicated naming conventions and by the fact that new polymer names are being “introduced” to the vernacular as polymer science advances. While there exist domain-specific machine learning toolkits that address these challenges, perhaps the greatest challenge is the lack of—time-consuming, error-prone and costly—labeled data to train these machine learning models. Our work repurposes Snorkel, a data programming tool, in a novel approach as a way to identify sentences that contain the relation of interest in order to generate training data, and as a first step towards extracting the entities themselves. We achieve 94% recall and demonstrate the importance of identifying the complex sentences prior to extraction by comparing to a state-of-the-art domain-aware natural language processing toolkit. We also show that our system captures sentences missed by both the toolkit and the expert labelers.

Recommended Citation

Murphy, Erin, "Ensemble labeling towards scientific information extraction (ELSIE)" (2020). College of Computing and Digital Media Dissertations. 25.
https://via.library.depaul.edu/cdm_etd/25

Download

Included in

Data Science Commons, Polymer and Organic Materials Commons

COinS

Ensemble labeling towards scientific information extraction (ELSIE)

Date of Award

Degree Type

Degree Name

School

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Search

Login and Notify

About The Commons

Links

Browse

Author Corner

At A Glance

Ensemble labeling towards scientific information extraction (ELSIE)

Author

Date of Award

Degree Type

Degree Name

School

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Share

Search

Login and Notify

About The Commons

Links

Browse

Author Corner

At A Glance