College of Computing and Digital Media Dissertations

BERT efficacy on scientific and medical datasets: a systematic literature review

Clayton Cohn, DePaul UniversityFollow

Date of Award

Winter 11-17-2020

Degree Type

Thesis

Degree Name

Master of Science (MS)

School

School of Computing

First Advisor

Peter Hastings

Second Advisor

Noriko Tomuro

Third Advisor

Roselyne Tchoua

Abstract

Bidirectional Encoder Representations from Transformers (BERT) [Devlin et al., 2018] has been shown to be effective at modeling a multitude of datasets across a wide variety of Natural Language Processing (NLP) tasks; however, little research has been done regarding BERT’s effectiveness at modeling domain-specific datasets. Specifically, scientific and medical datasets present a particularly difficult challenge in NLP, as these types of corpora are often rife with technical jargon that is largely absent from the canonical corpora that BERT and other transfer learning models were originally trained on. This thesis is a Systematic Literature Review (SLR) of twenty-seven studies that were selected to address the various methods of implementation when applying BERT to scientific and medical datasets. These studies show that despite the datasets’ esoteric subject matter, BERT can be effective at a wide range of tasks when applied to domain-specific datasets. Furthermore, these studies show that the addition of domain-specific pretraining, either through additional pretraining or the utilization of domain-specific BERT derivatives such as BioBERT [Lee et al., 2019], can further augment BERT’s performance on scientific and medical texts.

Recommended Citation

Cohn, Clayton, "BERT efficacy on scientific and medical datasets: a systematic literature review" (2020). College of Computing and Digital Media Dissertations. 24.
https://via.library.depaul.edu/cdm_etd/24

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

BERT efficacy on scientific and medical datasets: a systematic literature review

Date of Award

Degree Type

Degree Name

School

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Search

Login and Notify

About The Commons

Links

Browse

Author Corner

At A Glance

BERT efficacy on scientific and medical datasets: a systematic literature review

Author

Date of Award

Degree Type

Degree Name

School

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Share

Search

Login and Notify

About The Commons

Links

Browse

Author Corner

At A Glance