College of Computing and Digital Media Dissertations

Date of Award

Fall 9-24-2024

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

School

School of Computing

First Advisor

Jacob Furst, PhD

Second Advisor

Daniela Stan Raicu, PhD

Third Advisor

Roselyne Tchoua, PhD

Fourth Advisor

Samuel G. Armato III, PhD

Abstract

A dataset becomes meaningful for analysis when it contains more representative features. Machine and deep learning models rely on annotated instances for training. The annotation process is usually done either by humans (experts or crowdsourcing) or by models. In many cases, the variability between humans (the inter-observer variability) in evaluation leads to uncertainty in the learning process. Due to the lack of reliable labels in large datasets, the inter-observer variability can be quantified with different methods to estimate the ground truth label (i.e., referenced standard label) for model learning.

In health care, with the rise of artificial intelligence in clinical decision support systems, it is critical to pay attention to the assigned labels that come from non-consensus panels during diagnosis. When the panel reaches an agreement on the diagnosis, this will benefit patients, so they have a better chance of pursuing a treatment. On the other hand, the lack of agreement on the diagnosis will lead to further evaluations and tests. Noting that complete agreement does not necessarily indicate correctness. Consequently, the referenced standard labels for model learning would not be the optimal solution to achieve the desired patient outcome.

The insight of the proposed work is to explore the relationship between the uncertainty in human annotations and uncertainty in models’ learning and predictions. Particularly, the focus is on how uncertain labels that come from non-consensus panels affect the learning in the machine and deep learning models — providing the limits of learning images and recommendations for achieving optimal outputs. The study was conducted using two datasets, the medical dataset, the Lung Image Database Consortium (LIDC), and the Modified National Institute of Standards and Technology (MNIST) designed for the generalizability of its findings. Different investigations were conducted on the LIDC to quantify how much of the observer agreement is related to image content. Besides, multiple models were used to generate variations in models’ predictions for comparison with human assigned labels such as Monte Carlo dropout, and different train, validation, and test splits in addition to the baseline model. The results demonstrate that label uncertainty in datasets leads to models’ uncertainty.

Available for download on Friday, October 24, 2025

Share

COinS