Date of Award
Fall 9-24-2024
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
School
School of Computing
First Advisor
Jacob Furst, PhD
Second Advisor
Daniela Stan Raicu, PhD
Third Advisor
Roselyne Tchoua, PhD
Fourth Advisor
Samuel G. Armato III, PhD
Abstract
A dataset becomes meaningful for analysis when it contains more representative features. Machine and deep learning models rely on annotated instances for training. The annotation process is usually done either by humans (experts or crowdsourcing) or by models. In many cases, the variability between humans (the inter-observer variability) in evaluation leads to uncertainty in the learning process. Due to the lack of reliable labels in large datasets, the inter-observer variability can be quantified with different methods to estimate the ground truth label (i.e., referenced standard label) for model learning.
In health care, with the rise of artificial intelligence in clinical decision support systems, it is critical to pay attention to the assigned labels that come from non-consensus panels during diagnosis. When the panel reaches an agreement on the diagnosis, this will benefit patients, so they have a better chance of pursuing a treatment. On the other hand, the lack of agreement on the diagnosis will lead to further evaluations and tests. Noting that complete agreement does not necessarily indicate correctness. Consequently, the referenced standard labels for model learning would not be the optimal solution to achieve the desired patient outcome.
The insight of the proposed work is to explore the relationship between the uncertainty in human annotations and uncertainty in models’ learning and predictions. Particularly, the focus is on how uncertain labels that come from non-consensus panels affect the learning in the machine and deep learning models — providing the limits of learning images and recommendations for achieving optimal outputs. The study was conducted using two datasets, the medical dataset, the Lung Image Database Consortium (LIDC), and the Modified National Institute of Standards and Technology (MNIST) designed for the generalizability of its findings. Different investigations were conducted on the LIDC to quantify how much of the observer agreement is related to image content. Besides, multiple models were used to generate variations in models’ predictions for comparison with human assigned labels such as Monte Carlo dropout, and different train, validation, and test splits in addition to the baseline model. The results demonstrate that label uncertainty in datasets leads to models’ uncertainty.
Recommended Citation
Almansour, Amal, "Exploring how uncertain labels from non-consensus panels affect machine learning" (2024). College of Computing and Digital Media Dissertations. 61.
https://via.library.depaul.edu/cdm_etd/61