Blog

The Risk of Imbalanced Datasets in Chest X-ray Image-based Diagnostics

March 7, 2022

Chest X-Ray images generally contain overlayed textual information, called as annotations, depicting the hospital, patient and/or radiologist information.

The deep learning models, which are mostly black-box, can thus rely on these annotations instead of actual disease-specific area in the images to cheat and achieve higher accuracy. This undesirable behavior can especially be observed when combining images from multiple hospitals. For example, we hypothesize that if during training majority of Pneumonia cases come from one hospital (H1), as a shortcut the model will start detecting the hospital from the test images and assign a label of Pneumonia to all the images from H1, thus achieving higher accuracy.

Heatmaps of models for 90% (blue) and 60% (yellow) label-imbalance demonstrating spurious learning. With more imbalance, the reliance on the source annotations increases.

In this work, we focus on Pneumonia detection problem and use 2 publicly available datasets of CheXpert and ChestX-ray14 to prove this hypothesis. Further, we use a more transparent self-explainable method which explains it decisions while making them, called Prototypical Relevance Propagation, to detect high reliance of the models on the annotations instead of the lungs. We further show other kinds of artifacts, for example chest tubes, glucose bottles etc, captured by our self-explainable method, thus encouraging the use of more transparent models for intricate areas such as medical image diagnosis.

Publication

Demonstrating The Risk of Imbalanced Datasets in Chest X-ray Image-based Diagnostics by Prototypical Relevance Propagation

February 1, 2022

Abstract

The recent trend of integrating multi-source Chest X-Ray datasets to improve automated diagnostics raises concerns that models learn to exploit source-specific correlations to improve performance by recognizing the source domain of an image rather than the medical pathology. We hypothesize that this effect is enforced by and leverages label-imbalance across the source domains, i.e, prevalence of a disease corresponding to a source. Therefore, in this work, we perform a thorough study of the effect of label-imbalance in multi-source training for the task of pneumonia detection on the widely used ChestX-ray14 and CheXpert datasets. The results highlight and stress the importance of using more faithful and transparent self-explaining models for automated diagnosis, thus enabling the inherent detection of spurious learning. They further illustrate that this undesirable effect of learning spurious correlations can be reduced considerably when ensuring label-balanced source domain datasets.