Medical Image Classification

Self-supervised learning for the classification of medical images

Motivation

What are the issues with medical imaging?

Labeling Difficulties:

Subjective interpretation differences
Various diseases and complexities
Irregularities and uncertainties

Limited Dataset Size:

Difficulty in acquiring medical images
Privacy concerns with patient data
Ethical limitations

Experiment

The MedMNIST[5] dataset consists of 12 preprocessed datasets that include CT, X-ray, ultrasound, and OCT images. These datasets are used for various classification tasks, including multi-label, multi-class, regression, and binary classification. The data sizes in this collection range from a minimum of 100 to over 100,000 samples. MedMNIST is divided into training, validation, and test subsets, and we utilized the 12 training datasets (337,029 samples) for pre-training the model and evaluated its performance using the validation and test subsets.

We report the standard evaluation metrics, Accuracy (ACC), and Area Under the ROC Curve (AUC). AUC is a threshold-independent metric commonly used to evaluate the performance of models with continuous prediction scores. On the other hand, ACC is a threshold-based metric used to evaluate the performance of models with discrete prediction labels. Therefore, ACC is more sensitive to class imbalances compared to AUC. Since our experiments involve various datasets with different sizes and data diversities, both ACC and AUC can be used as comprehensive metrics. Despite the availability of other diverse metrics, we choose to report the ACC and AUC as benchmarking methods in line with those reported in [6]. The ACC and AUC results for each dataset are presented in Table 2.

In Table 2, the comparison results between the proposed method and the previous state-of-the-art (SOTA) methods are presented in terms of AUC and ACC for each dataset in MedMNIST-2D. The results of the proposed model show similar or slightly lower values compared to the latest methods. However, it should be noted that while MedViT used 224- sized images, we used 96-sized images. This indicates that our model requires less computational cost during training. Furthermore, our model provides similar performance to the ViT-based models while using significantly lower computational cost compared to the CvT-based models.