Cross-modality Self-Supervised Learning from optical coherence tomography and color fundus for computer-assisted diagnosis of AMD
Language
EN
Communication dans un congrès
This item was published in
ARVO Imaging in the Eye Conference Abstract, ARVO 2024, 2024-05-05, Seattle. 2024-07-01, vol. 65, n° 9
English Abstract
Purpose : To improve computer-assisted diagnosis methods by leveraging multimodal unlabeled datasets, naturally produced in ophthalmology practice. We apply Self-Supervised Learning (SSL) pretraining principles to jointly ...Read more >
Purpose : To improve computer-assisted diagnosis methods by leveraging multimodal unlabeled datasets, naturally produced in ophthalmology practice. We apply Self-Supervised Learning (SSL) pretraining principles to jointly learn modality-invariant features from Optical Coherence Tomography (OCT) and Color Fundus Photograpy (CFP). Methods : We used a dataset of 450 pairs of OCT and CFP images, sourced from the Alienor (Antioxydants, Lipides Essentiels, Nutrition et maladies OculaiRes) epidemiological study. Each pair was captured at the same time and from the same eye. CFP were 1920x991 pixels, 3-channels images, and OCT images consisted of 19 B-scans, 32-bit 1024x496 pixels 2D images. A contrastive self-supervised objective was adapted from the SimCLR paper (Chen and al., 2020), using normalized temperature-scaled cross entropy loss to maximize cross-modality similarity within pairs. A Convolutional Neural Network (CNN) based on EfficientNetB3 (Tan and Le, 2020) was used as the encoder. A trainable projection head of 3 non-linear layers was added on top, to obtain a 512-dimensional shared modality space. A small module of 2 trainable convolutional layers was used to match OCT images to the same shape as CFP. The network was trained on various augmented views for 140 epochs, using the Adam optimizer and a Cosine Annealing learning rate scheduler, with a batch size of 16 pairs and 2 views per image, totaling 64 single views for a single loss computation. Results : We compiled labeled and open-access datasets, resulting in a binary task dataset of 2200 images indicating the presence or absence of AMD (drusen, exudation, hemorrhage). Our trained SSL model was fine-tuned and evaluated based on this dataset, using 5-fold cross validation. First, we froze our EfficientB3 encoder and fine-tuned a small 3-layer MLP on top, achieving an AUC of 0.810 (+/- 0.040). Subsequently, it was unfrozen and fully fine-tuned, achieving an AUC of 0.866 (+/- 0.022). It was compared to and found to outperform other pretraining methods, including ImageNet (AUC 0.814 +/- 0.031), self-supervision on only CFP modality (AUC 0.840 +/- 0.020) and even supervised pretraining on Alienor (AUC 0.827 +/- 0.033). Conclusions : Our study demonstrates the feasibility of learning valuable features for AMD classification without annotations, using a cross-modality contrastive learning objective from OCT and CFP images.
This abstract was presented at the 2024 ARVO Imaging in the Eye Conference, held in Seattle, WA, May 4, 2024.Read less <