Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Lehan Wang, Weihang Dai, Mei Jin, Chubin Ou, Xiaomeng Li

Abstract

Optical Coherence Tomography (OCT) is a novel and effective screening tool for ophthalmic examination. Since collecting OCT images is relatively more expensive than fundus photographs, existing methods use multi-modal learning to complement limited OCT data with additional context from fundus images. However, the multi-modal framework requires eye-paired datasets of both modalities, which is impractical for clinical use. To address this problem, we propose a novel fundus-enhanced disease-aware distillation model (FDDM), for retinal disease classification from OCT images. Our framework enhances the OCT model during training by utilizing unpaired fundus images and does not require the use of fundus images during testing, which greatly improves the practicality and efficiency of our method for clinical use. Specifically, we propose a novel class prototype matching to distill disease-related information from the fundus model to the OCT model and a novel class similarity alignment to enforce consistency between disease distribution of both modalities. Experimental results show that our proposed approach outperforms single-modal, multi-modal, and state-of-the-art distillation methods for retinal disease classification.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_60

SharedIt: https://rdcu.be/dnwMk

Link to the code repository

https://github.com/xmed-lab/FDDM

Link to the dataset(s)

https://github.com/li-xirong/mmc-amd

https://ieee-dataport.org/open-access/retinal-fundus-multi-disease-image-dataset-rfmid


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposes FDDM distillation model to address the challenge of needing paired fundus and OCT modalities for eye disease classification. Specifically, teacher network extracts disease-specific features from fundus data and transfers it into an OCT student model without relying on paired training data. The inference only requires OCT modality data. Approach is evaluated on an in-house dataset and a public MMC-AMD dataset, compared with single-modal, multi-modal, and knowledge distillation methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed approach enables the multi-modal distillation for eye disease classification using unpaired fundus and OCT datasets. Inference process can be performed by using OCT single-modal data.
    2. The proposed CPM module focuses on specific features from the class instead of individual instances, to reduce the noise of the sample instance. The proposed CSA module is used to explore the knowledge in the relationships among different classes, which is then transferred to the student OCT model.
    3. The approach can extract knowledge from multiple public fundus datasets, and achieve the classification of 11 types of eye disease.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Fig 2. Framework overview can be improved.
    2. The split ratio and specific disease amount for train and test should be reported. And in the public dataset MMC-AMD experiment, compared methods should include more SOTA approaches.
    3. The order of 3.3 and 3.4 can be switched.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors state that code would be made available upon acceptance, I think the result is reproducible with public resources.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Lacking of comparison with SOTA methods on public datasets.
    2. Train and test ratio about in-house dataset should be reported. And it will be better to report kappa metrics.
    3. Fig 2 may need elaborations.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a interesting paper that solve the issue of needing paired data in multi-modal eye disease classification tasks.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The authors propose a new distillation model, exploiting a context of retinal multimodal imaging using a teacher-student strategy and further refine the results of pathological analysis in ophthalmology. The method seems interesting and offer considerable results, being tested in public and private datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Exploiting contexts of multimodal imaging is of interest in ophthalmology. The proposed method is adequate and offers improved results. The method was validated in public and private datasets. The comparative with the state of the arts offers a good position for the proposal, demonstrating the contribution of the multimodal exploiting context, despite only using one modality in inference.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The used datasets are not significantly large, and including different pathological scenarios. Further experimentations with larger datasets, also analyzing the impact sufficiently in particular diseases would reinforce the conclusions of the work. The authors stated not using paired datasets, there are some doubts about how the datasets were prepared and used, being detailed posteriorly.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is reproducible, offering details in the manuscript to understand the methodology. Also, the authors indicate their commitment to publish the code if the work is accepted. The authors validated the method in a private dataset, but also results in public dataset is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I have some specific additional comments of the manuscript that I would like to indicate:

    When you say in the introduction “with a single modality”, what do you refer there? 1 histological cut? Because after that it is said that OCT are inadequate compared with cheap fundus images and that is not true. The problem is using only 1 histological image but when entire 3D OCT scans are provided, there is higher amount of information. When you indicate, “without relying on paired training data”, what do you exactly mean. Not paired at all even by pathology? Do you feed the training with images from different patients, from different diseases? I find very interesting in this scenario even for future work to study, not paired at all data (patients, diseases, capture devices, etc.) but also, paired images by pathological scenario, and also paired images by patients, and registered images. This would analyze the potential of this strategy in the domain.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the previous comments, I find this work interesting for the audience of the MICCAI conference.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    (1) a novel fundus-enhanced disease-aware distillation model for retinal disease classification employs class prototype matching and similarity alignment techniques. (2) The proposed method facilitates flexible knowledge transfer from any publicly available fundus dataset, thereby reducing the cost of collecting multi-modal data. (3) good performance

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Proposes a novel fundus-enhanced disease-aware distillation module, FDDM, for retinal disease classification (2) Incorporates class prototype matching and class similarity alignment (3) Does not require paired instances for multi-modal training and inference (4) Significantly reduces the prerequisites for clinical application, making it more cost-effective for retinal disease diagnosis (5) existing baselines considerably, as demonstrated in extensive experiments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The KL distillation method is not new. Please refer to the paper: Shu, Changyong, et al. “Channel-wise knowledge distillation for dense prediction.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If the dataset is made publicly available, reproducibility will be supported.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) cite the original KL distillation papers. (2) It would be beneficial if the dataset could be made publicly available.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental results are comprehensive and promising. The proposed method effectively addresses the challenge of unpaired modalities.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposes a novel fundus-enhanced disease-aware distillation model to address the challenge of needing paired fundus and OCT modalities for eye disease classification. The designed model contains a teacher network to extract disease-specific features from fundus data and transfer it into an OCT student network. The inference only requires OCT modality data. The method is novel and performs well on both public and private datasets. The paper is well written with clear logic, and the rich experiments designed can well illustrate the advantages of the method. I recommend accepting this submission. The authors should address the detailed comments from the reviewers in the camera-ready manuscript.




Author Feedback

We appreciate the reviewers for their valuable feedback. Overall, the reviewers considered our motivation clear and novel (R3), and our model interesting (R1, R2) and cost-effective (R3). All reviewers (R1, R2, R3) recognized and acknowledged the effectiveness of our method and the significance it holds in reducing the requirement for paired data during the disease inference stage. Furthermore, the reviewers commended our contributions of releasing our dataset and code to the public, recognizing the additional benefits it brings to the research community. However, there are still concerns raised by the reviewers regarding the details of our method (R1), dataset specifics (R2), and additional comparisons on public datasets (R2). We thoroughly addressed these concerns and provide comprehensive explanations to address the reviewers’ feedback.

[R1] Misunderstanding on “OCT is inadequate compared to cheap fundus images” We concur with R1 that OCT, owing to its 3D nature, inherently contains more comprehensive information compared to fundus images for a single patient. However, when we mention that “OCT is inadequate compared to cheap fundus images,” we are emphasizing the relative cost difference in data collection. Fundus data collection is generally less expensive compared to OCT, leading to a limitation in the availability of OCT data compared to fundus photos.

[R1] Clarifications on “paired data” Our method does not rely on pairing operations, which means we are not dependent on paired images from individual patients or strict pairing of images based on pathological scenarios. Instead, we employ a strategy where we independently select images of each disease for each modality during the training stage. This approach enables us to obtain class-level features without the need for explicit pairing. It is important to note that our strategy does require datasets to have a shared label space, but it does not rely on paired images from both patients and specific diseases.

[R2] More comparisons in public dataset. The methods we compared with are the best reported results on the public dataset MMC-AMD. We also reproduced traditional distillation methods in [21] and [5] on the public dataset for more comprehensive comparisons. We enrich the evaluation on MMC-AMD below using MAP as metric: | OCT CNN | Two-Stream CNN [24] | FitNet [21] | KD [5] | Ours | | 87.98 | 86.91 | 90.72 | 90.29 | 92.29 | Our experiments show that our proposed method outperforms these methods, with a MAP of 92.29%, which is 2% (92.29% vs 90.29%) and 1.57% (92.29% vs 90.72%) higher than the MAPs of the methods in [5] and [21], respectively.

[R2] Train and test ratio about in-house dataset For our in-house dataset, we have maintained a training-to-test set ratio of approximately 8:2, with the separation being done at the patient level. To ensure the robustness of the model, the result was reported by five-fold cross-validation.

[R2] Kappa metrics. Our method demonstrated remarkable performance on our private dataset, achieving a kappa metric of 56.98%. This represents a substantial improvement when compared to the best-performing state-of-the-art method, OCT CNN, which reported a kappa metric of 51.51%. This outcome serves as compelling evidence to affirm the effectiveness of our proposed method.

[R2] Elaborate figure and switch paragraph order. We thank R2 for the suggestions , and we will consider this in our revision.

[R3] Cite Paper; Code and Data availability. We thank R3 for the suggestions and we will cite this paper in our final version. We are dedicated to ensuring the accessibility of our code, model, and data to the research community by making them openly available.



back to top