Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yanmiao Bai, Jinkui Hao, Huazhu Fu, Yan Hu, Xinting Ge, Jiang Liu, Yitian Zhao, Jiong Zhang

Abstract

Ultra-wide-field (UWF) fundus photography is a newly imaging technique with providing a broader field of view images, and it has become a popular and effective tool for the screening and diagnosis for many eye diseases, such as diabetic retinopathy (DR). However, it is practically challenge to train a robust deep learning model for DR grading in UWF images, due to the limited scale of data and manual annotations. By contrast, we may find large-scale high-quality regular color fundus photography datasets in the research community, with either image-level or pixel-level annotation. In consequence, we propose an Unsupervised Lesion-aware TRAnsfer learning framework (ULTRA) for DR grading in UWF images, by leveraging a large amount of publicly well-annotated regular color fundus images. Inspired by the clinical identification of DR severity, i.e., the decision making process of ophthalmologists based on the type and number of associated lesions, we design an adversarial lesion map generator to provide the auxiliary lesion information for DR grading. A Lesion External Attention Module (LEAM) is introduced and integrates the lesion feature into the model, allowing a relative explainable DR grading. Extensive experimental results show the proposed method is superior to the state-of-the-art methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_54

SharedIt: https://rdcu.be/cVRsm

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents an unsupervised method for DR grading in ultra-widefield retinal images by utilizing additional narrow-field fundus photography. The paper consider an auxiliary task of lesion detection in narrow-field images because the DR grading is closely related to lesion

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method does not require ground truth labeling of UWF images. Instead, it takes advantages of existing labeled dataset of narrow-field images.
    2. The ablation studies validate the effectiveness of the proposed modules.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The loss function for UWF lesion detection is quite weak. It is not very clear how accurate the lesion detections are (for UWF modality).
    2. The design of LEAM is not fully justified.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper includes the implementation deatils for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The loss function for optimizing the multi-lesion generation task consists of the BCE loss between GT and predicted lesion mask and the adversarial loss to distinguish between source and target lesion. The supervision for UWF lesion detection is very weak (only the adversarial loss). The paper should include more analysis on the results of UWF lesion detection.
    2. In Fig.2, please explain the code coding. What do red/yellow/purple/green pixel mean?
    3. The design of the proposed LEAM is not fully justified. The attention map is obtained from features in lesion module. Why adding the attention in lesion features as oppose to features from grading module? How effective this module is compared with simple concatenation?
    4. Please provide analysis and discussion of the limitation of the proposed method (failure case).
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the paper has merits and is of interest to the MICCAI community (unsupervised method/use of additional modality/under-explored modality).

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    Ultra-Wide-Field Fundus Photography is an emerging novel imaging technique that can provide a broader field of view . This is useful for diabetic retinopathy disease screening and grading. However, due to the unavailability of a large public UWF dataset, it is hard to train an automated system with a UWF dataset. Here the author proposes a transfer learning-based method where originally, the well-labeled publicly available color fundus photography images were used to train the system. Subsequently, an unsupervised transfer learning based strategy was taken to assist the DR grading of UWF images with the help of well-labeled color fundus photography image dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written.
    2. The paper performed a lot of experiments including relevant ablation studies.
    3. The results are better than the other methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper does not talk much about the benefit of using UWF over CFP. Is there any specific study that authors can cite showing the at broader field of view of peripheral retinal pathology of UWF resulted in better diagnosis or detection of specific biomarker of DR ? This can be cited along with the citation 3.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The note on reproducibility seems fine but the authors used a private dataset where it will be hard to access.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Here are some comments along with strong and weak points mentioned before:

    1. It will be good if the authors can show some example images from different disease classes.
    2. The author resized the images 512 X 512. The UWF images were 3900×3072 which significantly higher. Can the authors comment about the information loss while downsizing these images. Can these affect the final outcome?
    3. The results in terms of numbers are not quite high. Is this a issue with UWF image in general? As the authors collected those from local hospital, is it possible to comment on the accuracy of diagnosis by clinicians with those images?
    4. In table 1, the precision of M(Ultra) is significantly higher than CycleGAN where as F1 score is higher in CycleGAN. This is a little confusing. Is this because of the Recall?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper idea is relatively novel and also the paper is well-written. Other than some comments and issues with the results, I believe that this can be a stepping stone for successful and further investigation of this novel imaging system.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposes an unsupervised framework (ULTRA) for the classification of diabetic retinopathy (DR) in ultra-widefield (UWF) images, by transfer learning from labeled (and more common) colour fundus photograph (CFP) images (including pixel annotations for CFP images). The CFP data is used to train an adversarial lesion generation model (segmentation model) for the UWF images, the feature maps of which are then combined by a lesion external attention module. Classification performance of ULTRA is claimed to be superior to supervised training with CFP labels, and a few other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Large number of CFP involved in training
    • Usage of CFP pixel annotations, not just labels
    • Attempts to generally adapt/transfer rarer (UWF) features into more common (CFP) features
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Most natural baseline (supervised UWF model with label) appears missing
    • Performance improvement appears marginal (and dependant on metric); it is unclear whether the contribution of the method would diminish with more (labeled) UWF data (currently, only 904 UWF images available, of which about half are normal)
    • No direct (if partial) evaluation of lesion segmentation for UWF data
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The UWF dataset appears to be private. Exact details for the proposed adversarial lesion generation module and lesion external attention module (including hyperparameters) do not appear to be provided, only the general description.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The development of a framework for transferring knowledge from a more-common/better-annotated modality, to another (wider-coverage) modality, is desirable and likely has much application outside of ocular imaging. However, the manuscript in its current stage may be somewhat short on technical/implementation detail, and relevant comparisons.

    1. In the Methodology section, it is stated that “…we trained a U-Net to mask out such artifacts [on UWF images]”. How was ground truth obtained for the UWF images, since it is also stated that the UWF images were provided with any pixel-level annotations (Section 2.2)

    2. The training of the models (as in Figure 1) might be clarified further. In particular, is the DR grading module trained jointly with the lesion generation module, or is the lesion generation module trained first and then frozen?

    3. It might be clarified as to how the source and target image inputs (as shown in Figure 1) are selected, for each pair of inputs. This is additionally since the CFP and UWF images should not be of the same patient/eye, being from different datasets. Then, particularly if the model is trained in an unsupervised manner with respect to UWF, is there any assurance that the images are compatible?

    4. Moreover, it is unclear as to whether the (arbitrary?) CFP image should be part of the input to the DR grading module (implied at Point C in Figure 1), if the objective is to obtain a DR class for the target UWF image. Does this imply that the DR class output for the UWF image might be different depending on which CFP image it is paired with? This might be clarified.

    5. For the lesion segmentation for CFP in the lesion generation module, it might be clarified whether different types of lesions (MA/HM/SE/HE) are annotated and classified separately. Moreover, the term for the relevant loss might be standardized (apparently L_Seg in Figure 1, L_CE in the text and Equation 1)

    6. The naming of the lesion generation module might be reconsidered, if its actual function is to provide pixel-level annotations of UWF images, and not to generate new lesions on the UWF images (as from Figure 2)

    7. Moreover, it might be considered to directly evaluate the performance of pixel-level annotations from UWF images, by transfer from CFP images; ground truth on a (small) subset of the UWF images would be sufficient.

    8. The most obvious baseline to compare against, would appear to be a supervised UWF model trained conventionally using the image-level DR labels, either with pretraining from ImageNet or CFP data. This does not appear to have been attempted, from the models/results in Table 1.

    9. It is unclear why increasing the number of available CFP images (from 8,000 to 15,000) would result in reduced performance (Table 2). It is also not clear how F1 and Kappa metrics would diverge so greatly, and what the interpretation might be (2,000 Images having F1 values of 66.53% but Kappa of only 33.00%, compared to say 8,000 Images having barely-higher F1 of 67.57%, but Kappa of 51.01%). Repeating the individual experiments to estimate the variance of the results might be appropriate.

    10. Implementation details (including training & hyperparameter search methodology) might be provided for the comparison methods.

    11. It might be briefly checked as to how frequently the trained model gets the domain (i.e. source CFP vs. target UWF) incorrect, if activated during testing. This should provide some context as to the extent to which the grading module is actually domain-independent.

    12. Some minor phrasing/spelling issues, e.g. (Abstract) “newly imaging technique” -> “new imaging technique” (Abstract) “is practically challenge” -> “is practically challenging” (Section 2.2, Section 3.1) “This moudle consists of two parts/grading moudle” -> “module” etc.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The unsupervised transfer task is of great interest, but the manuscript could do with additional clarification, and experiments relating to the direct evaluation of UWF segmentation & against a supervised baseline.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    Authors have addressed previous comment to an extent, but more empirical evidence on baselines/impact of CFP image quantity would make the study more convincing.



Review #4

  • Please describe the contribution of the paper

    This paper proposed a lesion-aware transfer learning framework for diabetic retinopathy grading in ultra-wide-field. As well as a lesion external attention module was designed to transfer features between two modules.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Migrating features from easy-to-obtain data to hard-to-obtain data analysis is a great idea to mitigate the impact of data/label lacking.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. As introduced in section 2.2, the supervision of target lesion segmentation is the L_adv which is too “weak” to provide sufficient segmentaion guidance and the CFP and UWF have distinct domain variance (proved in Table 1) like intensity distribution to accelerate the convergence of the loss. How the L_adv is calculated? Moreover, the input CFP and UWF are unpair images, how to ensure that the lesion generation module can provide accurate lesion information for UWF image? 2. The LEAM is actually a channel-attention and spatial-attention module for feature fusion, which is not new since previous works have many similar designs. In addition, as claimed in the paper, the LEAM works as a bridge to supplement lesion features to the DR grading module. How to make sure that the input feature maps of LEAM can capture precise segmentation representations of the UWF without any corresponding supervision? 3. Why does the training performance get worse when more CFP images are involved in training as shown in Table 2 (15000 images are involved)? 4. In some extreme cases, the proposed framework can be regarded as a model trained with labeled CFP and tested on unlabeled UWF, a large amount of labeled CFP images can introduce enough prior features for accurate unsupervised prediction of UWF. Then how to prove that the LEAM has key positive impacts on the UWF prediction not other components (like concat operation in fig1 and adversarial loss)? 5. Please keep the symbols in the figure and text the same (L_seg in fig1 and L_CE in equation 1 are the same meaning?).
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method of this paper is relatively detailed and can be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Firstly, it is recommended to use more public datasets for validation of the proposed method. Secondly, the innovation of the method needs to be improved. Finally, more comparative experiments should be conducted to demonstrate the effectiveness of the proposed method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribution of this paper is not enough, and the experimental verification is not enough.

  • Number of papers in your stack

    1

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The author addresses part of my concerns in the rebuttal, so this paper can be considered for acceptance. However, the organization of this paper (motivation and experiment details) still needs careful revision. More citations and explanations are still needed on the effectiveness of using adversarial supervision. In the comparison experiments, the specific settings of each experiment and the difference from other experiments need to be explained more clearly.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper receives mixed review comments. Reviewers acknowledge the importance of studied problem and the interest to the MICCAI community. However, reviewers also raise some concerns for the effectiveness of proposed framework and experimental results. Therefore, the authors are invited for a rebuttal to address reviewer’s comments. Especifically, the authors should pay more attention to the following points.

    1. Discuss the loss function design for UWF lesion detection and the effectiveness of the proposed adversarial module.
    2. The direct/partial evaluation of lesion segmentation for UWF data and more analysis on the UWF lesion detection results.
    3. Clarification of the design of LEAM.
    4. Discuss the influence of different CFP images.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank the reviewers for taking time and we appreciate their positive supports to our technical novelty (e.g., “… has merits and is of interest to the community” by R1, “… is a great idea to mitigate the impact of data lacking” by R4)and effectiveness of method (e.g., “…ablation studies validate the effectiveness of the proposed modules “ by R1 and R3). Here we provide our point-to-point responses to address their concerns. Q1: The adversarial loss may be too weak (R1, R4). A1: 1) The adversarial loss is used to ensure the UWF lesion map predicted by the generator is close to the CFP lesion map. It enables the unsupervised learning to use the BCE loss on unlabeled UWFs by taking advantage of the labeled CFP lesion information. Similar ideas have been successfully applied to unsupervised semantic segmentation [26] and medical image segmentation [A]. 2) To demonstrate the effectiveness of the lesion generation module and the adversarial loss, we remove the adversarial loss while training the lesion generator. The results show the trained generator leads to a decrease of 2.14% in the accuracy compared to the M_ULTRA method shown in Tab.1. [A] Unsupervised domain adaptation in brain lesion segmentation with adversarial networks, 2017: 597-609.

Q2: No direct evaluation and analysis of UWF lesion segmentations (R1, R2, R4). A2: In Fig.2, we show the lesion maps (green: MA, purple: SE, yellow: HM, red: HE). Since it is an auxiliary task without pixel-level UWF lesion labels, we indirectly evaluate the detection results through ablation experiments shown in Tab.1 (ACC: M_Lesion:64.54% VS. M_Transfer:62.33%), indicating the effectiveness of the proposed lesion module for grading.

Q3: The design and effectiveness of LEAM (R1, R4). A3: 1) The disease grading task is not only restricted by the multiple lesion types of different clinical significances, but also suffered from the complicated background artifacts (e.g., eyelash and eyelids) and noise from UWF images, particularly in an unsupervised manner. Thus, it is desirable to utilize the filtered lesion attention map to reweight the grading module to capture more lesion features related to specific DR severity levels. 2) To demonstrate the effectiveness, we apply simple concatenation by removing the LEAM. However, the reduced M_ULTRA performs worse when compared with the original network. In Tab.1, we can observe that the integrated LEAM brings significant gains (M_Lesion:64.54% VS. M_ULTRA:67.04%) to the grading model, which verifies the effectiveness of LEAM.

Q4: Baseline selection, grading results are not quite high (R2, R3). A4: 1) Regarding the concern of missing baselines, we have actually performed such baseline comparisons in Tab.1. Specifically, compared with M_CFP (ACC:28.81%) and M^_UWF (ACC:63.43%) in Tab.1, our method shows better performance of ACC: 70.08%. 2) Regarding the concern of low grading performance, there exist quite limited works specialized on UWF grading and all these methods achieve relatively low grading performance. The highest one (ACC:63.16%) was reported by the supervised CycleGAN method [12]. The main cause for the relatively low performance in general is due to the limited UWF training data and the imbalanced data distribution, particularly for severe cases.

Q5: The influence of different CFP image numbers (R2, R4). A5: We believe that the main reason for performance degradation when increasing the number of CFP images (from 8000 to 15000) is due to data imbalance. The dataset with 8000 images is divided into about 1600 images per category, while the dataset with 15000 images has the following distribution, i.e., Normal: 4026, NPDRI: 3875, NPDRII: 4243, NPDRIII: 1514, and PDR: 1342, which may easily lead to model bias during training.

Q6: Training setting (R2). A6: 1) CFPs and UWFs are randomly selected not required to be paired for training in our method. 2) Segmentation and grading module were trained separately in our framework.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal address most of the comments of the reviewers and the AC votes for accepting this paper. But the presentation of this paper should be largely improved according to the comments of the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors submitted a strong rebuttal, which answered reviewer questions and eased concerns through pointing out specific tables and figures that could address those concerns. Post rebuttal, Reviewer 2 raised rating from reject to weak reject, and Reviewer 4 raised rating from weak reject to weak accept. Considering the three weak accept recommendations post rebuttal, I lean to accept the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a clear argumentation in their rebuttal and point to relevant points of their manuscript for clarification. As it tackles most of the main points highlighted during the review, this paper would deserve acceptance. However the camera ready version should tackle the problems of clarity underlined previously

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top