Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shuai Cheng, Qingshan Hou, Peng Cao, Jinzhu Yang, Xiaoli Liu, Osmar R. Zaiane

Abstract

Early diagnosis and screening of diabetic retinopathy are critical in reducing the risk of vision loss in patients. However, in a real clinical situation, manual annotation of lesion regions in fundus images is time-consuming. Contrastive learning(CL) has recently shown its strong ability for self-supervised representation learning due to its ability of learning the invariant representation without any extra labelled data. In this study, we aim to investigate how CL can be applied to extract lesion features in medical images. However, can the direct introduction of CL into the deep learning framework enhance the representation ability of lesion characteristics? We show that the answer is no. Due to the lesion-specific regions being insignificant in medical images, directly introducing CL would inevitably lead to the effects of false negatives, limiting the ability of the discriminative representation learning. Essentially, two key issues should be considered: (1) How to construct positives and negatives to avoid the problem of false negatives? (2) How to exploit the hard negatives for promoting the representation quality of lesions? In this work, we present a lesion-aware CL framework for DR grading. Specifically, we design a new generating positives and negatives strategy to overcome the false negatives problem in fundus images. Furthermore, a dynamic hard negatives mining method based on knowledge distillation is proposed in order to improve the quality of the learned embeddings. Extensive experimental results show that our method significantly advances state-of-the-art DR grading methods to a considerable 88.0%ACC/86.8% Kappa on the EyePACS benchmark dataset. Our code is available at https://github.com/IntelliDAL/Image.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_63

SharedIt: https://rdcu.be/dnwMn

Link to the code repository

https://github.com/IntelliDAL/Image

Link to the dataset(s)

https://kaggle.com/competitions/diabetic-retinopathy-detection


Reviews

Review #1

  • Please describe the contribution of the paper

    the authors propose an approach to deal better with the problem of false negatives in DR in CL “due to the lesion- specific regions being insignificant in medical images, directly introducing CL would inevitably lead to the effects of false negatives”.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Trying to improve the problem of false negatives. Both false negatives and false positives are indeed an accute problem in lesion detection from eye fundus images

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the system is confusing. Instead of a coherent description of a system, the authors try to describe it based on details but a lot of glue is missing in the description. Furthermore, they use terms that they do not introduce before using, and assume whole parts are known, so that it becomes difficult to appreciate the system; Furthermore, the improvement is quite residual compared with the closest competitors.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The system is not sufficiently well describe, which means it is difficult to replicate. The dataset used is public, meaning that from that perspective it is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    “Extensive experimental results show that our method significantly advances state-of- the-art DR grading methods by a considerable 88.0%ACC/86.8% Kappa on the EyePACS benchmark dataset.”

    -> advances by 86.8%, shouldn’t its be advances TO that value?

    “leveraging only the image-level lesion annotation hinders deep learning algorithms from extracting features of suspicious lesion regions, which further affects the diagnosis of diseases. For these reasons, some previous work [17,19] considers the introduction of pixel-level lesion annotation to improve the model’s feature extraction capability for lesion regions.”

    -> explain what you mean by image-level lesion annotation; Is pixel-level lesion annotation a mask with lesion id per pixel? This needs clarity

    “Despite the patch-level methods have achieved promising results, the large-scale pixel-level annotation process is time-intensive and error-prone which imposes a heavy burden on the ophthalmologist.”

    -> explain what do you mean by patch-level methods; do they use pixel-level annotation, as it seems from this phrase? You need to explain things much better…

    ” we first capture lesion regions in fundus images using a pre-trained lesion detector. Based on the detected regions, we construct a lesion patch set and a healthy patch set, respectively. Then, we develop an encoder and a momentum encoder [6] for ex- tracting the features of positives (lesion patches) and negatives (healthy patches).”

    “The introducing momentum encoder enables the contrastive learning to main- tain consistency in critical features while creating different perspectives for the positive samples.”

    -> very strange phrase, what do you mean by “the introducing”???

    “Secondly, considering that the critical role of hard negatives in the contrastive learning, we formulate a two-stage scheme based on knowledge distillation [8,21] to dynamically exploit hard negatives, “

    -> that the??? wrong phrasing

    “In stage 1, we con- struct positives and negatives based on a pre-trained lesion detector pre-trained on a auxiliary dataset (IDRiD [15]) with pixel lesion annotation, to avoid the effect of false negatives on the learned feature embeddings while aligning samples with similar semantic features.”

    -> need to elaborate more on how the lesion detector works, what is its structure and how it works… e.g. is it based on patches, or what?

    “In stage 2, a dynamically sampling method is developed based on knowledge distillation to effectively exploit hard negatives and improve the quality of the learned feature embeddings. “

    -> we need a lot of detail on this…

    “In the last stage, we fine-tune our model on the downstream DR grading task. “

    “to bridge the gap between local patches in the upstream task and global images in the downstream task, we introduce an attention mechanism on the fragmented patches to highlight the contributions of different patches on the grading results.”

    -> ok, elucidate what you mean by downstream and upstream

    Explain adequately what the momentum encoder is doing there, how it works with the encoder and why …

    In fig 1 stage 3 grading seems to use nothing from the previous stages, you need to clarify that…

    “Specifically, given a training dataset X with five labels (1-4 indicating the increasing severity of DR, 0 indicating healthy). “

    –> very strange end of phrase!!!

    “We first divide dataset X into lesion subset XL and healthy subset XH based on the disease grade labels of X.”

    -> be more specific, is that label 0 versus the rest?

    “Then, we apply a pre-trained detector fdet(·) only on XL and obtain high-confidence detection regions.”

    -> yes, but how does it do that? It seems to be a region detector, which, how does it work?

    “N = Randcrop(XH)…”

    -> its not clear to me why you state that you get negatives from the lesion images????

    “hard negatives,”

    -> please, define hard negatives…

    “and we adjust the update mechanism of the negatives queue(i.e. only enqueue and dequeue N to avoid confusion with P) to better adapt contrastive learning to the medical image analysis task.”

    -> what queue, exactly what are you talking about?

    “Training the Teacher Network. With the positives P, we obtain two views ̃ ̃′′′′′ P = {p ̃ ,p ̃ ,p ̃ …p ̃ } and P = {p ̃ ,p ̃ ,p ̃ …p ̃ } by data augmentation(i.e. color 123j 123j distortion rotation, cropping followed by resize). Correspondingly, with the neg- atives N, to increase the diversity of the negatives, we apply a similar data ̃ augmentation strategy to obtain the augmented negatives N = {n ̃ , n ̃ , n ̃ …n ̃ } ̃ ̃′ ̃ 123k (where k ≫ j). We feed P and P + N to the encoder En(·) and the momen- tum encoder MoEn(·) to obtain their embeddings Z = {z ,…,z |z = En(p ̃ )}, … we calculate the positive and negative similarity matrix by the samples of Z, ′ ̃ Z and Z. According to the similarity matrix, the contrastive loss Lcl-t of the teacher model training process can be defined as:”

    -> you need to introduce each thing you talk about. Please describe what the momentum network is doing, describe how, go on to justify your teacher network and your student network (why, what, how) and so on.

    “the hard negatives may exhibit more semantically similar to the positives than the normal nega- tives, indicating that hard negatives provide more potentially useful information for facilitating the following DR grading. “

    -> you need to explain, prior to anything, what hard negatives are and how to get them

    “Meanwhile, the number of hard negatives significantly affects the difficulty of training the model, in other words, the network should be capable of dynamically adjust the optimisation process by controlling the number of hard negatives.”

    -> why, justify or provide some reference…

    “According to the negative similarity matrix Atn produced by the teacher model, we prioritise the negatives that are likely to be confused with the positives in descending order and only select the top δ samples for distillation learning during the student model’s training phase. “

    -> the teacher model produces a negative similarity matrix, but you do not describe the workings, how that is done…

    “Considering that the proposed contrastive learning framework is trained with patches, whereas the downstream grading task relies on entire fundus images, an additional attention mechanism is incorporated to break the gap between the inputs of upstream and downstream tasks. Specifically, we first fragment the entire fundus image into patches …”

    -> since the whole approach is ill-described, this is not clear…

    “The results show that our framework presents a notably better DR grading performance than the SOTA methods due to improve quality of the learned lesion embeddings by eliminating the false negatives and dynamically mining hard negatives, and in turn enhancing the lesion-awareness of CL, which is beneficial for DR grading.”

    -> Table 1 experimental results seem to show that your approach does not improve significantly on some of the others, something like 0.863 to 0.868 (or also 0.859 to 0.868). And what is the statistical significance of that very small difference, since you do not have statistical significance info? Therefore your statement seems to be contradicted by the results themselves…

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is due to the issues already noted in the “weakest” part. The description of the system is confusing. Instead of a coherent description of a system, the authors try to describe it based on details but a lot of glue is missing in the description. Furthermore, they use terms that they do not introduce before using, and assume whole parts are known, so that it becomes difficult to appreciate the system; Furthermore, the improvement is quite residual compared with the closest competitors.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper contributes a lesion-aware contrastive learning (CL) framework for Diabetic Retinopathy (DR) grading. This framework enhances the model’s ability to focus on lesion regions in medical images and mitigates the impact of false negatives on contrastive learning. The approach combines the construction of positives and negatives (CPN) and dynamic hard negatives mining (DHM) to improve the quality of learned feature embeddings and enhance lesion-awareness in the model, resulting in significantly improved DR grading performance on benchmark dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of using a lesion aware CL framework using SimCLR was introduced in “Huang, Y., Lin, L., Cheng, P., Lyu, J., Tang, X.: Lesion-based contrastive learning for diabetic retinopathy grading from fundus images. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. pp. 113–123. Springer (2021)”. However, this paper propose some new features:

    • The paper presents a novel lesion-aware contrastive learning (CL) framework that enhances the model’s ability to focus on lesion regions in medical images, improving the DR grading performance on patched images compared to existing methods.
    • The proposed method overcomes the false negatives problem by reconstructing positives and negatives, which helps the model to better represent lesion features in the images.
    • The paper introduces a dynamic hard negatives mining scheme based on knowledge distillation, which enables the model to dynamically exploit hard negatives and transfer the learned knowledge to the student network. This approach improves the quality of learned feature embeddings and enhances the lesion-awareness in the model.
    • The authors perform comprehensive evaluations on a benchmark dataset, comparing their method with various state-of-the-art DR grading methods. The results demonstrate that the proposed framework significantly outperforms other methods in terms of both Kappa and Accuracy metrics.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper does not provide enough detail on the pre-trained detector.
    • The authors didn’t mentioned the number of extracted patches using the pre-trained detector.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient details regarding the methods, dataset, and experimental setup, which contributes positively to the reproducibility of the study. The authors have also provided the source code of this method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The experimental results demonstrate the effectiveness of the proposed framework. Overall, the paper is well-written and the methodology is sound.

    • You briefly mention that your approach can be easily applied to other medical image analysis tasks. It would be helpful if you could provide some examples or case studies that demonstrate this migratability, as this would showcase the versatility and potential impact of your method across different applications.
    • The paper could benefit from providing results of the linear and fine-tuned evaluations of the CL model.
    • You need to give more details on the pre-trained detector. and the number of extracted patches using this model.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend an acceptance for this paper with a score of 6 because it presents a novel lesion-aware contrastive learning framework for diabetic retinopathy grading. The proposed method demonstrates significant improvements in performance compared to state-of-the-art methods, which is a strong aspect of this work. The ablation studies provide valuable insights into the contributions of different components of the proposed framework, and the experiments are well-designed and thorough. Despite the above mentioned weaknesses, I believe the merits of this work outweigh its weaknesses, and the paper presents a valuable contribution to the field of diabetic retinopathy grading using contrastive learning.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The authors proposed a lesion-aware contrastive learning framework for Diabetic Retinopathy grading applied to the EyePACS dataset with an accuracy of 88% and Kappa coefficient of 0.868.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The experimental setup, the comparison with state-of-the-art methods and the outstanding results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The abstract must be too short and clear, not a second introduction.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper could be easily reproducible with the parameters and information reported in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper and results are really good, but the authors should re-write the abstract and release code to ensure the reproducibility of the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental setup, the dataset used in the paper, the outstanding results and the comparison with state-of-the-art methods are the major factors to recommend this paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposes a lesion-aware contrastive learning (CL) framework for Diabetic Retinopathy (DR) grading. The paper is sound well and shows promising results. However, there are many concers arised in the reviews. R1 gives the negative review and points out many drawbacks. Those are mainly focusing on the missing details and unclear representation. The authors should address the concerns in rebuttal.




Author Feedback

We are very grateful for the efforts of the reviewers and meta-reviewer in evaluating our work, we are happy that reviewers acknowledge the valuable contribution of the paper: the paper is well-written and the methodology is sound (R2), the paper and results are really good (R4), and the paper is sound well and shows promising results (MR). We address reviewer (R#) question (Q#) on weakness, grouped topically: 1)Performance improvement: Little improvement on SOTA (R1-Q22). Discussion regarding the comparison among the SOTA, we included this discussion in the revision. Compared with the latest SOTA method [24], our method has the advantages: 1. Our method requires only image-level grading classification labels while [24] requires pixel-level lesion annotations which is hard to obtain. 2. Our method has significantly fewer parameters compared with [24] (Params size(MB): 97.49 vs. 416.01; FLOPs(G):48.39 vs. 199.37). No statistical significance (R1-Q22). We did not conduct significance testing due to the sufficiently large size of the test set (53,576 images), and this metric is also not reported in the current studies such as [9,10,12,19,20,24]. 2)Writing issues: i. To address grammar errors (R1-Q1,4,5,11) and long abstract (R4-Q1), we have double-checked the grammar of the manuscript and rephrased the abstract, we will submit it in the revised version. ii.We apologize for any confusion caused by our paper. a)The lesion annotation(R1-Q2). The ‘image-level lesion annotation’ was used incorrectly. It should be ‘image-level grading annotation’ which refers to the grading labeling for the entire image, which is typically used for grading tasks. The pixel-level lesion annotation refers to the labeling of individual pixel in images, which is typically used for segmentation tasks. b)The lesion detector(R1-Q6,13 & R2-Q3). The lesion detector is trained on the auxiliary dataset IDRiD[15] based on Faster R-CNN. Then, this detector is used to generate suspicious lesion patches in images from EyePACS for the subsequent CL. By setting the confidence threshold of the detector to 0.9, we obtained 46,412 suspicious lesion patches for CL. We will add more details in the final version. c)The downstream and upstream (or pretext) task(R1-Q8,21): the upstream task aims to train a general-purpose pre-trained model, and the pre-trained model is fine-tuned in the downstream to adapt to specific tasks. d)The relation between stage 3 and the stages of 1 and 2 (R1-Q10): in stage 3, we directly employ and fine-tune the Encoder-S of the student model pre-trained in stages 1 and 2, and we have also explained in Section 2.3 and Fig.1. e)The divided subsets (R1-Q12), yes, images with label 0 are assigned to the healthy subset XH, and the rest are assigned to the lesion subset XL. f)The hard negatives (R1-Q19), the introduction and references[2,16] regarding hard negatives have been provided in the first sections in paper. Moreover, we have conducted experiments, shown in Table1 of the Supplemental Materials to demonstrate the effectiveness of dynamically mining hard negatives in retinal images. In the following Table2, we further explored the influence of controlling the number of hard negatives. 3)Missing explanations of basic concepts(R1): We apologize for omitting some basic terminology in the field of contrastive learning and knowledge distillation due to the limited space. For these basic terms, we have cited relevant references when they first appear. For example: momentum encoder (Q9,17)[6], negative sample (Q14)[6], hard negatives (Q15,18)[2,16], negatives queue (Q16)[6], teacher model (Q20)[8,21]. 4)Reproducibility(R4) We have uploaded the code to the CMT system, but we removed the code link from the original paper due to MICCAI anonymous policy. The link will be provided in the final version of the paper. Once again, we thank the reviewers for their valuable feedback and we will try our best to polish our writings to clarify the possible ambiguities.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has well addressed the concerns arised by the reviewers. I recommend accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper is generally well written with the methods being easy to understand and follow. The authors have addressed the reviewers’ concerns in the rebuttal. It reaches the minimum requirement for publication.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Pros:

    • Topic: The proposed method presents a novel frame work for DR grading task compared with various SOTA methods.
    • Novelty: Several novel improvements are proposed to reduce false negative rate. Cons:
    • Clarity: Some details of the method is not well-described. After Rebuttal:
    • the authors failed to convince the reviewer gave low score, but to me, the clinical need and novelty is sufficient for a conference paper;
    • the two positive reviews are general consistent to acknowlege the contribution of this work



back to top