Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Bingzhi Chen, Zhanhao Ye, Yishu Liu, Zheng Zhang, Jiahui Pan, Biqing Zeng, Guangming Lu

Abstract

Deep learning-based AI diagnostic models rely heavily on high-quality exhaustive-annotated data for algorithm training but suffer from noisy label information. To enhance the model’s robustness and prevent noisy label memorization, this paper proposes a robust Semi-supervised Contrastive Learning paradigm called SSCL, which can efficiently merge semi-supervised learning and contrastive learning for combating medical label noise. Specifically, the proposed SSCL framework consists of three well-designed components: the Mixup Feature Embedding (MFE) module, the Semi-supervised Learning (SSL) module, and the Similarity Contrastive Learning (SCL) module. By taking the hybrid augmented images as inputs, the MFE module with momentum update mechanism is designed to mine abstract distributed feature representations. Meanwhile, a flexible pseudo-labeling promotion strategy is introduced into the SSL module, which can refine the supervised information of the noisy data with pseudo-labels based on initial categorical predictions. Benefitting from the measure of similarity between classification distributions, the SCL module can effectively capture more reliable confident pairs, further reducing the effects of label noise on contrastive learning. Furthermore, a noise-robust loss function is also leveraged to ensure the samples with correct labels dominate the learning process. Extensive experiments on multiple benchmark datasets demonstrate the superiority of SSCL over state-of-the-art baselines.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_54

SharedIt: https://rdcu.be/dnwdB

Link to the code repository

https://github.com/Binz-Chen/MICCAI2023_SSCL

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a semi-supervised contrastive learning (SSCL) framework to tackle label noise in medical imaging tasks. The framework has three components. First, the Mixup Feature Embedding, an encoder with three different output branches from different Mixup augmentations of the input images, a projection head, a classifier head, and an information bottleneck (from momentum encoder). Second, the semi-supervised learning module outputs pseudo-labels and enables to have confidence value for (noisy) labels. Third, the similarity contrastive learning module includes a contrastive loss to enforce pairs of images with high label confidence to have similar representations. The proposed framework is evaluated on classification tasks on CIFAR-10, CIFAR-100, and two medical datasets, ISIC-19 and BUSI, with different noisy labels ratio (20%, 30%, 50%, and 80%). Authors compare to the supervised baseline and four comparison methods and show competitive results, especially with high noise ratio.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Organization is easy to follow, and overall the paper is well-written;
    • Strong evaluation of results on two public medical datasets, as well as CIFAR-10, and CIFAR-100, in comparison to current SOTA methods and a set of ablation studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of discussion on the limitations of the proposed method. The approach has three components, four loss terms and it has been tested on rather big 2D datasets. Authors could further comment on the computational burden of the method, its behaviour in the context of limited data or limited labels.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is well-described; some hyperparameters are not part of the ablations (omega, mu). Training hyperparameters are described in supplementary materials, and code will be made available upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Major comments:

    • Contribution #3: Extensive experiments and comparisons is not a contribution. Authors should specify to what extent those experiments demonstrate any characteristics of the proposed approach.
    • In section 3, some mathematical notations are missing definitions; some are present in Figure 1 and not defined in the main text and vice versa (See also some minor comments).
    • The total loss used is the sum of 4 terms; authors defined only one regularization parameter for L_IB, why only for this term? Would the method benefit from other loss weights?
    • In Figure 3, authors compare their obtained representation with the one obtained from “vanilla ResNet-18” Authors need to specify what “vanilla” corresponds to, pretrained how? on ImageNet? Supervised with the noisy labels?

    Minor comments:

    • Figure 1: caption lacks description; authors could add at least one sentence to describe each proposed module.
    • Eq. (2): thetas are not defined; I guess they represent the encoders parameters of the “K” and “V” outputs, respectively.
    • Eq. (8): scalar product/cosine similarity notation is not defined.
    • No reference is made to additional results on the supplementary materials.
    • Section 4.2, line 3: The reference to Table 1 is wrong.
    • Section 4.3, line 1: “the threshold t” typo is “tau” from Eq.(7)?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the paper is well-written, methology is clear. Experiments demonstrate a strong evaluation of the proposed approach with extensive experiments and relevant comparison to prior work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Authors’ rebuttal provided some clarifications about computational costs and explanations of the experimental results. However, answers are mainly assertions without providing additional concrete elements, e.g.:

    • “We intentionally focused on optimizing the regularization parameter for IB loss to reduce the complexity associated with parameter selection. This setting can achieve the optimal trade-off between model complexity and generalization performance.” A (tiny) ablation would be needed to justify this statement.
    • “Gamma is a dynamic threshold that is applied to ensure that top-k confident instances belonging to the n-th class are selected, k is determined by the predefined noise rate.” Then what is the rule for choosing k from the predefined noise rate? + In real scenarios, the noise rate may not be known. So I have kept the same rating after the rebuttal.



Review #2

  • Please describe the contribution of the paper

    The paper aims to propose a robust framework for combating medical label noise. The proposed framework utilized several established techniques: MixUp, Semi-supervised learning (specifically pseudo-labeling) and MoCo-like contrastive learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper tries to address an important problem for the medical imaging community
    2. The evaluation section is strong. The proposed method is compared to multiple strong baselines on multiple datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The performance gain from the proposed method is only moderate, especially considering the complexity of the method (for example, the total loss consists of four different losses, in practice we might imagine the need to carefully balance the weights of the different losses when given a problem or dataset). However, as shown in Table 1, the proposed method is better than thesecond best method by less than 1.5% for about 2/3 of the times, especially when NR is in range (0.2-0.5) which i believe is a more practical than noise ratio of 0.8.

    2. The confidence threshold gamma is said to be ‘dynamic’ in Sec 3.2, but it is not describe in the paper how this is chosen/adjusted/or learned.

    3. The paper lacks a clear explanation of the intuition behind several important design choices. For example, why do we think the IB loss and the queue loss have anything to do with combatting label noise?

    Other minor issues:

    1. The author said in the very begining that there are advances in combating conventional label noise but few studies dedicated to medical label noise. However throughout the paper, the author did not explain what is special for ‘medical label noise’ versus ‘conventional label noise’
    2. The paper said to be focus on medical label noise, but CIFAR10 and CIFAR100 is used for benchmarking?
    3. The paper did not describe what dataset is used in Fig2(a), plus proper figure caption is missing.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While i appreciate the paper consider multiple baselines, I couldn’t find sufficient experiment details in both the main paper and supplement for these compared algorithms. For example, did you do a hyperparameter search for each algorithm? did you try to ensure the hyperparameter search is relatively fair for each compared algorithm? Did you use the same backbone or ensure the backbone used is alsor relatively fair for different algorithms?etc.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I believe the followings are some straight forward steps to make the paper stronger: 1.better explaining the intuition of different design choice 2.include propoer figure captions to make figures self-contained;

    clarification question:

    1. Is there typo in notation of equation 4? how should i read this equation?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite its contributions, the paper has several limitations that need to be addressed. One of the main weaknesses is the only moderate performance gain from the proposed method, particularly given the method’s complexity. The performance improvement is less than 1.5% for about two-thirds of the cases compared to existing methods especially in realistic noise ratio setting. Additionally, the paper lacks a clear explanation of several important design choices. Furthermore, there are some important details missing, for example how the dynamic confidence threshold gamma is chosen and adjusted (if regarded as hyperparameter), or learned (if regarded as parameter). What is special about medical label noise than conventional label noise etc. (See details above)

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    the author’s rebuttal partially address my concerns. I keep my score unchanged



Review #3

  • Please describe the contribution of the paper

    This paper proposes a new framework to benefit training with noisy data. The method combines three innovation modules: 1) Mixup feature Embedding module (MFE) for enhancing the image representation; 2) Semi-supervised Learning module (SSL) for refining the results using pseudo label; 3) Similarity Contrastive Learning module (SCL) for capturing confident pairs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a new framework to benefit training with noisy data. The method combines three innovation modules: 1) Mixup feature Embedding module (MFE) for enhancing the image representation; 2) Semi-supervised Learning module (SSL) for refining the results using pseudo label; 3) Similarity Contrastive Learning module (SCL) for capturing confident pairs. This article is meaningful, with sufficient experimentation and structured writing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The topic of the article is very meaningful, but the experiment seems to lean more towards natural datasets than medical datasets

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experiment should be reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. What is the difference between ‘strong Aug “and’ weak Aug”? Is it controlled by the parameter \Lambda only? Are the samples of x_a and x_b the same in this case?

    2. Why does the author name the mini batch of image-label pairs as q, k, v? This seems to be the name of basic element in transformer, but it seems different.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This article is meaningful, with sufficient experimentation and structured writing.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a semi-supervised contrastive learning (SSCL) framework to tackle label noise in medical imaging tasks. This paper received mixed ratings. Please the authors address the questions/concerns in the rebuttal, mainly for (1) computational costs as pointed by R#1; (2) explanation of the experimental results as pointed by R#2 and R#3.




Author Feedback

We express our gratitude to all the reviewers for their insightful feedback. We sincerely appreciate their recognition of our paper on the following points: (1) Novelty and Contribution: “Strong paper with minor weakness” and “A new framework to benefit training with noisy data”. (2) Meaningful Topic: “The topic is very meaningful”. (3) Clear Organization: “The paper is well-written”. and (4) Strong Validation: “Strong evaluation of results on two public medical datasets”. Furthermore, we would like to address the questions raised by the reviewers:

Response to Meta-Reviewer #1:

(1) Computational Costs Analysis (Reviewer#1): Following the existing works, we exclusively employed the ResNet-18 architecture or its variants as the backbone network in our method, to mitigate the demand for computational resources. Compared with the state-of-the-art works, i.e., CTRR and Sel-CL, our model achieves a reliable improvement while maintaining a similar complexity level, without adding any additional computational demands (Params: 61.13M, Test time for each image: 0.014 seconds). Therefore, the computational cost of our approach is considered reasonable and acceptable.

(2) Explanation of Experimental Results (Reviewer#2): To demonstrate the robustness and generalization of our approach in learning with noisy labels, we have conducted extensive experiments on both medical and conventional noisy datasets. Our approach exhibits notable advancements in classification accuracy compared to the current best-performing method, with an average improvement of 1.2% on ISIC19 and an average improvement of 2.6% on BUSI. Moreover, our method consistently outperforms state-of-the-art methods on CIFAR-10 and CIFAR-100 datasets.

(3) Clarification of Experimental Details (Reviewer#3): In Eq. (1), lambda serves as a control parameter to adjust the mixing strength of training samples from each mini-batch, i.e., x_a and x_b. This mechanism enables the generation of a diverse set of mixup augmentations, including weak-augmented and strong-augmented images.

Additional Response to Reviewer#1 (Weak accept; Confident, Ranking: 1/4)

(1) Vanilla Resnet-18: The “vanilla” ResNet-18 model was initially pre-trained on the ImageNet dataset and subsequently fine-tuned using the noisy labels, ensuring compatibility with our approach.

(2) Regularization of IB loss: We intentionally focused on optimizing the regularization parameter for IB loss to reduce the complexity associated with parameter selection. This setting can achieve the optimal trade-off between model complexity and generalization performance.

Additional Response to Reviewer#3 (Strong accept; Very confident; Ranking: 1/5)

(1) Clarification of Notations: The representation of the mini-batch of image-label pairs as q, k, and v is purely a notation choice. It does not have any direct association with the Transformer architecture.

Additional Response to Reviewer#2 (Confident but not absolutely certain)

(1) Improvement in Range of 0.2-0.5: Through extensive experiments on various medical datasets, we observed superior performance in practical scenarios where noise levels ranged from 0.2 to 0.5, which can prove the superiority of our approach.

(2) Dynamic Threshold of Gamma: Gamma is a dynamic threshold that is applied to ensure that top-k confident instances belonging to the n-th class are selected, k is determined by the predefined noise rate.

(3) Design of Queue and IB: The main core of the queue is to promote contrastive learning performance, while IB is used to encourage the model to capture more reliable and discriminative features, thereby mitigating the detrimental effects of label noise.

(4) Fairness in Hyperparameter Settings: To make a fair comparison, we have carefully selected the same backbone network and the optimal hyperparameter for each algorithm.

(5) Typo in Eq. (4): We have revised Eq. (4): D_n^c = {(x_i, y_i) y_i · p_i > γ_n}, ensuring its correctness in our manuscript.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While not all concerns raised by R#2 are addressed in the rebuttal, it appears that the key questions have been answered. Accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studied the noisy label learning problem in medical images and proposed a semi-supervised contrastive learning method to select and generate reliable pseudo-labels and select confident pairs for contrastive learning. The method is evaluated on ISIC-19, BUSI datasets, CIFAR-10, and CIFAR-100 datasets, obtaining better results than baselines. The rebuttal provided clarifications about some important details of the work. The authors are encouraged to provide more explanations about how the dynamic threshold is determined for each class, i.e., is there an equation for computing the dynamic threshold?



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After carefully reviewing the authors’ feedback and the final decisions of the reviewers, it is evident that the majority of the reviewers lean towards accepting the paper, recognising the value of the work and its experimental results.

    However, one reviewer remains inclined towards rejection, albeit acknowledging the contribution of the paper. This reviewer concerns regarding limited performance gain and some lack of clarity in the design choices.

    Upon considering the reviewers’ feedback and opinions, the Meta Reviewer observes a generally positive reception of the paper and agrees with this sentiment. The majority of reviewers acknowledge the value of the work and the experimental results, which indicates a strong case for acceptance.



back to top