Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xuan Chen, Weiheng Fu, Tian Li, Xiaoshuang Shi, Hengtao Shen, Xiaofeng Zhu

Abstract

The presence of corrupted labels is a common problem in the medical image datasets due to the difficulty of annotation. Mean- while, corrupted labels might significantly deteriorate the performance of deep neural networks (DNNs), which have been widely applied to medical image analysis. To alleviate this issue, in this paper, we propose a novel framework, namely Co-assistant Networks for Label Correction (CNLC), to simultaneously detect and correct corrupted labels. Specif- ically, the proposed framework consists of two modules, i.e., noise de- tector and noise cleaner. The noise detector designs a CNN-based model to distinguish corrupted labels from all samples, while the noise cleaner investigates class-based GCNs to correct the detected corrupted labels. Moreover, we design a new bi-level optimization algorithm to optimize our proposed objective function. Extensive experiments on three popu- lar medical image datasets demonstrate the superior performance of our framework over recent state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_16

SharedIt: https://rdcu.be/dnwAN

Link to the code repository

https://github.com/shannak-chen/CNLC

Link to the dataset(s)

https://www.isic-archive.com/

https://web.inf.ufpr.br/vri/breast-cancer-database/


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, authors propose a novel co-assistant framework for label correction of corrupted labels in medical image datasets. The proposed framework is composed of two modules, a noise detector and a noise cleaner.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed approach designs a noise detector to discriminate corrupted samples from all samples, and then investigates a noise cleaner to correct the detected corrupted labels. Formulation of the cross-entropy loss in the CNN is regularized using an extra loss to smooth the update of model parameters so that preventing model overfitting on corrupted labels to some extent.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    How could authors be confident about the new label correction? Assuming that all labels in the used raw datasets are clean is questionable and adding corrupted labels with different noise rates, too. Moreover, the proposed classification strategy is not guaranteed of a correct labeling.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Three public database are used to validate the proposed approach: BreakHis (breast cancer histopathological images), ISIC (skin images), and NIHCC (frontal-view X-ray images)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In this paper, authors propose a novel co-assistant framework for label correction of corrupted labels in medical image datasets. The proposed framework is composed of two modules, a noise detector and a noise cleaner.

    The paper is well written.

    Strengths The proposed approach designs a noise detector to discriminate corrupted samples from all samples, and then investigates a noise cleaner to correct the detected corrupted labels. Formulation of the cross-entropy loss in the CNN is regularized using an extra loss to smooth the update of model parameters so that preventing model overfitting on corrupted labels to some extent.

    Weaknesses How could authors be confident about the new label correction? Assuming that all labels in the used raw datasets are clean is questionable and adding corrupted labels with different noise rates, too. Moreover, the proposed classification strategy is not guaranteed of a correct labeling.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The regularization of the cross-entropy loss in the CNN using an extra loss to smooth the update of model parameters so that preventing model overfitting on corrupted labels to some extent.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a neural network architecture to detect and correct faulty labelling of medical images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It is certainly true that labelling medical images may be error prone and a system to detect and correct these errors appears interesting and useful.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The reason for a great deal of labelling uncertainly is that human experts do not agree. It is very common for mislabellings to be detected as errors by a classifier, but I see no great need to automate this process as the corrections need only be done once. Moreover, I would happy to have such outliers classified by a human.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This is fine

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Uncertainly in labelling is very common in medical images and often domain experts disagree. As medical data are typically extremely important and costly, I would feel more comfortable with human expert review. This process only needs to be performed once, so automation is not indicated.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Motivation for this work is not as clear as it should be. The analysis seems somewhat shallow.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a Co-assistant Network for Label Correction, which consists of noise detector and noise cleaner. Extensive experiments are conducted on three medical image classification datasets to verify the effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The author makes a detailed baseline study and clearly states the comparison among each baseline.
    2. The paper is clearly written and easy to follow.
    3. Ablation study is clear to verify the effectiveness of each components.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. For Noise Detector module, I’m curious about how to determine the hyper-parameter n1 as 5%. Since we don’t know the rate of noisy labels, set it as 5% is not reasonalble. Ablation study on n1 is neccesary for experimental setting.
    2. The description for semi-supervised learning in Section 2.2 is not clear. This paper aims to solve label-noise problem, but unlabeled data is involved. What’s the exact setting of this paper?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author provides sufficient implementation details which helps reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Since the total algorithm process is complex, more adequate ablation study is necessary.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel but experments can be more fully.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviews are mixed. The authors may answer the reviewers’ questions during rebuttal.




Author Feedback

We appreciate all reviews for their valuable comments. Our responses are shown as follows.

To R1: Q1. Assuming that all labels in the used raw datasets are clean is questionable. We agree with your viewpoint. However, to the best of our knowledge, all existing methods related to corrupted labels are based on this assumption[1]. Additionally, to better verify the effectiveness of our method, we adopt the popular datasets that have high-quality labels. For example, the labels in NIHCC were checked and corrected by at least three senior radiologists[2].

Q2. How could authors be confident about the new label correction? We take a two-step strategy to guarantee the quality of new labels. First, noise detector identifies the candidates to be relabeled which are most likely to be corrupted. Second, noise cleaner utilizes a reconstructed dataset instead of original labels to predict new labels of candidates.

[1] Han B, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels[J]. NeurIPS, 2018. [2] Tang Y X, et al. Automated abnormality classification of chest radiographs using deep convolutional neural networks. NPJ digital medicine, 2020.

To R2: Q1: Corrections are no great need to be automated. We do not agree with this viewpoint, because: First, annotating medical data by doctors is very expensive, time consuming and error prone, especially for large-scale medical data.

Second, the major goal of label correction is to boost the model performance, and detecting corrupted labels is only one contribution of our work.

Third, numerous previous works about label corrections have been published in top medical academic conferences and journals, such as Co-correction[3] and LCC[4], which suggest that label correction is very significant.

Q2: Motivation is not clear. In fact, the motivation has clearly shown in the second paragraph in the introduction section. Specifically, our work is to overcome the limitations of robustness-based methods and label correction methods, i.e., robustness-based methods cannot detect and correct the corrupted labels, while label correction methods fail to enhance the model robustness of itself and consider the relations among samples.

[3] Liu J, et al. Co-correcting: noise-tolerant medical image classification via mutual label correction[J]. TMI, 2021. [4] Guo K, et al. LCC: towards efficient label completion and correction for supervised medical image learning in smart diagnosis. J. Net. Comput. Application., 2019.

To R4: Q1: How to determine the hyper-parameter n1 as 5%? First, n1 denotes the percentage of noisy samples selected by the proposed method. In theory, it should less than actual noise rate (which is smaller than 50% for binary classification). In our experiment, we search the best n1 in [1%, 25%].

For example, on Breakhis at ε=0.4. We report the precision of selected samples that truly have wrong labels. There are 95.4% at n1=1%, 89.9% at n1=5%, 87.7% at n1=15%, 85.9% at n1=25%. Additionally, we also present the testing accuracy on Breakhis at ε=0.4. There are 83.9% at n1=1%, 85.6% at n1=5%, 84.2% at n1=15%, 75.1% at n1=25%.

Based on above results, when n1=5%, the proposed method obtains the best model performance because of adequate and accurately labeled samples. Similar observations can be found in the other datasets. Therefore, we select n1=5%.

Q2: Why unlabeled data is involved in a label-noise problem? First, unlabeled data are generated from noise detector, which divides samples into clean, uncertain and corrupted. Note that it is hard to tell whether uncertain data are clean or corrupted, hence, to boost the model performance of noise cleaner, we adopt the uncertain ones as unlabeled data.

Second, semi-supervised methods usually have better performance than the supervised method when using the same number of labeled training data, so we adopt the semi-supervised manner to train the noise cleaner by employing the unlabeled data.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers still have concern after rebuttal, and decide to reject the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    rebuttal



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposed a noisy label detection and correction framework with uncertainty. The main criticism from the reviewers was to justify the necessity and rationale of the design. The authors’ rebuttal covered the major points.



back to top