Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiaoqing Guo, Yixuan Yuan

Abstract

Noisy labels collected with limited annotation cost prevent medical image segmentation algorithms from learning precise semantic correlations. Previous segmentation arts of learning with noisy labels merely perform a pixel-wise manner to preserve semantics, such as pixel-wise label correction, but neglect the pair-wise manner. In fact, we observe that the pair-wise manner capturing affinity relations between pixels can greatly reduce the label noise rate. Motivated by this observation, we present a novel perspective for noisy mitigation by incorporating both pixel-wise and pair-wise manners, where supervisions are derived from noisy class and affinity labels, respectively. Unifying the pixel-wise and pair-wise manners, we propose a robust Joint Class-Affinity Segmentation (JCAS) framework to combat label noise issues in medical image segmentation. Considering the affinity in pair-wise manner incorporates contextual dependencies, a differentiated affinity reasoning (DAR) module is devised to rectify the pixel-wise segmentation prediction by reasoning about intra-class and inter-class affinity relations. To further enhance the noise resistance, a class-affinity loss correction (CALC) strategy is designed to correct supervision signals via the modeled noise label distributions in class and affinity labels. Meanwhile, CALC strategy interacts the pixel-wise and pair-wise manners through the theoretically derived consistency regularization. Extensive experiments under both synthetic and real-world noisy labels corroborate the efficacy of the proposed JCAS framework with a minimum gap towards the upper bound performance. The source code is available at https://github.com/CityU-AIM-Group/JCAS.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_56

SharedIt: https://rdcu.be/cVRwH

Link to the code repository

https://github.com/CityU-AIM-Group/JCAS

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The paper addresses medical image segmentation with noisy labels. It introduces a joint class-affinity segmentation model to consider both pixel-wise label correction and pair-wise pixel relations to reduce the label noise rate. A DAR is proposed for affinity reasoning and an affinity-based loss function is designed for regularization.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of exploring the pair-wise pixel interdependencies for reducing noise rate sounds reasonable. The experiments are extensive, and the results look good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. My major concern to the paper is regarding the affinity in Sec. 2. It seems to me that P’ is computed for each pair of pixels in an image, thus, it can measure both the intra-class and inter-class affinities. However, why is it claimed to only “reveal the intra-class affinity relations” in Sec. 2.1? It is also not clear how the reverse version P_re’ measures the inter-class affinity.
    2. The ablation study is not comprehensive. The necessities of intra-class and inter-class relation learning (Eqn.1) is not well investigated. The class-level loss correction and affinity-level loss correction are not respectively studied also.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The algorithm is simple and should be easily to reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    More detailed descriptions should be given to elaborate how P’ is computed. Why does it only reveal intra-class relation? And how will its reverse version captures inter-class relations?

    More ablative experiments should be conducted in terms of without intra-class relation learning, without inter-class relation learning, without class-level loss correction, and without affinity-level loss correction.

    Since P’ is normalized, it may be not necessary to normalize for P_re’.

    Fig. 2 should be improved. It is not easy to be recognized from the figure about how P’ is derived.

    Eq.1 is very similar with GCNs. Does it perform in an iterative manner or just one-step inference?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper overall is easy to understand and the results also look good. However, some important details (especially for the affinity computation) are not clearly presented and the ablative experiment are not through, making it hard to evaluate the effectiveness of some essential designs. Thus, in the current stage, I will recommend the paper “weak reject”.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal has made some clarifications on the details of the algorithm, which address my concerns. Thus, I raise my score to “weak accept”.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a new method for robust medical image segmentation under noisy labels. The core of the proposed method is a novel Joint Class-Affinity Segmentation (JCAS) framework, which takes into account both pixel-wise class and pairwise affinity supersivions. Specifically, to rectify the pixel-wise segmentation mask, a differentiated affinity reasoning (DAR) module is developed. To effectively train JCAS, a class-affinity loss correction (CALC) strategy is introduced to correct supervision signals. Experimental results show that the proposed method outperforms the previous works under both synthetic and real-world noisy labels on one dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper takes into account the pixel pairwise affinity relationships for dealing with noisy segmentation masks. This perspective is novel and interesting. (2) The proposed DAR module is novel and simple. (3) The Class-Affinity Consistency Regularization loss is novel. In particular, this loss is developed with theoretical rationale. (4) The proposed method achieves the best overall segmentation results compared to previous works. This demonstrates the advantages of the proposed designs. (5) The proposed method is well motivated. The use of pairwise affinity is reasonable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The writing of this paper needs to be improved. Some descriptions are unclear and somehow misleading. For example, the paper mentioned that the affinity map measures the similarity between two pixels and reveals the intra-class affinity relations. But it is unclear why this map captures the intra-class affinity relations. In other words, where do these relations appear? Does this map also capture inter-class relations? (2) Although it is demonstrated that the proposed CALC is effective in ablation study, it is unclear if the Class-Affinity Consistency Regularization is indeed effective. It is possible that only the class-level loss correction and affinity-level loss correct play roles in CALC. Additional ablation studies are necessary. (3) The class-level loss correction and affinity-level loss correction are not novel. They are from existing works. (4) The computation of pairwise affinity maps is quadratic, and thus could be expensive. In the experiment it is unclear if the pairwise affinity map would need a lot of computation and memory. Some statistical numbers would be helpful in illustrating this.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper provides some details on the method implementation. But such details still may not be sufficient to reproduce the method. It would be helpful if the authors would release their source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) There are typos in the paper. For example, in equation 1 the P(k1) should be P’(k1). On Page 7, “results in Fig. 3” -> “results in Fig. 4”. On page 8, “curves in Fig. 4” -> “curves in Fig. 5”. (2) Additional ablation studies are needed to illustrate the benefits of Class-Affinity Consistency Regularization. (3) The writing of the paper should be improved. For example, it is necessary to clarify the “inter-class” and “intra-class” concepts. Also, why would the affinity label reduce the noise rate (see Fig.1, page 2)? (4) For completeness, it is better to also compare the proposed method with [15] and [20] in the experiment.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall I think the proposed method is novel, in terms of motivation and method design. The proposed method is demonstrated to be more effective on one medical image dataset (although the number of datasets used is limited). However, this paper still lacks some critical ablation studies to justify its technical contributions.

    The major factors I take into account for scoring: novelty and experimental evaluation.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, I think the authors have addressed most of my concerns. Therefore I would like to increase my previous rating for this paper. My remaining concerns for this paper include:

    1. The writing of the paper is not clear in some places.
    2. The DAR module increases the computational cost by about 1.5 times in FLOPs, which is not “negligible”.



Review #4

  • Please describe the contribution of the paper

    This paper presents a novel segmentation framework using noise labels in the surgical instrument dataset. The proposed method is constructed by the affinity representation learning in the inter-class and intra-class manners. Extensive experiments demonstrated the effectiveness of the proposed method by outperforming comparison methods under various types of noisy labels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed Differentiated Affinity Reasoning and Class-Affinity Loss Correction modules are novel and effective.

    • The proposed framework is indicated to be effective under various types of noise labels.

    • Overall, the paper is well organized and clearly presented with nice illustration.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • For the overall loss function, the effectiveness of using different weighting factors has not been discussed.

    • Some references to the figures are incorrect. For example, in the last paragraph of Section 3, the Jac curves are shown in Fig. 5, instead of Fig. 4.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experimental details are clear. Code is not available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Please include a computational complexity analysis of the proposed method.

    • For the visualization comparison (Fig. 4 in the main paper and Fig. 1 in the supplementary material), it would be good to include some discussions on how the proposed method outperforms other comparison methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and has been indicated to be effective under various types of noise labels.

  • Number of papers in your stack

    8

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There are non-converging review recommendations. The authors are encouraged to address esp. the issues raised by the reviewers including the empirical evaluations (e.g. ablation tests on whether the Class-Affinity Consistency Regularization is effective) and presentation (e.g. how P’ & and its inverse P_re’ measure intro/inter class affinity), among others.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We sincerely thank all reviewers for their invaluable comments, and approval that our method is well-motivated and novel. The code will be published for reproducibility. The common questions are first answered, followed by responses to individual review comment.

Q1(AC&R2&R3): Clarifications about intra- and inter-class affinity relations. A1: The affinity map (P’) measures feature similarity between any two pixels. Since intra-class pixels share the similar semantic representations, intra-class pixel pairs usually show large similarity scores in P’, which highlights these pixel pairs belonging to the same class. Hence, P’ reveals intra-class affinity relation. In contrast, reversed affinity map that measures dissimilar degrees emphasizes pixel pairs belonging to different classes, and thus P_re’ reveals inter-class affinity relation.

Q2(AC&R2&R3): More ablation studies. A2: To comprehensively demonstrate each proposed component, we ablate intra-class affinity reasoning, inter-class affinity reasoning, class-level loss correction (LC), affinity-level LC, and class-affinity consistency regularization (CACR) under ellipse noise, obtaining 55.795%, 56.180%, 55.203%, 55.179%, 56.612% Jac. The performance of ablating each component is degraded compared to the 58.452% Jac achieved by our method, verifying the effectiveness of individual component in mitigating label noise issue in surgical instrument segmentation.

Q3(R3&R4): Computational complexity. A3: The introduction of differentiated affinity reasoning (DAR) module leads computation overhead to grow from 780.71MB (Baseline [3]) to 784.67MB memory and from 58.88G (Baseline [3]) to 74.84G FLOPs. It is clear that the increased memory overhead is negligible.

Q4(R2): Normalization of P_re’. A4: Since the summation of each row in P’ is normalized to 1, each row of 1-P’ should be summed to n-1 rather than 1. Thus, we apply normalization to 1-P’ to obtain P_re’.

Q5(R2): How does Eq.1 perform? A5: Eq. 1 performs in one-step inference.

Q6(R3): Reason for reducing noise rate by affinity label. A6: Affinity label indicates whether pixels in a pair share the same class. Therefore, even if one pixel or both pixels in a pair are mislabeled, the affinity label of this pair might still be correct, thereby reducing the noise rate. The example has been shown in Fig 1.

Q7(R3): Comparison with [15, 20]. A7: To demonstrate the effectiveness of JCAS framework in tackling label noise issue, we have quantitatively compared it with label noise algorithms [18, 22, 10] under ellipse noise, as in Table 1. Herein, we further reimplement two additional label noise methods [15, 20] using the same backbone [3] for a fair comparison with our JCAS. It is observed that JCAS performs favorably against [15, 20] under ellipse noise with improvements of 9.187% and 5.876% in Jac.

Q8(R3): The perspective of jointing pixel- and pair-wise manners to tackle label noise issue is novel and interesting. In methodology, DAR module and CACR loss are also novel. But novelty of class- and affinity-level LC is not clear. A8: As for class-level LC, we model the label noise distribution in noisy class labels via a NTM and exploit volume minimization [10] for NTM estimation to alleviate label noise issue. Different from [10] designed for image classification, our class-level LC is achieved in pixel level for image segmentation. As for affinity-level LC, we represent the first effort to exploit affinity relations between pixels within an image for noise mitigation, and affinity-level LC is the first attempt to correct pair-wise supervision signals in image segmentation. Moreover, class- and affinity-level LC are mutually beneficial via the proposed CACR loss.

Q9(R4): Weighting factor lambda of CACR loss. A9: We tune the weighting factor of CACR loss under ellipse noise, resulting in lambda: {0.1, 0.01, 0.001, 0.0001, 0.0}-{71.040%, 71.384%, 71.193%, 70.974%, 69.601%}(Dice score). Hence, we empirically set lambda as 0.01.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper deals with surgical instrument segmentation with noisy labels using deep learning methods, with good empirical results, with favorable ratings from all three reviewers. Meanwhile, the reviewers have raised a number of concerns including the empirical evaluations (e.g. ablation tests on whether the Class-Affinity Consistency Regularization is effective) and presentation (e.g. how P’ & and its inverse P_re’ measure intro/inter class affinity), among others. The authors need to seriously go through the issues raised by the reviewers and address them properly.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed the main concerns raised by the reviewers. Please take the reviewers final comments into consideration while preparing for the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper received two acceptance and one rejection as initial recommendations. The rebuttal addressed most concerns, satisfying all reviewers. The final version should address all reviewer comments and suggestions, particularly in terms of writing clarity and experimental details.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top