Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Agostina J. Larrazabal, César Mart́ınez, Jose Dolz, Enzo Ferrante

Abstract

Modern deep neural networks achieved remarkable progress in medical image segmentation tasks. However, it has recently been observed that they tend to produce overconfident estimates, even in situations of high uncertainty, leading to poorly calibrated and unreliable models. In this work we introduce Maximum Entropy on Erroneous Predictions (MEEP), a training strategy for segmentation networks which selectively penalizes overconfident predictions, focusing only on misclassified pixels. Our method is agnostic to the neural architecture, does not increase model complexity and can be coupled with multiple segmentation loss functions. We benchmark the proposed strategy in two challenging segmentation tasks: white matter hyperintensity lesions in magnetic resonance images (MRI) of the brain, and atrial segmentation in cardiac MRI. The experimental results demonstrate that coupling MEEP with standard segmentation losses leads to improvements not only in terms of model calibration, but also in segmentation quality.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_27

SharedIt: https://rdcu.be/dnwAZ

Link to the code repository

https://github.com/agosl/Maximum-Entropy-on-Erroneous-Predictions/

Link to the dataset(s)

WMH Segmentation dataset: https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/AECRSD

LA Segmentation dataset: https://arxiv.org/abs/2004.12314


Reviews

Review #1

  • Please describe the contribution of the paper

    The main idea of this work is to propose a method which penalizes overconfident estimates against incorrect class predictions in image segmentation. This is achieved by specifically regularizing misclassified pixel predictions with low entropy towards a high entropy, uniform prediction across segmentation labels. The authors discuss that this is similar to minimizing Kullback-Leibler divergence between misclassified pixel label probabilities and a uniform distribution. The proposed approach is evaluated favorably against the same segmentation models with other loss functions (no entropy penalization or uniform entropy penalization) for two different brain features in MRI scans.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    For the real dataset, the method seems to work quite well both in overall segmentation accuracy, as well as in obtaining more qualitatively appropriate uncertainty estimates. This paper flowed very well and was clearly written, and comparisons made in evaluating their method seemed appropriate given their entropy penalty terms.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors presented two similar formulations of a penalizing loss towards misclassified predictions with low entropy, but one would hope for a more exhaustive comparison of the two. One other weakness is that this is only applied to MRI scans; it would be more convincing to demonstrate on a variety of image modalities, since it appears that the approach is not necessarily tailored to the application area.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    From the description, given that one is familiar with a neural net for medical image segmentation, this work seems quite reproducible as one can simply compute the proposed penalization terms and incorporate into the loss when training the model. Model architecture details are fairly well described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    While the discussion of the two different loss formulations which push misclassified pixels towards high entropy distributions was clear, it would be more clear if the authors had a good sense of why they behave differently (besides the gradient dynamics), or alternatively, when it might make sense to try one over the other if one wanted to choose which loss to use.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this paper is a valuable contribution and is clearly written; I think improved scores would come from extending their evaluation to other datasets, and providing more understanding of the two loss functions (specifically, which may be preferable, since they appear to give quite different uncertainty estimates).

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors improve segmentation performance and model calibration by adding a regularizer to the loss function to maximize the entropy of misclassified pixels. In other words, the model is encouraged to predict maximum uncertainty (p=0.5) on difficult parts of the image, and discouraged from the extremes (p=1.0 and p=0.0) . Thorough experiments are performed with a convincing comparison to SOTA alternatives.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Simple, intuitive idea iterating on previous literature which is evaluated thoroughly and convincingly. The novelty is iterating on reference [17] by selectively applying their regularization.
    • Thorough evaluation to SOTA methods and evaluated on 2 architectures.
    • General, so can be applied to any model architecture.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The improvements to reference [17] seem clear but slight. Could the authors verify the significance with a statistical test? e.g. add italics/underline/symbol/etc to best results that are significantly better.
    • The novelty is only a slight iteration on a previous work.
    • Not much theoretical justification of why the author’s method is better than [17], only empirical.
    • Figure 2. is not referenced with unclear captions, and too small.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Equations are clearly stated and seem clear to implement.
    • The code is shared.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The author’s could provide some theoretical proofs or insights into why their method improves on Reference [17].
    • Fig. 2 could be removed since it is not referenced.
    • For journal, the authors could compare to a SOTA architecture, like a transformer perhaps. Current SOTA methods on this dataset from previous literature are not mentioned in comparisons.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The paper is clear and well written. The idea is well communicated, justified and evaluated.
    • The results are a convincing improvement on previous approaches.
    • On the other hand, the method is only a slight iteration on a previous approach [17], so the novelty is limited. However, to the best of the reviewer’s knowledge, [17] was not applied to segmentation before so the application of [17] to this task is also a novelty.
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper provided a Maximum Entropy on Erroneous Predictions (MEEP) training strategy for segmentation networks which selectively penalizes overconfident predictions, focusing only on misclassified pixels, which aim to avoid the overconfidence estimation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of introducing Maximum Entropy on Erroneous Predictions is simple and effective approach. The paper is well organized and the idea is well explained.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The results in Fig. 2 is not consistent with Table 1, e.g., L_ce+PS. The authors should explain it. There are also related calibration researches, such as the calibration methods introduced in Guo’s [1] paper, the comparison and discuss with the related state-of-the art methods is recommended.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Why there is not analysis regarding Fig 2? Especially the explanation of Fig2 (c).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea behind this paper is simply and well explained. The experiments are promising and conclusive.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposes to enforce segmentation networks to produce fully uncertain predictions when erroneous by maximizing entropy on those voxels. R2 mentioned some concerns regarding the degree of novelty with respect to Pereyra et al., but recommended Strong Acceptance, same as R3. Since R1 also liked the proposed approach, and I do not find any reason to diverge from this consensus, I am supporting direct acceptance for this work.




Author Feedback

N/A



back to top