Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yidong Zhao, Changchun Yang, Artur Schweidtmann, Qian Tao

Abstract

The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate the nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via ensembling multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_51

SharedIt: https://rdcu.be/cVVp7

Link to the code repository

N/A

Link to the dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/

https://www.ub.edu/mnms


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a method to estimate the nnU-Net uncertainty for medical image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Comprehensive literature review;
    2. Novel with utilization of network checkpoints at various training epochs;
    3. Comprehensive evaluations;
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. “Efficient” in the title is not well supported by the results.
    2. Only 3D data is evaluated. 2D data segmentation should be included.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. “Efficient” in the title is not well supported by the results.
    2. nnUnet has been powerful on both 3D and 2D data. However, only 3D data is evaluated here. 2D data segmentation should be included.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is good and the field is interesting.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The novelty is limited though the authors have pointed out the difference.



Review #2

  • Please describe the contribution of the paper

    This paper leads to several nice contributions: • it provides a novel VI approximation method, • it provides an uncertainty estimation scheme for nnU-net architecture • it boosts this same architecture in the context of biomedical image segmentation

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is that it fills a gap in DL community by providing an uncertainty estimation method for the nnU-net architecture.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not see any particular weakness in this paper.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    No special recommandation to make.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper seems to be a very important contribution in the domain of biomedical image segmentation, since it provides an uncertainty scheme to the state-of-the-Art architecture that nnU-net is, which was not obvious to develop (as explained in the paper). Furthermore, its performance is boosted, leading to a new state-of-the-Art segmentation method.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper presents an uncertainty estimation method that employs the posterior sampling of weight space and validates it in two public datasets under the nnU-Net framework. The uncertainty is estimated by ensembling multiple snapshots (checkpoints) during one model training but under a cyclic learning rate schedule. The obtained results outperform three commonly used baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-organized and easy to follow.
    2. The idea is neat and it can have border impact once code is available, as it is integrated into the popular nnU-Net framework.
    3. The improvement in the ECE metric is significant, although the gain in the segmentation performance is relatively marginal compared to Deep Ensemble.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Too limited novelty. This method proposed in this paper is essentially similar to “Snapshot ensembles” (ICLR 2017), cited as [13] in the paper. However, this paper does not discuss it in related work and only mentions [13] in the cyclical learning rate setting (section 2.3). To me, the single-modal posterior sampling is similar to NoCycle Snapshot Ensemble, and the multi-modal posterior sampling is similar to Snapshot Ensemble. In my reviewing process, I was waiting for the EXPLICIT discussion and comparison of the difference between the proposed method and Snapshot ensembles but got rather disappointed. The differences are small, e.g., the use case (medical images), and the cyclical schedule (from cosine lr to a proposed one). Besides, there is no ablation experiment on the proposed cyclical schedule.

    If the authors would clarify the difference between the proposed method and Snapshot ensembles, I would consider adjusting my rating.

    Minor issue: The name “multi-modal” can be confusing in the medical domain.

    Snapshot ensembles: Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: Train 1, get m for free. ICLR 2017.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No issue here.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. A paper with limited novelty is fine, as long as it honestly claims its contributions. Do not overclaim.

    2. An extensive ablation study or study on important factors can improve the contributions of this paper. For example, why cosine lr fails for nnU-Net training? How do the hyper-parameters like the number of checkpoints, the gamma parameter, or training epochs affect the quality of estimated uncertainty?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I’m okay with a paper with limited novelty but with extensive experiments and ablation studies to distill the key factors that can contribute to the community. The authors cite Snapshot ensemble [13] in their paper but present no discussion/comparison between the proposed method and Snapshot ensemble. In my opinion, it is more like purposeful behavior. That’s why I downgrade my rating.

    I would raise my score if the authors could address my concerns.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Based on the response, I would like to raise my score. Authors are encouraged to fully discuss Snapshot in the revised paper.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposes a method to estimate the nnU-Net uncertainty for medical image segmentation. Good evaluation. There are concerna about methodological novelty by reviewer 3.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank all the reviewers for their expert review and constructive comments. We greatly appreciate the reviewers’ positive remarks that our paper presented a “novel” (R1 and R2) and “neat” (R3) idea to incorporate uncertainty estimation into the influential nnU-Net architecture, with “comprehensive literature review and evaluation” (R1). and that our work “seems to be a very important contribution in the domain of biomedical image segmentation” (R2).

The major criticism to which we would like to respond is novelty and similarity to Snapshot Ens. cited as [13], as raised by R3. We did not purposefully avoid discussing the NoCycle/Snapshot Ens. in [13], because we believe that the proposed single-/multi-modal sampling strategies have not been previously presented/validated and are in many aspects different from NoCycle/Snapshot Ens. in [13]. First, our single-modal sampling takes place densely in the last phase of each training cycle, during which LR is constant and SGD explores a high posterior region. In contrast, NoCycle saves checkpoints sparsely with standard LR decay, consuming as many epochs as the cyclical variant. Our method has a theoretical ground that the convergence phase of SGD at a constant LR approximates local posterior sampling, which was thoroughly discussed by Mandt et al. 2017, cited as [20] (Related Works). Second, Snapshot Ens. in [13] used only one checkpoint in each cycle and ignores local weight uncertainty, while our multi-modal sampling combines both local and global weight variability (the latter by exploring more modes via the cyclical scheme). Finally and importantly, NoCycle was designed to show that checkpoints on a standard SGD trajectory lack diversity and the use of snapshots for uncertainty estimation was not discussed in [13]. However, our results revealed that the weight uncertainty captured by SGD in a single training cycle can effectively propagate to the prediction, which was shown in Fig. 1 and Table. 2. We believe the most inspiring part of [13] was to use cyclical LR to explore more posterior modes and capture global uncertainty, that is why we cite it in the cyclical training section. We thank R3 for bringing in the discussions, and in the final version, we will add elaborated discussions on novelty and methods comparison: (1) Our method is a theoretically grounded posterior sampling approximation, compared to NoCycle; (2) Snapshot Ens. ignores local weight uncertainty whereas our method combines both local and global uncertainty (cf. Fig. 2 (b)); (3) our work focused on uncertainty estimation and the ablation study showed that a significant ECE improvement was contributed by local uncertainty (Table 2, single-modal), and cyclical training further improved ECE by exploring multiple posterior modes (for corner cases, e.g. Fig. 3 (d), better uncertainty thanks to the multi-modal diversity).

R1 raised concern about efficiency. In our work, the efficiency refers to the training efficiency for weight posterior inference, in comparison to the SOTA VI or Deep Ens. which demands modification of net architecture or linearly growing training time. We will clarify this point in the revision. R3 raised concern for the LR schedule. We have carried out extensive experiments but failed to tune cosine annealing on nnU-Net for multi-modal exploration while keeping the loss limited. The proposed scheme can drive weights out of a local mode (cf. Fig. 2) without loss overshoot. R1 and R3 also raised concerns about the range of experiments conducted. Due to space limit, we have focused on the major contribution of our work, showing that local weight uncertainty by SGD combined with cyclical training can better explore the posterior P(w D) and delivers an improved uncertainty estimation and segmentation performance, over the SOTA methods including MC-dropout and Deep Ens.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All the reviewers concerns have been addressed and the reviewers have converged to accept decision after the rebuttal. The one reviewer who had strong reject decision changed it to weak accept after the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a very good paper. As the reviewers wrote, the authors contribution is wide and includes:
    – A novel method to estimate uncertainty of nnU-net architecture with utilization of network checkpoints at various training epochs –A comprehensive literature review; –A comprehensive evaluations;

    This is a well-organized and easy to follow. I trust it will have a broad impact on the MICCAI community who uses nnU-net.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a posterior sampling based method to estimate segmentation uncertainty iapplied to the nnUnet model. There was a concern regarding novelty which was very well addressed by the authors in their rebuttal, and all reviewers are in favor of accepting the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top