Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Luisa Neubig, Andreas M. Kist

Abstract

Semantic segmentation is an important task in medical imaging. Typically, encoder-decoder architectures, such as the U-Net, are used in various variants to approach this task. Normalization methods, such as Batch or Instance Normalization are used throughout the architectures to adapt to data-specific noise. However, it is barely investigated which normalization method is most suitable for a given dataset and if a combination of those is beneficial for the overall performance. In this work, we show that by using evolutionary algorithms we can fully automatically select the best set of normalization methods, outperforming any competitive single normalization method baseline. We provide insights into the selection of normalization and how this compares across imaging modalities and datasets. Overall, we propose that normalization should be managed carefully during the development of the most recent semantic segmentation models as it has a significant impact on medical image analysis tasks, contributing to a more efficient analysis of medical data. Our code is openly available after peer review.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_67

SharedIt: https://rdcu.be/dnwEk

Link to the code repository

https://github.com/neuluna/ga-unet

Link to the dataset(s)

https://www.bagls.org/

https://datasets.simula.no/kvasir-seg/

http://medicaldecathlon.com/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a method to automatically select the best normalization method for semantic segmentation of medical images using evolutionary algorithms. The paper reveals that the choice of normalization method has a significant impact on the performance of semantic segmentation models, and provides a more effective solution for medical image analysis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper proposes a strategy for automatically selecting the best normalization method, which outperforms the baseline of any single normalization method, and demonstrates its effectiveness on different datasets and imaging modes. (2) The paper provides extensive visualization experiments to demonstrate the authors’ motivation and strategy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Searching for the best normalization config is not a new topic. As i know, [1] utilized the evolutionary algorithem to search the best Normalization layers and activation functions. Why do the authors not cite and discuss this work? [1] Evolving Normalization-Activation Layers, NeurIPS 2020.

    (2) The proposed strategy only outperforms the well-designed network on some datasets, and the performance gap is not significant. Overall, IN normalization may still be the better choice for most datasets. (3) The search space contains 5^9 candidates, while the proposed method initialize a small population with only 20 candidates, i think it is hard to evolve the architectures with such small population. (4) The paper proposes selecting the optimal strategy in the first generation, but why does it perform poorly on the heart dataset? What could be the reason for this?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I contend that this paper exhibits a high degree of reproducibility, as the authors adhered to the standard experimental procedures and analysis methods, and furnished the pertinent code and data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    see above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    lack of nolvety and convincing experiment results

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method for automatically selecting the best set of normalization methods for semantic segmentation in medical imaging using evolutionary algorithms. Compared to single normalization methods, the proposed method provides better results with insights into normalization selection across imaging modalities and datasets. Furthermore, the code is openly available after peer review, making it accessible to researchers.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1- The paper proposes a novel approach for automatically selecting the best set of normalization methods for semantic segmentation in medical imaging using evolutionary algorithms. 2- The proposed evoNMS method outperforms any single normalization method baseline and can potentially provide the best-performing set of normalization patterns for any given dataset. 3- Normalization is shown to significantly impact medical image analysis tasks, making the proposed method a valuable contribution to the field

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1- Lack of analysis: The main weakness of the paper is the lack of detailed analysis of the performance of different normalization methods across datasets. 2- The ranking row in Table 2 is misleading, and limits the credibility of the proposed method’s performance and its generalizability to other datasets. 3- The paper would benefit from a more thorough discussion of related work on the automated selection of normalization methods in medical imaging and comparing other evolutionary methods in the results section.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper meets the standard requirement in terms of reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    To address the weaknesses in the paper, the authors should provide a more in-depth analysis of the performance of different normalization methods across datasets. While Table 2 shows that IN normalization methods perform the best for 6 datasets compared with Gen1 and Gen20 methods which performed best on 1 and 4 datasets, respectively, the ranking row is misleading and limits the credibility of the proposed method’s performance and its generalizability to other datasets. Adding the average to better represent the ranking would improve the paper’s evaluation. Additionally, the paper would benefit from a more thorough discussion of related work on the automated selection of normalization methods in medical imaging and comparing other evolutionary methods in the results section. This would help to position the proposed approach in the context of existing methods and highlight its contributions to the field. Overall, these improvements would enhance the quality of the paper and improve its potential impact in the field of medical image analysis.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed evoNMS method outperforms baseline methods and has the potential to provide the best-performing normalization patterns for any given dataset. However, the paper has weaknesses such as the lack of detailed analysis of the performance of different normalization methods across datasets and the insufficient discussion of related work on the automated selection of normalization methods in medical imaging and comparison with other evolutionary methods. Addressing these issues would improve the paper’s credibility and increase its impact in the field of medical image analysis. Therefore, I recommend a weak accept.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel normalization method search approach using evolutionary algorithm. They demonstrate the proposed methodology, evoNMS, discovers effective network architectures that achieve at-par or better performance of semantic segmentation of 11 biomedical imaging datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel approach and analysis. Conveys the importance of normalization layers in the deep learning networks.
    • Demonstrate performance on 11 datasets.
    • Extensive ablation experiments to support the novel contributions of their work
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The training process is not explicitly laid out. Are the performances reported on test sets of the dataset? In Table 2 caption, what does “mean value of five individual runs” refer?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All the datasets used, except 1 are public datasets. They mention code will be available after peer review. So, the paper should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Good paper. Extensive analysis done. Another experiment - GradCAM based analysis of the normalization layers selected.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel method. Strong performance on multiple datasets and extensive ablation studies done.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a evolutionary algorithm to automatically select the normalization methods for data-specific noise. The paper demonstrates that the performance of semantic segmentation models in medical image analysis can be significantly influenced by the normalization method selected, and offers a more efficient solution. This paper may be considered for publication if the major concerns will be adequately addressed in the rebuttal phase:

    1. Lack of innovation in searching for normalization method.
    2. Lack of generality for other networks and datasets
    3. Lack of analysis of the performance of different normalization methods across datasets.
    4. More thorough discussion on related work




Author Feedback

We appreciate the thoughtful evaluation and constructive feedback provided by the meta-reviewer(s) (MR) and individual reviewers (R). We address the major issues raised by the MR and refer to specific statements from each R using their respective ID.

  1. Lack of innovation in searching for normalization methods: Our study presents a distinctive combination of (i) medical imaging, (ii) normalization methods, (iii) semantic segmentation, and (iv) neural architecture search. We acknowledge the importance of finding the optimal normalization method, as mentioned by R1. However, our study stands out as it contributes uniquely to this search by providing a thorough analysis of the most effective layer-wise normalization configuration across medical datasets, rather than proposing a new normalization method (as emphasized by R1 with Liu et al. [N1]). In addition, we highlight that this is the first study to systematically compare instance, batch, layer, group, and filter response normalization in terms of their layer-specific contribution.
  2. Lack of generality for other applications: We conducted experiments on eleven medical datasets to evaluate our findings. We provide evidence that it is the dataset rather than the imaging modality defining the normalization method configuration (Fig. 4B). We acknowledge R1’s observation of the strong performance of IN in most cases. Fig. 4A presents this significant finding across a wide range of medical datasets. In the biomedical domain, the U-Net architecture is widely used as a foundation for semantic segmentation tasks. We show that EvoNMS-generated U-Nets are on par or outperforming U-Net-based state-of-the-art architectures (e.g., nnU-Net). We are confident that our results can be extrapolated to other semantic segmentation architectures that incorporate normalization methods.
  3. Lack of analysis of the performance: We appreciate feedback from R2 that more detailed analysis is needed. In response, we present in Fig. 2 of the Appendix the evaluation of different normalization methods using the DC, HD95, and BBIoU metrics on three datasets. In accordance with R1, we are happy to report the average performance across all datasets of BN, IN, evoNMS (GEN20) in terms of DC↑ [0.514, 0.799, 0.823], HD95↓ [185.992, 4.121, 4.323], and BBIoU↑ [0.446, 0.770, 0.773]. We still outperform or are on par with our baselines and will report the above metrics in the revised manuscript. We also show the relationship between the normalization methods in different datasets in Fig. 3 and provide a summary of this information in Fig. 4A. 4.Thorough discussion on related work: We thank the MR and R1 and R2 for the criticism to highlight more related work. We provide a comprehensive overview, including previous works on evolutionary optimization of the U-Net for retinal vessel segmentation ([13, 16]), the need of normalization methods for high generalizability in deep neural networks ([5, 9, 17, 3, 15]), and studies exploring optimal normalization patterns through switchable normalization ([11]). We will also include important work on the combination of normalization and activation functions [N1], that domain-independent normalization helps to improve unsupervised adversarial domain adaptation for improved generalization capability [N2], and we proved that not only the architecture has a large impact on the generalizability of a neural network [N3], but also the combination of normalization. Regarding minor comments, we clarify that EvoNMS generates a single best key configuration in each generation, the best configuration in the given GEN is trained five times from scratch and averaged (R3). We did not claim to find the absolute best configuration in the first generation, but rather selected the best performing configuration to demonstrate increasing generalization ability over subsequent generations (R1). [N1] Liu et al., NeurIPS 2020;[N2] Romijnders et al., WACV 2019; [N3] Liu et al., Auto-DeepLab, CVPR 2019




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper develops an evolutionary algorithm to automatically select the normalization methods for data-specific noise in semantic segmentation of medical images. The paper shows the effectiveness of normalization methods to boost the performance of semantic segmentation in medical image analysis. However, the author’s rebuttal does not solve the key concerns, including insufficient innovation and generality to be published in MICCAI.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work is about studying the influence of different layer normalization methods on segmentation performance, and proposes an evolutionary scheme to find out the best kind of normalization for a segmentation problem. It provides results on 11 different datasets, making the evaulation very comprehensive. While technically not an overly strong contribution is made, I still think that this evaluation is meaningful and will provide a reproducible way on how to slightly improve segmentation performance in practice. The findings of this work are therefore in my opinion of interest for the MICCAI community so I tend to (weakly) vote for acceptance of this work.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In this paper, the authors proposed to use evolutionary search method to find the optimal normalization layer for the segmentation model. The paper is well written. They evaluated the method on 8 different tasks which is very good for MICCAI. The rebuttal addressed most concerns from the reviewers. The key points should be added to the final version. In addition, it will be good if the author could discuss the search time which is a big issue for evolutionary search algorithms.



back to top