Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Chen Chen, Zeju Li, Cheng Ouyang, Matthew Sinclair, Wenjia Bai, Daniel Rueckert

Abstract

Convolutional neural networks (CNNs) have achieved remarkable segmentation accuracy on benchmark datasets where training and test sets are from the same domain, yet their performance can degrade significantly on unseen domains, which hinders the deployment of CNNs in many clinical scenarios. Most existing works improve model out-of-domain (OOD) robustness by collecting multi-domain datasets for training, which is expensive and may not always be feasible due to privacy and logistical issues. In this work, we focus on improving model robustness using a single-domain dataset only. We propose a novel data augmentation framework called MaxStyle, which maximizes the effectiveness of style augmentation for model OOD performance. It attaches an auxiliary style-augmented image decoder to a segmentation network for robust feature learning and data augmentation. Importantly, MaxStyle augments data with improved image style diversity and hardness, by expanding the style space with noise and searching for the worst-case style composition of latent features via adversarial training. With extensive experiments on multiple public cardiac and prostate MR datasets, we demonstrate that MaxStyle leads to significantly improved out-of-distribution robustness against unseen corruptions as well as common distribution shifts across multiple, different, unseen sites and unknown image sequences under both low- and high-training data settings. The code will be available at https://github.com/cherise215/MaxStyle.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_15

SharedIt: https://rdcu.be/cVRyt

Link to the code repository

https://github.com/cherise215/MaxStyle

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript describes a method for training a model that is more robust to domain shifts with only single domain data. The authors do so by building two more things on top the MixStyle method: 1) positioning the MixStyle layers in the decoder instead of the encoder, and 2) introducing adversarial noise in MixStyle layers to encourage more robust feature extraction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The method is innovative, especially the part where adversarial mechanism is used to further improve the MixStyle layer perturbation. 2) The experiments conducted are robust. The Reviewer appreciates the amount of MR data that the Authors are able to collect and test. The Reviewer also appreciates the paradigm of comparing training on high-data regime vs low-data regime. The Reviewer also appreciates the inclusion of prostate dataset in the Supplement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The Reviewer does not find many major weaknesses except one: during result comparison, it would be good to include a brief description of what those compared methods are. In particular, the Authors should describe what the baseline is.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Before the publication of the Authors’ codes, it is difficult to replicate directly. However, the description of the method is clear and the foundation on which this manuscript is based on, namely, the MixStyle method, is easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In Fig. 2, the Reviewer sees that the two backpropagation updates are both “Maximize L-seg”. Should one of them be “Minimize” because of the adversarial nature of the training process?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is innovative, the experiments are robust, the results are promising, and the presentation is clear.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper propose a style augmentation method with adversarial training scheme.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of adversarial learning between segmentor and style augmentor to generate “hard” style for segmentation is kind of interesting.

    2. The organization is good, which is easy to follow.

    3. Many aspects of evaluation of the effectiveness of the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is one issue in Table1. As the images generated by style augmentor is feeded to train the segmentor, why the performance drops comparing with baseline on IID? In my opinion, the model has learned the variants and it should achieve better results than the baseline.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is highly reproducible with detailed information in paper and the code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. What is the baseline compared in Table 1. The authors should give clearly states. Is it just a encoder-decoder segmentation method?

    2. The results on prostate segmentation should be in the main text, or the tittle should be cardiac segmentation instead of medical image segmentation.

    3. Page 6, there is a missing right parenthesis in eq (5).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The idea is interesting to augment style with adversarial learning.
    2. Extensive experiments are conducted to show effectiveness of the proposed method.
    3. The paper is well organized starting from preliminaries to the improved version.
    4. One issue has not been explained in the paper, which makes the result not convincing and needs to be clarified.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a data augmentation framework which achieves out-of-domain robustness using a single-domain dataset. Proposed method maximizes the effectiveness of style augmentation by producing worst-case style composition via adversarial training. Experimental results shows proposed model can achieve little better performance than baselines in terms of dice score.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of expanding style space with additional noise and search for harder style composition is interesting and somewhat noble. Extensive experimental results and ablation studies shows the superiority and effectiveness of proposed model. Also, the clarity and organization of the paper is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although Fig. 1 shows visualizations of Mixstyle and Mixstyle-DA, it would be helpful to add few more visualizations from other baseline methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors will provide the code. Used datasets are public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The idea of expanding style space with additional noise and search for harder style composition is interesting and somewhat noble. Although Fig. 1 shows visualizations of Mixstyle and Mixstyle-DA, it would be helpful to add few more visualizations from other baseline methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of expanding style space with additional noise and search for harder style composition is interesting and somewhat noble. Extensive experimental results and ablation studies shows the superiority and effectiveness of proposed model. Also, the clarity and organization of the paper is good.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents clearly an effective method of single domain generalzation that can be of interest for the MICCAI community. The reviewers were in agreement with the importance of the topic, the novelty of the work presented, the robustness of the experimental evaluation, as well as the clarity of the presentation. Minor concerns were raised regarding the presentation of the experimental results, such as description of the methods under comparison, explanation for performance drops comparing with baseline on IID, and more visualizations from other baseline methods, which I expect the authors to address in their final revision.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank all the reviewers and AC for constructive comments and for having a consensus on the technical novelty (R1: innovative, R2: interesting, R3: interesting and novel), the effectiveness of our proposed solution, MaxStyle, and the robustness of the experimental evaluation, as well as the clarity of the presentation. As highlighted by both AC and the reviewers, our work presents a very effective method to improve cross-domain performance using single domain data only. Extensive experiments show that MaxStyle outperforms existing competing methods by a large margin on two different segmentation tasks. MaxStyle is a plug-in module, which can be easily integrated into general segmentation networks to boost model robustness. We hope that MaxStyle will enable more data-efficient, robust, and reliable deep models for the use of researchers and clinicians.

Following the suggestions from the reviewers, we will add visualizations of results for the different methods for comparison, as well as a detailed description of the baseline methods and explanations about the marginal intra-domain (IID) performance drop on the cardiac segmentation task. We would like to clarify here that our method does not necessarily sacrifice the IID performance. As presented in Table S3, on the prostate segmentation task, our method can significantly improve the IID performance compared to the baseline method (average Dice score: 0.8597 vs 0.8277). For the cardiac segmentation task, our method significantly improves the OOD performance (+25% with 10 training subjects, +11% with 70 subjects ) while only slightly sacrificing the IID performance (0.8104 vs 0.8108 using 10 training subjects, 0.8727 vs 0.8820 using 70 training subjects). We hypothesize that the IID performance degradation is task and dataset-dependent. We will add this in our discussion part of the revised manuscript.



back to top