Authors

Hyuna Cho, Yubin Han, Won Hwa Kim

Abstract

Modern deep learning methods for semantic segmentation require labor-intensive labeling for large-scale datasets with dense pixel-level annotations. Recent data augmentation methods such as dropping, mixing image patches, and adding random noises suggest effective ways to address the labeling issues for natural images. However, they can only be restrictively applied to medical image segmentation as they carry risks of distorting or ignoring the underlying clinical information of local regions of interest in an image. In this paper, we propose a novel data augmentation method for medical image segmentation without losing the semantics of the key objects (e.g., polyps). This is achieved by perturbing the objects with quasi-imperceptible adversarial noises and training a network to expand discriminative regions with a guide of anti-adversarial noises. Such guidance can be realized by a consistency regularization between the two contrasting data, and the strength of regularization is automatically and adaptively controlled considering their prediction uncertainty. Our proposed method significantly outperforms various existing methods with high sensitivity and Dice scores and extensive experiment results with multiple backbones on two datasets validate its effectiveness.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_53

SharedIt: https://rdcu.be/dnwD1

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #5

Please describe the contribution of the paper

The paper proposes a data augmentation method for polyp image segmentation that aims to address an important limitation of several existing methods. In particular, existing approaches tend to distort or ignore the clinical information of local regions of interest in an image when applying transformations like dropping, mixing image patches, or adding random noise. In contrast, the proposed method employs object-level adversarial perturbations to augment the dataset while preserving the semantics of key objects. Additionally, it utilizes object-level anti-adversarial perturbations to generate easier-to-predict samples, which are used to guide training via a consistency regularisation loss. The proposed approach demonstrates promising results in improving the performance of polyp image segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea of using adversarial perturbations to generate hard-to-predict samples while preserving the semantics of key objects, and anti-adversarial perturbations to generate easier-to-predict samples that guide training, is very interesting and has not been explored in the field of medical image segmentation.
- The results of the study are promising. The proposed method has been evaluated on two publicly available datasets, and it consistently outperforms previous studies across various backbones and evaluation metrics.
- The paper is well organised and easy-to-follow and the proposed method is well motivated.
- The paper includes detailed ablation studies that analyse the impact of various values of critical parameters on the performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The anti-adversarial consistency regularisation proposed in the paper significantly increases the complexity of the method but does not yield equivalent performance improvements. The method achieves a mIoU of 92.15% with a common supervised segmentation loss and 92.43% when incorporating the anti-adversarial consistency regularisation loss.
- The hyper-parameter settings of previous methods were adopted from their original papers, which raises concerns about fairness in comparisons. Basic hyper-parameter optimisation should have been performed to ensure that optimal values for key parameters are selected for all methods for the current task.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method and experimental setup is well-described in the paper, and the authors intent to make their code publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- In Section 3, the authors claim that “using the pseudo-label from anti-adversary as a perturbation of the ground truth, the network is supervised by diverse and realistic labels that contain auxiliary information that the originally given labels do not provide”. However, it is not clear why the pseudo-labels contain this auxiliary information, and the authors need to elaborate further to support this claim. Also, it would be useful to include some visual comparison between the pseudo-labels and the ground truth.
- In Fig. 2, the consistency regularization loss is denoted by Lcr, while Rcon is used in the rest of the paper. To ensure consistency throughout the paper, one of these notations should be changed.
- It would be beneficial to include some qualitative results demonstrating the effectiveness of the proposed method compared to previous methods.
- See also comment about experimental settings of previous methods in weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The paper proposes a very interesting idea that has not been explored in image segmentation.
- The proposed method consistently demonstrated improved performance compared to different baselines across various experiments using various neural network architectures.
- Several ablation studies have been conducted to analyze the individual components of the proposed method.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The authors of this work have developed a data augmentation technique to address the requirement for large labeled datasets. Specifically, they suggest perturbing the objects of interest with adversarial noise and training a network to expand discriminative regions with the assistance of anti-adversarial noise. This guidance is achieved through consistency regularization between two contrasting sets of data, with the strength of the regularization controlled by prediction uncertainty. The efficacy of this approach was evaluated on two datasets, and comparisons with some methods has been presented.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is written in a clear and coherent manner, making it easy to follow and understand.
2. The authors introduce an anti-adversarial consistency loss that effectively enhances the performance of image segmentation in the face of adversarial attacks.
3. The related works section is well-structured and offers valuable insights into the current state of research in this area. The contribution of this paper is well-justified and well-positioned within the literature.
4. The method section is described well to explain the proposed approach. The experiments are thoughtfully designed and executed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors state that their aim is to develop a method that addresses the challenge of image segmentation in limited annotation settings. However, the use of a large data fraction (80% = 800 out of 1000 images) for demonstrating the proposed approach and comparing with existing methods raises some concerns. Even the baseline U-Net with basic augmentations achieves a high segmentation Dice score of 0.94 and 0.92 for the Kvasir and ETIS datasets, respectively. It would be more appropriate to evaluate the proposed method in a lower fraction of labeled settings, such as 5% or 10%, to determine whether it can provide robust gains in limited annotation scenarios.
2. Although the proposed method exhibits improvements over the compared methods, the differences in terms of Dice score are relatively small. Specifically, the difference between the proposed method and the U-Net baseline with basic augmentations is only 0.02 Dice score for both datasets. To determine the statistical significance of these improvements, the authors could consider conducting significance tests like Wilcoxon signed-rank test. These tests could be performed between the proposed method and the best-performing work from the literature, which would provide a more rigorous evaluation of the proposed approach.
3. The authors miss some citations from literature that use data augmentation in limited labeled scenarios to provide robust image segmentation. They leverage unlabeled data and limited labeled data to generate synthetic data that provides gains on image segmentation in limited annotation scenarios. It would be beneficial if the authors could consider including these works in the revised version of their paper. I am listing some such works below: [1] “Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation”, CVPR, https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Data_Augmentation_Using_Learned_Transformations_for_One-Shot_Medical_Image_Segmentation_CVPR_2019_paper.pdf [2] “Semi-supervised and Task-Driven Data Augmentation”, IPMI, https://link.springer.com/chapter/10.1007/978-3-030-20351-1_3 [3] “A learning strategy for contrast-agnostic MRI segmentation”, PMLR, http://proceedings.mlr.press/v121/billot20a [4] “Unsupervised Deep Learning for Bayesian Brain MRI Segmentation”, MICCAI, https://link.springer.com/chapter/10.1007/978-3-030-32248-9_40
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors have provided enough details in the paper. It would be great if they can release the code after publication.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The comments for this section have been provided in above sections of weakness (section 6)
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors’ stated goal is to develop a method that addresses the need for a large number of annotations. However, their evaluation is conducted at a high data fraction of 80%, which equates to a substantial number of annotated images (800 out of 1000). To better assess the proposed method’s robustness, it would be advisable for the authors to evaluate it on smaller fractions of labeled data, such as less than 5-10%.

Moreover, the segmentation performance improvements achieved by the proposed method are relatively small compared to the baseline approaches (around 0.02 Dice improvement). Therefore, it is essential that the authors perform statistical significance tests to determine whether these gains are significant or not.
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a novel data augmentation method for medical image segmentation without losing the semantics of the key objects, whose effectiveness is evaluated.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea is novel.
- The performance of the proposed method outperforms other augmentation methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Firstly, the authors only evaluated their method on one dataset, which may not be sufficient to demonstrate its effectiveness in a variety of scenarios. It would be more compelling to use multiple datasets, including 3D data, as well as different medical imaging modalities such as CT and MRI, to better evaluate the proposed method’s robustness and generalizability.

Moreover, the authors’ proposed method appears to share some similarities with a previously published work [1]. As such, it is essential for the authors to clarify the differences between their approach and [1], and include a thorough comparison in their experimental results. This will ensure that the paper’s contribution is adequately distinguished from the previous work.

Also, the ablation study for the perturbation budget is missing.

[1] Realistic Adversarial Data Augmentation for MR Image Segmentation, MICCAI-20
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This method might be easy to reproduce.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

As far as I know, adversarial training usually decreases the performance. Why the authors use adversarial training help the augmentation ? Please clarify.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Missing reference and limited evaluation dataset.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a novel method for data augmentation using adversarial perturbations. The method is well explained, and the evaluation is convincing. Please consider taking care of the minor questions/points of the reviewers for the camera-ready version.

Author Feedback

We thank all reviewers for their favorable and constructive reviews. Here we clarify the questions and concerns.

Q) Rev #2 and #4: Some references are missing. A) We appreciate Rev. #2 and #4 for suggesting great references. We will add them to the paper and discuss their methods.

Q) Rev #2: An ablation study for the perturbation budget is needed. A) We provided the ablation studies on the effect of noise magnitude and perturbation steps in the supplementary material due to the page limit.

Q) Rev #2: Why does adversarial training boost performance? A) While adversarial training itself may decrease performance due to the increased difficulty of the training task, learning on the augmented dataset comprising both clean and adversarial samples enables the model to learn more robust and discriminative features. This facilitates improved feature representation and better generalization performance as the model can learn a more comprehensive representation of the data.

Q) Rev #5: Performance gain with AAC is marginal & Include additional analyses on pseudo-label. A) We will additionally add quantitative ablation study on the AAC (i.e., R_con) on diverse settings and add visualization of the pseudo-label in the revised paper to further provide in-depth analyses on AAC.

Q) Rev #5: Inconsistency in Fig. 2. A) We appreciate the reviewer for pointing out the error. We will fix the inconsistency in Fig.2 and its caption in the revised paper.

Q) Rev #2/4/5: Include additional experiments (e.g., additional dataset, different settings, and more qualitative and statistical comparisons) A) We appreciate the reviewers for these suggestions and agree that introducing such experiments would be beneficial to verify the effectiveness of our method. However, due to the page limit, it is difficult to include all these analyses in the main paper. Given the short 8 page limit, we did our best to demonstrate the effectiveness of our model over 9 baselines using 4 backbones and 4 metrics on two dataset (e.g., ETIS with only 156 training samples). We appreciate the feedback from the reviewers and will gladly include those experiments in detail in the journal version of our paper.

back to top

Anti-Adversarial Consistency Regularization for Data Augmentation: Applications to Robust Medical Image Segmentation