Authors

Fu Wang, Zeyu Fu, Yanghao Zhang, Wenjie Ruan

Abstract

Adversarial training has been demonstrated to be one of the most effective approaches to training deep neural networks that are robust to malicious perturbations. Research on effectively applying it to produce robust 3D medical image segmentation models is ongoing. While few empirical studies have been done in this area, developing effective adversarial training methods for complex segmentation models and high-volume 3D examples is challenging and requires theoretical support. In this paper, we consider the robustness of 3D segmentation tasks from a PAC-Bayes generalisation perceptive and show that reducing the trained models’ Lipschitz constant benefits the models’ robustness performance. Demonstrating by empirical investigation, we show that adjusting the adversarial iteration can help to reduce the model’s Lipschitz constant, enabling a self-adaptive adversarial training strategy. Empirical studies on the medical segmentation decathlon dataset have been done to demonstrate the efficiency of the proposed adversarial training method. Our implementation is available at https://github.com/TrustAI/SEAT.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_69

SharedIt: https://rdcu.be/dnwB3

Link to the code repository

https://github.com/TrustAI/SEAT

Link to the dataset(s)

http://medicaldecathlon.com/

Reviews

Review #3

Please describe the contribution of the paper
The contribution of this paper comes from three parts:
1. The authors show that the adversarial training effect on 3D segmentation tasks can be improved by reducing the norm of the trained models’ gradient based on the PAC-Bayes generalization framework.
2. The authors demonstrate that dynamically adjusting the adversarial iteration can achieve a better regularizing effect on the gradient norm than fixing the iteration, as existing methods do not work on 3D tasks.
3. The authors design a SElf-adaptive Adversarial Training strategy, SEAT for short, and empirically prove its effectiveness on the MSD dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The authors propose a novel self-adaptive adversarial training strategy (SEAT) that can effectively improve the robustness of 3D medical image segmentation models.
2. The authors provide a theoretical analysis of the PAC-Bayes generalization framework to show how reducing the Lipschitz constant of the trained model can narrow down the generalization gap and improve the effect of adversarial training.
3. The authors conduct extensive empirical studies on the Medical Segmentation Decathlon (MSD) dataset to demonstrate the effectiveness of their proposed method in improving segmentation accuracy and robustness.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The proposed SEAT method may require additional computational resources and time compared to traditional adversarial training methods.
2. The empirical studies were only conducted on the MSD dataset, and it is unclear how well the proposed method would generalize to other datasets or medical imaging tasks.
3. The paper does not compare the performance of their proposed method with other state-of-the-art adversarial training methods for 3D medical image segmentation.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides some details on the experimental setup and methodology, which could help with reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- While you provide some details on the experimental setup and methodology, there is room for improvement in terms of providing more detailed information and resources to aid in replication. Providing a step-by-step guide for reproducing your results or pre-trained models/weights would greatly enhance the reproducibility of your work.
- It would be helpful if you could compare the performance of your proposed SEAT method with other state-of-the-art adversarial training methods for 3D medical image segmentation. This would provide a better understanding of how your method compares to existing techniques.
- While you do not explicitly mention any weaknesses or limitations of your proposed method, it would be helpful if you could discuss potential limitations or future directions for research in this area.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper appears to be well-written and presents a novel approach for improving 3D medical image segmentation models. The authors provide theoretical analysis and empirical studies to support their proposed method, which adds credibility to their work
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

For robust 3D medicial image segmentation, the paper provides PAC-Bayesian generalization bounds and related adversarial learning method. Experiments on a Medical Segmentation Decathlon dataset is used to validate effectivess of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Research on robust 3D medicial image segmentation is valuable and rarely studied. 2) The paper provides theoretical evidences, i.e., PAC-Bayesian generalization bounds.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) After reading the PAC-Bayesian generalization bounds and proposed adversarial learning method in Section 3, I am still not sure how the theoretical support and proposed method contribute to medical image segmentation based on general image segmentation. 2) According to the experimental results in Figure 2., the proposed methods outperforms previous methods on limited tasks only.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides sufficient implementation details.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Refine contributions of the PAC-Bayesian generalization bounds and proposed adversarial learning method for medical image segmentation.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Research on robust 3D medicial image segmentation is valuable and rarely studied. The paper also provides theoretical evidences, i.e., PAC-Bayesian generalization bounds.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

In this paper, the authors apply the PAC-Bayes framework to adversarially train a 3D medical image segmentation model. Besides, the authors propose a self-adaptive strategy to determine the number of update iterations for better adversarial training effect.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

In Fig.1 of Sec 3.2, the authors present an interesting visualization result to show that a model with lower average gradient norm usually is of high adversarial robustness.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The effectiveness of the proposed SEAT method has not been well verified. The proposed SEAT method is compared with only one previous baseline. It is suggested to compare with 1-2 other latest adversarial training methods.
2. The number of update iterations in the FREE method is fixed. How to determine the iteration number in the FREE method? Have the hyper-parameter been well tuned and set to its best value? Could the authors provide the results of the FREE method with different values of the iteration number?
3. How to determine the values of the hyper-parameters in the proposed method, such as the maximum iteration, update frequency, and a relax factor? How does the result change when setting different values for these hyper-parameters? (For example, the relax factor)
4. To show if the comparison is fair, it is suggested to report the hyper-parameter setting of the FREE and SEAT methods, in a form of table.
5. Even though in Sec 3.2, the limitations of existing methods (spectral nomalization, penalizing gradient norm) are discussed, they are not so convinced. In the case of 3D medical image segmentation, existing methods require too large GPU usage or too much computing time? The authors should estimate and report these computation costs with numerical values. It is suggested to present the experimental evidence to show that existing methods are prohibited.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Unsatifactory. They claim that they reported ‘The range of hyper-parameters considered, method to select the best hyper-parameter’, ‘the mean & variation of results’, ‘the average running time’ and ‘the memory footprint’. However, I cannot find these information at all. By the way, the authors claim that they will release the training code, evaluation code and model weights after acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please see the weakness.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The effectiveness of the proposed method has not well verified. It is unclear if the comparison is fair. It is unclear how to determine to the key hyper-parameters in the comparison. The limitations of existing methods are not so convinced, due to the lack of experimental results. If the authors provide reasonable responses, the desicion may be changed.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper provided some theoretical insights of robustness for segmentation models and proposed a self-adaptive adversarial training method called SEAT based on the findings that regularizing the gradient norm can improve robustness and adversarial training can reduce the gradient norm. Compared with previous work, SEAT can yield similar performance but less computationally expensive. Effectiveness is shown by the experiments on the Medical Segmentation Decathlon (MSD) datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper is well organized and written, and the logic is clear. It provides some theoretical insights and based on the findings of previous works, the adversarial training is introduced. In addition, based on the experimental results, adversarial training algorithm is further improved and the authors propose their SEAT algorithm. Results are shown on the Medical Segmentation Decathlon (MSD) datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The theoretical insights are less novel. The value of this research in 3D medical segmentation is less valuable.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This work is relatively more reproducible because the code and the model will be available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The theoretical part of PAC-Bayes Generalisation Bounds might be reorganized. Some contents are not necessary.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The logic is clear and the experimental results prove the statements.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper provides theoretical insights into the robustness of segmentation models and proposes a self-adaptive adversarial training method called SEAT. This is based on findings that regularizing the gradient norm improves robustness and adversarial training reduces the gradient norm. The SEAT method offers similar performance to previous work but is less computationally expensive, with effectiveness demonstrated on the Medical Segmentation Decathlon (MSD) datasets. The paper is well-organized, clear, and logically presented, providing valuable theoretical insights. It innovatively introduces adversarial training, further enhancing this algorithm based on experimental results, and proposes the SEAT algorithm. The authors provide a theoretical analysis of the PAC-Bayes generalization framework and demonstrate the effectiveness of their method in improving segmentation accuracy and robustness through extensive empirical studies on the MSD dataset.

The reviewers noted several weaknesses of the paper.
1. The effectiveness of the proposed SEAT method has not been adequately verified, with only one previous baseline used for comparison. More comparisons with other latest adversarial training methods are suggested. The lack of comparison with other state-of-the-art adversarial training methods for 3D medical image segmentation is noted. However, as a conference paper, this may be acceptable.
2. The paper doesn’t detail the hyper-parameter setting of the FREE and SEAT methods, and it’s unclear how well the proposed method would generalize to other datasets or medical imaging tasks. Additionally, the proposed SEAT method may require additional computational resources and time compared to traditional adversarial training methods.
3. The weak connection between the theoretical analysis and medical image segmentation also confused certain reviewers, which raises a concern. It needs clarification.
Overall, the paper offers good contributions despite some concerns on the details of the work.

Author Feedback

We sincerely appreciate all reviewers for their valuable and encouraging feedback. In the following, we address the comments.

About the PAC-Bayesian generalization bound, we noticed that this theory has been adopted to explain and improve the adversarial training towards classification models. But it has yet to be studied in the segmentation tasks in general. In this work, we have demonstrated how to formulate the segmentation performance to align with the PAC-Bayesian theory’s definition. We have also shown that the insights from previous work [1], specifically that regularizing the gradient norm improves the generalizability of the adversarially trained models, remain valid for 3D medical segmentation tasks. Along with the problem formulation, we noted that the key difference between the classification and segmentation tasks when applying the PAC-Bayesian bound stems from the model architecture. We intend to continue investigating the impact of the architecture perspective.

Concerning the computational cost, it is true that additional computation is required to calculate the norm of the gradient and update both the threshold and the number of adversarial steps. However, compared with backpropagation, these operations are significantly less costly, given that the gradient has already been computed for updating the model’s parameters and adversarial perturbation. Moreover, by performing fewer backpropagation operations, our method actually reduces the computational complexity of the adversarial training algorithm.

On the experimental front, to the best of our knowledge, Daze et al. [2] was the only work that studied the adversarial robustness of 3D medical segmentation models at the time we prepared this paper. Consequently, we reproduced their method on our machine and presented their performance as a baseline in this paper. In future work, we aim to include other model architectures and implement other adversarial training methods to provide a more comprehensive empirical study of adversarial robustness in the medical segmentation scenario.

As for implementation details, we adopted the code developed in [2] and followed their practice to set up most of the hyperparameters. We will make our code publicly available on GitHub, and a link will be included in the camera-ready version of this paper.

[1] Farnia, F., Zhang, J.M., Tse, D.: Generalizable adversarial training via spectral normalization. In: ICLR (2019) [2] Daza, L.A., P ́erez, J.C., Arbel ́aez, P.: Towards robust general medical image segmentation. In: MICCAI (2021)

back to top

Self-adaptive Adversarial Training for Robust Medical Segmentation