Authors

Ainkaran Santhirasekaram, Avinash Kori, Mathias Winkler, Andrea Rockall, Ben Glocker

Abstract

The reliability of segmentation models in the medical domain depends on the model’s robustness to perturbations in the input space. Robustness is a particular challenge in medical imaging exhibiting various sources of image noise, corruptions, and domain shifts. Obtaining robustness is often attempted via simulating heterogeneous environments, either heuristically in the form of data augmentation or by learning to generate specific perturbations in an adversarial manner. We propose and justify that learning a discrete representation in a low dimensional embedding space improves robustness of a segmentation model. This is achieved with a dictionary learning method called vector quantisation. We use a set of experiments designed to analyse robustness in both the latent and output space under domain shift and noise perturbations in the input space. We adapt the popular UNet architecture, inserting a quantisation block in the bottleneck. We demonstrate improved segmentation accuracy and better robustness on three segmentation tasks.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_63

SharedIt: https://rdcu.be/cVRwQ

Link to the code repository

https://github.com/AinkaranSanthi/Vector-Quantisation-for-Robust-Segmentation

Link to the dataset(s)

https://www.synapse.org/#!Synapse:syn3193805/wiki/89480

https://nihcc.app.box.com/s/r8kf5xcthjvvvf6r7l1an99e1nj4080m

http://db.jsrt.or.jp/eng.php

https://wiki.cancerimagingarchive.net/display/Public/NCI-ISBI+2013+Challenge+-+Automated+Segmentation+of+Prostate+Structures

Reviews

Review #2

Please describe the contribution of the paper

The paper provides a novel method to make a standard U-Net segmentation model more robust by incorporating a discrete bottleneck using Vector Quantization. The paper claims to perform well within a certain degree of perturbation and domain shift under a certain set of assumptions and also provides a simple theoretical justification for it.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Vector Quantization has been widely used in Speech Disentanglement and Image Generation. The paper provides a novel formulation of incorporating a discrete bottleneck (VQ) in a segmentation framework for increasing the robustness(noise perturbation of the input as well as domain shift in the datasets) of the trained model.
2. Thorough experiments have been performed to justify the claim in both binary and multi-class settings.
3. Perturbation study of the latent space corroborates the robustness claim.
4. Provides a simple bound on the amount of perturbation needed to change the output of the quantization block.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. While the paper is successful in justifying the robustness claim, it is still not clear why there is an increase in performance from the baseline U-Net on a single domain? The paper says in the CONCLUSION section that a potential reason is codebook behaving as an atlas in the latent space for segmentation of anatomical parts(which are very structured), but unfortunately no experiments/visualization has been performed to claim this. The paper also says that because anatomy is structured, the quantization bottleneck helps in capturing this structure by limiting the latent space through discretization. The reasoning doesn’t seem to be plausible, with no visualization of the codebook to support it.
2. Working only on basic U-Net architecture doesn’t prove its applicability on other sophisticated segmentation architectures.
3. No results on datasets where the segmentation region is small which limits the evaluation of the technique. For instance, I would like to see some results on the ACDC dataset (Automated Cardiac Disease Diagnosis) where the number of pixels of interest are less as compared to background.
4. The method particularly use Group Normalization and Swish Activation which is different from basic traditional U-Net architecture. I would like to know if the success of the method is contingent on using Group normalization and Swish Activation?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. It would strengthen the paper’s claim of capturing anatomical structure if we can see some visualization of the codebook vectors and how it relates to the increased performance over baseline in single domain.
2. Results on ACDC Dataset will strengthen the paper.
3. Some ablation studies are missing. E.g., an ablation study on number of vectors in the codebook. Does increasing the number of vectors leads to “more complete” codebook and hence improve performance in the domain shift experiments?
4. Please refer the points in the “weakness” section.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In the case of a single domain, the increase in performance is not very clearly explained, which casts doubt on the technique. I am open to increase my rating if my questions and doubts are answered.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposes a quantisation block that learns a discrete representation in the embedding space in the bottleneck of UNet architecture to improve the robustness under domain shift and noise perturbations in segmentation models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper introduces vector quantisation, a method proposed in the generative model , i.e., VQ-VAE, into the U-Net architecture to improve the robustness of the segmentation model. The author shows that vector quantisation of latent features could effectively solve the domain shift and noise perturbations in the input space.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

If the paper aims to claim that vector quantisation could be effective in the segmentation task universally, the author should validate this on a more general platform, such as nnUNet. Additionally, the author should try at least one more architecture except for U-Net to approve the generalization.

As VQ-VAE has claimed, the discrete representations fit for discrete latent variables, like planning and language tasks. However, the author claims that the VQ is also an

The author claims that the network could be robust by minimizing Φ(x+ε)−Φ(x), this assumption could be true for the noise perturbations. However, for domain shift, the problem should become Φ(f(x))−Φ(x) instead of simply adding a small value of ε. The function f() can range from a renormalization function to a non linear mapping, like in the paper shown in [1] [2].

There are many other works for solving the domain shift problem in medical images, such as [3][4]. While the author only compares VQ-Net against U-Net. So what is the advantage of VQ against domain adaptation methods?

In the task of the domain shift experiment for lung segmentation, as shown in Figure 1, I didn’t see there is an obvious difference between NIH and JRST. And in table 2, the segmentation performance improves from JRST to NIH encounting domain shift. Therefore this experiment is not convincing enough to prove the effectiveness of VQ for domain shift.

Reference: [1] Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation [2] Synergistic Image and Feature Adaptation: Towards Cross-Modality Domain Adaptation for Medical Image Segmentation [3] The domain shift problem of medical image segmentation and vendor-adaptation by Unet-GAN [4] A Closer Look at Domain Shift for Deep Learning in Histopathology
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

If the paper aims to claim that vector quantisation could be effective in the segmentation task universally, the author should validate this on a more general platform, such as nnUNet. Additionally, the author should try at least one more architecture except for U-Net to approve the generalization. Additionally, the author doesn’t give enough provement to demonstrate the effectiveness of VQ in domain shift. Please see the below comments.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The author claims that the network could be robust by minimizing Φ(x+ε)−Φ(x), this assumption could be true for the noise perturbations. However, for domain shift, the problem should become Φ(f(x))−Φ(x) instead of simply adding a small value of ε. The function f() can range from a renormalization function to a non linear mapping, like in the paper shown in [1] [2]. Therefore, it is hard to believe that the VQ could solve the domain shift problem in general, as the domain shift could be brought by different reasons, like the setting of equipment, or the variance of imaging conditions. So I would suggest the author mainly validate the work on their second task to approve the robustness under noise or degradation.

There are many other works for solving the domain shift problem in medical images, such as [3][4]. While the author only compares VQ-Net against U-Net. So what is the advantage of VQ against domain adaptation methods? Are there any further experiments to approve this?

In the task of the domain shift experiment for lung segmentation, as shown in Figure 1, I didn’t see there is an obvious difference between NIH and JRST. And in table 2, the segmentation performance improves from JRST to NIH encounting domain shift. Therefore this experiment is not convincing enough to prove the effectiveness of VQ for domain shift.

Minor Comments: Maybe the name VQ-Unet is better than VQ-Net in this paper, since the network develops from U-Net. This follows the same naming rule from VAE to VQ-VAE.

For Table 2, it would be better if the author could reorganize the data. For example, they can represent the change of dice score and HD95 in seperate tables. Therefore, it would be easier for the reader to compare the difference brought by the domain shift.

Reference: [1] Zakazov, Ivan, et al. “Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021. [2] Chen, Cheng, et al. “Synergistic image and feature adaptation: Towards cross-modality domain adaptation for medical image segmentation.” Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019. [3] Yan, Wenjun, et al. “The domain shift problem of medical image segmentation and vendor-adaptation by Unet-GAN.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019. [4] Stacke, Karin, et al. “A closer look at domain shift for deep learning in histopathology.” arXiv preprint arXiv:1909.11575(2019).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The author claims that by introduing VQ can effectively solve the problem of domain shift in medical images. However, either the theorical assumptions or experiments could prove this claim.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

This manuscript describes an application of quantized low-dimensional space in medical image segmentation to achieve more robust segmentations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Vector quantization of the low dimensional space has been shown to be promising in many areas and I think it is not too popular in medical image segmentation.
- Strong evaluation: various datasets, two metrics, and two ways to measure the robustness against variance: 1) via adding noise (Section 3.3) and 2) by testing on data from a different domain (Section 3.2).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Use of UNet as a baseline (from 2015) instead of a more updated approach, such as nnUNet or TransUNet.
- I found Section 3.1 a bit confusing.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility statements look reasonable. Authors mentioned in both the reproducibility statement and the paper that the code will be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- I think that the manuscript would benefit in terms of clarity if “sg” from Eq. 2 would have a more detailed explanation, as in [19] (right below their Eq. 3). Related to this, above Eq. 2 we can read that the gradients from the encoder need to be copied to the decoder. At first, I thought that it was to ignore the codebook from \Phi_q, but I think that this codebook gets updated by backpropagation as well right? (Eq. 2, third term).
- I found Sections 2.3 and 3.1 a bit difficult to read/understand, in my opinion. First, in Section 2.3, “r” is defined (Eqs. 3-4) to later discover in 3.1 that “r is obsolete”.
- Typo above Assumption 3: “and thereby enforce” -> “and thereby enforces”
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work shows that the quantized low-dimensional space in the encoder of autoencoder-like networks is beneficial to gain robustness. Authors show this via trustworthy experiments.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper provides a novel method to make a standard U-Net segmentation model more robust by incorporating a discrete bottleneck using Vector Quantization.

The paper provides a novel formulation of incorporating a discrete bottleneck (VQ) in a segmentation framework for increasing the robustness(noise perturbation of the input as well as domain shift in the datasets) of the trained model.

If the paper aims to claim that vector quantisation could be effective in the segmentation task universally, the author should validate this on a more general platform, such as nnUNet. Additionally, the author should try at least one more architecture except for U-Net to approve the generalization.

The author claims that the network could be robust by minimizing Φ(x+ε)−Φ(x), this assumption could be true for the noise perturbations. However, for domain shift, the problem should become Φ(f(x))−Φ(x) instead of simply adding a small value of ε. The function f() can range from a renormalization function to a non linear mapping.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

6

Author Feedback

We would firstly like to thank all the reviewers for their time and effort in providing us with both constructive and useful feedback. We note 4 main areas of improvement:

We completely agree, that our method of applying a discrete bottleneck using vector quantisation to the UNet architecture to improve robustness will be made more valid by also applying this method to other encoder-decoder segmentation architectures. We have already carried out experiments which showed more robust segmentation performance by applying vector quantisation to the bottleneck of the TransUNet [1] and aim to include these results in the extension of this paper. We also note the reviewers recommending apply vector quantisation to the nnUNet [2] and analysing robustness which we also hope to include in the paper extension.

We note the reviewers providing a better mathematical expression of a domain shift stated; Φ(f(x))−Φ(x) instead of simply adding a small value of ε. We will make this recommended change in the camera ready paper.

We would like to clarify that our baseline UNet architecture used Swish activation and group normalisation which we found to demonstrate similar performance to a UNet architecture with ReLU activation and batch normalisation.

Does increasing the number of codebook vectors leads to a “more complete” codebook and hence improve performance in the domain shift experiments? This is a hypothesis we had in mind when developing our methodology and a very valid point which is not mentioned in our paper. The ideal codebook will be the minimal number of codebook vectors to achieve the best possible segmentation performance, as this would achieve better robustness. This analysis will form the basis of future work as we aim to learn a codebook which is maximally and uniformly spread on the unit hypersphere with a minimum number of codebook vectors. Here, we also aim to provide visual explanations of individual codebook vectors to support further claims made in our paper.

References:

Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. 2021 Feb 8.

Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S, Wasserthal J, Koehler G, Norajitra T, Wirkert S, Maier-Hein KH. nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486. 2018 Sep 27.

back to top

Vector Quantisation for Robust Segmentation