Authors

Hritam Basak, Zhaozheng Yin

Abstract

Although unsupervised domain adaptation (UDA) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. Specifically, we propose a two-stage training process. First, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint. The proposed CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. This pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task, (i.e. pixel-level segmentation) using a semi-supervised setting. Furthermore, we experimentally validate that our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy. Upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the SoTA methods, both in SSDA and UDA settings. Codes will be released.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_25

SharedIt: https://rdcu.be/dnwC9

Link to the code repository

https://github.com/hritam-98/GFDA-disentangled

Link to the dataset(s)

https://datasets.simula.no/kvasir-seg/

https://polyp.grand-challenge.org/CVCClinicDB/

Reviews

Review #2

Please describe the contribution of the paper

The paper proposes a novel semi-supervised domain adaptation method for medical image segmentation, which utilizes disentangled contrastive learning and consistency regularization. The effectiveness of the method is demonstrated through experiments on two segmentation tasks with publicly available data. The proposed method achieves superior performance compared to state-of-the-art methods in both semi-supervised and unsupervised settings.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The use of Gaussian Fourier Domain Adaptation (GFDA) for style transfer is an innovative contribution. This technique provides smoother frequency transitions and high-quality synthetic image pairs for contrastive learning.
2. The ablation study is well-motivated and informative. This provides good insights into the impact of individual components of the model. It supports the argument made earlier in the paper about introducing both SCL and CCL for disentangled contrastive learning.
3. The comparison experiments involve a good number of methods from the literature, enhancing the credibility of the proposed method over state-of-the-art techniques for both semi-supervised and unsupervised settings.
4. The use of contrastive learning is an interesting aspect of this work. It has shown effectiveness in various domains, and the proposed disentangled contrastive learning provides a new aspect for future research in medical image analysis.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The paper’s clarity could be improved. Specifically, Section 2.3 needs more explanation to be accessible to a general audience. The authors could provide more insights and diagrams if necessary to introduce the consistency constraint more effectively.
2. While GFDA is an interesting approach, there is a concern that disentangled features may not purely capture content and style information. Given the current way that positive and negative pairs are generated, there is a fear that style features (domain-specific) may only capture differences in Fourier frequencies, rather than actual styles (which, in medical imaging, are usually related to scanning parameters).
3. The authors did not show any medical images in this paper. As a MICCAI paper, showcasing these images is essential to demonstrate the clinical impact and provide insights to the audience, who are actively working in medical domains.
4. The paper does not report standard deviations in its results. While the mean values are appealing compared with other methods, without showing standard deviations and conducting appropriate statistical tests, the results are less convincing.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good reproducibility. Public dataset for development and evaluation. Code is available upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. While the ablation study suggests the superiority and validity of disentangling content and style in UDA/SSDA, it would be valuable for the authors to quantitatively evaluate [1,2] the impact of disentanglement on domain adaptation performance. Such an analysis would strengthen the paper’s contribution and provide further insights for future research. Additionally, alternative disentanglement methods [1,3] could be explored to complement the proposed approach.
2. It is highly recommended that the authors include representative medical images/segmentation results in their paper. Doing so would allow for more constructive discussions about the strengths and limitations of the proposed method and highlight potential directions for future research.
3. To increase the credibility of the superior performance claimed by the paper, it is highly recommended that the authors conduct statistical comparisons between different methods.
4. The authors should further explore the impact of the parameters in GFDA, such as $\sigma$. This parameter controls the cut-off frequencies of “style” and “content” in medical images and may vary significantly for different datasets, organs, and imaging modalities. For example, 2D-acquired MRI may have high-resolution frequencies in-plane and low frequency through-plane. Exploring how to adjust GFDA accordingly would also improve the applicability of the proposed method in clinical settings.
5. The authors stated in the introduction that existing domain adaptation methods enforce the model to learn low-level nuisance variability that may not be relevant to the task, without elaborating on the definition of nuisance variables in the context of this paper. The authors are encouraged to provide more detailed explanations on this claim to strengthen the argument. Possible references [4,5]. They should provide examples of what they consider as nuisance variability in medical image segmentation and explain how learning them may negatively impact the model’s ability to generalize to different domains. Additionally, they should also discuss how the proposed disentangled contrastive learning method helps alleviate the issue of nuisance variability and provide evidence of its effectiveness in addressing this issue.
[1] Zuo et al. “Disentangling a Single MR Modality.” DALI 2022. [2] Carbonneau et al. “Measuring disentanglement: A review of metrics.” IEEE TNLS 2022. [3] Ouyang et al. “Representation disentanglement for multi-modal brain MRI analysis.” IPMI 2021. [4] Johansson et al. “Generalization bounds and representation learning for estimation of potential outcomes and causal effects.” Journal of Machine Learning Research. 2022 [5] Lokhande et al. “Equivariance allows handling multiple nuisance variables when analyzing pooled neuroimaging datasets.” CVPR 2022.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel semi-supervised domain adaptation method for medical image segmentation, which combines disentangled contrastive learning and a consistency regularization. The proposed method outperforms state-of-the-art methods both in semi-supervised and unsupervised settings on two publicly available datasets. The use of GFDA for style transfer between domains and the intriguing contrastive learning aspect are unique and interesting contributions. While the clarity of the paper could be further improved, the ablation study and comparison experiments are well-motivated and informative. Overall, the paper provides valuable insights into the domain adaptation field for medical image segmentation and has the potential to present at MICCAI.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper proposes a semi-supervised model for addressing the domain adaptation problem. In particular, the authors propose to first pre-train the feature encoder by doing style contrastive learning and content contrastive learning. To disentangle the features, the authors propose to use a Gaussian mask in the Fourier space for transforming the source and target domain images. In addition, a feature propagation module is used to enforce the encoder to contain the structural information. Finally, the authors use a teacher-student model to train the encoder and decoder for segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The approaches are intuitive and well-motivated.
2. The experiments are very extensive. The authors compare to several strong baselines and conduct experiments on two public datasets.
3. The ablation results show the effectiveness of each component.
4. The results are strong.
5. The visuals help to understand the paper.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. I think the authors need to clarify very clearly what are the novel ideas they propose and what are the techniques they use directly from the literature. Please cite where the method comes from in each section in the method part. If I understand correctly, the Gaussian masking might be novel, the content/style contrastive learning (i.e constructing the positive/negative pairs) may be very similar to prior arts. I found [1] doing a similar Fourier-based style augmentation. I suggest the authors clarify. Similar questions hold for the dense feature propagation.
2. The Gaussian masking is interesting. However, I think more discussions need to be included on the hyperparameter setting i.e. finding the optimal mean and variance for the Gaussian distribution.
3. Using the Teacher-Student framework does not really help much according to Table 3. However, training the two branches needs much more training power. This brings the question on the table, why it is worth doing so? Isn’t it better to train the decoder jointly with the contrastive learning part? So you can have a end-to-end model to train rather than two-stage training. I think the authors need to expand with either performing corresponding experiments or more in-depth discussion.
4. Artifacts in the transformed images are obvious (see Fig.1 in appendix). I think this will be problematic. For example, if we can consider augmenting the source images with the target domain styles, these artifacts are not the target styles at all.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the authors mention that the code will be released, they did not specify in the reproducibility checklist. I suggest the authors to release code for public benefits if accepted.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please see the major weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, considering the extensive results and interesting ideas such as the Gaussian masking and the Fourier-based augmentation, I recommend acceptance. However, I strongly suggest the authors address the questions raised.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper studies Semi-supervised Domain Adaptive Medical Image Segmentation. A a two-stage training process is proposed. Firstly, a Fourier DA based self-learning paradigm using a domain-content disentangled contrastive learning method is applied. A pixel-level feature consistency constraint is also used. Secondly, a teacher-student style semi-supervised finetuning is adopted. Experiments on Endoscopy and Brain CT demonstrated the effectiveness.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper studies the SSDA problem, which is a practical setting in medical image segmentation.
2. The proposed scheme is clear, which first bridges the domain gap via contrastive learning, then use semi-supervised method for further finetuning. Results also validate the effectiveness of the method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The motivation of Gaussian FDA is not clear, and is not ablated. Why does the proposed method do not use the original FDA which replaces the low frequency part directly?
- Some other papers that closely related to the proposed method and also use FDA are not discussed, such as [1-3] [1] Yao, Huifeng, Xiaowei Hu, and Xiaomeng Li. “Enhancing pseudo label quality for semi-supervised domain-generalized medical image segmentation.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 3. 2022. [2] Liu, Xinyu, et al. “Consolidated domain adaptive detection and localization framework for cross-device colonoscopic images.” Medical image analysis 71 (2021): 102052. [3] Liu, Quande, et al. “Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- What are the major differences between the proposed method and FSM [24] in stage 1?
- The colored table numbers are not aligned vertically. Please adjust it.
- The figure resolution is extremely low. Please change to higher resolution images.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducible according to paper description.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

See Weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method is clear and effective. However, there lack discussions on related work and comparison of the methodology.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper targets the semi-supervised domain adaptation, which is a emerging topic and has great potential in medical image analysis. The use of Gaussian Fourier SSDA brings some new idea for this task. All reviewers acknowledged the method and evaluations.

However, as pointed by all reviewers, the clarity should be improved, e.g., motivation of Gaussian part should be clearly discussed. It is ok to build your work on some exisisting methods. In addition, the authors are suggested to use pdf/high-res image for Figures.

Author Feedback

We thank the Reviewers (R) for their insightful comments (C)! We are encouraged that they found our work to be intuitive and well-motivated (R1, R2), validated extensively (R1, R2), innovative (R1, R2, R3), and practically relevant (R2, R3). Below, we address reviewers’ questions (Q). Q1. Motivation for GFDA and style-content disentanglement is unclear (R1, R3) Ans: FDA helps in bridging the large domain gap between source and target domains, which if untreated, hurts the overall segmentation performance. The low frequency component in Fourier space can be attributed towards the differential appearance (color, illumination) between source and target images. Hence, altering this component in frequency space aids the source image style to be transformed to target style, and vice-versa. However, the swapping window being rectangular, causes an abrupt change in frequency in output images, which after performing IFFT, causes incoherent patches in transformed images (Fig1. Supplementary). Therefore, we propose a Gaussian window instead, for a smoother frequency transition. The mean and SD of this Gaussian kernel is set by cross validation. We aim to learn content specific features, invariant to image styles, which motivates us to the style-content disentanglement. Specifically, the style CL is designed so that the model can learn and differentiate between the source and target styles and extract discriminative feature representations. Content CL, on the other hand, aids the model to extract content-specific information irrespective of image styles. Together, they complement the model to understand the difference between source and target representations and identify important artefacts from images. Q2. Major difference from FSM in stage 1 (R3) Ans: Although FSM focuses on source-to-target style transfer, the approach differs significantly from ours. FSM adopts a diffusion-like image synthesis algorithm from a random noise which is iteratively transformed to target-style image through neural style transfer. This requires two-stage training: coarse generation of BN constraints, followed by fine image generation. On the flip side, our proposed method utilizes Gaussian Fourier Domain Adaptation, which requires no training for image style transfer. This is simpler, faster, and requires only one non-trainable parameter: SD of Gaussian mask. We employ a contrastive training for disentangling style and content features by designing two different loss functions. FSM, on the other hand, use a single convolutional encoder where they use the shallow and deep features as style and content losses, respectively. Although being simpler and straightforward, our proposed method outperforms FSM significantly in SSDA-based polyp segmentation task (Table 1).
Q3. Motivation of student-teacher network and why not CL only? (R1) Ans: We followed the standard and most popular SemiSL framework, which is based on student-teacher framework. This also helps us perform a fair comparison with other SemiSL works in literature which follow similar pipelines. CL is a self-supervised framework and cannot be directly used for semi-supervised tasks. As we focus on learning from limited annotations through SemiSL, we adopt this paradigm. Q4. Style and content CL may not fully capture disentangled features (R2) Ans: We completely agree. However, we propose disentangling style and content information and employing the two contrastive losses to enforce global representation learning in Stage 1. The learned encoder thereafter extracts discriminative style and content specific information, which is then utilized in Stage 2, i.e., downstream task of pixel-level segmentation. Although we cannot fully distinguish differential features originating from scanning parameters or imaging physics in this work, we propose to identify the visual disparities between different imaging modalities (T1, T2, T1CE, FLAIR, etc.) using GFDA and learn them through the proposed CL framework.

back to top

Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning