Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Xiaoyi Sun, Zhizhe Liu, Shuai Zheng, Chen Lin, Zhenfeng Zhu, Yao Zhao

Abstract

To overcome the barriers of multimodality and scarcity of annotations in medical image segmentation, many unsupervised domain adaptation (UDA) methods have been proposed, especially in cardiac segmentation. However, these methods may not completely avoid the interference of domain-specific information. To tackle this problem, we propose a novel Attention-enhanced Disentangled Representation (ADR) learning model for UDA in cardiac segmentation. To sufficiently remove domain shift and mine more precise domain-invariant features, we first put forward a strategy from image-level coarse alignment to fine removal of remaining domain shift. Unlike previous dual path disentanglement methods, we present channel-wise disentangled representation learning to promote mutual guidance between domain-invariant and domain-specific features. Meanwhile, Hilbert-Schmidt independence criterion (HSIC) is adopted to establish the independence between the disentangled features. Furthermore, we propose an attention bias for adversarial learning in the output space to enhance the learning of task-relevant domain-invariant features. To obtain more accurate predictions during inference, an information fusion calibration (IFC) is also proposed. Extensive experiments on the MMWHS 2017 dataset demonstrate the superiority of our method. Code is available at https://github.com/Sunxy11/ADR.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_71

SharedIt: https://rdcu.be/cVRXJ

Link to the code repository

https://github.com/Sunxy11/ADR

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper
1. This paper addressed the problem of domain-specific information inherent in the domain-invariant features, especially under large domain shifts.
2. Hilbert-Schmidt independence criterion (HSIC) is used to restrict the independence and complementarity
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novel channel-wise disentangled representation learning was presented as opposed to dual-path disentanglement.
2. An attention bias for adversarial learning was proposed to emphasize task-relevant domain invariant features
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• The attention bias module and the use of Hilbert-Schmidt independence criterion (HSIC) are not new (see ref [14] for the Hilbert-Schmidt independence criterion).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

• The authors shared their code, which thus is highly reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

This paper proposed a new and innovative UDA for segmentation by means of an attention-enhanced disentangled framework. This paper presented a few key innovations for the considered application, though not entirely novel, including (1) the embedding space is disentangled in a channel-wise manner into domain-invariant and domain-specific subspaces; (2) attention bias is proposed to boost capturing task-related domain invariant features. The experiments are comprehensive and the results are encouraging.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel channel-wise disentangled representation learning alongside extensive experimental results.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

The paper proposes a UDA framework for cardic segmentation, working with: i) Alignment of Imaging Characteristics. ii) Channel-wise Disentanglement. iii) Attention Bias for Adversarial Learning. The proposed method archives better performance when adapting between MRI and CT on the cardic segmentation task.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1.The paper is well-motivated, well-written, and easy to understand. 2.The topic of unsupervised domain adaptation and application to cross-modality is highly important in clinical practice. 3.The proposed method is a combination of disentangled representation learning and attention mechanism. Both are hot topics with a lot of literature and both are interpretable. Such a combination has limited novelty, but the paper demonstrates its effectiveness in unsupervised segmentation of medical images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1.The experimental results are not solid enough. Since the main innovation of this paper is the framework rather than the pre-training manner, all UDA methods involved in the comparison should be treated under the same configuration. According to the description in the #section2.5, the pre-training parameters, learning rate are not same. (The proposed ADRs are fine-tuned on the basis of SIFA, while others are initialized from scratch.) 2.The compared UDA methods are not SOTA now. It is recommended to add more comparison with SOTA methods.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors included model details in the section 2 and released the main code on the GitHub.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Although #Alignment of Imaging Characteristics# is not your main contribution, it is recommended to add references on cross-domain alignment. (section 2.3)
2. Some figures are too small to catch all details, and improvement or re-design is recommended. (Fig. 2-5)
3. As the main contribution, how the HSIC is applied to disentangle representations extracted by the encoder is not clearly explained. It is recommended to describe more details than Eq.(2).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The topic of unsupervised domain adaptation and application to cross-modality segmentation is interesting. The weakness of this paper is that the experiment results are not solid.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The author’s rebuttal addressed some problems, but the fairness statement about the experimental comparison didn’t convince me, so I didn’t give a higher score.

Review #3

Please describe the contribution of the paper

This work presents an Attention-enhanced Disentangled Representation (ADR) learning framework for cross-domain cardiac segmentation, where Hilbert-Schmidt independence criterion (HSIC) is adopted for feature disentanglement and the attention bias module is used for alignment of task-relevant regions. The proposed method is demonstrated its superior performance on the MMWHS 2017 dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The problem of cross-modality segmentation is important and interesting. The proposed HSIC and attention bias modules are proven useful.
2. Fig.1 is clear and helps to understand.
3. The paper is well written and nicely organized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In section 2.2, the authors only used the GAN to achieve the image-to-image translation, which theoretically can not preserve the anatomical structure in the translation process. That is the reason why the cycle consistency loss is proposed in CyleGAN [1]. In this case, the subsequent cross-modality segmentation for anatomical structure does not seem to make sense.
2. Before the feature disentanglement, the authors firstly perform image translation, which really confuses me. The disentangled representation learning is based on the assumption that the image from different domains share the same domain-invariant features and have their own domain-specific features [2,3]. In this work, the domain-invariant feature and domain-specific features can be considered as the shared anatomical content and specific image style (CT or MR). Taking MR as the source domain, the style of translated image x^{s->t} should look like CT image, as shown in Fig.1. In this case, for two CT images (real CT and pseudo CT), how do disentangle their content and style as they share the same content and style? Therefore, it seems contradictory to perform image translation before feature disentanglement.
3. In section 2.5, the generator and discriminator used in image alignment are fine-tuned with a learning rate of 1e-10. My question is, with such a small learning rate, whether the optimizations of these two networks can be negligible? Whether the translated images are visually better or worse? The author should plot the loss function curves of these two modules and visualize the translated images.
[1] Zhu, Jun-Yan, et al. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” Proceedings of the IEEE international conference on computer vision. 2017. [2] Lee, Hsin-Ying, et al. “Drit++: Diverse image-to-image translation via disentangled representations.” International Journal of Computer Vision 128.10 (2020): 2402-2417. [3] Huang, Xun, et al. “Multimodal unsupervised image-to-image translation.” Proceedings of the European conference on computer vision (ECCV). 2018.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I’m not sure if it can be reproduced as some modules seem to be unreasonable in my opinion. But the authors provides the code, which is a plus.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. The authors should discuss the issues I mentioned.
2. I encourage the authors to visualize the domain-invariant features and domain-specific features and discuss the level of disentanglement between domain-invariant and domain-specific features.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Some modules seem to be unreasonable as I mentioned before.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

3
[Post rebuttal] Please justify your decision

Regarding the Q2, the authors claimed that their model is pretrained with cycle-consistency loss in their rebuttal. However, in their submitted manuscript, the key cycle-consistency loss was not mentioned, which will mislead the readers that a simple GAN can be used to perform image2image translation, as described in section 2.2. Besides, similar to the feadback for Q2, the authors’ answers for Q6 also seems to contradict the manuscript. Specifically, they claimed that the domain-specific features denote the image style, such as CT or MRI, in section 2.3 while in their rebuttal, the domain-specific features denote operator, noise or even artifact. Actually, from their submitted manuscript, the image translation and feature disentanglement is indeed contradictory, as I discussed early. Therefore, I still stand by my decision.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper received mixed comments. The reviewers and myself acknowledge the practical significance of studied UDA problem and the whole framework (channel-wise disentangle) is interesting and new. However, R2 raise concerns about the fairness of comparison and comparison with SOTA, and R3 raise concerns about the rationale of the proposed framework. Thus, the authors are invited for a rebuttal to address the reviewer’s concern. Especifically, the authors should pay more attention to following points. Note that the authors are not allowed to add next experimental results.
1. The rationale of using GAN for image2image translation and the fine-tuning of image alignment.
2. The fairness of experimental comparison.
3. The compared UDA are not SOTA.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

Thanks to AC and reviewers for their time and insightful comments. They found our framework is interesting(MR), innovative(MR&R2) and effective(R1&R2&R3), but also pointed out some issues. We will clarify the main points:

Q1(R1&R2):Novelty The use of existing HSIC and attention does not diminish the novelty of this paper. As R2 stated, our main innovation is the novel framework for UDA. 1) Unlike dual-path disentanglement [18,22] , we propose a novel channel-wise disentanglement without relying on more costly image reconstruction to promote interleaving and mutual guidance between domain-invariant and domain-specific features. Here, HSIC is used to ensure the independence between the disentangled features. 2) Feature attention is applied to adversarial learning of the output space to enhance the learning of task-related features.

Q2(MR&R3):About img2img translation To perform image level alignment, the parameters of GAN based img2img translation have been pre-trained with cycle consistency loss. Therefore, fine-tuning with a small learning rate can not only ensure the preservation of anatomical structure, but also reduce the model complexity and make it easier to train. We observe that the loss of generator in the experimental record is decreasing. Meanwhile, we visualize in Fig.1 an example of image and grayscale distribution before and after alignment.

Q3(MR&R2):Fairness Due to fine-tuning, the learning rates of G_t and D_t are set not too large as general (see Q2 for details) , thus to avoid the instability in network updating. The ablation experiments in Fig.4 show that the performance of Base+Gen ,i.e., fine-tuning on the basis of SIFA, on the CT->MR task is slightly lower than that of SIFA, but the final dice of our model is nearly 3% higher than that of SIFA. In this sense, we think this is a fair comparison.

Q4(MR&R2):About whether the compared UDA are SOTA We compared some highly cited methods [8,10,19,23], which are popularly used as baselines in the field of UDA in medical, as well as the main leading papers [2,3] and the recent SOTA method [4]. Experimental results in two directions demonstrate the effectiveness of our method. If the chair agrees, we can add comparisons with more SOTA methods.

Q5(R2):How the HSIC is applied to disentanglement As illustrated in Fig.1, the disentanglement is performed in a manner of channel wise, which is completely different from traditional dual path disentanglement. Here, HSIC is applied as a independence loss L_HSCI to ensure the independence between the disentangled subspaces, which is more in line with the goal of disentanglement.

Q6(R3):Image translation and disentanglement are contradictory It may be that our motives are not clearly explained. In fact, these two parts are not contradictory, but rather complementary. As we know, it is very difficult to achieve desired effect by directly performing feature-level alignment under large domain shift. Unlike those previous methods, we propose a coarse-to-fine progressive alignment method. Here, image translation is applied to address the issue of uneven distribution at the image level (e.g., brightness) for coarse alignment. On this basis, disentanglement is used to further remove the remaining domain shift (such as operator, noise or even artifact) via fine alignment at the feature level, whereas such fine alignment is implemented in the task-relevant domain-invariant subspace.

Q7(R3):Visualization and degree of separation of features Thanks for your valuable suggestion. We reviewed the experimental records and found that the HSIC loss converged rapidly, reaching around 1e-4. We also visualized the t-SNE of disentangled features on the test dataset, visually they are well separated. In addition, we measured the degree of separation by DBI (average intra-class distance divided by inter-class center distance, the smaller the better), MR->CT: 0.29, CT->MR:0.34. We will discuss it further in the revised version.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper has a new or novel design in the method part. However, as pointed out by R3, the motivation of the method design are somehow unclear. The rebuttal has partially addressed the motivation of this part, while the “domain-specific” and “domain-shared” features are still unclear.

This is a borderline paper, I vote to accept this paper if we still have place considering the new method design. But the authors need to further clarify the motivation of their method design in the final version.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

6

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presented an Attention-enhanced Disentangled Representation (ADR) learning framework for cross-domain cardiac segmentation, where novel channel-wise disentangled representation learning was presented as opposed to dual-path disentanglement and an attention bias for adversarial learning was proposed to emphasize task-relevant domain invariant output. Specifically, Hilbert-Schmidt independence criterion (HSIC) is adopted for channel-wise feature disentanglement. The reviews are mixed; therefore, I read the paper myself. In my opinion, though some techniques have been proposed in the literature, their combination and application in unsupervised domain adaptation have some novelty. Overall, its merits out-weigh limitations; therefore, I recommend accepting this work.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

7

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper has received mixed comments from reviewers and meta-reviewers. Especially R3 seems to have major issues with the explanation of the methods and the rationale behind. I also think that one the key points of the paper “Q6(R3):Image translation and disentanglement are contradictory” is not well answered by the authors in the rebuttal. After reading the explanation, it’s not clear how the method should serve the purpose of style and content disentanglement. My decision is to reject the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NR

back to top

Attention-enhanced Disentangled Representation Learning for Unsupervised Domain Adaptation in Cardiac Segmentation