Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yili Lin, Dong Nie, Yuting Liu, Ming Yang, Daoqiang Zhang, Xuyun Wen

Abstract

Domain shift is a big challenge when deploying deep learning models in real-world applications due to various data distributions. The recent advances of domain adaptation mainly come from explicitly learning domain invariant features (e.g., by adversarial learning, metric learning and self-training). While they cannot be easily extended to multi-domains due to the diverse domain knowledge. In this paper, we present a novel multi-target domain adaptation (MTDA) algorithm, i.e., prompt-DA, through implicit feature adaptation for medical image segmentation. In particular, we build a feature transfer module by simply obtaining the domain-specific prompts and utilizing them to generate the domain-aware image features via a specially designed simple feature fusion module. Moreover, the proposed prompt-DA is compatible with the previous DA methods (e.g., adversarial learning based) and the performance can be continuously improved. The proposed method is evaluated on two challenging domain-shift datasets, i.e., the Iseg2019 (domain shift in infant MRI of different ages), and the BraTS2018 dataset (domain shift between high-grade and low-grade gliomas). Experimental results indicate our proposed method achieves state-of-the-art performance in both cases, and also demonstrate the effectiveness of the proposed prompt-DA. The experiments with adversarial learning DA show our proposed prompt-DA can go well with other DA methods. Our code is available at https://github.com/MurasakiLin/prompt-DA.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_68

SharedIt: https://rdcu.be/dnwdP

Link to the code repository

https://github.com/MurasakiLin/prompt-DA

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The proposed approach aims at performing domain adaptation for medical image segmentation using “prompt learning”. Here the prompts are automatically generated and aims at encoding the domain information. Experiments are performed on two MR brain datasets where the domain gap is the age (infant brain) and the glioma grade (high grade vs low grade).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed approach aims at integrating prompt learning, a concept primarily used in NLP, for a visual domain adaptation task.
    • An ablation study is presented.
    • Experiments are conducted on two datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper is hard to follow.
    • Figures are too small and cannot be read on a printed version of the manuscript.
    • It is unclear how the prompt generator helps to perform domain adaptation. The domain-specific information seems to be injected at the bottleneck of a UNet network and can thus be bypassed by the skip connections. Results in the ablation study even suggest that the prompt-learning approach alone doesn’t allow for performing domain adaptation.
    • Improvements are marginal.
    • No statistical tests are performed.
    • Experiments are conducted on simple problems where the domain gap is relatively small.
    • Missing comparison with nnUnet that uses intensive data augmentation.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In its current form, this work is not reproducible. The authors claimed that they will release their code if accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Abstract:

    • “The recent advances of domain adaptation mainly come from explicitly learning domain invariant features (e.g., by adversarial learning, metric learning and self-training).” Image translation is currently the SOTA for hard domain adaptation problems in medical image segmentation. Aligning feature distribution has, for example, been shown to be outperformed by image translation in the crossMoDA challenge.

    Introduction:

    • The paragraph describing prompt learning is unclear. It will greatly help the reader if the notion of prompts is clearly defined.

    Methodology:

    • The figures are too small and cannot be read if the paper is printed.
    • donot –> do not
    • How does prompt generation help to perform domain adaptation?
    • It seems that the domain adaptation information is injected at the bottleneck of a UNet-style network. If this is the case, this means that it doesn’t affect the information passing through the skip connections. The authors should clarify this point.

    Experiment section:

    • No statistical tests are performed
    • No cross-validation is performed
    • Improvements are marginal.
    • Comparison with SOTA for image segmentation is missing (nnUnet)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is hard to follow. The methodology doesn’t seem to be adapted for a domain adaptation task. Observed improvements are marginal, and no statistical tests are performed.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    I acknowledge the authors’ efforts to improve their paper. Notably, they conducted statistical tests, although I would appreciate clarification on which specific tests were performed, as well as additional comparisons with the SOTA.

    However, it is crucial to note that significant revisions were required, particularly to improve the manuscript’s clarity and provide a comprehensive description of the proposed approach. Although the authors mentioned that they would provide “more details” regarding prompt learning in the final version, the current statement lacks specificity and makes it difficult to evaluate. In particular, the rebuttal does not answer one the main concern that was raised: “the methodology doesn’t seem to be adapted for a domain adaptation task.”



Review #2

  • Please describe the contribution of the paper

    The authors propose a domain adaptation method for medical image segmentation based on prompt learning. They introduce a a prompt generation module that is fused with the latent space of a U-Net by a fusion module. This approach can be combined with other DA methods such as adversarial learning. They outperform SOTA methods on segmentation tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is described in good detail. The evaluation of the method and comparison to SOTA is well done, and important ablation studies are performed. Overall, the paper is easy to follow and well written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -For me, it is not quite clear why we can treat Low-grade and High-grade gliomas as two different domains. Is there any other paper that uses the same approach, or any medical justification for that? -What are the advantages of the proposed method over “ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation” by Hu et al. (2022). It is not clear how these methods compare and what the differences are.

    • In Tables 1-4, it is not clear what scores are reported. Please indicate that in the caption. What measure is taken, and is it the mean over the test set? Was a cross-validation performed? -In Tables 1 and 2, it is unclear what backbone was used, 3D UNet or TransUnet. -Since TransUnet is a 2-D approach, how are the segmentation scores computed? Slice-wise or aggregated over the 3D volume? -In Figure 1, it would be good to see where the fusion function psi, as well as W_s and W_c of Equation 5 are.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors promise to share the code as well as the dataset upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    -Please address the points listed under “weaknesses”.

    • There are a few typos (spaces missing between some words). Please correct them.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper tackles a problem that is of importance for many medical applications. The method is well presented and well evaluated, and can easily be integrated in existing segmentation networks. Comparison to SOTA is sufficient, and a nice ablation study is performed.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    The authors proposes a solution to tackle the issue of domain shifts between training and testing distributions (leading to a decrease in the performance of machine learning models). The proposed solution is a domain-aware learning framework that utilizes prompt learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative use of prompt learning. The method of prompt learning can be added on to any domain adaptation techniques, this renders the method widely applicable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is hard to understand how prompt learning module really works. Clarification questions to authors: Can these category specific prompts be visualized (semantics and spatial information)? How do you evaluate whether these prompts are domain specific?
    • Limited related work section, I believe this is a rich field. If not able to include in the baseline comparisons please cite and explain the differences between your paper and proSFDA, https://www.sciencedirect.com/science/article/pii/S1361841523000506, as well as other more recent DA papers.
    • On the brain segmentation task, improvement is very small.
    • Reported metric is limited to IoU. Please report DICE to improve comparability to other papers (i.e. https://arxiv.org/pdf/2012.12570.pdf, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246057).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors claimed that they will release the source code upon publication. There is no clear declaration of what software frameworks and versions used. Descriptions are lacking necessary details for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Major:

    • How does this method compare to ProSFDA (https://arxiv.org/abs/2211.11514)? How about other newer, SOTA baselines i.e. SynthSeg? Explain, report these as the SOTA baseline comparisons.
    • Please report DICE to improve comparability to other published work (i.e. https://arxiv.org/pdf/2012.12570.pdf, https://doi.org/10.1016/j.media.2023.102789 , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246057).

    Minor:

    • Provide some visualizations of the learned prompts if possible.
    • Incomplete sentence “While they cannot be easily …”
    • Abbreviations (HGG and LGG) used without assignment, section 3.1, Table 2.
    • Incomplete citation (ref: #7)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method if not novel fairly interesting and would be an interest to MICCAI community. If authors provide DICE scores (regardless of the reported values outperforming or not the SOTA approaches), add a newer approach to their baseline comparisons, my reject decision will change to accept.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I am satisfied with the proposed edits and authors’ answers.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a domain adaptation method for medical image segmentation based on prompt learning. Reviewer 1 recommends strong rejection due to lack of clarity, marginal improvements, and lack of statistical tests. Reviewer 2 recommends acceptance but has concerns about the justification for treating low-grade and high-grade gliomas as different domains and the lack of clarity in some tables and figures. Reviewer 3 finds the method innovative and widely applicable but raises concerns about the limited explanation of how the prompt learning module works, small improvement on brain segmentation, and lack of DICE scores for comparison. Therefore, based on the varying strengths and weaknesses identified by the reviewers, I recommend inviting the authors to submit a rebuttal to address the concerns raised.




Author Feedback

We thank the reviewers and meta-reviewers for their valuable feedback. We appreciate they agree on the novelty and interest of the model(R2, R5), well written and easy to follow(R2, R5) and wide applicability by combining with other DA methods (R2, R5), adequate experiments(R1, R2). Detailed responses are given below. We will release our code at the earliest time. For reviewing purposes, we share the code at: https://github.com/AnonymousMICCAI2023/promptDA.

Q1 [R1,R5] Clarity about Prompt Learning: More details to understand the prompt learning will be provided in the final version. Specifically, we will show typical visualized domain-specific prompts. We’ll also figure out whether the learned prompts are domain specific: a). check the accuracy of the domain classification task; b) cluster the prompts to see if they are well separated.

Q2 [R1] Domain Gap and Performance Gain: The domain gap of the infant brain MRI at different ages is large. While the gliomas datasets have a relatively mild domain gap. The performance gain is big rather than marginal. We can achieve 5.46% and 4.75% improvement on 12-month-old (12m) and 3-month-old (3m) infant brain datasets. Note, the improvement is statistical significant (p < 0.05). Also, our method outperforms the comparing SOTA DA methods on the two tasks.

Q3 [R1,R2,R5] Literature Overview for DA: We will add related work descriptions in the final version.

Q4 [R2,R5] Compare to ProSFDA and Other SOTAs: ProSFDA and our work are quite different. ProSFDA generates prompt in the image space and use the prompt in a two-stage manner to fix the gap but ours is generated in the latent space via an explicit task and fused to fix the gap in only one stage. Moreover, ours is fully compatible with other DA methods while the ProSFDA is not. Experiments are done on the multi-center OC/OD seg dataset (RIGA+) used in ProSFDA. For fairness, the experiment settings all follow proSFDA and the results are: ProSFDA:[(95.29,85.61),(94.71,85.33),(95.47,85.53)]; ours:[(96.23,87.69),(95.85,86.66),(96.16,86.62)] .

We also compare our method with other latest SOTAs (on the infant brain tasks). 1.ADR [2]: [12m: (71.81, 77.02, 76.65), 3m: (70.16, 72.04, 62.98)]. The ADR underperforms us by 2.77% and 1.04%, respectively. 2.SynthSeg [3]: [12m: (76.14, 79,91, 68,43), 3m: (72.41, 76.59, 60.21)]. We found SynthSeg really a powerful domain generalization model, it outperforms our method in certain categories (i.e., White Matter) but underperforms us on the others. 3.nnUNet: [12m: (65.66, 73.23, 66.74), 3m: (55.54, 63.67, 67.19)]. Simply using nnUnet does not work well for this DA task. [2] Attention-enhanced disentangled representation learning for UDA in cardiac seg, MICCAI, 2022. [3] SynthSeg: Seg of brain MRI scans of any contrast and resolution without retraining, MIA, 2023.

Q5 [R2] Issue about Using Gliomas Dataset: The Gliomas dataset is a good case for testifying the intra-modality DA task since the LGG and HGG have different size and distributions for tumor regions. Some DA papers, for example, [1], also adopted this dataset for evaluation. [1] Intramodality DA using self ensembling and adversarial training, MICCAI, 2019.

Q6 [R5] Lack of DICE metric: We actually used Dice score as the evaluation metric and the numbers in the Tables are all computed with it. However, we had mistaken writing Dice as mIoU in the paper.

Q7 [R2] Details about Experiments and Results: We will mention the metric in the caption in Tables. The reported performance is averaged over the test sets in a CV manner. In Table 1 and 2, the backbone was 3D-UNet. As for TransUNet, we inference in a slice-wise manner and aggregate the slices into one volume. The Dice scores are calculated based on the volumes. We will follow your suggestion to add the symbols in Fig. 1. All these details will be provided in the final version.

Q8 [R1,R5] Writing issues: We will thoroughly check the writing issues and correct them in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work aims to address the challenge of domain shift in deep learning models for real-world applications, particularly in medical image segmentation. The proposed multi-target domain adaptation algorithm, called prompt-DA, aims to implicitly adapt features by leveraging domain-specific prompts and a feature fusion module. Despite the authors’ efforts in addressing reviewer concerns, such as improving clarity on prompt learning, providing additional comparisons against SOTA methods, and addressing the lack of experiment details, there still remain some gaps, such as adding a comprehensive description of the proposed approach and the validity of the proposed method for a domain adaptation task, that need to be filled in order to achieve acceptance for MICCAI this time. Therefore, based on the aforementioned reasons, this paper is recommended for rejection.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper does has several merits of the novelty, and evaluation. In overal, it can bring some insights for MICCAI community. Though there can be significant revision accroding to the rebuttal feedbacks. The authors are encouraged to carefully revise the manuscript if can be accepted.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript introduces a prompt learning-based domain adaptation method for medical image segmentation, which uses a prompt generator to produce domain-specific information and then adopts a fusion module to learn domain-aware feature representations. The experiments (especially the ablation study) demonstrate the effectiveness of the prompt learning-based domain adaptation. The rebuttal has addressed most of the reviewers’ concerns, such as marginal improvements, lack of statistical tests and comparison with recent state-of-the-art methods. The rebuttal does not well address the clarity issue, and some technical details are missing. The authors are encouraged to improve the presentation in the revised version.



back to top