Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qinji Yu, Nan Xi, Junsong Yuan, Ziyu Zhou, Kang Dang, Xiaowei Ding

Abstract

Unsupervised domain adaptation (UDA) has increasingly gained interest for its capacity to transfer the knowledge learned from a labeled source domain to an unlabeled target domain. However, typical UDA methods require concurrent access to both the source and target domain data, which largely limits its application in medical scenarios where source data is often unavailable due to privacy concern. To tackle the source data-absent problem, we present a novel two-stage source-free domain adaptation (SFDA) framework for medical image segmentation, where only a well-trained source segmentation model and unlabeled target data are available during domain adaptation. Specifically, in the prototype-anchored feature alignment stage, we first utilize the weights of the pre-trained pixel-wise classifier as source prototypes, which preserve the information of source features. Then, we introduce the bi-directional transport to align the target features with class prototypes by minimizing its expected cost. On top of that, a contrastive learning stage is further devised to utilize those pixels with unreliable predictions for a more compact target feature distribution. Extensive experiments on a cross-modality medical segmentation task demonstrate the superiority of our method in large domain discrepancy settings compared with the state-of-the-art SFDA approaches and even some UDA methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_1

SharedIt: https://rdcu.be/dnwLb

Link to the code repository

https://github.com/CSCYQJ/MICCAI23-ProtoContra-SFDA

Link to the dataset(s)

https://chaos.grand-challenge.org/

https://www.synapse.org/#!Synapse:syn3193805/wiki/217789


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript describes a novel source-free unsupervised domain adaptation (DA) framework for cross-modality medical image segmentation. This framework is composed of 1. feature alignment between the source classifier weights (serve as prototypes) and the target features; 2. contrastive learning where negative samples are selected by the predictive probability and the rank of categorical prediction. The proposed framework is evaluated on cross-domain segmentation datasets between abdominal CT and MRI.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is technically sound. Interestingly, the prototype-anchored feature alignment takes the marginal distributions of labels into consideration. The uncertainty-based criterion that considers the rank of prediction for selecting negative examples, is also insightful.

    The experiments are comprehensive. Ablation studies have been conducted for most of key components for demonstrating the effectiveness of each component.

    The paper is well-written. The illustrations are neat and self-explanative.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some key design choices are expected to be elaborated: e.g. in Eq. (1), both cosine distances and unnormalized inner products between the prototypes and the features are employed (looks like using cosine distances to adjust softmax functions). Could the authors explain more about the design choices behind the scene?

    Convergence: The authors are encouraged to discuss if the proposed framework naturally converges, or if there are additional interventions such as early-stopping needed.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Judging from the checklist, there seems to be no major issues with reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    For a future extension, the authors are encouraged to supplement the following information:

    1. Discussing pros and cons in the clinical context for the related topics: unsupervised domain adaptation, source-free unsupervised domain adaptation, test-time domain adaptation, and domain generalization.

    2. Elaborating the mechanism behind key design choices in more detail: For example, the form of Eq. (1) - (2).

    3. Evaluating the proposed framework on more datasets and/or segmentation network backbones.

    4. Ablating the proposed uncertainty-based negative sample selecting rule. It appears to be very insightful but does it really bring significant benefit compared with vanilla pixel-level/feature-level contrastive learnings?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is technically sound and it demonstrates improved segmentation accuracy compared with existing works.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Handling distributional shifts (domain adaptation/source-free adaptation & domain generalization) in medical image computing remains a critical but unsolved problem. Despite the marginal improvement in methodology, the manuscript exhibits more merits over drawbacks. Some of my concerns are properly addressed in the rebuttal. I would therefore maintain my rating.



Review #2

  • Please describe the contribution of the paper

     The paper proposes a two-stage source-free domain adaptation (SFDA) framework, including prototype-anchored feature alignment stage and contrastive learning stage, for medical image segmentation.  In the prototype-anchored feature alignment stage, the weights of the pre-trained pixel-wise classifier are utilized as source prototypes to preserve the information of source features. The bi-directional transport is introduced to align the target features with class prototypes by minimizing its expected cost.  In the contrastive learning stage, pixels with unreliable predictions are used to create a compact target feature distribution.  Extensive experiments on a cross-modality medical segmentation task validate the proposed SFDA framework in large domain discrepancy settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

     The paper proposes a two-stage SFDA framework for medical image segmentation, including prototype-anchored feature alignment stage and contrastive learning stage.  The paper is easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

     The motivation (i.e., unreliable predictions usually get confused among only a few classes instead of all classes) and methodology of the proposed contrastive learning stage are similar to those in [1]. Although the SFDA task addressed in this paper is different from the SSL task in [1], unreliable predictions for unlabeled (target) data are the core problems to be solved in two tasks. Therefore, the novelty of the second contribution is limited.  No ablation analysis on L_{T2P} and L_{P2T}.  No sensitivity analysis on the hyperparameter r_{l}.  The proposed method aims to solve the unreliable predictions problem, which may be caused by class imbalance problem. However, according to Fig. 1, the class ‘liver’ belongs to minority class, but the performance is unsatisfactory, which is even inferior to baseline models.

    [*1] Wang Y, Wang H, Shen Y, et al. Semi-supervised semantic segmentation using unreliable pseudo-labels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4248-4257.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code will be public upon the acceptance as claimed by the authors.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

     It is suggested to provide ablation analysis on L_{T2P} and L_{P2T}.  Sensitivity analysis on the hyperparameter r_{l} should be provided.  Please state the reason why the performance of minority class ‘liver’ is unsatisfactory.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the comments in section 6&9.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    1, The experiments do not show how the proposed method can solve the unreliable predictions problem or class imblance problem, which are the motivation of proposed method. 2, As I mentioned before, the motivation and methodology of the proposed contrastive learning stage are similar to those in [18]. I do not think the modification is significantly different. Therefore, I prefer to keep the score of wr.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a two-stage SFDA framework for medical image segmentation. Specifically, in the Prototype-anchored Feature Alignment (PFA) stage, the authors adopt the weights of the pre-trained pixel-wise classifier as source prototypes and introduce the bi-directional transport to align the target features with them. In the Contrastive Learning (CL) stage, they follow a semi-supervised training mechanism to make the target feature distribution more compact. The proposed method has been evaluated on a cross-modality abdominal multi-organ segmentation benchmark.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The use of pre-trained pixel-wise classifier as source prototypes in SFDA is promising, and the designed target-to-prototype (T2P) loss function is interesting. The ablation experiments (Table 1) in the supplementary also validate its effectiveness.
    • The overall segmentation performance of the proposed method is superior to other SFDA methods on the cross-modality abdominal multi-organ segmentation benchmark.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The motivation in the introduction section does not fully match the method proposed in this paper. The conclusion of the second paragraph states that the performance of previous SFDA methods is limited by domain discrepancy. However, the proposed method does not directly address the issue of domain discrepancy, but rather is more of a source-free semi-supervised method. Therefore, the motivation of the proposed framework for SFDA should be further clarified.
    • The definition of L_{P2T} is a bit unclear. And the ablation experiments in Table 1 of the supplementary show that L_{P2T} can be harmful to the segmentation performance. This should be further discussed.
    • It seems that the CL stage is simple adopted from [18]. What is the major difference between them?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The code is not provided.
    • The detailed U-Net structure, e.g., down-sample layers, BN or IN for normalization, is not provided.
    • The detailed training strategy is provided.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The conclusion of the second paragraph states that the performance of previous SFDA methods is limited by domain discrepancy. However, the proposed method does not directly address the issue of domain discrepancy, but rather is more of a source-free semi-supervised method. It is encouraged that the authors further clarify the motivation of the proposed framework for SFDA.
    • The definition of L_{P2T} is a bit unclear, and it would help to further discuss the impact of L_{P2T}.
    • It would be relevant to clarify whether the experiments were carried out on 2D or 3D images, as well as to discuss the computational complexity of the CL stage, given its resource-intensive nature.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The design of the target-to-prototype (T2P) loss function and the superior overall performance over other SFDA methods support the acceptance of this paper. However, the limited novelty of the CL stage and the lack of clarity regarding the motivation lead me to lower the rating.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I appreciate the authors’ efforts on addressing my concerns, I prefer not to change my decision.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper describes a source-free domain adaptation (SFDA) method for cross-modality medical image segmentation. The method integrates two parts: (i) matching the source classifier weights (which serve as prototypes) and the target features; and (ii) a contrastive learning, in which negative samples are selected by the predictive probability and the rank of categorical prediction. The method is evaluated on a cross-domain (CT-MRI) segmentation setting. The paper is technically sound, with good experiments and ablations. Also, the proposed target-to-prototype loss function is interesting.

    I would suggest that the authors clarify/discuss the following aspects in the rebuttal:

    • Could the authors provide the ablations requested by the reviewer 2.
    • It seems the proposed uncertainty-based negative sample selection does not really bring significant benefit. Could the authors comment on this?
    • It seems that L_{P2T} could be detrimental to performance. It will be great to discuss this.
    • Is the contrastive learning stage the same as in [18]? Please clarify the differences with [18].




Author Feedback

We thank the reviewers for acknowledging that the method is technically sound and interesting (R1&R3&AC), the experiments are comprehensive(R1&AC), and the paper is easy to follow (R1&R2). To AC&R1,

  1. Motivation of uncertainty-based negative sample selection. We emphasize that many previous contrastive learning methods for segmentation ignore false negative samples, and unreliable pixels may be wrongly pushed away in contrastive loss. We find that when simply selecting negative samples from those reliable pixels (ie, vanilla pixel-level CL), the mean Dice only increase from 82% to 82.9% (0.9%). In comparison, when utilizing our uncertain-based selection rule to select those unreliable pixels (uncertain pixels) as negative samples, the mean Dice increases from 82% to 86.1% (4.1%). And the under-segmentation of the liver is largely improved, from 83.9% to 89.9%. (examples in Fig.3(b) and more in Fig.1 of the supplementary). To AC&R2&R3,
  2. The difference with [18]. [18] aims at the semi-supervised setting, which means both labeled and unlabeled data are available when constructing the query, positive and negative samples. But in our source-free setting, because of without labeled data, we take source prototypes in the previous PFA stage as positive samples instead of using the center of query samples like [18]. Noted that, this modification not only reduces the computational complexity of the CL stage, but also further enhances the domain alignment between source and target. Also, it brings 2.5% improvements in mean Dice.
  3. Ablations on L_{T2P} and L_{P2T} and the impact of L_{P2T}. We have shown the overall ablation results of each component in Tab. 1 of the supplementary, including L_{T2P} and L_{P2T}. When combined L_{T2P} with L_{P2T}, the mean Dice is improved (76%->82%), especially minority classes (eg, L.Kidney:64%->81%, R.Kidney:67%->83%). As pointed out in Sec 2.1, L_{P2T} ensures that the prototype of a minority class (eg, kidney) will be assigned to sufficient target features. Thus optimizing it alone (without combining with L_{T2P}) encourages cluster sizes (ie, class proportion) to be uniform, which explains the reduction of the dice score of the majority class like liver. To R2,
  4. Sensitivity analysis on r_{l}. We add the sensitivity analysis of r_{l} for ‘MRI to CT direction’. The mean Dice for r_{l} range from 1 to 5 is 83.7%, 85.3%, 86.1% (ours), 85.6%, and 84.1%. Intuitively, when r_{l} is small, false negative samples would not be filtered out, and if r_{l} is large, negative samples become irrelevant with corresponding query samples, making discrimination less informative. Due to the page limit, we will update this analysis in our supplementary.
  5. The class ratio of ‘liver’. We must clarify that the liver is the majority class compared to other organs. (eg, Liver:~9% in class ratio; Kidney:~1%), and Fig.1(c) is the category-wise probability of an unreliable pixel prediction, not the class ratio of each organ. To R1,
  6. The choice of cosine distance. If we apply LogSoftmax to define the point-to-point transport cost instead of currently applied cosine distance, that will cause a 3.2% drop in mean Dice.
  7. Model convergence. Early-stopping is needed to prevent the model’s collapse as the loss decreases quickly in the first 200 iterations during the PFA stage. To R3,
  8. The motivation of the proposed framework. In light of the substantial domain disparity (eg, MRI to CT), our initial stage, referred to as PFA stage, tries to facilitate the adaptation of the source model to the target data, thereby achieving implicit feature alignment. Following this, we can derive more reliable pseudo labels to effectively choose query samples during the CL stage, consequently amplifying the efficacy of target feature representation learning and further reducing domain discrepancies.
  9. 2D or 3D. We split the 3D volume into 2D slices for model training, and re-stack 2D predictions into 3D volume for metric evaluation.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Most of the concerns are properly addressed in the rebuttal.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The motivation is not clearly addressed in the proposed method. In addition, the novelty with respect to the recent methods are not well clarified in the rebuttal.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work is aimed at proposing a source-free domain adaptation framework for medical image segmentation, addressing the limitations of traditional unsupervised domain adaptation methods by utilizing only a well-trained source segmentation model and unlabeled target data. The rebuttal has adequately addressed the major concerns of the three reviewers and AC, including adding motivation of uncertainty-based negative sample selection, ablation studies, and technical details on the proposed approach. Thus, this paper is recommended for acceptance.



back to top