Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kyungsu Lee, Haeyun Lee, Georges El Fakhri, Jonghye Woo, Jae Youn Hwang

Abstract

Unsupervised domain adaptation (UDA) has attracted much attention in imaging-based diagnosis, due partly to the difficulty in labeling a large number of datasets in target domains, which otherwise adversely affects the diagnostic performance of well-trained deep learning models in a source domain. UDA has enabled deep learning models to make use of large-scale datasets that are acquired in various domains for model deployment. However, UDA has deficiencies in carrying out adaptive feature extraction, when dealing with data without their labels in a target unseen domain. To alleviate this, we propose advanced test-time fine-tuning UDA to better utilize latent features of datasets in an unseen target domain at diagnosis. Specifically, our framework is based on an auto-encoder-based network architecture that fine-tunes the model itself, where our framework learns knowledge pertaining to an unseen target domain at the fine-tuning phase. Additionally, a re-initialization module is introduced to inject randomness into network parameters so that our framework is optimized to a local minimum that is well-suited for an unseen target domain. We also provide a mathematical justification to demonstrate the benefits of our framework for better feature extraction. We carried out experiments on UDA segmentation tasks using breast cancer datasets acquired from multiple domains. Experimental results showed that our framework achieved state-of-the-art performance, compared with other competing UDA models, in segmenting breast cancer on ultrasound images from an unseen domain, which supports its clinical potential in better diagnosing breast cancer in various target domains.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_52

SharedIt: https://rdcu.be/dnwdz

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an advanced test-time fine-tuning unsupervised domain adaptation (UDA) method for breast cancer segmentation. The method is trained with supervision in the source domain and then fine-tuned with self-supervision in the target domain.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is different from most domain adaptation methods because it does not generate pseudo-labels in the target domain, rather it fine-tunes the network itself. The experimental evaluation is particularly strong, with results of two methods on three datasets and an ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Test-time fine-tuning is different but not entirely novel (see some references below). However the authors do not review prior work on test-time fine-tuning for DA.

    Test-Time Training with Self-Supervision for Generalization under Distribution Shifts, PLML 2019; Model Adaptation: Unsupervised Domain Adaptation Without Source Data, CVPR 2020; Self-supervised Test-Time Adaptation for Medical Image Segmentation, MLCN 2022; Tent: Fully Test-Time Adaptation by Entropy Minimization, ICLR 2021; Uncertainty Reduction for Model Adaptation in Semantic Segmentation, CVPR 2021

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is not yet available but the authors state, in the reproducibility checklist, that it will be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Expand the related work paragraph with papers on test-time fine-tuning and highlight the differences of the proposed method. Correct typos such as Ceof-> coef, Usion->Fusion, Fusionnet-> FusionNet

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Some relevant references are missing. It is difficult to assess the novelty of the method without a revision of prior work on test-time fine tuning.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    My opinion did not change after reading the authors’ rebuttal.



Review #2

  • Please describe the contribution of the paper

    Authors proposed a new self-supervised domain adaptation framework for the Breast Cancer application. The model’s novelties are sufficient with a well-defined of encoder, decoder, domain adaptation together with GAN network. The results are also good overall in three datasets as well. Author also provide detailed ablation studies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The main strength of this work I think is a state-of-the-art performance on three datasets.
    • The proposed method though is not really novelty, but this is sufficient for the MICCAI community. Also the ideas to inject is randomness noises in Eq.(5) is interesting to me.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are some confusing points to the Reviewers:

    1. Why do authors concatenation output of Decoder for Segmentation (D_seg) and D_{FT} before feeding into the header H (Line 13 in Algorithm, Figure 1 right side)? Because the D_{FT} and Encoders E are used to reconstruct input images (images in target domain_ while D_{Seg} and E are used for gerate mask predictions. Can you elaborate on this point? Also an ablation study to show the benefits of this strategy is valuable.

    2. It is hard for the reader to understand the subsection “Benefits of our dual-pipeline”. Again, the root problem is at concatenating putout of two decoders.

    3. Authors provide detailed ablation results; however it is difficult to know the differences between cases. Reviewer believes that using equation number might be a better option. In addition, can authors provide the ablation results for using Eq (5)?

    4. The first paragraph in Introduction is wordy and doesn’t contain much important information. Authors should compress it and provide more key points.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is fine with me.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I think the presentation should improve points mentioned in (6). Also, make the introduction + related works be more short and concise, focusing on highlight the main differences between the proposed one and other approaches.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In overall, this paper has more contributions than the weakness points. Though, the presentation has to be improved.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors andress most my concerns. If the paper can be improved in writing, I think this would be fine for MICCAI.



Review #3

  • Please describe the contribution of the paper
    • Propose of an advanced test-time fine-tuning Unsupervised Domain Adaptation
    • Evaluate on UDA segmentation tasks using breast cancer datasets, and the experimental results demonstrate the satisfactory performance.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Important Problem: Test-time fine-tuning Unsupervised Domain Adaptation is an important problem with real-world applications.
    • Neat Method: The methodology proposed in the paper is neat and effective, utilizing an auto-encoder-based network architecture for fine-tuning the model itself during diagnosis, and a re-initialization module for injecting randomness into network parameters to optimize the framework for the unseen target domain.
    • Elegant Analysis: A mathematical justification is provided, which enhances the credibility and rigor of the approach.
    • Comprehensive ablation study: The paper includes a comprehensive ablation study, including both numerical experiments in table 1 and t-SNE visualization in figure 6 in Sec. 3.3 ablation study, demonstrating the effectiveness of parameter fluctuation.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Writing Clarification: the writing should be improved. These issues make the paper difficult to read and understand. (a) Some symbols and formula are unclear defined. For example, it is unclear the the \hat{s} in equation 1 means. If it is segmentation ground truth, it doesn’t match with \hat{y} in figure 1. The usage of \supseteq in formula 2 is not right. Please check all other formula. Thanks. (b) it is unclear the function of discriminator C. Why do adversarial learning between image and generated image? This should be illustrated with at least one sentence. (c) In sec 3.3, the PF should be marked as the abbreviation of parameter fluctuation. What the meaning of Offset is should be explained.

    • Limited Comparison: Other SOTA test time domain adaptation method should be compared. Just find one for your reference [1].

    • Lack of resource consumption reporting: Please report the test-time memory and time consumption, which are crucial metrics for evaluating the feasibility of the proposed approach in real-world applications.

    • Mathematical justification issues. Usually, the convolution operation concludes non-linear layer, which should be considered. Also, the justification steps are not easy to follow, and the authors could clarify the justification to improve the readability and understandability of the paper.

    [1] Hu, Minhao, et al. “Fully test-time adaptation for image segmentation.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24. Springer International Publishing, 2021.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All relevant checklist is clicked. But Hyper-parameters and the training process should be reported at least in appendix.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. please pay more attention for paper clarity.
    2. report the resource consumption
    3. complete the mathematical justification.
    4. more comprehensive comparison method.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the weaknesses that have been identified, I would recommend a score improvement if the issues are addressed. The paper’s motivation and idea are sound and important, and the proposed approach has potential real-world applications. I like it. However, the writing could be improved to make the paper more accessible and easier to follow. Additionally, the paper could address some insufficiencies, such as lack of resource consumption reporting and limited comparison with other state-of-the-art methods. And I would improve the score if the issues are addressed.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    After carefully considering the rebuttal from the authors, I still believe that the clarity of this paper needs significant improvement. I found it difficult to understand the whole process, and the authors should prioritize enhancing the description of their method. Therefore, I have decided to decrease my score to a strong reject.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers criticized the paper for its lack of a detailed review of the related work and comparisons for innovations. The reviewers also ask for improving the writing and reporting resource consumption. I suggest that you take the rebuttal opportunity to win over all reviewers by carefully addressing each of the weaknesses indicated by all three reviewers.




Author Feedback

For Rvw #1 We appreciate your suggestions and will incorporate the listed references to enhance the manuscript. However, we would like to highlight that our paper offers more than just a simple implementation of test-time DA (TTDA). It introduces two novel contributions: (1) mathematics of dual-pipeline for TTDA; (2) PF method that randomizes the distribution of parameters based on the confident mathematical justification. We kindly request the reviewer consider these novel techniques as substantial contributions to the advancement of TTDA. Additionally, we will thoroughly proofread the manuscript to improve its overall quality.

For Rvw #2 3-1 and 3-2. Basically, the architectures of the decoders are the same. However, since the headers of image reconstruction and generating segmentation mask are different (different output), a new header incorporating D_{FT} and D_{seg} is devised and leverages the outputs of two decoders. Besides, D_{gen} = D_{FT} is fine-tuned during the fine-tuning step, and the D_{FT} learns the knowledge of the input domain via image reconstruction. Two distinct knowledge (information) from D_{seg} and D_{FT} enable the network to utilize target domain knowledge and predict precise predictions. The subsection of the dual pipeline shows the mathematical justification of enhancing knowledge about the target domain in terms of information theory. We will clarify the subsection based on the reviewers’ constructive comments. 3-3. Sorry for the confusion, and we understand some missing descriptions are in the results of the ablation study. We will compensate for the descriptions and clarify the table by using equation numbers, as the reviewer commented. Besides, Eq (5) indicates the parameter fluctuation (PF), and you can find “pre-train+PF” and “pre-train+PF+fine-tuning”. The mere use of PF cannot provide the prediction since PF is a randomizing method, and a not trained model (non-pre-train) cannot provide precise predictions. 3-4 and 6. We will improve the quality of the manuscript by removing redundant sentences and shortening the paragraph for conciseness. In the extra pages, we will compensate for the clear justification of math and make a table for math notations as commented by Rvw #3.

For Rvw #3

  1. Clarity: (a) \bar{s} is a label (Algorithm I), and \hat{y} is prediction. Sorry for the mis-typo of ⊆. (b) Discriminator aims to discrete the original and generated images and optimize the auto-encoder to generate an image identical to the original image. This procedure enables the network to learn the knowledge of the input domain. (c) OFFSET indicates adding offset value (constant) to all parameters, and we apologize for the missing definition of the OFFSET. Again, we apologize for missing descriptions and errors, and we will thoroughly proofread the manuscript to improve its overall quality.

  2. Comparison: Thank you for the recommendation, and we will update the manuscript with the comparison to [1]. A simple illustration is Our model provides 0.102 improved D. coef with a fast FPS of 7.2 compared to [1]. Due to the TTFT method, we illustrate the FPS rather than FLOPs.

  3. Computation: With an image size of 512, compared to U-Net, of which # params and FPS is 31.0M and 27.1, ours are 50.4M and 18.7. Due to PF and partial tuning, our model achieved a faster prediction time. Our model has two advantages (1) our model provides precise prediction with low parameters (others>150M); (2) our model provides a similar prediction to non-TTFT models. We will update the experiments in the final version of the manuscript.

  4. Math: Thank you for the comments on our manuscript, and we now understand the math expressions are wordy and need improvement. Besides, we comprehend the non-linearity in CNNs, and thus errors occur in Eq (4) between a theory and a real-world application. We will reveal the errors in the limitations and make the table for math notations and additional paragraphs for proofs in the appendix.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers’ recommendations are split: R1: weak accept; R2: accept; and R3: reject. I agree with R2 and R3 that the paper needs to be improved significantly in writing for the camera-ready version before publishing it at MICCAI.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper has some merits even though the novelty is limited. The experimental evaluation is particularly strong, with state-of-the-art performance on three datasets and ablation studies. I encourage the author to address all reviewers comments in final version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a test-time fine-tuning UDA framework that utilize the latent features of the data in the unseen target domain by fine-tuning the model itself during diagnosis. The paper received quite some diverse reviews with the reviewers highlighting missing prior work discussion with limited comparisons, mathematical justification issues and not very clear description of the method with missing ablation. The authors submitted a rebuttal to address these points. The metareviewer, even if he/ she agrees with R3 about the clarity of the method, weights more the good performance of the method and the importance of the problem making it an interesting contribution to MICCAI.



back to top