Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuhao Huang, Xin Yang, Xiaoqiong Huang, Xinrui Zhou, Haozhe Chi, Haoran Dou, Xindi Hu, Jian Wang, Xuedong Deng, Dong Ni

Abstract

Deep classifiers may encounter significant performance degradation when processing unseen testing data from varying centers, vendors, and protocols. Ensuring the robustness of deep models against these domain shifts is crucial for their widespread clinical application. In this study, we propose a novel approach called Fourier Test-time Adaptation (FTTA), which employs a dual-adaptation design to integrate input and model tuning, thereby jointly improving the model robustness. The main idea of FTTA is to build a reliable multi-level consistency measurement of paired inputs for achieving self-correction of prediction. Our contribution is two-fold. First, we encourage consistency in global features and local attention maps between the two transformed images of the same input. Here, the transformation refers to Fourier-based input adaptation, which can transfer one unseen image into source style to reduce the domain gap. Furthermore, we leverage style-interpolated images to enhance the global and local features with learnable parameters, which can smooth the consistency measurement and accelerate convergence. Second, we introduce a regularization technique that utilizes style interpolation consistency in the frequency domain to encourage self-consistency in the logit space of the model output. This regularization provides strong self-supervised signals for robustness enhancement. FTTA was extensively validated on three large classification datasets with different modalities and organs. Experimental results show that FTTA is general and outperforms other strong state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_22

SharedIt: https://rdcu.be/dnwAT

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposes a Fourier Test-time Adaptation (FTTA) framework to improve classification accuracy in medical images where a domain gap exists. The authors utilize Fourier-based domain adaptation to reduce the domain gap. Meanwhile, multi-level consistencies, including smooth global and local consistency and logit space consistency, are proposed to regularize the outputs of the prediction model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper adopts the Fourier-based input adaptation to transfer the style of source domain images to target domain inputs, reducing the domain gap between the training source data and unseen target domain data. (2) The paper proposes using learnable weights for global and local feature integration to smooth the hard consistency, reducing the adaptation difficulties. (3) The paper proposed to regularize the predictions of the model by forcing the consistency between the style-interpolated images and the linear combination of the testing images and transformed images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The title of the paper contains the keyword “Test-Time”, indicating the method should not have access to the training source data in the adaptation process. However, Fourier-based adaptation needs training data in the source domain to transfer style, which seems to conflict with the title and is not clearly explained. (2) Consistency loss functions should be more clearly formulated in detail, such as the formulation of integration of local and global features in the “Smooth Consistency for Global and Local Constraints” section. (3) The procedure of selecting representative styles of the source domain is not thoroughly elucidated. The differences between the two groups of target images after Fourier-based transformation do not have adequate discussions as well. (4) Comparison experiments are mainly conducted on the private dataset, and the validity of the framework is limited. (5) There are some typos. For example, in Formula 1, the high-frequency component should be one of the target images instead of the source image.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is likely to be reproduced and the code is going to be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) Authors may consider giving a more specific formulation of loss functions as mentioned above. (2) A detailed description of selecting representative styles in the source domain should be provided. Adding visualized figures is better. (3) Authors are suggested to have more comparison experiments on public datasets to make work sounder. A Quantitative comparison of Fig.4 is also recommended.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study is technically sound the general idea is well described. The proposed domain adaptation framework is practical and innovative and addresses the problem of domain gap in a reasonable manner. The presentation of the work is complete and well-structured in a coherent way. However, using data from the source domain for Fourier-based style transfer has the risk of violating the premise of Test-Time Adaptation(TTA). Moreover, more details in the formulation should also be provided to make the work more articulately presented.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a Fourier test-time adaptation framework (FTTA) to improve the model’s robustness against the domain shift problem, which jointly updates the input and model for online refinement.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The ablation study conducted and the comparison with other methods appear to be promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Adding some details could improve the manuscript. (More details in the comments section)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The fact that the authors have stated their intention to release the code is promising, and I am looking forward to examining the code in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The manuscript technically sounds. Here are a few questions and comments that I believe could enhance its quality:

    1. In Fig.3, a testing image xt is being transferred into two source-like images (xt1 and xt2). In this case, is xt2 is same as the xt? In the figure, they look similar. Additionally, I would like to know which image(s) were utilized from the source domain for this transformation, and how the authors chose those images.

    2. In Fig.3, using linear style interpolation, two groups of images are obtained for subsequent smooth consistency measurement at global features and local visual attention. It seems that some details are missing regarding this part. For instance, each group includes how many images?

    3. The authors noted that they employed standard data augmentation techniques such as rotation, flip, contrast transformation, etc., during the training process. It would be preferable if they provided a comprehensive list of all the techniques used rather than using “etc.” Alternatively, if they utilized a specific augmentation package or library, they could specify it.

    4. It is stated that although Fourier domain adaptation methods proposed in [21, 22] are effective, they require obtaining sufficient target data in advance. However, I do not perceive this as an inherent constraint in those methods. Even a single target image’s low-frequency spectrum can be utilized to adapt all source images using those techniques. While additional target images may be beneficial, they are not mandatory. Moreover, it is unclear how the proposed method resolved this concern and needed fewer images from the target domain, compared to those methods. It would be valuable if the authors could provide insights on this matter.

    5. Please acknowledge the highly relevant work in this domain: Sharifzadeh, M., Tehrani, A.K., Benali, H. and Rivaz, H., 2021, September. Ultrasound domain adaptation using frequency domain analysis. In 2021 IEEE International Ultrasonics Symposium (IUS) (pp. 1-4). IEEE.

    6. Regarding Table 2, the authors have indicated that their proposed method (referred to as “Ours”) achieved significant improvements on Baseline. However, it is unclear if this improvement was statistically significant. If indeed it was, it would be advantageous if the authors could also include the p-value in their report.

    7. In equation (1), since the low-frequency content of x_s is replaced with x_t, should not the obtained amplitude frequency be annotated as A-x_s’, instead of A-x_t’? Because now it is a transferred version of the source images (s’).

    8. “… to only one given test image once Recently …” –> Period is missing, and probably “once” needs to be removed?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript technically sounds, and the evaluations are satisfactory.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a novel approach called Fourier Test-time Adaptation (FTTA) to improve the robustness of deep models against domain shifts. FTTA employs a dual-adaptation design to integrate input and model tuning, thereby jointly improving the model’s robustness. It introduces a reliable multi-level consistency measurement of paired inputs for achieving self-correction of prediction. Extensive experiments on three large datasets validate that FTTA is effective and efficient, achieving state-of-the-art results over strong TTA competitors.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed Fourier Test-time Adaptation (FTTA) approach is a novel and general framework to improve classification robustness.
    2. FTTA introduces a reliable multi-level consistency measurement of paired inputs for achieving self-correction of prediction.
    3. This paper provides extensive experiments on three datasets, which validate that FTTA is effective and efficient, achieving state-of-the-art results over strong TTA competitors.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited explanation of the proposed method: This paper provides a brief overview of the proposed FTTA approach, but it lacks detailed explanations of some key aspects, please see the detailed questions in comments part below.
    2. The proposed method is not well motivated from the paper.
    3. This paper does not include a convincing ablation study to investigate the contribution of each component in FTTA to its overall performance.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to the reproducibility checklist filled out by the authors, they have provided sufficient details about their experimental setup.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. What’s the different from the TTT and TTA proposed in this paper? Does the TTA also need training with loss backward at test time?
    2. The proposed method needs much more computation cost, it seems like a ensemble method with Fourier augmentation, could you please show the comparison of computation cost?
    3. Does the evaluation result stable in FTTA? It seems that performance is highly affected by the Fourier augmentation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The proposed method is novel and effective.
    2. The motivation of this paper, the detailed design of FTTA is unclear.
    3. The experimental results do not convince me to use TTA in this way.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper introduces a Fourier Test-time Adaptation (FTTA) framework, a novel approach aimed at improving classification accuracy in medical images with domain gaps. The authors leverage Fourier-based domain adaptation to decrease the domain gap and propose multi-level consistencies for output regularization of the prediction model. The paper introduces a reliable multi-level consistency measurement that aids in self-correcting predictions. Comprehensive experiments on three datasets have shown that FTTA is effective and efficient, achieving superior results over strong TTA competitors.

    The weaknesses noted by the reviewers are minor or generic issues. The paper is viewed favorably, presenting valuable contributions with some minor concerns. It is recommended for provisional acceptance.




Author Feedback

We thank the reviewers for their constructive comments. We will revise them in the camera-ready/journal version paper.



back to top