Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Negin Ghamsarian, Javier Gamazo Tejero, Pablo Márquez-Neila, Sebastian Wolf, Martin Zinkernagel, Klaus Schoeffmann, Raphael Sznitman

Abstract

Models capable of leveraging unlabelled data are crucial in overcoming large distribution gaps between the acquired datasets across different imaging devices and configurations. In this regard, self-training techniques based on pseudo-labeling have been shown to be highly effective for semi-supervised domain adaptation. However, the unreliability of pseudo labels can hinder the capability of self-training techniques to induce abstract representation from the unlabeled target dataset, especially in the case of large distribution gaps. Since the neural network performance should be invariant to image transformations, we look to this fact to identify uncertain pseudo labels. Indeed, we argue that transformation invariant detections can provide more reasonable approximations of ground truth. Accordingly, we propose a semi-supervised learning strategy for domain adaptation termed transformation-invariant self-training (TI-ST). The proposed method assesses pixel-wise pseudo-labels’ reliability and filters out unreliable detections during self-training. We perform comprehensive evaluations for domain adaptation using three different modalities of medical images, two different network architectures, and several alternative state-of-the-art domain adaptation methods. Experimental results confirm the superiority of our proposed method in mitigating the lack of target domain annotation and boosting segmentation performance in the target domain.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_32

SharedIt: https://rdcu.be/dnwcJ

Link to the code repository

https://github.com/Negin-Ghamsarian/Transformation-Invariant-Self-Training-MICCAI23

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a method to filter out unreliable pseudo labels based on transformation-invariant self-training, under the assumption that the pixel segmentation label should not change with spatial transformation. The authors casted the problem as a domain adaptation task; however, the proposed method is more related to semi-supervised learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The presentation of the manuscript is clear.
    2. The proposed method is evaluated on public datasets and outperforms some state-of-the-art domain adaptation methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This work is more related to semi-supervised learning, instead of domain adaptation. Most unsupervised domain adaptation methods do not leverage sample labels. The core task of this work is how to assign a reliable pseudo label to unannotated data, which is the task of semi-supervised learning or learning from noisy labels.
    2. Due to the wrong formulation of the problem, many related works in semi-supervised learning and noisy-label learning are omitted in the introduction and comparison experiments (though some semi-supervised methods are included in Table 1).
    3. The novelty of the proposed method is weak. Transformation invariance has been widely exploited in different ways in deep learning, e.g., data augmentation, and more prominently in contrastive learning. This work presents a slightly different way to incorporate the transformation invariance into the loss function. I could not completely deny its novelty, but it is weak.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have not released the code yet. But, since the presentation of this paper is clear, I believe this work is reproducible by following the instructions outlined in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Perform a thorough review of semi-supervised learning and noisy-label learning, and perform a comparison experiment with the state-of-the-art in those fields.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the weakness outlined above.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The rebuttal did not address my concerns. First, “noisy” label means the noise in the pseudo labels, not noise in the ground-truth label. Many noisy-label-learning methods can learn useful information from the noisy label, and some can even automatically correct the label noise during learning. Second, this work does not try to the narrow the domain gap, which is the main task in domain adaptation. The noise in the pseudo labels can be caused by domain gap or by the scarcity of labeled training data. Any semi-supervised learning method (not developed in for domain adaptation) can be applied to the setting of this work. In the other way around, the proposed method can be applied to the standard semi-supervised learning setting where the noise in pseudo labels is caused by the scarcity of training samples (not by the domain gap). Therefore, I think this work is more related to semi-supervised learning. Anyway, terminology is a minor issue. The main concerns are in the novelty and performance gap to the SOTA, which are not addressed satisfactorily, especially about the concern on novelty. Furthermore, it is still unclear why spatial transformation is not considered in this work. Some spatial transformations (e.g., flipping) can be done efficiently; therefore, efficiency consideration is not good argument against using spatial transformations. Based on the above arguments, I will maintain my rating of weak reject.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel transformation-invariant self-training (TI-ST) strategy to assess the reliability of psudo-labels in unsupervised domain adaptation paradigm. Considering there are limited work to explore the reliability assess by the trianed model self. So, the TI-ST is proposed to explore this direction. Experimental results on three medical datasets demonstrate the superiority of the proposed method compared with seven alternative methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The involved problem formulation includes semi-supervised learning and unsupervised domain adaptation, which is highly relevant for clinical medical image analysis. 2) The proposed strategy implements the reliability assessment by a one-stage framework, instead of the commonly used teacher-student paradigm, to achieve efficient training and testing in terms of time and computation. 3) The experimental design is relatively complete to involves three different modalities and seven state-of-the-art alternative methods, which effectively demonstrates the superior performance of the proposed TI-ST strategy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The involved comparable methods only contain one domain adaptation-based algorithm ([17], published in 2019). More recent domain adaption methods should be included in the results section to assess the proposed strategy better. 2) Experiments are only conducted for the pre-defined source and target dataset. For example, the “Spectralis” dataset is used as the source in the OCT dataset, and the “Topcon” dataset denotes the target domain. How about the results of exchanging them? (Topcon as the source and Spectralis as the target domain) 3) Why are only non-spatial transformations involved in the target domain dataset? In the literature on semi-supervised segmentation, spatial transformations are also often used in the unlabeled domain (dataset) to constrain the consistency between two transformed inputs, such as the random flip or rotation, as long as existing the corresponding inverse transformation. 4) Is there a typo error in the equation (4)? What does mean of mu?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper provides most implementation details, including the used dataset, the preprocessing, batch size, learning rate strategy, loss functions and the parameters of the optimizer.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) Involve more domain adaptation-based methods in the results section to provide more evidence for the SOTA performance of the proposed method. 2) More transformations may be introduced in the target domain dataset to achieve more substantial constraints on transformation invariant learning.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a simple but effective strategy to assess pseudo-labels’ reliability through transofrmation-invariant self-training paradigm. The novelty is limited, but the experimental performance is impressive, especially for the complete and informative results analysis, segmentation visualization, and ablation studies.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Dear AC, After comprehensive consideration, we still adhere to our previous judgment, where more experiments and comparisons are needed to support the novelty.

    Thanks



Review #3

  • Please describe the contribution of the paper

    This paper proposed a self-supervised training strategy that defines the pseudo label using two versions (original and non-spatially augmented). This method improved the performance, compared to supervised learning, across three different data sets with different modalities. It also has the best or comparable performance against other self-supervised training methods across these data sets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is simple and the results demonstrate its effectiveness. Especially, compared to mean-teach or pseudo-pixel based methods, the proposed method doesn’t to require additional device memory (as mean teach needs to maintain two models) or heuristic based algorithms (such as pseudo pixels).

    The experiments are extensive as

    • two different backbone models are considered
    • three different modalities are considered which makes the conclusion strong and robust.

    Overall the paper is also well written and easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main concern is regarding the supervised baselines. The reported dice scores are all relatively low and there are no other metrics, such as Hausdorff distance, reported.

    Precisely,

    • OCT [1] reported dice score >60%, but the best model in this paper has only 50%.
    • MRI [15] reported dice score >90%, but the best model in this paper has only 74%.

    Although methods are different, a gap of 10-20% Dice is still concerning. [1] https://ieeexplore.ieee.org/document/8653407 [15] https://arxiv.org/abs/2002.03366

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is easy to understand therefore implementing the method would not be difficult. Otherwise, the author has included sufficient details regarding hyper parameters in the paper.

    According to the reproducibility checklist, the author promised to release code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is self-contained. Potentially, including more metrics such as surface dice, Hausdorff distance etc would be beneficial. Otherwise, analysis explaining the relatively low dice score overall would be great.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposed a simple yet effective method, extensive ablations studies demonstrates superior performance without need of additional memory or heuristic algorithms. Despite a bit concerning baseline performance, this paper is a self-contained well written study.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper received mixed reviews with two (weak) accept and one weak reject recommendations. The idea of self-training with pseudo-label reliability assessment, the effectiveness of the proposed method and the clearness of the presentation are recognized by the reviewers. The area chairs considered the paper and the reviewers’ comments agree with the following concerns with the paper: (1) comparisons with SOTA UDA methods are lacking to some extent, and there is a big performance gap compared with SOTA results; (2) some evaluation details are lacking and not comprehensive; (3) the differences/contributions w.r.t. noisy labeling methods are not clear. The authors are suggested to provide a rebuttal to address these questions.




Author Feedback

We thank all reviewers for their constructive feedback.

[R1, R2] Category of the proposed framework: Domain Adaptation (DA) methods can be supervised, semi-supervised, and unsupervised, depending on labeled data available in the target domain. There are some general inconsistencies in the usage of the terms “unsupervised” and “semi-supervised.” Some papers define semi-supervised domain adaptation as methods assuming few labeled and many unlabeled data in the target domain, and unsupervised domain adaptation considering no labeled data for the target domain is available. However, we agree that paradigms involving unlabeled data or a limited amount of labeled data from the target domain should both be referred to as semi-supervised. We thank the reviewers and will adapt the introduction accordingly. Considering the above paradigm, all evaluated baselines are semi-supervised methods that leverage labeled data in a supervised manner and propose strategies to learn from unlabeled data. The labeled data can be a small subset of unlabeled data (domain generalization problem) or data acquired from other sites/devices/scanners introducing a larger domain gap (domain adaptation problem). Indeed, semi-supervised learning strategies are independent of labeled and unlabeled data domains.

[R1, R2, Meta R] Baselines: We regret that, in adherence to the rebuttal guidelines, we are unable to present additional experimental results here. Nonetheless, we draw the attention of reviewers to our extensive evaluations showing the superiority of the proposed method against SoTA semi-supervised learning methods such as CPS (CVPR 2021), ST (CVPR 2022), and RL (MICCAI 2021), which have proven to be very strong baselines.

[R1, Meta R] Semi-supervised domain adaptation vs. noisy label learning: Noisy label learning refers to the condition where available labels are not accurate or of high quality, while semi-supervised DA works by assuming that available labels are reliable and accurate and its objective is to bridge the distribution gap between the labeled and unlabeled data.

[R1] Novelty: We agree with R1 that data augmentation is widely exploited to improve supervised learning performance and central to contrastive learning. However, the novelty of our proposed method lies in leveraging transformation invariance as a self-assessment strategy for pseudo-label reliability indication. This novelty leads to the superiority of the proposed strategy, as we extensively show in our experiments.

[R2] Source and target combinations: Considering three datasets from different scanners for RETOUCH and six datasets from different sites for MRI, we had six and 12 choices for (source, target) pairs for RETOUCH and MRI, respectively. Due to space constraints, we reported results for one combination per modality.

[R2] Spatial transformations: We agree with R2 that spatial transformations can further filter out unreliable pseudo labels. However, using spatial transformations also increases computational time since the same spatial transformations must also be applied to the detections from the original image. Hence, we opted for non-spatial transformations for computation efficiency and simplicity.

[R3, Meta R] Performance gap: (1) In OCT [1], training and testing are performed on the same dataset (e.g., train and test on Topcon). In our case, the supervised results refer to cross-domain segmentation performance (training on Spectralis and testing on Topcon). (2) In MRI [15], the authors explore multi-source learning, assuming labeled data from multiple sources are available, which can improve generalization performance. They train the network on the labeled data from all sites and test on each site separately. (3) The numbers of labeled images we use to train the networks for RETOUCH and MRI datasets (provided in the experimental setup) are far lower due to performing four-fold validation.

[R2] Typo: We thank the reviewer for identifying the typo.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studied the problem of cross-domain medical image analysis and proposed a domain adaptation method based on self-training with transformation-invariant highly-confident predictions in the target domain. The method is evaluated on three different tasks and reported better results than baselines. The rebuttal addresses a number of the concerns of the reviewers. Based on the rebuttal, a remaining major concern is the novelty of the method; using two augmentations of data to perform contrastive learning is quite common in SSL, but it is not clear how such a method advances the state of the art.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal is convincing, in my opinion. The paper has some merits and the reviewers acknowledged the effectiveness of the method and clarity of the presentation.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After reading all the comments from reviewers and my personal reading, I agree with R1 that this paper causes confusion in the background setting. Given the incremental technical novelty and lack of extensive comparison with SOTA methods (suffering from the performance gap), I lean to reject this manuscript. Hope above comments can be helpful for further improving the manuscript.



back to top