Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Alexander Bigalke, Lasse Hansen, Mattias P. Heinrich

Abstract

Recent deep learning-based methods for medical image registration achieve results that are competitive with conventional optimization algorithms at reduced run times. However, deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data. While typical intensity shifts can be mitigated by keypoint-based registration, these methods still suffer from geometric domain shifts, for instance, due to different fields of view. As a remedy, in this work, we present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain. We build on a keypoint-based registration model, combining graph convolutions for geometric feature learning with loopy belief optimization, and propose to reduce the domain shift through self-ensembling. To this end, we embed the model into the Mean Teacher paradigm. We extend the Mean Teacher to this context by 1) adapting the stochastic augmentation scheme and 2) combining learned feature extraction with differentiable optimization. This enables us to guide the learning process in the unlabeled target domain by enforcing consistent predictions of the learning student and the temporally averaged teacher model. We evaluate the method for exhale-to-inhale lung CT registration under two challenging adaptation scenarios (DIR-Lab 4D CT to COPD, COPD to Learn2Reg). Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data. Source code is available at https://github.com/multimodallearning/registration-da-mean-teacher.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_27

SharedIt: https://rdcu.be/cVRS9

Link to the code repository

https://github.com/multimodallearning/registration-da-mean-teacher

Link to the dataset(s)

https://learn2reg.grand-challenge.org/Learn2Reg2021/

https://med.emory.edu/departments/radiation-oncology/research-laboratories/deformable-image-registration/downloads-and-reference-data/index.html


Reviews

Review #1

  • Please describe the contribution of the paper

    This work is built on top of the mean-teacher-based framework (which was initially proposed for semi-supervised learning), where it adapts the perturbation (e.g., scaling, translation) and enforces consistency as well as introduces GCN to extract features. They also adapt the integration-based LBP to obtain the displacement fields. As such, the method can perform well in exhale-to-inhale lung registration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written and easy to follow.
    2. Adapting the mean-teacher consistency for registration is interesting.
    3. The method is simple and shows effectiveness in exhale-to-inhale lung registration.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Unclear and somewhat misleading motivations. As for the domain shift, it is somewhat incorrect to argue that “The domain shift consists in exhale scans from the target domain exhibiting a cropped field of view such that upper and lower parts of the lungs are partially cut off”. This is not common sense for domain shift (CT-MRI, obviously different intensity distribution, etc). Particularly, the cut-off problem seems like because of pre-alignment preprocessing in the learn2reg challenge. I think it is somewhat misleading to regard this issue as a domain shift problem.

    2. This work is built on top of the mean-teacher-based framework (which was initially proposed for semi-supervised learning). Adapting the mean-teacher method into registration is interesting, yet, the motivation is not strong and also ambiguous here. Here, it seems like the paper just follows a semi-supervised paradigm, and tries to use another dataset as unlabeled data. Therefore, I will just regard the method as a semi-supervised registration method. The claim for tackling the domain adaptation problem is not strong. However, in the image registration community, acquiring ground truths are always infeasible, especially for those not-landmark-based methods. That is why unsupervised registration becomes popular. Thus, this method may lack practical values.

    3. Following the above concerns, the experiments are weak. Especially, since the proposed approach is inherently a semi-supervised registration method that enables learning with both labeled data and unlabeled data (w/o the landmark labels), it is unfair to just compare their methods with source-only, target-only. The method exploits more data (i.e., unlabeled data) during training, so it is not surprising to see the improvements. I mean, when you train with your limited labeled data, the method may struggle with overfitting, so it is not surprising to see a performance drop in your test set. Therefore, it is hard to evaluate the efficacy of this paper.

    4. This is just a discussion point regarding the motivation of using the mean-teacher framework in registration since I noticed an interesting work [1] that utilizes mean-teacher design to achieve adaptive regularization weighting during training for unsupervised registration. From [1], another insight of using mean-teacher design in registration is that the solutions to this ill-posed problem may vary greatly among different training steps, therefore [1] enforces the consistency to exploit the “temporal” information related to the ill-posedness. Authors can further discuss similar works, although having different motivations. [1] Xu, Z, et al. “Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration.” arXiv preprint arXiv:2107.02433 (2021).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good. Authors have provided the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The motivation should be further clarified and re-considered. From my experience, it is inherently a semi-supervised registration method that attempts to utilize the unlabeled data via consistency. Claiming this a domain adaptation problem is strange to the image registration community.
    2. Experiments should be re-designed to support their arguments and motivations.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation is a little misleading with insufficient examples and experiments to support the arguments.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    6

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision

    I agree with the extended discussion and summary by AC. These concerns regarding (I) if this problem can be categorized as DA is still unclear. This is hard to define (also extendedly pointed by AC), despite a rebuttal authors respond. I think the problem definition and motivation should be clearly re-considered. Also, there is also no consensus on the DA claim in learn2reg lung registration challenge. Considering the (potentially) misleading claim, I suggest authors use the datasets with commonly defined domain shift for their contribution.

    This also holds for the evaluation (I appreciate the good extended discussion by AC). If this domain shift is not a common sense in registration (also even for segmentation), it is inherently a semi-supervised registration. Thus, “ A sense of how much improvement is due to the increased number of data -versus- truly due to the novel method need to be discussed” by AC is very important. Yet, this is hard to address in a rebuttal.

    Overall, I appreciate some merits. I suggest a modified (also can be safer) claim for future resubmission.



Review #2

  • Please describe the contribution of the paper

    In this paper, the author proposed a novel approach for domain adaptation registration. They introduced a method based on key-points registration, the Mean Teacher paradigm and graph convolution network. They experimented on a different lung registration dataset and compared with last published method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method introduced in this paper propose a new method using Mean Teacher for registration adapation. The method is novel, simple and obtained very good results on two public datasets.

    The author proposed adaptation of the Mean Teacher framework to work in the context of registration problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed pipeline requires ground truth deformations phi to use the supervised loss. The author did not explain how they obtain/calculate these ground truth deformations. With a classic registration algorithm or a deep learning based algorithm ? Can this method be expanded to unsupervised registration ?

    The optimisation is performed without regularisation losses on the produced deformation. Is it a choice of the author ? Concerning the regularisation, the author do not discuss the performance of the regularisation in term of folding or standard deviation of the Jacobian. Do the proposed method produce smooth or noisy deformation ?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method seems fully reproducible, and the code is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The quality of the paper could be improved by discussing the following points :

    • Could the proposed Mean Teacher approach be expanded with CNN instead of GCN and with full volumes instead of points clouds ?

    • The regularity of the proposed method (negative Jacobian/foldings)

    • The adaptation to unsupervised registration without groundtruths.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel, obtained very good results and the paper is very well written and presented.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    A new registration to cope with domain shift was proposed via a keypoint-based registration model alongside self-embedding within the mean teacher framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is based on the mean teacher framework alongside domain adaptation, overcoming two main drawbacks of deep learning, including the assumption of i.i.d. between source and target domain data and the need for massive training data.
    2. Domain adaptation for registration is not many compared with UDA for segmentation and classification
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The utility of reverse augmentation is not clearly demonstrated via an ablation study.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    • The authors shared their code, which thus is highly reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper fills in the gap of the need for establishing local correspondence in the context of UDA for registration which is deemed new and innovative. Specifically, the authors combined the optimization registration approach (i.e., LBP) with the mean teacher framework. It is surprising that the proposed method even outperformed the target-only methods (including VM++ and target only), which is typically considered an upper-bound in other UDA tasks (e.g., segmentation or classification).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The paper deals with the area of UDA for registration which is underexplored.
    2. The methods are new and solid, built upon well-validated approaches.
    3. The experiments are through and the results are convincing.
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Domain adaptive keypoint-based lung registration with the Mean Teacher

    This submission tackles keypoint registration between different domains under a semi-supervised approach. The originality resides in using a mean-teacher framework to leverage a semi-supervised learning of the deformation field and graph convolution networks on keypoints. The evaluation is on a CT lung dataset under two scenarios, different cropped field of views, and different breathing cycles. The reviewers have very extreme and disparate scores, which need consensus. The authors are therefore invited to address the following key concerns in a rebuttal:

    • domain adaptation - domain adaptation is often associated in medical image segmentation, or classification, with differences in intensity distributions between a source and target sets of images. Such domain shift is typically addressed in image registration with mutual information or adequate metric learning. The considered domain shift in this submission focuses on differences (R1) in image cropping and changes in output distributions of displacements, which may be a bit far stretched in claiming the paper as a generic domain adaptation method.

    • semi vs unsupervised registration - the community is currently focused on learning-based fully unsupervised registration since ground-truth displacements may be unrealistic to acquire. In such context, fully supervised methods may be considered limited, and relying on additional unlabeled data in a semi-supervised context is normally expected to provide additional performance (R1,R2). A sense of how much improvement is due to the increased number of data -versus- truly due to the novel method need to be discussed, or at least been evaluated. R3 is also raising questions on why the proposed method goes beyond an upper-bound in an unsupervised approach.

    • validation - the current evaluation uses only displacement errors (TRE) as an assessment (R2). This is insufficient to evaluate the quality of a deformation field. Jacobian maps are typically provided to assess the smoothness and lack of folded transformations. A better sense of such assessment, even though insufficient, could be the standard deviations and maximal displacement errors (std.dev and max. of TREs).

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank the reviewers for their time and effort.

R1 questions the positioning of our work as domain adaptation (DA) as the domain shifts (DS) in our work (varying geometries of the input due to varying fields of view (FOV) and breathing types) differ from the commonly studied intensity shifts. We argue that the considered DS are still clinically relevant and challenging: 1) R1 attributes the cropped FOV to pre-processing. This is incorrect. Rather, varying FOVs are due to different imaging protocols among clinical sites and thus highly relevant for medical DA. 2) Our experiments prove the severity of both DS by revealing a huge gap between source- and target-only models. Based on the above criticism, R1 regards our method as a semi-supervised (SS) method. We respectfully disagree. SS settings assume iid data while we consider shifted domains. Thus, we are addressing DA and our method should also be evaluated in this context.

R1 states that “acquiring ground truth (GT) is always infeasible, especially for not-landmark-based methods” and criticizes a lack of practical value of our method compared to unsupervised learning (UL) methods. We agree that GT labels are scarce and that UL is an indispensable technique to overcome the lack of GT. However, we respectfully disagree to the extent that acquiring GT is not infeasible. Even if costly, dense landmark correspondences can be manually annotated, and interpolation enables supervision of not-keypoint-based methods. As direct supervision with GT can improve performance–especially in challenging tasks–we argue that GT should be exploited whenever available. DA enables flexible usage of GT by transferring knowledge from labeled source to unlabeled target domains, thus offering high practical benefit. In summary, we consider DA and UL as complementary methods to overcome scarce GT, with this work focusing on DA.

R1 notes that our method uses more data than comparison methods such that its efficacy is hard to evaluate. Given our work’s focus on DA, we used the usual evaluation scheme for DA, where source- and target-only model are standard baselines. Our main contribution is to close the gap from source to target model without accessing target GT. Our method uses the same target data as the target model but–importantly–without GT, plus labeled data from a shifted domain. Here, directly weighing the importance of lacking target labels against additional shifted data is difficult. But from a DA perspective, it is well known that matching the target model’s performance is extremely difficult and clearly proves the effectiveness of our method. (Surpassing the target model is indeed surprising and may be due to the overall small dataset sizes). We further agree that a comparison with other (DA) methods, using the same data as our method, is interesting. However, since DA for registration is barely explored, we did not find a comparison method for our DA setting in the literature, and the application of standard DA methods failed (Sec. 3.1-baselines-4). Finally, combining DA and UL is an exciting field, which is beyond the scope of this work and left for future work.

R2 notes the absence of a regularization loss and criticizes the lack of an evaluation criterion to assess the quality of the predicted deformation fields. Indeed, a regularization loss is not needed because LBP includes a regularization cost that “enforces smoothness of the predicted displacement field” (Sec 2.2). The term is independent of learned features such that all models predict smooth deformations, as confirmed by std.devs of log-Jacobians of 0.036, 0.035, 0.034 and percentages of folding voxels of 7e-6, 0., 1e-4, calculated for source-only, target-only, and our model under the 4DCT->COPD setting. Due to their similarity, the scores provide little additional value for model comparison, which is why we did not report them. If advised, we will add them. We also show the cumulative distributions of TREs in Fig. 2.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Domain adaptive keypoint-based lung registration with the Mean Teacher

    The rebuttal reveals opposing views. There are doubts on a possible overclaim whether handling differences in image cropping falls into a domain adaptation problem. While R1 has very valid points (it is far stretched to claim a general domain adaptation paper), R2 and R3 have also highlighted the merit of developing a method that works on target-without deformation fields. The employed approach resonates and is applied on segmentation or classification problems (mean-teacher for semi-supervised, domain adaptation from source to targets without labels), but remains unapplied in registration.

    A decision needs to be made between constrasting views. The rebuttal has partially addressed the concern on this exact overclaim, stating that differences in image cropping creates a shift in distributions even if using the same image domain. I would tend to disagree. A domain adaptation makes more sense with a reasonnable domain change such as a significant distribution shift (different imaging modalities, sites - which could also all include changes in image cropping or breathing cycles). My concern is that the current title and narrative could be, as is, perceived as misleading, and the rebuttal does not offer to change anything on this matter. A suggestion is to better situate the paper in the title, perhaps on varying field-of-views.

    The approach could be original if situated within its shifting context, which may need a discussion or comparison within a semi-supervision context, with or without varying image cropping. Any extension submission is strongly encouraged towards that direction in order to fully appreciate the mean-teacher-based registration approach. As is, the paper could be perceived as misleading, at least to 1/3 of reviewers.

    That said, the claim on domain adaptation may be considered secondary, and keep value in the paper for adapting the mean-teacher framework for registration, which has merit, not necessarily highly innovative as it uses existing concepts, but could have impact.

    For all these reasons and situating this work with respect to the other submissions, the recommendation is Acceptance.

    The final decision will be a consensus with the other co-meta-reviews.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although there was a strong negative review, the authors produced a thorough rebuttal. Overall, I believe this paper leans on accept. I think that nevertheless the authors should take the thorough feedback and improve the paper in the camera ready – at the end of the day, this is the goal of the review process, and I believe that the paper has room to do this here.

    Congratulations to the authors.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Strong and objective rebuttal to R1 and other reviewers recommended acceptance, which i also agree with.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top