Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhe Xu, Jie Luo, Donghuan Lu, Jiangpeng Yan, Sarah Frisken, Jayender Jagadeesan, William M. Wells III, Xiu Li, Yefeng Zheng, Raymond Kai-yu Tong

Abstract

In order to tackle the difficulty associated with the ill-posed nature of the image registration problem, regularization is often used to constrain the solution space. For most learning-based registration approaches, the regularization usually has a fixed weight and only constrains the spatial transformation. Such convention has two limitations: (i) Besides the laborious grid search for the optimal fixed weight, the regularization strength of a specific image pair should be associated with the content of the images, thus the ``one value fits all’’ training scheme is not ideal; (ii) Only spatially regularizing the transformation may neglect some informative clues related to the ill-posedness. In this study, we propose a mean-teacher based registration framework, which incorporates an additional temporal consistency regularization term by encouraging the teacher model’s prediction to be consistent with that of the student model. More importantly, instead of searching for a fixed weight, the teacher enables automatically adjusting the weights of the spatial regularization and the temporal consistency regularization by taking advantage of the transformation uncertainty and appearance uncertainty. Extensive experiments on the challenging abdominal CT-MRI registration show that our training strategy can promisingly advance the original learning-based method in terms of efficient hyperparameter tuning and a better tradeoff between accuracy and smoothness.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_2

SharedIt: https://rdcu.be/cVRSI

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors investigated deformation regularization weighting in context of Voxelmorph based non-rigid abdominal CT-MRI registration. They proposed to use a mean teacher approach to adjust dynamically the weights of the spatial and temporal consistency regularization based on the transformation uncertainty and appearance uncertainty. Experiments involving 10 intra-patient test CT-MRI scans showed that the proposed approach seemingly outperformed state-of-the-art SyN, Deeds and DIF-VM methods according to organ-wise Dice, average surface distance, deformation Jacobian evaluation metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • novel formulation of dynamic regularization weighting, exploiting temporal information across the training steps
    • reported the state-of-the-art registration results for the 10 abdominal CT-MRI pairs
    • well written and clearly organized manuscript
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • the ablation study does not seem to have a clear conclusion
    • the statistical significance of obtained “VM (AS+ATC)” results should be verified wrt. methods in comparison in order to make enable clear conclusions
    • it should be investigated how the initial regularization weight impact the obtained final result (i.e. is hyperparameter tuning still required with the proposed approach?)
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • PyTorch version not reported.
    • Data description is scarce (no scanner info, instructions to annotators, their degree and level of experience not reported). Review board approval is claimed to have been obtained, but is not referenced (id number).
    • Placeholder for code link not included in the manuscript, but authors claim code will be made available upon acceptance.
    • Relevant statistical significance tests not performed.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This is a well written manuscript that introduces a novel formulation of dynamic regularization weighting, exploiting temporal information across the training steps and reports the state-of-the-art registration results for the 10 abdominal CT-MRI pairs. The main drawback is that the evaluation is missing an analysis of statistical significance of reported differences in performance between methods, which would allow to make firm conclusions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • novel training scheme for learning based registration with state-of-the-art results on 10 abdominal CT-MRI pairs
    • well written manuscript
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presented a double-uncertainty guided spatial and temporal consistency regularization weighting strategy for the Mean-Teacher (MT) based registration framework, avoiding the grid searching for the optimal regularization weight.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work exploited the self-ensembling teacher model with the transformation and appearance uncertainty, avoiding grid searching for the optimal fixed regularization weight in the deep registration model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The discussion on the mean field or images is required to justify the uncertainty computation. There lack some descriptions of the threshold selection and the loss function.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper has provided details about the models, datasets, and evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The average deformation fields and deformed images of N stochastic forward passes on the teacher model with random dropout were used to define the uncertainty. How to guarantee the mean field or the deformed image consistent with the ground truth field or image in the deformable registration? It would be helpful to discuss the mean field or images to justify the uncertainty computation.
    2. The thresholds \tau_1 and \tau_2 are important to update the regularization weight and the appearance consistency terms. We noticed that \tau_1 is ten times of \tau_2 in experiments. Whether the thresholds were set empirically?
    3. In Fig. 3, \lambda_{\phi} and \lambda _c seem to converge in the training process. How about the backbone network, such as the VM, used the converged weights? Variant II used fixed weights in the ablation study, while the selected values are not consistent with Fig. 3 (b, d).
    4. It would be helpful to describe the loss functions, such as the temporal consistency regularization L_c.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work provided a double-uncertainty-guided spatial and temporal consistency regularization weighting strategy to relieve the weight tuning. Experiments on abdominal CT-MRI registration have shown promising advances by the proposed strategy.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a student-teacher model for image registration, a typically ill-posed problem. Separate student and teacher models are implemented; the former is a typical registration network which features spatial regularization to improve the ill-posed nature of the model. The novelty is in the teacher model (and its interaction with the student model), which imposes temporal regularization for consistency with the student model, and allows tuning of weights for spatial regularization in the student model. Their method is evaluated on CT-MRI images of the abdomen from a partner hospital.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors do effectively describe the method, which is able to produce uncertainty maps in the image registration process. Adaptive weighting allows images which are more difficult to register to receive more attention in subsequent steps of the iterative algorithm. Method outperforms other image registration methods in various metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Description of the implementation was somewhat hard to follow, especially as it appears that it requires manual specification of numerous parameters (e.g., thresholds). How might these things change for a different dataset? It also seems like uncertainty results for registration are mainly based on 6 forward passes, which would appear fairly small.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data is not currently publicly available (awaiting IRB approval), no discussion of code availability and this seems like it would be fairly difficult to implement from scratch.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Some of the parameter settings seem fairly arbitrary, for instance thresholds for focusing on the most uncertain predictions, and would necessitate further study; are results robust to these choices? I also don’t know what the baseline runtime is for these procedures to compare the newly proposed methods to, just that they are faster.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This contribution could be deemed innovative, but perhaps requires more detailed study of various choices and model implementation. It was not easy to follow details of the model itself.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper’s contribution is to establish the exponential moving average strategy (self-ensembling) in a Student/Teacher network setup to find optimal regularisation weighting. The work is evaluated on a reasonably sized abdominal MR/CT dataset and makes adequate references to related work. The reviewers positively remark on the novel formulation, clear organisation, but criticise the relatively high number of hyper-parameters, inconclusive ablation and (so far) limited reproducibility. I would recommend to apply the method on public challenge datasets (e.g. Learn2Reg 2021 Task 1) in the future.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We are grateful that reviewers and ACs recognized our work on this underexplored yet important topic, i.e., adaptive weighting for the registration regularization term. We also appreciate all the constructive comments from reviewers.

  1. Regarding the thresholds \tau_1 and \tau_2 by R2
    • since the images are normalized, and the warping step undergoes interpolation, we found the standard deviation (for N passes) is relatively small. Thus we set the threshold \tau_2 as 1%. For the displacement field, the uncertainty is relatively undulatory and higher; thus, we set \tau_1 as 10%. Here, the two thresholds were set empirically. The analysis of the two thresholds will be further investigated.
  2. R2: In Fig. 3, \lambda_{\phi} and \lambda _c seem to converge in the training process. How about the backbone network, such as the VM, using the converged weights?
    • VM will slightly improve using this converged lambda indicated by our method (such as \lambda_phi = 2.3). However, in the typical grid search scheme, it will take much more time to search for such a weight (e.g., the first-round search range {1,2,3,4,5} and the second round {2.1, 2.2, …, 2.9}). This is also the dilemma we want to tackle.
  3. R2: Variant II used fixed weights in the ablation study, while the selected values are not consistent with Fig. 3 (b, d).
    • For this variant, we are meant to set the fixed weights according to the empiricism from the previous grid search scheme. It will be somewhat tricky if we use the “searched” precise converged values from our complete version. Therefore, we set the \lamda_phi as 3 according to the typical grid search scheme for VM.
  4. R2: It would be helpful to describe the loss functions, such as the temporal consistency regularization L_c.
    • page4: Lc is measured by mean squared error (MSE).
  5. R3: uncertainty results for registration are mainly based on 6 forward passes, which would appear fairly small.
    • It is a tradeoff as too many forward passes will be computationally inefficient.



back to top