Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jiashun Chen, Donghuan Lu, Yu Zhang, Dong Wei, Munan Ning, Xinyu Shi, Zhe Xu, Yefeng Zheng

Abstract

Recently, deep-learning-based approaches have been widely studied for deformable image registration task. However, most efforts directly map the composite image representation to spatial transformation through the convolutional neural network, ignoring its limited ability to capture spatial correspondence. On the other hand, Transformer can better characterize the spatial relationship with attention mechanism, its long-range dependency may be harmful to the registration task, where voxels with too large distances are unlikely to be corresponding pairs. In this study, we propose a novel Deformer module along with a multi-scale framework for the deformable image registration task. The Deformer module is designed to facilitate the mapping from image representation to spatial transformation by formulating the displacement vector prediction as the weighted summation of several bases. With the multi-scale framework to predict the displacement fields in a coarse-to-fine manner, superior performance can be achieved compared with traditional and learning-based approaches. Comprehensive experiments on two public datasets are conducted to demonstrate the effectiveness of the proposed Deformer module as well as the multi-scale framework.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_14

SharedIt: https://rdcu.be/cVRSU

Link to the code repository

https://github.com/CJSOrange/DMR-Deformer

Link to the dataset(s)

LPBA40: https://drive.google.com/file/d/1308rPiQBZTa13tI-0KbGYUv41G88ejjf/view

Neurite-OASIS:https://drive.google.com/file/d/1VmwQs2nCsRHEHKUtRUAIE-DJqX6XD4iq/view


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a new deformer-based multi-scale registration method (DMR) for medical deformable image registration. The proposed Deformer module leverages the multi-head attention strategy to capture the long-range dependency between high-level features. The multi-scale architectural design and auxiliary loss from different levels further improve the registration performance. The method is evaluated on two public brain MRI datasets (LPBA40 and OASIS). Extensive quantitative and qualitative evaluations demonstrate that the DMR performs favourably against state-of-the-art methods (2 conventional methods, 2 CNN-based methods and 1 CNN+Transformer based method).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This method proposed a multi-scale structure with multi-head attention to capturing the long-range dependencies between the high-level embeddings. The results in tables 2 and 3 demonstrate that the proposed Deformer block outperforms the existing Transformer and CNN blocks by a significant margin.

    Strong evaluation. This paper comprehensively evaluates the registration accuracy, diffeomorphic properties and smoothness of the deformation field of the proposed method.

    The paper provides a sufficient ablation study to justify the hyperparameters choice and architectural design of the proposed DMR model.

    The writing is clear and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The idea of using multi-head attention to capture the long-range dependencies in image registration is not particularly novel. Recent Transformer-based methods [1,2] share a similar design.

    Model complexity. It is worth noting that the multi-head attention is notoriously computationally intensive, while the runtime and model complexity are ignored in this work. The results will be more convincing if the model complexity, i.e., number of learning parameters and FLOPS, and runtime of each method are reported.

    The authors argue that the attention mechanism in the DIR task may lead to inferior performance as corresponding voxels should only be found in a limited local range. However, there is no particular solution to this issue in this paper. And the proposed Deformer module also leverages global multi-head attention to model the long-range dependencies of the features, which contradicts their statement.

    The sample size of the test set is not sufficient. Only 1 atlas and results of 28 (8 + 20) scans are reported. The paper will benefit from including more atlas for evaluation to reduce the statistical bias.

    References

    [1] Chen, Junyu, et al. “ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration.” MIDL2021. [2] Zhang, Yungeng, Yuru Pei, and Hongbin Zha. “Learning dual transformer network for diffeomorphic registration.” MICCAI2021.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good reproducibility. Many technical details of the proposed method are reported and the authors claimed that the code will release if the work is accepted in the reproducibility checklist.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The qualitative results in fig.2 are upside down and stretched. I prefer the qualitative result in the supplementary material instead.

    Minor: Avoid “Title Suppressed Due to Excessive Length”. (The running head.)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is well-written and the proposed method is adequately evaluated. Although there are some concerns about the model complexity and novelty, this paper provides a comprehensive evaluation and ablation study to demonstrate the effectiveness and the registration performance of the proposed method. I believe this paper is of interest to the MICCAI community.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal and the other reviews, I’ve decided to maintain my score (weak accept). While the authors did a good job of highlighting the novel points of their approach, there are several concerns regarding the scale of the experiments and ablation study of the novel points. Specifically, only 28 (8 + 20) scans and 1 atlas are included in the test set. The paper will benefit from including more atlas or performing cross-validation to reduce the statistical bias of the experiments. Moreover, the intuition of learning the basis vectors from only the moving image is also unclear to me in the rebuttal. Finally, the authors report the result of their weakly-supervised variant in the Learn2Reg challenge in the rebuttal. However, a common baseline VoxelMorph with weakly-supervised Dice loss can achieve 84.1% Dice (Team:3idiots), but the proposed method only achieves 84.2% Dice with significantly more learning parameters and FLOPS than VoxelMorph. In this case, the performance gain of the proposed method over VoxelMorph is less convincing. Yet, since the Deformer is modularized and it showed clear performance gain over CNN (Residual and transformer blocks) in table 2, it has the potential to be adapted to other image registration networks. I believe the methodological contributions of this paper are of interest to the MICCAI community.

    I highly recommend the authors carefully address the questions raised by Reviewer 4 in the future/camera-ready version.

    Typo: Page 3, “sofmax function” -> “softmax function”.



Review #3

  • Please describe the contribution of the paper

    The paper propose one deep learning network to perform deformable registration. Instead of convolution operations, transformer module is used in the framework. Experiment is performed over the two public brain MR image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is clear to read.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of novelty. Transformer has been widely used into medical image registration, authors should compare with the existed methods. Zhang, Y., Pei, Y. and Zha, H., 2021, September. Learning dual transformer network for diffeomorphic registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 129-138). Springer, Cham.
    2. Reviewer concerns about the validation of the experiment. In terms of LPBA40 with 56 annotated regions, although it is widely used in registration evaluation, the validation seems not to be well qualified. Like revealed in Wei et al., the annotation of LPBA40 is of low quality. If there is no refinement, it is better to avoid directly using it to evaluate the registration methods. Wei, D., Zhang, L., Wu, Z., Cao, X., Li, G., Shen, D. and Wang, Q., 2020. Deep morphological simplification network (MS-Net) for guided registration of brain magnetic resonance images. Pattern Recognition, 100, p.107171.
    3. In terms of the Fig. 2, it is still hard to find the improvement region.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Should be good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please try to evaluate the methods over broad clinical registration scenarios, in addition to the brian image registration.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of novelty.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    It proposes a novel Deformer module along with a multi-scale framework for the deformable image registration task. The Deformer module is designed to facilitate the mapping from image representation to spatial transformation by formulating the displacement vector prediction as the weighted summation of several bases. With the multi-scale framework to predict the displacement fields in a coarse-to-fine manner, superior performance can be achieved compared with traditional and learning-based approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The ablation experiments and comparison with other models show Deformer module really works. It successfully improve performances in registration area.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is still an encoder-decoder architecture which adds several Deformer Module. It would have more novelty with more model design.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think the reproducibility of the paper is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I think the author’s work is valuable because it outperforms VoxelMorph which is one of the most popular benchmark in medical image registration. One very interesting thing in this paper is that encoder-decoder framework adding Deformer module outperforms both CNN and Transformer model. I think although this model does not have too many design on model, the results are enough provocative to researchers. I encourage the author to make the codebase open source in Github and thus people could build new benchmark in registration area.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Registration is sensitive to computation times, so I think the author may add GFLOPs and FPS information of every models.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    The authors propose a novel deep learning-based method for medical image registration, including the following key contributions: 1) Instead of directly regressing displacement fields, the authors propose to predict voxel-wise displacement vectors as the weighted sum over a set of learned basis vectors. This is realized by a two-stream Deformer module with one stream predicting the basis vectors (from moving features only) and the other stream predicting the weights (from fixed and moving features). 2) This idea is embedded into a multi-scale model, predicting displacement fields at 4 resolutions. Displacement fields at each level are supervised by an auxiliary loss and a refining fusion network is proposed to merge the predictions from coarse to fine. The method is evaluated for the task of brain MRI registration on two public datasets. Results show superior performance over 6 competing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Predicting displacement fields as the weighted sum over basis vectors is (to my knowledge) a novel idea. I like the idea as it can potentially simplify the complex direct regression of offset vectors.

    • The proposed multi-scale fusion scheme appears also novel to me. It can be seen as complementing the skip connections in the U-Net-like base architecture by the displacement fields predicted by the Deformer modules. According to experimental results, this improves performance. Compared to a basic U-Net, the proposed scheme clearly leverages more information (including auxiliary losses) and compared to iterative methods like LapIRN [22 in paper], it only requires a single forward pass through a single model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I recommend comparing the proposed method to stronger competitors than those selected by the authors. In particular, the proposed network w/o Deformer modules (last entry in Tab. 3) is equivalent to a basic U-Net architecture (similar to Voxelmorph) but is still clearly superior to all considered learning-based competitors. This raises concerns about the quality of the competitors. Specifically, I am missing comparisons to other multi-scale methods [1, 2, 3] and the strongest methods of Learn2RegChallenge Task3 [1, 4].

    • The motivation for the selected design of the Deformer Module and corresponding ablation studies are not fully convincing. Please see point 8 for detailed comments and questions.

    • The motivation for the proposed multi-scale fusion scheme is not clearly elaborated. Why are the predicted displacement fields lifted to a higher number of features and combined with the feature maps? According to Fig. 1, I initially expected the fusion network to exclusively operate on predicted displacement fields. Why is it useful to instead concatenate predicted displacement fields with the feature maps, what is the intuition behind that?

    • As for evaluation on OASIS, it is not clear whether the authors employ the official data split used in the challenge. I would recommend using this split as it allows comparison to the Leaderboard and to results reported in the Learn2Reg paper [4].

    [1] Mok and Chung. “Large deformation diffeomorphic image registration with Laplacian pyramid networks.” MICCAI 2020. [2] De Vos et al. “A deep learning framework for unsupervised affine and deformable image registration.” Medical image analysis 2019. [3] Hering et al. “mlvirnet: Multilevel variational image registration network.” MICCAI 2019. [4] Siebert et al. “Fast 3D registration with accurate optimisation and little learning for Learn2Reg 2021.” arXiv preprint arXiv:2112.03053 (2021).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used public datasets and announced to publish code after review. This will likely allow for reproducing the results. Beyond, the authors describe sufficient details of their method and experimental setup.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Regarding the design of the Deformer module: 1) Predicting displacement fields as a weighted sum over a set of basis vectors is a nice idea, but the authors do not verify if this is indeed superior to direct regression. I am missing an ablation experiment with a single-stream Deformer module directly regressing displacement fields. 2) What is the motivation for a two-stream architecture? To predict suitable attention weights, the associated branch needs to know the corresponding base vectors, which, however, is not the case. If the branch implicitly anticipates the base vectors predicted by the other stream, the other stream would be redundant. 3) What is the motivation to learn the base vectors instead of using a fixed set of basis vectors? For me, learned (and input-dependent) basis vectors are actually no real basis vectors. Deformer-B is a good starting point to answer the question but does not really provide a fair comparison for 2 reasons: 1) It only includes 3 instead of K*N basis vectors; 2) Since attention weights are the result of a softmax and thus in [0, 1], the length of the final displacement vectors cannot exceed the length of a unit vector. Thus, displacements are restricted to neighboring voxels. Consistently, Deformer-B achieves the same performance as using no Deformer at all (Tab. 3, last entry). Overall, a stronger baseline (with more basis vectors and more flexible lengths) is needed here. 4) What do the Deformer modules learn? Do they really need to learn displacement fields or could they learn other useful features? It would be interesting to report the DSC of the displacement fields at the different reslutions. Moreover, what happens when setting the auxiliary losses to 0 while keeping the Deformer modules? Do they still contribute useful features? 5) Why are moving features only sufficient to propose displacement bases? While this choice is confirmed by the ablation experiment, it appears unintuitive to me, and an explanation would be helpful. 6) Final deformation fields output by the refinement network are obtained by ordinary direct regression. Why is the proposed weighted summation not used at this stage?

    • Typos and presentation: Table 1 seems a bit misplaced in the methods section and would better fit one or two pages later. p.2: matrix production –> matrix product / multiplication p.2: the identical transformation –> identity transformation p.3: as the the weighted summation p.6: missing point after “Datasets” p.6: propose approach –> proposed approach p.7: For ablation study, We p.8: on the each pair

    • I found the description of the refining network difficult to follow. The supplementary material helps, but the description in the main paper could be clarified.

    • 40 scans from LBPA40 are divided into 19,3,8 scans. What about the remaining 10 scans?

    • Idea for future work: Currently, the refining network concatenates displacement fields predicted by the Deformer module with the feature maps from moving and fixed scan. Would it be possible / useful to use the displacement fields to warp the moving features to the fixed features instead?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I perceive both methodological contributions (attention-based displacement prediction with Deformer module & multi-scale approach) as interesting and novel. However, as detailed in the weaknesses section, I am missing stronger baselines to support the value of the proposed multi-scale scheme and I am missing more convincing ablation experiments / argumentation to justify the design of the Deformer module.

  • Number of papers in your stack

    1

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors convincingly addressed part of my concerns while the following two concerns remain:

    1) Why is the author’s baseline model w/o Deformer clearly superior to all learning-based competitors? This still questions the quality of the learning-based competitors. The effectiveness of the Deformer could be demonstrated in a more convincing way by emebedding the Deformer into an existing architecture (e.g. Voxelmorph).

    2) Why does Deformer-B achieve the same performance as using no Deformer at all and does not improve on this baseline. For me, this is unexpected and needs to be discussed as it questions the “correctness” / “fairness” of this comparison otherwise.

    Despite the two concerns, I updated my rating to “weak accept” because I perceive the idea of Deformer as novel and interesting and the experimetal results partly demonstrate its effectiveness.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents interesting ideas to decompose the displacement field prediction, but fair comparisons to Learn2Reg OASIS are missing (it’s reasonable enough to come close to Learn2Reg winners LapIRN/ConvexAdam but this needs to be included). Further improvement with a (weakly-supervised) Dice loss could be expected. So far a number of important ablations and discussion to related work are missing. There are numerous comments from the reviewers that should be addressed during the rebuttal and may improve some of the current weaknesses.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We are glad that reviewers find our work “novel and effective” (R4), “clear” (R1, R3), “strong evaluation with sufficient ablation” (R1, R2) and “interesting” (R2, R4). Our responses to major concerns are as follows.

Q1: Clarify the difference and comparison between Transformer and Deformer (R1, R3, R4).

We would like to correct some misunderstandings and point out some ablation studies missed by reviewers. First, unlike Transformer, which adopts long-range dependency to compute the attention, the linear layers of the proposed Deformer module are applied to each voxel-wise feature separately. Thus, we avoid the long-range dependency by limiting the field-of-view of each voxel-wise feature, so that the corresponding voxel can only be found in a limited local range and too large deformation can be prevented, as stated in the second paragraph of the introduction and the last paragraph on page 3. Second, the multi-head mechanism is not the main contribution of our study. As presented in Table 1 of the supplementary, the multi-head mechanism only improves the Dice by 0.8%. As pointed out by R2 and R4, our main contribution is the Deformer module which predicts displacement fields as the weighted sum over basis vectors. And the comparison with Transformer, along with variants including directly regressing displacement fields (Deformer-B), are shown in Table 2 on page 8, further demonstrating the effectiveness of the Deformer module. In addition, the effect of the number of basis vectors has been presented in Table 2 of the supplementary.

Q2: Number of learning parameters and FLOPS, and running time (R1, R2).

We will add the number of parameters, GPU memory and average running time to register each pair of scans on Neurite-OASIS dataset as follows: Voxelmorph-573K/34.9G/0.56s, DMR (proposed)-7.9M/120.3G/0.63s, VIT-V-Net-31.6M/60.5G/0.85s. For the scenarios with limited computational resource, we can reduce these three metrics to 7.2M/46.7G/0.61s by using a single head with little performance degradation, as shown in Table 1 of the supplementary.

Q3: More comparisons to other methods and experiments with a weakly-supervised Dice loss (R3 and R4).

The ‘multi-scale’ methods mentioned by R4, e.g., LapIRN, DLIR and mlVIRNET are all multi-stage methods. Their computation of the deformation field at the next stage depends on the registration result of the previous stage and the processes cannot be parallelized, while our method only requires a single forward pass. In addition, the weakly-supervised Dice loss is heavily used in Learn2Reg, while our method is unsupervised. As suggested, we add the auxiliary Dice loss and achieve 84.2% Dice which is comparable to Learn2Reg winner LapIRN (86.2% Dice with multiple cascaded networks). We will add comparisons with DLIR and mIVIRNET (which use different datasets and is not open-source) in the final version.

Q4: The design of Deformer (R4).

First, the direct regression is denoted as Deformer-B in Table 2, which removes the left branch of the Deformer module and is equivalent to fixing the displacement bases as (0, 0, 1), (0, 1, 0) and (1, 0, 0). Second, the intuition of learning the basis vectors from only the moving image is that the displacement base set is an inherent property of each image, and should be adjusted according to different images, as demonstrated in Table 2. Third, an intermediate displacement field is learned by the Deformer module at each scale, as shown in Figure 2 of the supplementary. The average DSCs with these displacement fields on the Neurite-OASIS test set are 69.2%, 54.5%, 37.4% and 17.8% from fine to coarse scales, respectively. The Deformer module is not used to learn the final displacement field because it prefers the original image feature, but the inputs of the refining network are already displacement fields. Also, we thank R4 for pointing out the mistake about LPBA40 (should be 29, 3 and 8 scans).

Other minor concerns will also be carefully revised.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have largely addressed the remaining concerns during their rebuttal and added a few additional interesting insights of their ablation studies. Overall the proposed deformer module is a solid contribution and the validation reasonable for a conference paper (including one manually labelled brain dataset and one challenge benchmark). For future work a comparison to TransMorph is recommended.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The novelty of the proposed method, compared to the original transformer, is the locality that conceptually sounds. While it showed significant performance improvement over existing methods, the dataset used in the experiments represents rather small inter-subject shape differences, but many different datasets would not. Although more intensive validation is desirable using those challenging data and also compared to other STOA methods, the paper presents sufficient novelty and a good depth of work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    It seems like there is agreement that this paper just passes acceptance to MICCAI. Congratulations to the authors.

    I would encourage the authors to take into account the thorough reviews (even the ones after the response), which indicate that there is substantial improvement that could be done in several aspects, including experiments.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    -



back to top