Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Haiqiao Wang, Dong Ni, Yi Wang

Abstract

The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration. However, the use of Transformer in most registration networks is straightforward. These networks often merely use the attention mechanism to boost the feature learning as the segmentation networks do, but do not sufficiently design to be adapted for the registration task. In this paper, we propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed ModeT naturally transforms the multi-head neighborhood attention relationship into the multi-coordinate relationship to model multiple motion modes. Then the competitive weighting module (CWM) fuses multiple deformation sub-fields to generate the resulting deformation field. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets show that our method outperforms current state-of-the-art registration networks and Transformers, demonstrating the potential of our ModeT for the challenging non-rigid deformation estimation problem. The benchmarks and our code are publicly available at https://github.com/ZAX130/SmileCode.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_70

SharedIt: https://rdcu.be/dnwxn

Link to the code repository

https://github.com/ZAX130/SmileCode

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposes a new approach to deformable image registration using the Motion Decomposition Transformer (ModeT), which is designed to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed method separates the tasks of feature extraction and deformation estimation in deep-learning-based registration networks, making the registration procedure more sensible. The ModeT employs a multi-head neighborhood attention mechanism to identify various motion patterns of a voxel in the low-resolution feature map. Then, with the help of a competitive weighting module and pyramid structure, the motion modes contained in a voxel can be gradually fused and determined in the coarse-to-fine pyramid decoder.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper proposes a novel approach to deformable image registration using the Motion Decomposition Transformer (ModeT), which explicitly models multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation.
    • Separates the tasks of feature extraction and deformation estimation.
    • Multi-head neighborhood attention mechanism.
    • With the help of a competitive weighting module and pyramid structure, the motion modes contained in a voxel can be gradually fused and determined in the coarse-to-fine pyramid decoder, resulting in more accurate and robust registration.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed method is evaluated on only two public brain magnetic resonance imaging (MRI) datasets, which may limit the generalizability of the results to other types of medical images.
    • The proposed method is evaluated on only two public brain magnetic resonance imaging (MRI) datasets, which may limit the generalizability of the results to other types of medical images.
    • “Motion decomposition” is not a new research topic and there are many similar works, such as a recent paper: Sun, M., Wang, W., Zhu, X., & Liu, J. (2023). MOSO: Decomposing MOtion, Scene and Object for Video Prediction. arXiv preprint arXiv:2303.03684.
    • Multi-head neighborhood attention and pyramid is also not new, lacking effective arguments to support, and there is no discussion about ablation study.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author stated that the code will be made available and reproducibility seems to be possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Lacks some details and ablation studies, and some parts are not particularly innovative. Specific concerns are mentioned in the Weaknesses section.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is limited and the discourse is not adequate, The disadvantages outweigh the advantages to some extent.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This paper proposes a deep learning-based deformable registration network for medical images that employs a motion decomposition Transformer (ModeT) and a competitive weighting module (CWM) to improve the accuracy and interpretability of the registration process. The proposed network was evaluated on two publicly available brain MRI datasets and compared against several state-of-the-art registration methods. The results showed that the proposed method outperformed all comparison methods in terms of DSC and ASSD metrics and achieved satisfactory performance in terms of the percentage of voxels with non-positive Jacobian determinant.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    – Clear and detailed explanation of the proposed method, including the ModeT and CWM components.

    – Thorough evaluation of the proposed method on two publicly available datasets and comparison against several state-of-the-art registration methods.

    – Use of multiple evaluation metrics, including DSC, ASSD, and percentage of voxels with non-positive Jacobian determinant, to assess the performance of the proposed method

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    – The related work part is mainly focused on Transformer and deformation registration. The paper does not provide a thorough analysis or comparison with other existing methods that may use similar techniques regarding motion, which may raise questions about the uniqueness and originality of the proposed approach.

    – Lack of discussion on the limitations or potential drawbacks of the proposed method.

    – Lack of analysis on the computational efficiency of the proposed method compared to the comparison methods. Since Transformer needs a large computational resource, it will be interesting to see the computational efficiency, for example, training and testing time.

    – Limited discussion on the clinical relevance or impact of the proposed method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author did state that they will publish the code after acceptance, which is a positive sign for reproducibility. Additionally, the paper mentions the datasets and evaluation metrics used in the experiments, which allows for the replication of the experiments by other researchers.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In terms of the proposed method, it would be beneficial if the authors could provide more insight into the novelty of their approach, especially regarding the ModeT and CWM components. While the paper describes these components in detail, it would be useful to know how they differ from existing methods and how they contribute to the improved performance of the proposed method.

    Another potential improvement could be to provide more information on the hyperparameter settings used in the experiments, especially regarding the regularization term λ and the neighborhood size n. The authors briefly mention that these were set to 1 and 3, respectively, but it would be helpful to know if different values were explored and why these particular values were chosen.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the strengths and weaknesses of the paper, I would recommend weak accept. The proposed method appears to be a significant improvement over existing registration methods and the evaluation is thorough and convincing. However, the authors could provide more discussion on the limitations and potential drawbacks of the proposed method and further analyze its computational efficiency. Additionally, the authors could include more discussion on the clinical relevance or impact of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel deep network structure for deformable image registration that draws inspiration from two previous algorithms, Deformer and Coordinate Translator (citation [4] and [17] in the manuscript). The proposed network structure is motivated by authors’ observations that low-resolution feature maps from the fixed and moving images corresponds to multiple possibilities of different types or patterns of motion. The algorithm’s performance was evaluated on two datasets, and the results show improved registration performance. Overall, the paper presents an interesting approach to deformable image registration using deep learning techniques.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is inspired by two previously proposed algorithms, Deformer and Coordinate Translator (citation [4] and [17] in the manuscript). It used a similar fusion strategy as in [17] to combine the features extracted from the fixed and moving images and weight a predefined map of displacements. However, the proposed method differs from [17] in two major aspects. Firstly, the author introduced the idea of multi-head attention from the transformer architecture to account for the multiple possible motion that may occur between the images. Secondly, a learnable positional bias is introduced to improve the flexibility of using the predefined displacements. Additionally, the author uses displacement vectors instead of absolute coordinates, and a competitive weighting module is proposed to combine the displacement field from multiple attention heads. The experimental results demonstrate that the proposed method outperforms several state-of-the-art algorithms, including [4] and [17], thus showing the effectiveness of the proposed modifications.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper proposes a Competitive Weighting Module (CWM) to combine the displacement field from multiple attention heads. However, the module lacks proper motivation. The input to CWM is only the displacement vectors, which do not contain any information about image intensities or features. This suggests that CWM is solely relying on the displacement vectors to determine their reasonableness, which potentially limits its overall performance.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Given that the authors plan to release the source code of this work, the results presented in the paper can be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The author should consider addressing the limitation discussed above in order to further strengthen the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach, which combines the multi-head attention mechanism with the cross attention to output displacements, is a relatively new technique in the area of deformable image registration. The paper presents experimental results that demonstrate improved performance compared to existing similar methods. Overall, the paper represents a promising contribution to the field.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The majority of reviewers recommends acceptance based on the good mix of novelty, clear motivation and experimental results. In summary, the paper proposes a relevant contribution to deal with the decoupling of feature extraction and motion prediction building upon the coordinate transformer (and related work). Reviewer #1 mentions that while focusing on transformer based DL-networks prior work that explored multiple modes of deformations e.g. probabilistic registration (cf. https://doi.org/10.1016/j.media.2015.09.005 which achieved also >72% Dice on LPBA) was omitted. Therefore the discussion should be extended accordingly. I recommend acceptance and would encourage the authors to explore the use of their method on other clinically relevant tasks (e.g. Learn2Reg and BratsReg challenges).




Author Feedback

We would like to thank the anonymous reviewers and meta-reviewer for the constructive comments. We will include more related work and discussion in the final version.



back to top