Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, Jinman Kim

Abstract

Image registration is a fundamental requirement for medical image analysis. Deep registration methods based on deep learning have been widely recognized for their capabilities to perform fast end-to-end registration. Many deep registration methods achieved state-of-the-art performance by performing coarse-to-fine registration, where multiple registration steps were iterated with cascaded networks. Recently, Non-Iterative Coarse-to-finE (NICE) registration methods have been proposed to perform coarse-to-fine registration in a single network and showed advantages in both registration accuracy and runtime. However, existing NICE registration methods mainly focus on deformable registration, while affine registration, a common prerequisite, is still reliant on time-consuming traditional optimization-based methods or extra affine registration networks. In addition, existing NICE registration methods are limited by the intrinsic locality of convolution operations. Transformers may address this limitation for their capabilities to capture long-range dependency, but the benefits of using transformers for NICE registration have not been explored. In this study, we propose a Non-Iterative Coarse-to-finE Transformer network (NICE- Trans) for image registration. Our NICE-Trans is the first deep registration method that (i) performs joint affine and deformable coarse-to-fine registration within a single network, and (ii) embeds transformers into a NICE registration framework to model long-range relevance between images. Extensive experiments with seven public datasets show that our NICE-Trans outperforms state-of-the-art registration methods on both registration accuracy and runtime.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_71

SharedIt: https://rdcu.be/dnwxo

Link to the code repository

https://github.com/MungoMeng/Registration-NICE-Trans

Link to the dataset(s)

https://adni.loni.usc.edu/

https://fcon_1000.projects.nitrc.org/indi/abide/

http://fcon_1000.projects.nitrc.org/indi/adhd200/

https://brain-development.org/ixi-dat aset/

https://mindboggle.info/data.html

https://surfer.nmr.mgh.harvard.edu/fswiki/Buckner40Adni60Testing

https://www.loni.usc.edu/research/atlas_downloads


Reviews

Review #1

  • Please describe the contribution of the paper
    • The authors improve their previous registration approach by using a transformer instead of convolutions in the decoder. -The affine registration is predicted together with the deformable registration in one network.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is technical sound, well-written and easy to follow.
    • The motivation is clear.
    • The authors performed an extensive evaluation (including negative Jacobian Determinant!) and compare their methods against several other deep-learning and conventional methods.
    • In an ablation study, the authors also explore the effect of exchanging the conv layers in encode, decoder and in both showing that in their architecture it seems not helpful to use a transformer in the encoder.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper’s proposed changes to the architecture appear to be incremental in nature. While the use of a transformer has become a standard technique, the paper’s results only marginally outperform previous works. The presentation of this paper won’t be that different from the one from last year besides saying that there is this change of one conv layer to a trans-block. (The additional affine transformation is nice.)
    • The clinical relevance for an image registration method for intra-patient brain MR and even more liver CT is not clear. Please explain why such registration is helpful. For population studies, you could also just segment the structures and then compare volume etc. Why do we need voxel-wise correspondences? It would be interesting to see if this method also works well for other applications (nothing for the rebutal of course but in general)
    • Failure cases and limitations of the work aren’t discussed.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The answers made in this list seems to be mostly correct. However, sometimes the authors say not applicable and I don’t understand why this point for example is not applicable:

    An analysis of situations in which the method failed. [Not Applicable]

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Is it possible to register images with a different field of view? In the current state, the images are aligned by center of mass initialization and then cropped into the same image size. This works nicely for brain MR images, however, for other applications it’s not that easy. Can the affine registration handle larger initial misalignments?
    • The authors write “These NICE registration methods show advantages in both registration accuracy and runtime, and thus are regarded as state-of-the-art” This is quite a strong claim. It was only shown that on the one task of inter-patient MR brain registration, the NICE registration performs slightly better than the methods for comparison.
    • The proposed method is only evaluated on one specific task which is in general fine. To claim state-of-the-art it might make sense to evaluate on more tasks especially also on intra-patient registration.

    The proposed architecture is quite similar to Eppenhof et al 2020 (https://ieeexplore.ieee.org/abstract/document/8902170). Of course they haven’t used transformers yet but also conv layers.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper differs only minimally from the paper presented at MICCAI last year. The results could be improved a little by adding a transformer block instead of the conv layer, but that was it. I see little added value in presenting this paper at MICCAI (again). Please don’t get me wrong. I think the work is good and we should continuously try to improve our methods, but the question is how often we publish these intermediate steps and what are interesting and relevant results for the community.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    It’s a bad argument to say that everyone does it that way and that’s why they do it. It would be nice if the authors would take the review seriously and see it not as an attack but as a help to become even better.

    I don’t see any improvements in the manuscript in the rebutal and therefore, I will keep my score.



Review #2

  • Please describe the contribution of the paper

    The paper presents an extension of a non-iterative coarse to fine network for medical image registration. The methodological extension is two-fold. First, a transformer backbone has been added to the decoder part of the original approach and second the network optimises both affine and deformable transformations in a unified framework. The paper also presents an extensive validation on structural brain MR images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper are as follow:

    • The paper is well articulated, well written and easy to follow.
    • The validation is extensive, both in the number of dataset and the alternative approaches that are evaluated.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The validation does not necessarily highlight the benefit of the proposed method.
    • The validation could have been made more fairer in some aspects. For example, not all approaches used the same loss functions (e.g. different measures of similarity and regularisation terms), which makes it difficult to contrast approaches.
    • Lack of clinical context.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be released if accepted, the baseline method implementation are already available online and the evaluation relies on public dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    As aforementioned, the paper is well articulated, easy to follow and the methodological choices are well justified and sound. The two main contributions, combined affine and deformable scheme and the use of transformer are justified by stating that affine registration is a key step to initialise deformable registration and the use of transformers enable to model long-range relationship between the input images. While I do not challenge these claims, I feel that the validation does not really highlight these contributions. Indeed, all brain images are here initially aligned using a centre of mass approach and they all roughly cover the same field of view. As a result, I suspect that all affine transformations will be close to identify. This is highlighted by the little impact that the affine step has on all evaluation measures. For example, there does not seem to be any catastrophic failure when using deformable only. Second, brains across individuals are somehow relatively similar (when compared to other organs) and this diminish the argument for needing to model long-range relationships between images as convolutions across the different layers are likely to be sufficient. As a result, I found it difficult to assess whether the improvement in evaluation measures are due to the methodological extension or to the use of different measures of similarity (e.g. CC versus LNCC), the different regularisation terms, the task specific hyper-parameter tuning of the proposed (against default values for others) or to the different network capacity of the evaluated approaches. I would thus encourage the authors to use datasets that really emphasize the added values of their contributions. One could for example consider images with different fields of view (e.g. spine imaging, VERSE dataset) and of organs that present greater difference across subjects (e.g. abdominal imaging). Lastly, as the difference in Dice between the approaches are relatively small, one could have discuss the potential impact of a 1-3% increase in Dice on a clinical application.

    Very minor notes:

    • I believe I_f should be P_f at the top of page 4
    • Section 2.2 (page 4). “features of the last L_d” should be L_a.
    • The CPU timings of the proposed approach between Table 1 (Nice-Trans) and Table 2 (Trans-decoder) are different.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main concern is that the validation does not really highlight the added value of the proposed.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This contribution is based on the NICE (Non Iterative Coarse-to-finE) network, but replaces all convolutional blocks in the decoder arm with SWin blocks. Furthermore, the coarsest level predicts a set of affine parameters rather than a displacement field, making the network a joint affine+nonlinear registration network. The network is trained on ADNI+ABIDE+ADHD+IXI and tested on Mindboggle+Buckner+LPBA, and reaches state-of-the-art accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. It follows good practice and reproducibility guidelines and provides details about the architecture and hyperparameters that allow the network to be easily reimplemented. While the novelty of the proposed network is relatively low (it combines an established coarse-to-fine registration architecture with SWin blocks that have been popularized last year), it does reach state of the art accuracy, with a sound methodology and validation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed methodology is very incremental (plugin the latest fancy block in the most popular registration architecture). This critique is not specific to this paper, but I do wish the community moved away from chasing .01 Dice improvement on a relatively artificial (inter-subject brain registration) and simple (within-modality, heavily pre-processed images) task. It is unclear how much rotation is naturally present in the training and testing dataset, and if any augmentation (artificial rigid and/or elastic transformations) has been used. It would be interesting to know the sensitivity of the affine component of the network to initial misalignment.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Very good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Does preprocessing include rigid and/or affine alignment? How much misalignment is there in the training set?
    • There are problems for which the affine component (and in particular scaling) is much more problematic than in brain registration (e.g. chest/abdomen). These tasks may be more interesting for a joint affine+nonlinear network
    • One issue I have with most coarse-to-fine architecture is that they tend to not recompute features in the decoder path. However, features are in general not rotation invariant (i.e. (K * x) o phi != K * (x o phi), where phi is more than a translation). When images are pre-aligned, it is not a big problem, but I wonder if it becomes one when images are initially rotated with respect to each other, as is the case here.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and easy to follow, and reaches SOTA results on the test datasets. However, in my opinion, it is quite incremental, which is why I give a “weak reject”.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal does not strongly sway my score. I do change it to 5 for consistency across similar papers that I have reviewed.

    Nonetheless, there is a consensus among reviewers that this paper (and other quite similar ones that I’ve reviewed) is frustrating because while the science is relatively sound, the methods are extremely incremental, are restricted to toy problems without real world application, and only yield small improvements above SOTA.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviews are quite borderline. The reviewers agree that the methodological contribution are marginal given the extensive literature on the topic. Similarly, the resulting improvements are minimal and lack context – is there a variance / staistical testing done? Were the hyperparameters adjusted while looking at the validation datasets? Are the fairly minimal improvements meaningful?

    I looked at the paper myself and agree with these comments. I think need to very carefully explain the value of the contributions and results, and describe carefully what their process to validating experimental results and their statistical significance is.




Author Feedback

We thank the meta-reviewer (MR) and reviewers (R1/R2/R3) for the comments, and we present our responses (RE) as follows: 1: The proposed method is incremental (MR/R1/R3); similar to Eppenhof et al’s study (R1). RE: Our study identified two important findings that advance the image registration community, and we respectfully suggest that they are not incremental contributions. First, we propose to perform both affine and deformable coarse-to-fine registration with a single unified network, which shows the feasibility of unifying the two separate registration steps. This is important as this can free the time/resources needed by separate affine registration. Second, our integration of transformers into the NICE registration architecture revealed insights on how transformers facilitate registration. By embedding transformers at different network parts (Table 2), we found that transformers benefit registration in modeling inter-image relevance but not in exploring intra-image representations. In addition, we like to differentiate our method from Eppenhof et al’s PTN. Apart from not using transformers, PTN also cannot realize NICE registration and joint affine registration.

2: Lack of context for inter-patient brain registration (MR/R1/R2); Evaluation on other tasks (R1/R2/R3). RE: In this study, we followed the commonly used registration evaluation procedure [7-9, 12-18, 20, 21 in the paper] and regarded inter-patient brain registration as the evaluation task. Inter-patient brain registration is a challenging task as there exist both large and subtle deformations between different scans (exemplified in Fig.2), and it is also a preferential evaluation task as it has been well-benchmarked by recent coarse-to-fine registration studies [12-18 in the paper]. We will evaluate our method on other tasks in our future study.

3: The improvements in DSC are small (MR/R2); statistical test (MR). RE: Our method was evaluated on highly-competitive, well-benchmarked registration datasets (Mindboggle/Buckner/LPBA) [12-18 in the paper], and the most recent SOTA methods published in MICCAI/MedIA [13, 14, 18, 20, 21 in the paper] were included into comparison. Under this challenging setup, small improvements are expected and our method still achieved a statistically significant improvement (P<0.05) in DSC. In addition to the DSC improvements, our method also provides convenience by jointly performing affine and deformable registration.

4: Evaluation is unfair; validation did not necessarily highlight the contributions (R2). RE: To ensure fairness in our evaluation, we used the official open-source codes with the same experimental settings for all comparison methods. Despite the various loss/regularization among comparison methods, all these methods were optimized for brain registration according to their papers. With respect to our dataset not being able to highlight our contribution, we like to suggest that brain registration is also a challenging evaluation task for affine registration (e.g., Mok et al.’s C2FViT in CVPR 2022) with considerable misalignments among brain MRIs (even after the center of mass initialization).

5: Misalignments in the dataset (R3); the sensitivity of affine registration (R1/R3). RE: We did not perform any data augmentation or rigid/affine registration during preprocessing. Therefore, the misalignments in the datasets are inherent among brain MRIs. Our experiments showed that our method worked well with the inherent misalignments (Table 1). The sensitivity of affine registration to different degrees of misalignments will be explored in our future study.

6: Other comments: hyperparameter analyses (MR), failure cases (R1), inappropriate statements (R1), and edit errors (R2). RE: Hyperparameter analyses were presented in the supplementary materials (Table S2/S3); Failure cases will be added/discussed in the supplementary materials; inappropriate statements and edit errors will be revised in the camera-ready version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper remains quite borderline. Overall, one reviewer raised their scores but remained quite concerns about aspects of the paper.

    Overall the rebuttal reads along the lines of saying that many previous papers have carried on experiments in this manner. This is somewhat reasonable, but also carries quite a bit of risk, as there is much more noise than signal out there. Overall, I agree with R1’s post-rebuttal comment that while this is likely a borderline-accept paper, the reasoning in the process could use improvement.

    I really hope that the authors can take into account the main concern from the reviewers – where although this may pass a certain ‘correctness bar’, it would be great if the authors clarified their impact and contributions before the camera ready and in further work.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There are no major concerns with the paper, however, the technical contribution is deemed incremental by all reviewers. Compared to classical methods and SOTA DL-based brain registration tools the improvements are rather small, the application is not clinically relevant and the methodological differences to concurrent MICCAI submissions (all evaluated in part on the same brain datasets) are neither explained nor obvious. Both reviewer that rated at least two of those similar papers expressed their frustration about this and I share their view. I recommend to reject the paper, since the community does not directly benefit from a circle of papers that incrementally improve upon one another from year to year (or yield comparable scores for concurrent submissions).



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The novelty of the proposed method and its performance improvement are marginal compared to existing NICE methods. The rebuttal did not provide sufficient information to address reviewers’ detailed critiques, such as experimental settings, methodological details, and preprocessing.



back to top