Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yi Qin, Xiaomeng Li

Abstract

Unsupervised deformable image registration is one of the challenging tasks in medical imaging. Obtaining a high-quality deformation field while preserving deformation topology remains demanding amid a series of deep-learning-based solutions. Meanwhile, the diffusion model’s latent feature space shows potential in modeling the deformation semantics. To fully exploit the diffusion model’s ability to guide the registration task, we present two modules: Feature-wise Diffusion-Guided Module (FDG) and Score-wise Diffusion-Guided Module (SDG). Specifically, FDG uses the diffusion model’s multi-scale semantic features to guide the generation of the deformation field. SDG uses the diffusion score to guide the optimization process for preserving deformation topology with barely any additional computation. Experiment results on the 3D medical cardiac image registration task validate our model’s ability to provide refined deformation fields with preserved topology effectively. Code is available at: https://github.com/xmed-lab/FSDiffReg.git.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_62

SharedIt: https://rdcu.be/dnwxf

Link to the code repository

https://github.com/xmed-lab/FSDiffReg.git

Link to the dataset(s)

https://acdc.creatis.insa-lyon.fr/description/databases.html


Reviews

Review #3

  • Please describe the contribution of the paper

    The authors propose a novel solution for unsupervised deformable image registration in cardiac images. The proposed framework utilizes two modules: Feature-wise Diffusion-Guided Module (FDG) and Score-wise Diffusion-Guided Module (SDG). FDG guides the deformation field generation by utilizing multi-scale intermediate diffusion features, while SDG optimizes the deformation topology preservation using the diffusion score. The paper claims that extensive experiments show impressive improvements over all baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper proposes a improved solution based on recent diffusion work for unsupervised deformable image registration in cardiac images that utilizes two modules: Feature-wise Diffusion-Guided Module (FDG) and Score-wise Diffusion-Guided Module (SDG).
    • The proposed framework effectively guides deformation field generation by utilizing multi-scale intermediate diffusion features and optimizes deformation topology preservation using the diffusion score.
    • The paper claims that extensive experiments show impressive improvements over all baselines.
    • Potential impact: The proposed framework has potential applications in medical imaging and could potentially improve the accuracy of deformable image registration.
    • The paper is well-organized and clearly presents the proposed framework, experimental setup, and results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Regarding the use of multiple “wise” methods such as feature-wise and score-wise, it is not uncommon, and the author has just combined them in the diffusion model. Therefore, the novelty of this approach is debatable.
    • The author’s method may be sensitive to the weight of scoreNCC and lacks discussion on this aspect.
    • it is unclear whether it would be effective for other types of medical images, there is a lack of discussion on the next steps for future research.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is average with some missing details, but the author declares to provide the training code and testing modes, which seems promising.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Compared to the improvement in accuracy brought by the two types of “wise” methods, it would be more clinically and scientifically relevant to investigate whether they have an impact on generalization, which would also be more interesting.
    • Performing appropriate statistical analysis about the sample size used in experiments would increase confidence in the reported results.
    • The compared methods and experimental data are both relatively small, and there is a lack of strong support.
    • It would be better if there are some discussions about how does the diffusion model’s latent feature space help with modeling deformation semantics.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper has some drawbacks and is not very novel, but it has some degree of rationality. The advantages are believed to outweigh the disadvantages to some extent.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    FDG uses diffusion decoder to generate diffusion latent features to help predicting the deformation on multiple scales. SDG uses the diffusion score to guide the optimization process for preserving deformation topology.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors have introduced new ways to utilize information from the diffusion models by taking advantage of the features and the score to guide multi-scale DVF estimation and weight adjustment. While such rationale and choice warrant further justification, I applaud such effort.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are many encoder-decoder schemes that can generate multi-scale features and there are also many attention schemes. While the authors have empirically illustrated that the (f-m) or diffusion latent generally exhibit similar spatial location, it is yet to be rationalized (1) why they are the bottleneck of performance in registration, since (f-m) really depends on both the motion magnitude and the local intensity gradient - while the former presents challenge, the latter typically favorably drives the deformation estimation. (2) score functions are gradient of the log pdf, and does not naturally lend itself to an attention in registration. I cannot help but wonder whether weighting w.r.t. the score would be more preferrable than alternative attention maps.

    Similarly, while the diffusion features have flavors of multi-scale, they are not the only ones. What about the alternatives?

    In terms of result, it is very surprising that the ablation study without either FDG or SDG (table 2 first row) - so effectively getting rid of major improvement - still outperforms all the other SOTA benchmarks.

    What is the justification of taking the feature level derived deformation fields and taking average?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    OK

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please support your choice of using the diffusion as the backbone for generating the multi-scale feature and the reweighting modifier S. What about simpler alternatives such as VAE for the former, and attention maps for the latter?

    Please clarify the reason why your barebone network (without FDG or SDG) performs better than the benchmarks, in the sense of DSC. What has been done differently that contribute to such performance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The ideas are generally sound and can use more justification. Experiments and reporting are appropriate. There are some skepticisms on the design choice and fairness of comparison that could use more clarification.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    In this paper, we present a novel diffusion model for registration. The novelty lies in the extension of previous work with feature- and score-based diffusion-guided modules (FDG and SDG) for improved guidance that focuses on significant image regions and preserves the topology of the computed deformation vector fields.

    The approach is validated in an ablation study and compared with other state-of-the-art methods for cardiac image registration. The results show an improvement in dice values and realism of deformations in terms of a lower number of negative Jacobians is shown.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel and original extension of diffusion-guided image registration.

    • Presentation of two complementary techniques (FDG and SGD), of which each is an improvement. The advantage of combining the two methods is demonstrated in an ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The experiments are somewhat limited to registration of the heart with only minor changes in the morphology and appearance of the structures.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code, data, and models will be published in full, and the hyperparameters for training will be given in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • In part 2.1, the loss L_register is mentioned but not defined. Please include an explanation of L_register in the paper.

    • According to Eq. (2), F_R^i consists of the feature maps from the previous step (i-1). However, in Figure 2, it looks like the feature maps from the current step (i) are concatenated because the same index i is used in the feature map in the feature-wise diffusion-guided module and in the three feature maps in the registration decoder. Please clarify this in the paper.

    • Also, how is the first feature map F_R created?

    • It is unclear what value for gamma was used in the experiments. Please clarify this in the paper.

    • The discussion of the results is somewhat brief. In particular, I would appreciate if you could elaborate on the significance of the improvements obtained and the differences shown in Figure 3, as they are not very clear.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • new interesting original method, novel diffusion model
    • results only on one single data set
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers all found the paper to be sufficiently novel and to contain promising experimental results to be recommended for acceptance. The proposed attention guidance seems to provide a good balance of improving accuracy and reducing complexity of the estimated deformations. The adaptations for diffusion-based 3D image registration are only evaluated on a single dataset and it is recommended to extend the validation in future work (e.g. the Learn2Reg challenge). I would encourage the authors to incorporate the minor suggestions of the reviewers in a final version and also report the standard-deviation of Jacobian determinants, because only showing the fraction of negative values may put a high emphasis on few outliers.




Author Feedback

We appreciate all valuable comments from reviewers and AC. Overall, reviewers found our method novel and original (R1), and our idea was generally sound (R2). R3 acknowledged that our paper is well-organized with a clear framework, experimental setup, and results, highlighting its potential contributions to the deformable image registration community. The main concerns raised by reviewers revolve around the standard deviation of Jacobian determinants (AC), results on additional datasets (R1), design choices (R2), and statistical analysis (R3). Below, we addressed these concerns accordingly.

->[AC] Report the SD of Jacobian Determinants. The SDs of Jacobian Determinants are 0.176 (Ours), 0.178 (DiffuseMorph), 0.183 (VM), and 0.182(VM-Diff).

->[R1] The definition of L_register and hyperparameter gamma. L_register is the same as L_scoreNCC, and gamma was selected as 1 in our experiments.

->[R1] The generating source of feature maps F_R. The F_R^i is generated from F_R^{i-1} (previous step i-1), F_E^i, and F_G^i (current step i), and the first feature map F_R^0 is the direct output z of the encoder.

->[R1] The significance of the improvements and visual differences of the results. The major improvements observed include better alignment of the myocardium and an overall visual similarity to the fixed target image. We will enhance the clarity of these improvements in Figure 3.

->[R2] The rationale for using the diffusion model. While alternatives such as VAEs do exist, the diffusion model is better suited for this task because 1) the training target of the diffusion model is simpler than the alternatives while the semantics of the diffusion model is still rich, making it suitable for tasks with limited available data. 2) Prior work, such as DiffuseMorph, has demonstrated that the diffusion model can capture more meaningful deformation semantics compared to other generative methods.

->[R2] The rationale for feature-level deformation field fusion. The features obtained at different levels of the network attend information at different scales and granularity. A commonly adopted method for merging these deformation fields is to take their average. In our ablation study, we evaluated the direct derivation of the deformation field (without FDG), and the results were inferior compared to our proposed method.

->[R2] The meaning of f-m in Figure 1. The (f-m) figure is used to empirically demonstrate the areas where large deformations are likely to occur. Though f-m does indicate the local intensity gradient, these subtractions are more likely to be linear deformations. Our method aims to eliminate non-linear deformations in addition to these primary illustrations.

->[R2] Why the result without the proposed FDG or SDG was better than the baseline methods? The result trained without FDG also uses the denoising diffusion decoder but generates the deformation field from the encoded feature directly. Therefore, the encoded image pair features trained by the denoising diffusion model still have strong semantics. However, the results without FDG or SDG showed only marginal improvement over baseline results, which indicates the importance of feature-level deformation field generation and the reweighing scheme.

->[R3] The weight of L_scoreNCC. The weight of L_scoreNCC is empirically set to 20 in the experiment, and we will add a quantitative plot for sensitive analysis.

->[R3] The statistical analysis of the results. The results of paired t-test and Wilcoxon signed rank test all showed p<0.005, which indicates significant statistical differences between the results of our method and all baseline methods. We will add this result to the final version.

->[R3] The generalization of our method.
Our proposed work models the non-linear deformation semantics using the diffusion model. Therefore, it is sound to generalize to other registration tasks and images. We will discuss it as the future direction in the conclusion.



back to top