Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Baochang Zhang, Shahrooz Faghihroohi, Mohammad Farid Azampour, Shuting Liu, Reza Ghotbi, Heribert Schunkert, Nassir Navab

Abstract

The accurate estimation of X-ray source pose in relation to pre-operative images is crucial for minimally invasive procedures. However, existing deep learning-based automatic registration methods often have one or some limitations, including heavy reliance on subsequent conventional refinement steps, requiring manual annotation for training, or ignoring the patient’s anatomical specificity. To address these limitations, we propose a patient-specific and self-supervised end-to-end framework. Our approach utilizes patient’s preoperative CT to generate simulated X-rays that include patient-specific information. We propose a self-supervised regression neural network trained on the simulated patient-specific X-rays to predict six degrees of freedom pose of the X-ray source. In our proposed network, regularized autoencoder and multi-head self-attention mechanism are employed to encourage the model to automatically capture patient-specific salient information that supports accurate pose estimation, and Incremental Learning strategy is adopted for network training to avoid over-fitting and promote network performance. Meanwhile, an novel refinement model is proposed, which provides a way to obtain gradients with respect to the pose parameters to further refine the pose predicted by the regression network. Our method achieves a mean projection distance of 3.01mm with a success rate of 100% on simulated X-rays, and a mean projection distance of 1.55mm on X-rays. The code is available at github.com/BaochangZhang/PSSS_registration

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_49

SharedIt: https://rdcu.be/dnwPu

Link to the code repository

https://github.com/BaochangZhang/PSSS_registration

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a X-ray/CT registration approach with an auto-encoder based network to help supervising the learning process with gradient information, followed by a attention block. The result is then refined by 100 iterations of refinement to find the best pose parameters that generates closest DRR image to input image.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main contribution of this paper is the refinement model, which is based on generating DRR images via DeepDRR, and then iteratively optimize pose parameters based on NCC lose. The refinement method proves to contribute the most of performance gain over simple pose regression models off-the-shelf backbone network. This paper also claims contributions including self-supervision, regression model with autoencoder, multi-head self-attention, incremental learning stragegy, but there is lack of evidence on the novelty and importance of these claims.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- In the introduction, this paper emphasis that one of the drawbacks of existing method [2] is “the final performance of this automatic registration method still relies on conventional refinement methods, which limits the computational efficiency of deep learning-based registration” and claim that the proposed method “overcome the aforementioned limitations”, but in fact the existing method is also heavily relying on a refinement method, where majority of computational cost and performance gain comes from that refinement method. In fact the method in [2] is even faster than the proposed method.
- There’s no direct comparison against any 2D/3D medical image registration method listed in the introduction. This paper only compared the proposed method against simple regression approach with off-the-shelf backbones. There’s no evidence on that the proposed method is superior than existing methods.
- one of the major contributions listed in this paper, “self-supervision” which is using synthetic DRR image as training data, is widely used in 2D/3D medical image registration problems as early as when deep learning is introduced in this field. This simply cannot be listed as a major contribution of this paper.
- the other major contribution: “regularized autoencoder and multi-head self-attention mechanism are embedded to encourage the model to capture patient-specific salient information automatically, therefore improving the robustness of registration”. To my understanding, the model is trained only on each individual patient CT model, and then test on the same CT model, or X-ray image from the same patient, which in short, the model is only trained from individual patient. How will the proposed component can encourage the model to capture even more specificity when the model is 100% trained on that specific patient? With the training and testing setup of this paper, this claim is quite absurd.
- The “Incremental Learning Stragegy”, what’s the necessity of that when you can simply generate more data with very little extra cost?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility is ok since most of the details are included in this paper. The code and data will not be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

In the paper, the authors should emphasis on the main advantages of the proposed method, and prove that with strong reasoning and evidence. For example, if the proposed method is not significantly faster and less depending on the refinement models, then the authors should not make such claim. Also, strong reasoning and evidence should be presented by rich and solid experiments that proves the claims.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

2
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presented an approach with several components to accomplish 2D/3D image registration. The claimed contributions are either not significant, and cannot be claimed as contribution, or lack of solid reasoning and evidence to be proved important. These major weakness need significantly more experiment and writing, and cannot be simply addressed during rebuttal. Therefore I recommend rejection for this paper.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

3
[Post rebuttal] Please justify your decision
1. Comparisons: The added experiments demonstrates the effectiveness of the proposed method over method in [2].
2. Refinement method: I agree that the wording makes it confusing, the authors should modify accordingly in the final draft.
3. Contributions: a) I don’t agree that “self-supervision” can be listed as a major contribution in the broad 2D/3D medical image registration domain. There are existing 2D/3D medical imaging registration methods in other applications that rely heavily on synthetic data to train, and employ some additional labeled data to refine and improve the performance. Perhaps a more proper description is that in this specific application and problem, this proposed automatic training without manually label is sufficient. Again, the authors should use more precise wording and avoid making broad and confusing claims. B) It is not sure how the “incremental learning” is better at overfitting than generating more data since the training is patient specific, i.e. train and test on the same patient. If the claimed contribution is true, it should be backed with experimental evidence, i.e. a generalization comparison on “incremental training” and training with more data.
In summary, the authors addressed some concerns of the paper, but there are still major weaknesses without being properly addressed. Therefore I recommend reject for this paper.

Review #2

Please describe the contribution of the paper

This is a nice topic of research that propose a robust rigid 3D/2D registration of pre-operative CT images against intra-op calibrated X-ray. The method relies on two steps: an automated initialization step and a refinement step. The automated initialization step is performed using a CNN-based regression networks trained with subject specific training dataset derived solely from data-augmentation using the pre-op scan of the patient. The regression CNN embeds state-of-the-art techniques: the latent space of a regularized autoencoder is combined with attention mechanism to regress the 6 dof parameters of the rigid transformation. Then, for the refinement step, a 3D/2D optimization is performed using a recent technique with differentiable DRR generation (ie. the optimization is guided by the derivative in opposite to conventional free-derivative optimization).The evaluation of performances used 6 datasets with virtual (DRR) X-rays, and one dataset used real fluoroscopic images (phantom).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-Originality of using the scan of the patient to obtain a patient specific CNN (that address issues of collecting huge dataset), even it was already proposed (but in other context)

-The combination of recent techniques: attention mechanisms for the init. step + Deep DRR for the fast refinement step

-The reached performances are interesting, method seems robust
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-The real X-ray used for the validation were X-ray of Pelvis phantom, only the skin is included in the soft tissues (and they are homogeneous), as a result, not organs are present, and authors cannot claim that the system is transposable to real fluoroscopic image (that involves noise, presence of surgical tools, wires, low contrast, superimpositions of various soft tissues, bubble of gaz, etc…). In summary, the virtual X-ray (DRR) generated from the phantom CT-scan and the real X-ray of the phantom are infinitely most similar than comparing an X-ray of a real patient and a virtual X-ray (DRR) generated from patient’s CT-scan. Generally, the performances drop due to the image style difference between DRR and real X-rays.

-Perhaps, listing some additional constraints to use the method would be useful to be mentioned: tolerance of superimposition of field of view for CT and fluoro, should be approximatively the same of field of view ? or larger for the CT ?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Sufficient information is provided to reproduce the experiments, for instance using a public CT dataset.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

In 2.2 refinement model:

“The ZNCC is then used as the loss function to minimize the distance between the fixed image and the moving image. With the powerful PyTorch auto-grad engine, the six pose parameters are learned iteratively. For each refinement process, the refinement model is online trained for 100 iterations using an Adam optimizer with a learning rate of 5.0 for translational parameters and 0.05 for rotational parameters, and outputs the pose with the minimal loss score.”

This is no more a training but a derivative guided optimization to maximize the image similarity. Suggests:

“the six pose parameters are learned iteratively” –> “the six pose parameters are optimized iteratively using Adam optimizer…” “the refinement model is online trained for 100 iterations” –> the optimal refinement is found using 100 iterations…

“The quantitative and qualitative evaluation results of our proposed method illustrate its superiority and its ability to generalize to real X-rays even when trained solely on DRRs.” –> Cannot see to which method it is compared to conclude in the superiority?

It would be nice to elaborate about the applicability with real X-ray of real patient (and anticipate issues and/or propose solutions).

typo: “our proposed method illustrate” –> illustrates typo: 3.1 Datasets: Our method on is : remove on
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Good paper that combines recent state-of-the-art techniques, the supplementary video is appreciated to summarize the method and results.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

8
[Post rebuttal] Please justify your decision

Authors adequality answer to reviewers’ comments.

The final version should mention “domain randomization” in perspective to reduce the gap between DRR and real X-ray images, which could allow methods validated on phantoms to have the same range of performances on real X-ray.

Review #3

Please describe the contribution of the paper

The paper proposed a pipeline of 2D-3D registration between X-ray and CT. The pipeline includes a regression network to predict projection pose which is trained by an incremental learning strategy, and then the predicted pose is used to initialize a follow-up refinement step by finding optimal pose based on the NCC loss.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed 2D-3D registration pipeline is novel by reasonably comprising two steps for a coarse-to-fine estimation of projection pose;
2. The training strategy for 1st-stage pose estimation is based on an incremental learning strategy to avoid overfiting;
3. Propose a similarity metric loss (ZNCC) to constrain the features extracted by encoder to contain necessary structure information;
4. An online refinement model of pose estimation from 1st-stage was proposed to generate fast DRRs and auto-grad for learning pose parameters iteratively.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. According to the result shown in Table 2, the proposed refinment model plays a very important role in optimizing the pose parameters, and the paper lacks of the comparison of the proposed pipeline with the refinement model initialized by another method to demonstrate the contribution of the regression neural network.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Clear pipeline introduction, while the introduction of the proposed refinment model is a bit simplified to follow its performance; Clear delination of experiments including DRR and X-rays cases.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

A nice paper including technical innovation and comprehensive experiments, and it tried solving a clinical issue e.g. spine intra-operative alignment between CT and X-ray images for navigation purpose, and the results seem to be promising.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. Novel design of the pipeline to predict projection pose in a coarse-to-fine strategy;
2. Several innovative points about pose regression network framework in the coarse stage, including a training strategy to avoid potential over-fitting;
3. An online refinement model of pose estimation from the coarse stage was proposed to generate fast DRRs and auto-grad for learning pose parameters iteratively, which is very important for performance according to the results in Table 2;
4. Not very comprehensive to demonstrate the effectiveness of the coarse pose estimation stage by comparing it to other methods.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

Well explained to the issues.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper introduces a patient-specific, self-supervised model for automatic X-ray/CT registration, leveraging a unique combination of an autoencoder-based network, attention block, and a two-step pose refinement process. Two reviewers provided a positive review of this work. However, there are some important questions raised as well. Therefore the authors should be given a chance to address the major points before this work can be accepted. Specifically, about the weakness points raised by the reviewers summarized below.

-Overreliance on refinement methods, which contradicts the paper’s claim of overcoming such dependency.
- Absence of direct comparison against any existing 2D/3D medical image registration methods.
- Evaluation using real X-ray images of a pelvis phantom is not representative of actual patient images with varied internal structures and potential artifacts.

Author Feedback

We appreciate all (meta-) reviewers’ (MR,R1,2,3) constructive comments. They found our approach unique (MR), interesting and novel (R2,3) with comprehensive experiments (R3), addressing an impactful clinical topic (R2,3) and achieving robust (R2) and promising (R3) performance. All (R1,2,3) recognized the importance and novelty of the proposed refinement method. We elaborate on their comments below.

Comparisons (R1&MR): We have compared with a regression-based 2d/3d registration method [15] (TMI2018), as shown in Table.1. We compared with [2] (TMI2021), [2] achieved about 91.9% of SR and 4.21 of mTRE on 6 DRR datasets. Our method achieved higher scores (100% of SR, 2.67 of mTRE) on DRRs. Additionally, we compared our refinement method with that used in [2] on our Xray dataset, given the same initial pose setting. In terms of NCC, SSIM, CSS and runtime, [2] achieved 0.9773, 0.9342, 0.9375 and 6.5s respectively; our method achieved 0.9880, 0.9469, 0.9503 (shown in Table.3) and 2.5s (shown in Conclusion) respectively.

Clarifying the view on refinement method (R1&MR): We want to emphasize that when talking about ‘conventional refinement’ methods referencing [2], we meant the traditional derivative-free optimization rather than a machine learning-based method as proposed here. This seems to cause some confusion. Relying on refinement does not pose a problem, in case it is consistent and intelligent. In fact, both initialization and refinement steps are vital for accurate registration. As noticed by R2, we propose differentiable DeepDRR for pose refinement (i.e., the pose parameters are learnable, and the optimization is guided by the derivative in contrast to traditional derivative-free optimization); we show in 1.Comparisons that the proposed differentiable DeepDRR method outperforms that used in [2].

Concerning real X-ray (R2&MR): We fully agree that the validation on X-ray of phantom cannot fully represent the performance on X-ray of real patients, but it shows that the proposed method has high potential. There are recent methods, e.g., Gao et al. (Nat. Mach. Intell. 2023) suggesting domain randomization to reduce the gap between DRR and real X-ray images, which could allow methods validated on phantoms to perform better also on real X-ray.

Clarifying our contributions (R1): 1) Our work is not the first one to use DRR for 2D/3D registration, but the one to employ it for automatic X-ray/CT registration on abdominal context without the need of manual annotation either on CT or on X-ray images, as recognized by R2. 2) For the proposed regression model, Table.2 shows that each component plays an important role. Fig.2 and supplementary video show the location of learned salient attention. The claim is not that self-attention makes it ‘more patient specific’ but enables it to capture salient information for each patient. 3) The regression model is trained via incremental learning strategy, which further gains some improvements as shown in Table.2. In fact, compared with generating more data, incremental learning strategy is often memory-saving and avoids overfitting, as also mentioned by R3.

Effectiveness of regression model (R3): The regression model aims to provide a good initial pose which plays a vital role before refinement. Compared with [15], the proposed regression model raises the SR from 67.13% to 95.25%, which means it is more robust and could lead to better initial poses.

Discussion and Correction (R2): We will dedicate detailed discussions on the constraints that the method would perform well (e.g., tolerance of superimposition of field of view for CT/Xray). These are important, even if in clinical routines, such constraints are often satisfied. Thanks again for constructive suggestions on expression and typo.

Reproducibility (R1): We will release the code upon acceptance, as we agreed in the system during submission.

Thanks again for your insightful comments and constructive suggestions.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors addressed most of the reviewers’ concerns satisfactorily. They clarified the comparisons made, justified the use of the refinement method, provided explanations for their unique contributions, and acknowledged the limitations of their validation approach. However, they may need to provide further evidence for their performance claims, particularly in relation to real-world application. Overall, the paper introduces novel elements and addresses a significant problem, although there are certain limitations. Based on the strengths and the author’s responses to the concerns raised, I recommend accepting the paper.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed a X-ray/CT registration framework. The overall writing of this work is good and the main framework is also easy to follow. I agree with Reviewer1 that this manuscript is quite over-claimed without full comparisons with other methods. Therefore, my final rating is reject.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Overall, very compelling paper with a clear rationale with an interessting loss function used for 2D/3D registration of X-ray images. In my opinion, the paper is fairly incremental in nature given the dependancy of previously proposed methods and the fact the method is not yet fully validated on real X-ray images really limits the impact of this. This is a very borderline paper but following the rebuttal, reviewers were still positive and would tend to slightly lean on the accept side.

back to top

A Patient-Specific Self-supervised Model for Automatic X-ray/CT Registration