Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Nazim Haouchine, Reuben Dorent, Parikshit Juvekar, Erickson Torio, William M. Wells III, Tina Kapur, Alexandra J. Golby, Sarah Frisken

Abstract

We present a novel method for intraoperative patient-to-image registration by learning Expected Appearances. Our method uses patient-specific preoperative imaging to synthesize expected views through a surgical microscope for a predicted range of transformations. Our method estimates the camera pose by minimizing the dissimilarity between the intraoperative 2D view through the optical microscope and the synthesized expected textures. In contrast to conventional methods, our approach transfers the processing tasks to the preoperative stage, reducing thereby the impact of low-resolution, distorted, and noisy intraoperative images, that often degrade the registration accuracy. We applied our method in the context of neuronavigation during brain surgery. We evaluated our approach on synthetic data and on retrospective data from 6 clinical cases. Our method outperformed state-of-the-art methods and achieved accuracies that met current clinical standards

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_22

SharedIt: https://rdcu.be/dnwOW

Link to the code repository

https://github.com/rouge1616/ExApp/

Link to the dataset(s)

https://drive.google.com/drive/u/2/folders/1T2NS_BftaxE6yYZj3I1LdspuqNKwSyCl

Reviews

Review #1

Please describe the contribution of the paper

The author(s) proposed a method to rigidly align a preoperative MRI scan with a single intraoperative RGB camera image of the cortical brain surface for neurosurgery application (e.g., surgical navigation using Augmented Reality views). Core of the proposed method is a patient-specific pose regression network, trained on a set of synthesized RGB cortical surface images. The method was tested retrospectively on 6 patient datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- To my knowledge the proposed method is novel. Although methods which uses cortical brain surface images to register, measure or correct for brain shift were previously discussed, these methods either require 3D surface information, or pre-processing (segmentation) of the intraoperative image. In difference, the proposed method estimates the camera pose and orientation intraoperatively based on the original camera image. As the authors discussed, eliminating the preprocessing step will improve the update frequency for real-time application (author reported possible update frequency of about 22 Hz).
- The method was tested with retrospective patient datasets. Although testing was performed on only 6 patient datasets, results still indicate a potential to be generalizable to patient population.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The proposed method only estimates the rigid alignment between camera view and MRI volume. However, brain shift can have a significant influence on the final registration accuracy. In fact, the authors compare the results of the proposed method to 2 rigid registration methods which were suggested as an initial registration for the non-rigid brain-shift deformation correction.
- In its current state, the method might have limited clinical feasibility as it assumes a predefined focal length of the intraoperative camera. To the best of my understanding, to ensure correct estimation of camera pose, focal length of the synthesised training set images and intraoperative images needs to be identical. Focal length defined prior to the surgery might interfere with the ability to adjust the camera optimally intraoperatively.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author(s) have provided sufficient details in the paper to reproduce the implementations, as well as have made code and data publicly available. The work has therefore a high reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Although I found the manuscript in general easy to follow, please note that there are many spellings, punctuation and “copy-paste” mistakes in the text.
- The choice for the 5.27mm +- 0.75mm accuracy threshold for registration errors requires a more detailed justification. The referenced study measured the initial registration error using registration between intraoperative US and preoperative MRI. However, the same and other commercial systems also provide options for intraoperative registration with less noisy data and have reported higher accuracy. Would the required accuracy threshold not also depend on the type of clinical application?
- Result chapter, subsection “Metrics”: I believe the standard error for translation was suppose to be 5mm, not 5cm.
- During generating the synthetic training images, a deformation function D is applied to simulate brain shift. How was this function defined?
- In figure 5: I would find it very helpful if the original intraoperative images without the AR overlay could be added to this figure (maybe above the top row). The overlay hides the relevant features of the vessels in the original image.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the weaknesses I mentioned above, I find the work interesting and novel and of interest to the MICCAI community. However, the manuscript will need some editing before inclusion into the proceeding.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

Although the paper has weaknesses, with the changes, additions discussed by the authors in the feedback I believe the paper presents a well-worth contribution to the MICCAI conference.

Review #2

Please describe the contribution of the paper

This paper presents an intra-operative 2D/3D registration method that estimates the poses of intra-operative images of brain surface with respect to its corresponding pre-operative 3D mesh model segmented from MRI scans. Pre-operative 3D mesh models together with sampled camera poses and human brain surfacer textures are used to train a deep neural network that can synthesize simulated images. After that, the simulated images together with camera poses are used to train a pose regression network which maps each input image to its corresponding camera pose. Thus, the trained pose regression network can be used to estimate poses for intra-operative images. Synthetic datasets are used to train the pose regression network. Also, simulation and real experiments are performed to test the accuracy of the proposed pose regression network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Clinical significance was well illustrated. 2. The proposed method can avoid feature extraction and pose initialization of intra-operative images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The paper requires a significant overhaul for clarity as it is not well-written. Numerous typos and grammatical errors make it difficult to understand, and many sentences require rephrasing. 2. The mathematical formulations have not been correctly utilized. 3. The paper as a whole is challenging to follow and comprehend.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility depends on whether the code and dataset are to be open sourced. Otherwise, training two networks for generating synthetic datasets and pose estimation, and making the whole pipeline working take a lot of efforts in fine-tuning.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Firstly, I would like to say that the idea of estimating poses for intra-operative brain surface images without using distinct features is interesting. However, this paper lacks clarity, with unclear method descriptions and mathematical formulations in need of rectification and improvement. There are also numerous typos and grammar errors, along with many long sentences that are difficult to follow.

(1) “intraoperative”, “preoperative”, “pre-computed” and “pre-generated” etc., should be hyphenated or not hyphenated consistently throughout the text. For example, sometimes preoperative/ intraoperative are used, sometimes pre-operative/intra-operaive are used (in the title of Fig. 1). “Data set” or “dataset”, which one you prefer to use, please keep the consistency. (2) many words are misspelled, like “intraopearative”, “pairwisie” and “trasnformation” etc., (3) some abbreviations should be defined before they are used or replace them with their full name, such as “CBCT”, “DoF”, etc., “w.r.t” should be “w.r.t.” (4) misusing of singular and plural, for example, in the sentence “In more general applications, case-centered training of DNNs are gaining in popularity and demonstrate remarkable results [16].” “Are ” should be “is”, “demonstrate” should be “demonstrates”.
In the sentence “Code and data is publicly available”, “is” should be “are”. In the sentence “The ground-truth pose was obtained by manually aligning the 3D meshes on their corresponding images.” , “pose” should be “poses” and “was” should be “were”. In the sentence “The model training and validation was performed on the synthesized images while the model testing was performed on the real images. “was” should be “were”. In the sentence “These methods uses learning-based models to extract binary images and probability maps of cortical vessels to drive the registration.”, “uses” should be “use”. （5）some punctuation marks are wrongly used. For example, in the sentence “We believe that our method bring numerous advantages and makes progress towards providing accurate surgical guidance, in neurosurgery but also in other surgeries that could benefit from this work.” The comma “,” is not correctly used. In the third sentence of Abstract, there should be a comma after “stage” to separate the two clauses or use “and” , or “, thus reducing”. In the fifth sentence, the prep “on” should be “to”, and delete the second “we”. “here R ∈ R3°ø3 and t ∈ R3 represent a 3D rotation and 3D translation, respectively and A is the camera”, there should be a comma symbol after “respectively”. “As illustrated in Fig. 1, given a 3D surface mesh of the cortical vessels M, derived from a 3D preoperative scan and a 2D monocular single-shot image of the brain surface I,”, there should be a comma after “scan”, otherwise, it will says that the mesh M is derived from a 3D preoperative scan and a 2D monocular single shot image. The last sentence of the first paragraph, “making” cannot be the first word of this sentence. etc.,

(6) some sentences need rephrase and not easy to understand. For example, “The network architecture of PΩ consists of 3 blocks composed of two convolutional layers one ReLU activation.” “We use the set of generated Expected Appearances TP = {(Ii; pi)}i; with and optimize the following loss function over the parameters Ω of the network PΩ :” “It is also patient-specific from to presence of M.” “We also adapted the standard 5cm-5deg translation and angular error to neurosurgery and reduced it to 3mm-3deg.” “our method outperformed ProbSEG and BinSEG with an average ADM errors 3.26±1.04mm compared to of 4.13±0.70mm compared to 8.67±2.84mm respectively, below reported current neuronavigation errors [4].”

(7) some sentences or words are repeated. “Several methods exists to optimize Θ however they require a large set of annotated data [22] [3] or perform only on modalities with similar sensors [12] [25]. optimize Θ however they require a large set of annotated data [22] [3] or perform only on modalities with similar sensors [12] [25].” This sentence is repeated twice. “On the other hand, the test set consisted of the real images of of the brain surface acquired using the surgical camera and are never used in the training.” Double using “of”.

(8) some mathematical formations are not correct and precise. For example, the camera pose “R” \in R_{3 \times 3} is not enough, it belongs to the special orthogonal group SO(3). The reprojection error function is not written correct since the [R,t] is 3 \times 4 matrix and u_ci is 3\times1, and v_i is 2 \times 1. considering to use homogenous coordinates for all the expressions in this paper.

I have some doubts about the proposed method’s clinical applicability. In my view, the paper did not sufficiently discuss the advantages of using this system in clinical practice. For instance: (1) To my understanding, the accuracy of the method can be further improved by sampling more camera poses for each pre-operative mesh model. However, the paper did not illustrate how the number of sampled camera poses affects the accuracy. (2) There are structural and textural differences between synthetic and real images. As the network is trained only on synthetic images, I believe that the accuracy will decrease if the model is directly applied to real images. This concern is not addressed in the paper. (3) Although the authors introduced a deformation function to account for potential intra-operative brain deformations, they did not provide details about the function or how it affects the final result.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

2
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is in need of substantial revisions to improve its clarity, as it is not well-written. There are numerous typos and grammatical errors present which impede comprehension, and a considerable number of sentences necessitate rephrasing.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

1
[Post rebuttal] Please justify your decision

This paper exhibits a notable lack of clarity, characterized by ambiguous method descriptions and mathematical formulations that require rectification and enhancement. Moreover, numerous typographical and grammatical errors are present, compounded by the excessive use of lengthy sentences, which impede comprehension. Consequently, these deficiencies prevent the paper from meeting the standard requirements of the MICCAI conference. Regrettably, the authors’ rebuttal comments have failed to address my concerns regarding the methodological issues and clinical applications identified in this study.

Review #3

Please describe the contribution of the paper

A novel 3D/2D registration method of preoperative scans with intraoperative images of the brain surface acquired by a surgical camera was proposed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The technique content of this manuscript is novel. It does relay on processing the intraoperative camera-derived images to extract image features, which limits the application of traditional works.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The dataset for training and testing is too small.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the paper is unclear.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

This paper describes a novel 3D/2D registration method of preoperative scans with intraoperative images of the brain surface acquired by a surgical camera.

Major strengths: 1) The method that generates expected appearances for pose estimation of the virtual camera is interesting and novel. 2) The method is technically sound and the evaluation satisfactory. 3) The paper is very well written and organized, it is therefore easy to follow and read.

Major weaknesses: The dataset for training and testing is too small. The robustness of the proposed method for new clinical cases is questionable.

In summary, the manuscript is a good and novel. I suggest a strong accept of this manuscript.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of the method.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

7
[Post rebuttal] Please justify your decision

I maintain my original decision

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents an intra-operative 2D/3D registration method based on a pose regression framework (trained on synthetic images) to align intra-operative images of the cortical surface with corresponding pre-operative 3D mesh models segmented from MRI scans.

There are a number of strengths of this paper including novelty of the method, clinical significance, possibility for real-time application of the method, tested on real data of 6 patients. Furthermore, the code and data are being made available to the community.

At the same time the reviewers agree that the paper needs significant editing, missing formulas (e.g. deformation function), additional figures needed (Fig 5 see R1), discussion of the accuracy suggested, discussion about training on synthetic images and in turn robustness.

Author Feedback

We are glad that reviewers found our work “novel” (R1,R3), “of interest to the MICCAI community” (R1) and showing “clinical significance” (R2). The reviewers also found our method to be “technically sound” (R3) and “interesting” (R2, R3) and our evaluation to be “satisfactory” (R3) with “a potential to be generalizable to patient population” (R1).

Reviewers gave very constructive comments. We address here the principal comments/questions:

1 - Typos and grammatical errors: We acknowledge the presence of typos and grammatical errors and thank the reviewers (in particular R2) for listing them. We will thoroughly correct them, and improve the overall clarity of the paper by rephrasing some sentences.

2 - Clinical feasibility: We strongly believe that our work is an important step forward towards making Augmented Reality a clinically-viable navigation solution during neurosurgery. Our method eliminates intraoperative image processing that is the weakest point in deploying Augmented Reality clinically.

Accuracy with real images (R2): As mentioned in the paper (Section 3, paragraph Datasets) our method is directly applied on the REAL images: “We evaluated the pose regressor network on both synthetic and real data … the model testing was performed on the real images.”. The training is performed on synthetic images by transferring appearances from other patients, but we ALSO performed tests on 6 human dataset of real surgical views during neurosurgery that were excluded from the synthesized set. Our results outperforms SOTA.

Focal length (R1): We agree that focal length is important to ensure correct estimation of camera pose. In this work it is assumed to be known (similar to many AR research work). The network estimates a 6-DoF pose but could also estimate the focal length as an additional output. This training set must then include changes in the focal length by varying the focal length during the sampling of the projections. This is considered one of our future works.

Number of samples (R2): The number of samples is not that important as long as they spatially cover the range of plausible camera location/orientation. We followed the standard practices set by NeRF [16] (~100 views) and restricted the views to the clinically-plausible poses. The number of samples does not impact the accuracy, the number of textures does, as shown in Figure 4. The clinical feasibility is therefore not impacted by the number of samples.

We will add a Discussion section to discuss the clinical feasibility of our approach.

3 - Deformation: Our contribution in this paper focused on estimating a rigid (6-DoF) camera pose from a single view by learning expected appearances. Deformation was not considered and is part of our future works. The Deformation function D is simply used for data augmentation purposes during the data generation step, to add little geometric non-isotropic variance to the synthesized projection as described in [Ronneberger2015]. We will better explain the function in the paper. We focused on evaluating the appearance invariance in our experiments which is the core contribution of the paper.

Other:

The neuronavigation errors reported in [4] is indeed measured using iUS but does, in our opinion, represent best the initialization errors of neuronavigation systems since it is measured after dura opening and before resection. We will add more details to justify our choice. (R1).

We will add a row in Figure 5 with the original images without augmentation. (R1).

We will use homogeneous coordinates for all the mathematical expressions in the paper. (R2).

The size of the training set is sufficient since it encapsulates an adequate number of views and textures. It is important to recall that a model is trained for each case on a patient-specific training set. We will clarify this in the paper (R3).

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I would lean with towards agreeing with R1 and R3 that the paper would be interesting to the community and with the changes proposed by the authors in the rebuttal it is acceptable to MICCAI.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed a 2d-3d registration framework for camera pose estimation. The overall pipeline is easy to follow while the writing of this paper is very poor. I agree with Reviewer2 that this manuscript requires significant polishment before submission.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

After reading the paper, as well as the reviews and rebuttal, I agree that the approach has some interesting ideas and tackles an important clinical problem, particularly by eliminating the preprocessing step which would improve the refresh frequency of registration. However I have serious doubts in terms of the findings and conclusions that can be taken from this paper with such a limited size of dataset (n=6) for a synthesis DL pipeline. I do not agree with the authors that this small cohort is sufficient to capture the intrinsic variability of views and textures for neurosurgical applications. R2 also points out to serious clarity issues which should be addressed. I suggest to explore this further with a more extensive dataset and lean towards reject.

back to top

Learning Expected Appearances for Intraoperative Registration during Neurosurgery