List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Soorena Salari, Amirhossein Rasoulian, Hassan Rivaz, Yiming Xiao
Abstract
Homologous anatomical landmarks between medical scans are instrumental in quantitative assessment of image registration quality in various clinical applications, such as MRI-ultrasound registration for tissue shift correction in ultrasound-guided brain tumor resection. While manually identified landmark pairs between MRI and ultrasound (US) have greatly facilitated the validation of different registration algorithms for the task, the procedure requires significant expertise, labor, and time, and can be prone to inter- and intra-rater inconsistency. So far, many traditional and machine learning approaches have been presented for anatomical landmark detection, but they primarily focus on mono-modal applications. Unfortunately, despite the clinical needs, inter-modal/contrast landmark detection has very rarely been attempted. Therefore, we propose a novel contrastive learning framework to detect corresponding landmarks between MRI and intra-operative US scans in neurosurgery. Specifically, two convolutional neural networks were trained jointly to encode image features in MRI and US scans to help match the US image patch that contain the corresponding landmarks in the MRI. We developed and validated the technique using the public RESECT database. With a mean landmark detection accuracy of 5.88±4.79 mm against 18.78±4.77 mm with SIFT features, the proposed method offers promising results for MRI-US landmark detection in neurosurgical applications for the first time.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43996-4_64
SharedIt: https://rdcu.be/dnwQc
Link to the code repository
N/A
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
This submission concerns a novel application of Contrastive Learning to the problem of automatic landmark detection in multi-modal images. Using the RESECT dataset with MRI and US images of the brain, authors use two separate encoders to find patches in US that have the highest similarity to an initially queried landmark in MR. Landmark localisation accuracy is used for validation and comparisons are performed with 3D SIFT features.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is very well presented and easy to follow.
- The problem of landmark detection for validation of registration algorithms is well motivated
- The Contrastive Learning approach using patches seems to be novel, and an interesting workaround to solve the problem.
- The obtained results are in principle, superior to a classical SIFT approach.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- If I understood correctly, the main weakness I find is the employed 5 x 5 x 5 voxel neighbourhood used in the inference. This raises some questions on the validity of the presented results.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The paper is thorough and contains pretty much all details necessary to be reproduced. A minor detail I did not find were the ranges of the image augmentations in training.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
I find the paper to be very clear and introduce a novel and interesting framework for landmark detection in US and MR images. However, I have a major concern on how the method was tested:
- The methods section states that during inference, the search for candidate patches is only performed in a 5x5x5 voxel neighbourhood from the MRI landmark. Considering the resolution of 0.5 mm stated in the Dataset description, does this mean that landmarks that were shifted more than half of the diagonal of the patch (around 2 mm) cannot be a accurately identified? If this is the case, the results for large and medium shift may not be valid.
- With such a small limit, this implies that the method is only valid if a very accurate registration between the MRI and the US is already in place – since the objective of the framework is to validate the registration, the problem could become circular for clinical application.
- Finally, was SIFT applied with the same search constraint? If not, the comparison may not be fair. To strengthen the paper, the authors should clarify these details.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
My score is based on the potential reliance of the framework on a very accurate registration, which may make the problem circular.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
The authors have clarified the unit of the search range (5 x 5 x 5 mm) and therefore addressed my main concern with this submission (even though larger ranges would be desired in the presence of large brain shift). I understand the authors have constrained the SIFT search as well, but there should be more detail on the exact process through which this was achieved. Overall, I believe that even though results are not optimal, the paper presents an interesting approach to the multi-modal landmark detection problem.
Review #2
- Please describe the contribution of the paper
The authors presented a contrastive learning framework to detect corresponding landmarks between MRIs and iUS images in neurosurgery. The approach was trained and evaluated using RESECT; a comparison was performed against a SIFT method. The study reported a mean landmark detection accuracy of 5.88 mm with the proposed approach, vs. 18.78 mm with SIFT features.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- An automated and accurate method to identify corresponding features is an important and useful tool to extract features or landmarks from medical images to enhance surgical navigation. This task is particularly challenging when considering cross-modality applications, here e.g. MR and ultrasound. This work represents an interesting and potentially significant contribution to this overall effort.
- It appears the authors would make this code available; that initiative should be commended. Moreover, the use of RESECT/EASY-RESECT, known public datasets, facilitates comparison to better gauge the performance.
- The use of contrastive learning here is an interesting and notable contribution in terms of feasibility investigation on this topic and in this general domain.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- One of the major concerns is the dataset leveraged here has limited sample size (n=22), even with data augmentation techniques. Without additional data, subject-wise division of 70%:15%:15% as training, validation and testing sets, casts some questions regarding the generalizability of the framework when applied to a larger population with heterogeneity in disease states, scanner (MR and US) behaviors (including modality and protocol considerations, e.g. in this work T2 is favored over T1).
- While outperforming SIFT features compared in the study, average accuracy of 5.88 mm of locating corresponding landmarks introduces questions of clinical usefulness; the lack of discussion on clinical threshold in terms of accuracy presents some challenges on gauging the performance.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The work should be reproducible with reasonable efforts considering the checklist provided by the authors.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- There could be additional analysis for results in Table 1 by grouping the results for brain shift categories of small, medium and large to further help understand the impact of brain shift on detecting landmarks for both the proposed approach and SIFT.
- Another feature detector, in addition to SIFT, could be considered to further illustrate the performance of the proposed approach, especially here considering the performance of SIFT is so poor.
- Discussion on Subject 12 would be interesting as it is the only case where SIFT slightly outperforms the proposed approach.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The recommendation is based on the unique approach and the potential impact on multi-modal imaging enabled surgical guidance, admittedly the results could mature further and the clinical usefulness of the proposed approach in its current form remains a question.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
The authors’ feedback and responses to comments are appreciated; the recommendation remains the same. The concern regarding usefulness and adequacy of results in their current state remains.
Review #3
- Please describe the contribution of the paper
The authors are learning anatomical keypoints on brain ultrasound and MR images that can be matched for the ultimate goal of registration and tackling the brain shift problem. They have taken an 2.5D approach with contrastive learning on patches and worked on the EASY-RESECT dataset. In the evaluation they compare to the classical SIFT features.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The authors are tackling an important difficult problem. They demonstrate feasibility of the contrastive learning paradigm for multi-modal feature learning. Additionally, they use 2.5D learning to leverage the efficiency in speed and memory consumption of 2D networks over 3D ones. They mention that the 3D approach overfits, speculating that it’s due to the limited amount of data, which is a valuable finding. There are statistical tests showing that their results are significantly better than their chosen baseline.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The reported standard variation of the aggregate SIFT comparison seems incorrect, the number should be higher.
The comparison to SIFT is sub-optimally chosen. SIFT is, one, created with natural images in mind and, two, created for mono-modal applications. There are better choices. For example:
- Mattias P. Heinrich et. al “MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration”, Medical Image Analysis, 2012
- A. V. D. Evan M Yu Alan Q. Wang and M. R. Sabuncu. “KeyMorph: Robust Multi-modal Affine Registration via Unsupervised Keypoint Detection.” In: MIDL 2021
- Markova et. al “Global Multi-modal 2D/3D Registration via Local Descriptors Learning”, MICCAI 2022
- F. Zhu, M. Ding, and X. Zhang. “Self-similarity inspired local descriptor for non-rigid multi-modal image registration.” In: Information Sciences 372 (2016)
The work’s goal is to help in solving the issue of brain shift during a surgical removal of gliomas. However, the experiments are solely on images taken prior to resection (mentioned in discussion). The authors say that one might need more elaborate strategies like segmentation of the removed area to tackle the problem. But then the question is, how is the proposed method to be integrated in a helpful manner? Furthermore, what is the setting of evaluation of the errors? In the brain shift scenario, one has a good registration prior to resection and then this registration needs to be adjusted during and after resection.
Section 3.3 explains the method. There is an assumption that is used that the candidate landmark is within a limited range so the one is searching in a 5x5x5 neighborhood for the US one. This raises a couple of questions:
- Is this assumption applied in the baseline SIFT method? It would be a fairer comparison if it were.
- This implies that indeed one is measuring the error after resection when one can assume that prior to it the keypoints were perfectly aligned and now there needs to be an adjustment. But this is in conflict with the statement in the discussion section that the experiments have been employed only on images prior to resection.
It is unclear on which images the evaluation was performed.
Additionally, the current iUS to iUS registration for brain shift is more accurate. While this can be explained due to it being a mono-modal registration, what might be the motivation to register a during or post-resection iUS to the preoperative MR instead of the iUS itself? (reference: Pirhadi, A., Salari, S., Ahmad, M.O. et al. Robust landmark-based brain shift correction with a Siamese neural network in ultrasound-guided brain tumor resection. Int J CARS 18, 501–508 (2023))
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
A public dataset is used and it is stated that the code is going to be released. Reproducibility is ensured.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
The description on the setting and motivations for it could be clearer. On what kind of images is the evaluation performed? Only in the discussion we learn that only preoperative images were used but then where is this assumption coming from in section 3.3. that the candidate landmark is within a limited range so the one is searching in a 5x5x5 neighborhood for the US one? Is this assumption also applied in he SIFT matching?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While it is an important problem, a worth trying method for realization and it is done on a public dataset, the weaknesses seem predominant if not clarified otherwise. The work is concerning brainshift but it has been done only on preoperative images. It is multi-modal feature localization for subsequent registration but the baseline of sift is not the most appropriate choice to compare a multi-modal approach. Additionally, an assumption is employed in the method for the matching of the features that seems to have been omitted for SIFT. Potentially, SIFT would have a considerably better performance if the assumption is added to the matching of them as well.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The manuscript received mixed reviews.
Ther is agreement among reviewers that the method contains innovative aspects, is well motivated, and clearly presented. Availability of code is perceived as another strength.
There are, however, several concerns also. The chief concerns include:
- Questions around the validity of the results (use of small receptive field, which may invalidate results for medium and large shifts)
- Unclear the adequacy of method performance: errors are large and while better than SIFT, probably not close to clinical utility
- In addition to the above, unclear the strength of baseline approaches
- Concerns around small dataset size
Compelling answers to the above concerns, especially in regards to evaluation validity, are necessary.
Author Feedback
We thank the reviewers for their comments. All raised points will be addressed in the revised paper.
Small dataset (R2): The RESECT dataset has patients with brain tumors of diverse sizes, shapes, and locations. Our work proposes a 2.5D patch-based approach with 2D networks that helps mitigate the demand for large datasets. Here, we obtain image patches around landmarks (~16 landmarks/patient), where local anatomical variability is limited, further increasing the sample size. Finally, we used full cross-validation to perform training and testing to allow a proper assessment of all patients, ensuring the robustness of our solution.
Clinical adequacy (R2): Our work’s motivation is not for registration but for validating registration algorithms for brain shift compensation. Manual landmark pair annotation is highly time-consuming and thus unfit for intraoperative use. In the literature, the error for automatic medical landmark detection (often for CT & MRI with good contrast and resolution) is 3~4 mm (e.g., Ghesu FC et al. MICCAI 2016), while the ideal accuracy should be 1~2 mm. Nevertheless, the current accuracy is enough to automatically detect large registration errors. Despite the need for MRI-iUS landmark detection for clinical use, it is challenging due to the poor quality of iUS. Here, we present a novel framework for a difficult and important clinical problem, and the results are highly promising, with potential for future improvement.
Motivation for MR-iUS registration in brain shift correction & concern about using pre-resection images (R3): Despite the challenges with MR-iUS registration, MRI provides a more detailed view of the patient’s anatomy and is better at visualizing soft tissues than the US alone. Also, the largest brain shift often occurs after craniotomy (before resection starts) and will set the stage for the rest of the surgery. This is why we focus on pre-resection iUS data in the paper. The more minor brain shift during and after resection can be estimated with iUS-iUS registration and aggregated to that before resection estimated through MRI-iUS registration. However, compared with direct registration of during- and after-resection iUS to pre-op MRI, the aggregation strategy may accumulate residual errors at the two stages and can be less efficient.
Landmark search range and validity of results (R1, R3): We apologize for the oversight. In the paper, we forgot to add the unit for the search range. During the inference time, we searched within a range of [-5,5] mm in each direction in iUS around MRI landmark locations to find the best match (highest similarity). Note that this search range is an adjustable parameter by the user (e.g., surgeons/clinicians). Also, for SIFT-based landmarks, we did attempt to use constraints during landmark matching. Due to the inherent nature of SIFT, it can pre-select keypoint candidates based on feature strengths. Also, we tried to remove keypoints in iUS that were distant from reference MRI landmarks. However, with the consideration of adequate feature strengths, we saw no difference in the results of landmark matching.
Strength of SIFT as the baseline approach (R2, R3): SIFT is a well-known tool for keypoint detection and image registration. Specifically, it has been widely used in multi-modal medical registration and keypoint correspondences for image-guided neurosurgery for brain shift correction. For instance, please see the below papers:
Chauvin et. al., 2022. Registering Image Volumes using 3D SIFT and Discrete SP-Symmetry. arXiv preprint arXiv:2205.15456. Luo et al., A feature-driven active framework for ultrasound-based brain shift compensation, MICCAI 2018. Toews et al., Efficient and robust model-to-image alignment using 3D scale-invariant features. Medical image analysis 2013.
Also, Chauvin et al. (2022) showed that SIFT could extract contrast-invariant feature descriptors, which motivated us to utilize 3D SIFT.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Responding reviewers were satisfied with the responses, and from my assessment, the rebuttal adequatenly addresses the concerns of R3. The reviewers, however, also note that several limitations remain.
Consequently and because the rebuttal does not specify how these responses will be incorporated in the final submission, I strongly urge the authors to carefully revise their manuscript to well reflect both their answers and remaining limitations in adequate form.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This paper proposed an effective method for landmark detection. The constrastive learning framework benefits the crossmodal tasks. As pointed by the reviewers, the main problem is the weak baseline involved in the experiments. Besides, the issues of datasets and validations are also raised by the reviews. The rebuttal cannot fully address the issues above, which leads to my final rating: reject.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
While I appreciate the author’s rebuttal, the responses to the concerns about the usefulness and adequacy of results and fair comparison (performance comparison of the SIFT with the same matching assumption) are not sufficiently justified. Overall, the novelty and the impact remain low for this paper, even after the rebuttal.