Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Viktoria Markova, Matteo Ronchetti, Wolfgang Wein, Oliver Zettinig, Raphael Prevost

Abstract

Multi-modal registration is a required step for many image-guided procedures, especially ultrasound-guided interventions that require anatomical context. While a number of such registration algorithms are already available, they all require a good initialization to succeed due to the challenging appearance of ultrasound images and the arbitrary coordinate system they are acquired in. In this paper, we present a novel approach to solve the problem of registration of an ultrasound sweep to a pre-operative image. We learn dense keypoint descriptors from which we then estimate the registration. We show that our method overcomes the challenges inherent to registration tasks with freehand ultrasound sweeps, namely, the multi-modality and multi-dimensionality of the data in addition to lack of precise ground truth and low amounts of training examples. We derive a registration method that is fast, generic, fully automatic, does not require any initialization and can naturally generate visualizations aiding interpretability and explainability. Our approach is evaluated on a clinical dataset of paired MR volumes and ultrasound sequences.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_26

SharedIt: https://rdcu.be/cVRS8

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a rigid-registration neural network based on descriptors (local features) extracted from 2D US and 3D MR images. The method is adapted from LoFTR (Local Feature matching with Transformers). Each image is processed with separate U-net like neural networks to produce feature maps. A similarity matrix is filled with the dot products of pairs of descriptors, then filtered with a “double softmax” to isolate significant “matching” features. This matrix is the basis of the loss function of an end-to-end registration network. The pose is finally estimated from the matches with RANSAC. The method is evaluated on MR + US liver images of 16 patients. Several versions of the method are compared, and overperform a baseline method from ImFusion with statistical significance. While the registration error remains relatively large (on a difficult task), the proposed method is interesting at least as an initialization for more accurate methods which require a close initial pose.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the method is concise and well described
    • it is adaptable to different image dimension and modality
    • no prior segmentation nor pose is needed, although this information could be taken into account if available.
    • the results are very interesting, especially with a far initial pose
    • nice mitigation of the US images FOV
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • processing successive images in a sweep is a plus, but the “Multiple frames” section is really brief
    • the generation of your ground truth pose is unclear, which could impact both the training and results. You use uniformly distributed keypoints to limit the avoid the need for annotated landmarks, but are these points reflecting a matching aven with your “softer loss” (page 3)? The estimation of the ground truth pose by manual registration is also questionnable. Was it done and validated by clinicians?
    • the maximum errors (Fig.2) remain very large, which is a strong limit for any clinical use. The failure cases should at least be discussed (image properties, far initially pose, … ?)
    • it would be interesting to evaluate your method on a dataset with annotations available, to better characterize your results on TRE. You should find several registration challenges with appropriate data (e.g. Learn2Reg or CuRIOUS).
    • while your study is a very good first step, rigid registration is always limited to estimate non-rigid deformation… First evaluate on a more adequate task?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is very well described and could easily be reproduced. The evaluation dataset is not public, though.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    “Despite all these advantages, to the best of our knowledge, these approaches have been applied the medical domain only to a limited extend. ..” -> An example of MR/UR registration based on SIFT descriptors could be: https://doi.org/10.1007/s11548-018-1786-7. There are others.

    “Multi-modality and multi-dimensionality Different modalities exhibit different visual appearances and also emphasize different structures. We overcome this issue by jointly training two distinct networks (a 2D model for US and a 3D one for MR) to produce cross-modality descriptors.” -> the two networks will provide descriptors for each modality, but are the extracted features the same? Probably not, at least due to the different physics behind each modality. This could be discussed.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method seems sound, and results are promising on a challenging task.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents an automated 2D/3D registration between US and MR images for ultrasound-guided interventional applications. The algorithm follows a classical feature/landmark matching approach but with features learnt from 2 Unet architectures for US and MR images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very nicely written. The motivation are well explained which connects well with the method and results section.

    The authors clearly had a good understanding of the clinical problem and the proposed method can well addresses the problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method used is not entirely novel since it’s largely based on LoFTF algorithm. On the other hand, it has been well adapted to solving the problem for medical images.

    The results are encouraging but still not reaches a very high accuracy which is needed for most image-guided intervention.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The steps involved in the experiment design are clearly explained but lack a bit of details. The code base seems to be unavailable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It would be nice to see how the algorithm can be generalized to different ultrasound scanning setting, e.g. different depth, frequency, etc.

    The dataset sample size is a bit small, which is why the authors used cross-validation. But it would be nice to have a larger set for a proper validation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the method is not particularly novel and results still needs a bit improvement, the paper is very clearly written and has a very good clinical relevance.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution is the adaptation of the LoFTR algorithm to multi-modal data and by considering imprecise ground truth.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Well written Consistent bibliography Clear contribution

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Validation with only 16 patients Incremental improvement of existing work (it is an improvement of LoFTR algorithm)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The description is quite short and probably difficult to reproduce, see for example section 2.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The goal is to improve multi-modal registration. The work focuses on US and MRI, of abdominal images. It is challenging due to the noisy data acquired with US and also because the abdominal part is highly deformable. The proposed method is based on keypoints extraction and matching. The main contribution is the adaptation of the LoFTR algorithm to multimodal data and with imprecise ground truth. First of all, two models are trained (U-Net) : one for 2D images and one for 3D data for estimating cross-modality descriptors (matching score are computed by scalar product between features extracted with the 2 differant models). Then, problems about the imprecision of the ground truth by considering data augmentation.

    The approach is validated on 16 patients with ablation studies.

    Questions :

    • “In order to not consider the ultrasound geometry but rather the structural features, we further augment the ultrasound data by cropping it in a random polygon shape” : it is unclear what is really done and more important why.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the approach is a little bit limited.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper received consistently positive remarks from all reviewer on writing and execution of the proposed adaptation of LoFTR method to medical imaging. The work is likely to be of interest both from an application and methodological view point and the MIC as well as the CAI communities. There are no clear weaknesses or question to be addressed in a potential rebuttal, therefore the paper may be accepted as is.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank the reviewers for their positive and constructive feedback. To answer the posed questions:

  • The registration was done manually by a domain expert.
  • We acknowledge that the method is not ready for the clinical practice, and this work should be considered a feasibility study for a different approach to multimodal registration. The problem of global Ultrasound to MR registration is an open problem, in particular around the abdominal section where the ultrasound probe pressure causes significant deformation.
  • We observed that most of the failure cases were among intracoastally taken sweeps. These sweeps are particularly hard to register because of the shadows produced by the ribs (I don’t know if ribs is the correct medical term). Furthermore our dataset contains few intracostal examples, therefore our model exhibited lower performance in those cases. We expect that this issue could be mitigated with additional training data.
  • Regarding the question on PolyCrop augmentation we should first clarify what we meant to express with “ultrasound geometry”. Ultrasound frames taken with a curvilinear probe have an intrinsic shape (fan geometry) that cannot cover a whole rectangular image, therefore black pixels are added around the ultrasound data to obtain a rectangular image. This process creates strong edges that are consistent across images. The trained model can use these edges as spatial clues. For example to infer that a certain part of the image is in the top left of the ultrasound frame. This would prevent the model from generalizing to different sweep motions. Our PolyCrop augmentation, randomizes the location of these edges rendering them not informative.

We will include these details in the revised version.



back to top