Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Matteo Ronchetti, Julia Rackerseder, Maria Tirindelli, Mehrdad Salehi, Nassir Navab, Wolfgang Wein, Oliver Zettinig

Abstract

We propose a novel method to automatically calibrate tracked ultrasound probes. To this end we design a custom phantom consisting of nine cones with different heights. The tips are used as key points to be matched between multiple sweeps. We extract them using a convolutional neural network to segment the cones in every ultrasound frame and then track them across the sweep. The calibration is robustly estimated using RANSAC and later refined employing image based techniques. Our phantom can be 3D-printed and offers many advantages over state-of-the-art methods. The phantom design and algorithm code are freely available online. Since our phantom does not require a tracking target on itself, ease of use is improved over currently used techniques. The fully automatic method generalizes to new probes and different vendors, as shown in our experiments. Our approach produces results comparable to calibrations obtained by a domain expert.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_9

SharedIt: https://rdcu.be/cVRUP

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The author designs a phantom consisting of nine cones with different heights for ultrasound calibration. In addition, with the tip detection, the probe can be calibrated without requiring a tracking target on the phantom.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is a new calibration phantom for US probe. The tip of the phantom can be detected using machine learning automatically for easier probe calibration.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper designs a phantom consisting of nine cones with different heights for ultrasound calibration. However, the heights’ choice of nine cones should be further discussed and analyzed. Why it is the best design in this study.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) The designed phantom consists of nine cones. The relationship between the number of cones and the calibration accuracy should be analyzed. (2)The author said “we simulate Gaussian and speckle noise of various scales”. Please add more explanation. (3) The author uses 4 pairs of tips to produce a calibration hypothesis. Whether the more pairs of tips, the higher the calibration accuracy? (4) Weather the proposed calibration phantom is suitable for various ultrasound probe? For example, the linear probe and the curve probe.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The scientificity of the designed phantom needs to be further verified.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The influence of the US segmentation of the images for US probe calibration have not been evaluated. In addition, different CNN models should be tested. The author should give the standard deviation of the calibration errors. More information about the generation of the ground truth labels is also necessary. These issues haven been solved.



Review #2

  • Please describe the contribution of the paper

    The authors present a novel calibration technique that allows feature and intensity-based calibration. Their proposed method is fully automatic that extracts the locations of cones in their proposed phantom and using segmentation they can track them across the sweep.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Their goal is not achieving better accuracy than current state-of the-art methods, but they compare their results to the “Expert” calibration without human intervention. Accordingly, their method achieved similar results to the expert calibration both in average error and distribution shape.
    • Their proposed phantom design is easy to manufacture
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Augmentation method is not explained well
    • Testing is performed on devices from three manufacturers only.
    • The weaknesses of the method are not described well.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The phantom CAD model and machine learning model are freely available online. They mentioned a reference implementation that is also are available on github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Explaining the augmentation method
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    -Novel phantom design, easy to manufacture and open source code.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper presents a procedure for combining phantom-based and CNN image-based calibration methods to provide accurate, automatic calibration of tracked ultrasound probes. The proposed method was also tested on a dataset using an ultrasound probe not used in development, demonstrating generalizability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Generalizability: The inclusion of multiple set-ups with different probes and vendors in the dataset is a key strength of this work. As well, the experiment explicitly introducing images from a new system and quantifying the error strengthens the claim of generalizability.

    Dissemination: The commitment to providing openly available plans for 3D printing the phantom in different scales and the code is another major strength of this work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Need: Although the limitations of current phantoms are described in the Introduction, it is not made clear how the proposed phantom would overcome some of these limitations (ex. manufacturing tolerances will still affect the present design) and therefore it is unclear why a new phantom is needed rather than adding an image-based refinement step to an existing phantom.

    Limitations: No discussion of the limitations of the proposed method is included in the paper.

    Details of CNN: The authors describe using a CNN for segmentation of the images for image-based refinement; however, many details about the implementation are missing, most critically a description of the dataset splits used for training and testing. More information on the generation of the ground truth labels is also necessary.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper demonstrates a strong commitment to reproducibility by making the 3D printing plans and code freely available; however, key implementation details, such as hyperparameters and details of the architectures and training performed are missing from the paper itself.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overall, this paper was interesting and well-written, solving a useful challenge; however, the following need to be addressed to properly understand the work.

    Major Revisions: 1) Introduction: Although thorough and clearly written with the limitations of current methods described, it is not entirely clear how the proposed approach will overcome the limitations of previous methods and why the need for new phantom rather than combining image-based detection with any of the existing phantom approaches. 2) The paper should acknowledge the limitations of the proposed approach and discuss. 3) Approach: Details of the CNN implementation should be provided, most notably the dataset split. 4) Approach: Data labelling - Details of how the tracked sweeps and label map are registered are not provided. This is critical to explain, as this affects the ground truths used for training and evaluation. Were the automatic segmentation maps validated in some way prior to use?

    Minor revisions: -Fig. 1: as the two sub-figures (a and b) are extremely similar, including both does not add much value. It would be better to show a photo of the physical phantom in (a) instead. -Experiments and Results: given the large disparity between the average and median distance metrics (~10mm vs. ~1 mm), a rationale for the outliers should be provided, as well as a normality test on the data to aid in the interpretation of the results

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and the techniques proposed would be of interest to the MICCAI community; however, the amount of critical information and discussion that is missing in the current version must be addressed in order for the work to be properly understood.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose a new method for fully automated calibration of ultrasound. The reviewers have found value in the manuscript but ask for more clarity on some aspects that should be addressable. I suggest authors focus on:

    • what are the weakness of the method
    • describe better the augmentation approach
    • explain the choice of tip lengths

    Also please read the reviews carefully and address all of the reviewers comments

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We thank the reviewers for their friendly and constructive feedback. We have already improved our manuscript within the format requirements and can thus confidently say that the following comments will be reflected in the final paper version, should it be accepted.

We agree with all reviewers that the method’s weaknesses should be clarified. These include the complexity of our approach (3D-printed phantom, CNN, RANSAC, intensity-based refinement), though we believe the initial setup amortizes quickly with regular use, and making code and models open source mitigates this issue as well. Second, the algorithm needs the cone’s tip and base to be visible in the same frame to estimate its height. This limits the range of angles that the probe can cover and might affect the accuracy of the produced calibration, see also discussion of cone heights further down. Third, the technique may not be suitable for more specialized transducer geometries such as endoscopic or intravascular probes. There is no limitation to most conventional extra-corporeal transducer types. And fourth, the technique is dependent on the CNN performance. Re-training and/or refinement may be needed for a robust detection on certain US scanners and more exotic probes.

Related to the weaknesses is the question of how the proposed approach will overcome the limitations of previous methods. In particular, the question was raised why we could not combine any of the existing phantom approaches with image-based calibration. For this to work and out-compete our method, an existing phantom would have to a) not require being tracked itself, b) be designed in such a way that a globally unique calibration can be automatically obtained from any supported scan configuration, and c) provide sufficient echogenicity in multiple directions without depth-dependent reverberation artifacts to support intensity-based refinement. To the best of our knowledge, no such phantom exists. For instance, the commonly used N-wire phantom fulfills a) and b) but the thin wires are not suitable for the image-based approach due to the limited overlap. Nevertheless, we did perform experiments with the N-wire phantom and obtained comparable calibration errors: range 1.11-2.54mm for the wire phantom and range 0.88-1.93mm for our method without image-based refinement. Since our goal was to be as good as reasonable possible and thus non-inferior to expert calibrations in a fraction of the time, and also because an intensity-based refinement after N-wire calibration would have required additional scans on yet another phantom, we ultimately did not see a benefit of adding this complementary data to the manuscript. Instead, we continued evaluations after the submission deadline on different hardware and will provide more elaborate supplementary material together with the source code on Github.

Multiple reviewers asked about the data augmentation, which was performed as follows. We re-scaled every frame to 1mm spacing, cropped a random 128x128 section, applied Cutout with random intensity, added speckle noise at a resolution of 64x32 and Gaussian noise at full resolution. Cutout augmentation is used to emulate having some parts of the cones not visible, while the speckle noise is sampled at a lower resolution to create larger scale speckles.

Finally, we want to clarify that cone heights are spaced uniformly in the range 20-60mm. A wider range would make cones more distinguishable but would also limit the range of usable imaging depths. We thus chose this range to make the phantom usable for conventional linear, sector and convex probes. We have found that using 9 cones is a good compromise between the number of keypoints and the number of mismatches due to similar cone heights. Unfortunately, the paper format does not leave sufficient space for an in-depth analysis of this question.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors have focused on the major criticisms in their rebuttal, and have successfully addressed them. Some of the original requests by reviewers, such as comparison to other methods and further experiments remain, however rebuttal should not be adding experiments and I believe the paper holds as is.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addresses the remaining issues well. The paper brings an interesting contribution, especially given the authors’ decision to make the phantom blueprints and their code available. This is an incremental but useful contribution to the community which the AC recommends to accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I thank the authors for their effort in addressing the questions raised by the reviewers. Although many of the issues have been addressed (such as details of data augmentation), I still think this work has limited impact. For example, the algorithm needs the cone’s tip and base to be visible in the same frame to estimate its height. This limits the range of angles that the probe can cover and might affect the accuracy of the produced calibration. In addition, re-training and/or refinement is likely needed for detection on different US scanners.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top