Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yordanka Velikova, Walter Simson, Mehrdad Salehi, Mohammad Farid Azampour, Philipp Paprottka, Nassir Navab

Abstract

Abdominal aortic aneurysm (AAA) is a vascular disease in which a section of the aorta enlarges, weakening its walls and potentially rupturing the vessel. Abdominal ultrasound has been utilized for diagnostics, but due to its limited image quality and operator dependency, CT scans are usually required for monitoring and treatment planning. Recently, abdominal CT datasets have been successfully utilized to train deep neural networks for automatic aorta segmentation. Knowledge gathered from this solved task could therefore be leveraged to improve US segmentation for AAA diagnosis and monitoring. To this end, we propose CACTUSS: a common anatomical CT-US space, which acts as a virtual bridge between CT and US modalities to enable automatic AAA screening sonography. CACTUSS makes use of publicly available labelled data to learn to segment based on an intermediary representation which inherits properties from both US and CT. We train a segmentation network in this new representation and employ an additional image-to-image translation network which enables our model to perform on real B-mode images. Quantitative comparisons against fully supervised methods demonstrate the capabilities of CACTUSS in terms of Dice Score and diagnostic metrics, showing that our method also meets the clinical requirements for AAA scanning and diagnosis.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_47

SharedIt: https://rdcu.be/cVRuy

Link to the code repository

https://github.com/danivelikova/cactuss

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a pipeline to segment the (healthy) aorta in US images without having to use labelled US data. They do so by training a contrastive generative network to translate between intermediate representation (from CT) to ultrasound images. Given that the CT scans are labelled, they can then train a network to segment the aorta in the IR.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clearly written, well explained, and good figures make the concept easy to understand.
    • Novel IR methodology using conv. ray tracing instead of simple edge detection
    • Clinically acceptable results on volunteer data
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No patient data with AAA diagnosis or borderline AAA. Therefore the generalizability of the proposed method for actually detecting an aorta >8mm diameter is questionable, as this case is not tested.
    • No mention of statistical testing on the results.
    • Focus on Dice score is not correct for this clinical application, as distance errors are more relevant to the AAA diagnosis.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good, but of course would be better if the code could be made available and US data could be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Section 2 - What is meant by ‘anisotropic properties’ of the IR?
    • Section 2 - The authors should briefly explain the conv ray-tracing approach in addition to citing [16]
    • Section 2.1 - Was the test set of 100 frames acquired from a separate volunteer?
    • Section 2.1 - No unhealthy (i.e. patients with AAA) in CT or US datasets.
    • Figure 3 - Labels of what is simulated US/IR (‘fake’) and what is real would be helpful
    • Section 2.4 - What about an experiment where no IR is required? Could you use CUT to go between abdominal CT and US directly instead of using an IR? What would the performance be? I expect the authors did some preliminary testing with this even if it wasn’t a thorough experiment, and it would be good to report these preliminary results.
    • Table 3 - Was any statistical testing done for significance?
    • Table 3 - On the siemens machine, an MAE of 7.6+1.5 seems that in some cases the mean error is exceeding the 8mm clinically acceptable threshold? Is this correct?
    • Table 5 - Please include MAE as well.
    • Section 3 - Overall I would argue that DSC is not a good measure for AAA diagnosis compared to a distance error such as MAE or Hausdorff. I would emphasize the distance metrics more than DSC. In particular, for the Siemen’s machine it seems that a supervised U-net is better at distance errors (although not sure if this is statistically significant as mentioned above).
    • Section 3 - FID was measured but not reported. Would be interesting to read how well CUT does in this application.
    • Section 3 - A better way to present the results in Table 3 would be to do a Bland-Altman style plot where it is clear if/how the error changes as the ground-truth diameter varies.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend an accept because the authors describe a clear methodology and novel pipeline. The one major weakness is that there is no AAA patient data, and therefore it remains to be seen if this technique would work for AAA diagnosis for the proposed clinica

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper describes a way to bridge between CT and US to enable AAA screening using US. This is done through an Intermediate Representation of the anatomy. Simulated US is generated from CT, and then trained with real US images unpaired. Also, a segmentation network is generated from the simulated US images. This framework can then take real ultrasound image for segmentation for the purpose of AAA screening and diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of unpaired image-to-image translation network is interesting for this application. It demonstrates the feasibility of using ultrasound for AAA diagnosis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The real ultrasound images came from relatively young individuals. These people are probably “easy” to scan. People who need AAA screening are much older and they are probably more “difficult” to scan and could have worse image quality. This could greatly impact the performance.

    Commercial solution for AAA diagnosis is already available using 3D ultrasound alone (no RUS or camera). There is no mentioning of this.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is adequate.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The author should try ultrasound images with different body habitus for the learning.

    I am not sure if the proposed solution is a good candidate for using ultrasound for AAA diagnosis.

    Some minor typos (e.g. “Germangy”)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The data used is not quite suitable for applications like AAA diagnosis. This however is a good feasibility on a cross-modality application.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors demonstrate a framework for automatic aortic measurements on abdominal US doppler studies using a domain adaptation technique leveraging an intermediate representation space and deep learning models trained on labeled CT studies of the aorta. They then show favorable performance of their approach compared to a Unet model trained on a small number of labeled US studies alone using a Unet.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Accurate domain adaptation of deep learning models between CT and US modality holds considerable value. CT-based models are more plentiful, well curated, and reproducible. US can be performed in any setting (office, emergency room, OR), requires less capital and can be leveraged by robotic interventions to provide real-time guidance. The approach of using an intermediate representation space that can leverage CT-based labels for US applications is novel to this reviewer. Importantly, the accuracy results for a specific clinical task–aorta measuring–is reported as better than if a model built only with US imaging were used.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A relatively small number of patients were used for evaluation and many of the same patients appear to have been used to train the comparator US U-net model and serve as test patients. The full heterogeneity of anatomy encountered in practice cannot be captured with so few patients and so failure modes appear unexplored.

    The clinical need for automatic aortic segmentation on US is not entirely clear. US doppler studies are typically performed by a trained US technologist for whom it takes minimal effort to make aortic measurements. While CT has superior accuracy, discrepancies in US measurements of aortas in a screening context are rarely clinically meaningful.

    The framework is complex and it can be challenging to follow which component is being discussed (US simulation vs CUT vs IR space aorta segmentation), especially with regard to data used for training/validation/evaluation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors reference the US simulation algorithm and CUT network and provide parameters for each. As noted in their checklist code is not provided . Overall this limits reproducibility of their work. It would be of great interest to the community if they provided access to their framework to allow others to leverage well-curated CT (or MR?!) datasets for US applications.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overall this is impressive work with a novel concept that if further validated in a larger and more diverse number of patients, and for different clinical tasks, could create a paradigm for accurate US imaging segmentation and classification models. Further understanding of the how training sample sizes and hyperparameters can affect the performance of the CUT and segmentation models would be enlightening.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors were the use of an intermediate space generated from US simulation and linked with real US images through a CUT algorithm that is a concept novel to this reviewer and appears to perform at a clinically acceptable level. Further validation in a larger, diverse patient cohort is needed.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: The use of unpaired image-to-image translation network is interesting for this application. It demonstrates the feasibility of using ultrasound for abdominal aortic aneurysm (AAA) diagnosis. In addition, the paper is well-written and is easy to understand. Novel intermediate representation (IR) methodology using convolutional ray tracing instead of simple edge detection.

    Weaknesses that should be addressed in the revision: The real ultrasound images came from relatively young individuals. However, people who need AAA screening are much older and they are likely more difficult to scan and could have worse image quality. In addition, the number of subjects is relatively small. No patient data with AAA diagnosis or borderline AAA. Therefore the generalizability of the proposed method for actually detecting an aorta >8mm diameter is questionable, as this case is not tested. There is no mention of statistical testing on the results. Focus on Dice score is not correct for this clinical application, as distance errors (such as Hausdorff distance) are more relevant to the AAA diagnosis. Please address all comments in your rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9




Author Feedback

We thank the reviewers for their constructive feedback. We appreciate that they recognize the novelty of the approach and impact for the MICCAI community (R1, R3, MR1), as well as the clarity of the manuscript (R1, R3, MR1).

The main contribution of CACTUSS is an intermediate representation (IR) that allows one to map readily available CT anatomical labels to unlabeled ultrasound data for any desired anatomy, and could “create a paradigm for accurate US segmentation and classification models” (R3). An initial feasibility study is successfully performed in-vivo on volunteers for aorta segmentation and shows the strength of the proposed IR method (R3, MR1), its feasibility (R2, MR1) and clinical acceptability (R1, R3).

Since we use an open source CT data [1] and test our solution on volunteer US images, we have not performed an explicit evaluation on AAA patient (R1, R2 and MR1), however, our initial experimental results on AAA sample images show that CACTUSS is able to successfully generate an IR for AAA B-mode images independently of anatomical size and shape. By now, our code for generation of IR from US data is publicly available and the reviewers as well as the community can directly verify that the IR generation also performs well on patient data with AAA samples. The desired segmentation performance can be achieved by retraining the segmentation network on any in-distribution data. Therefore, the community can see the application of CACTUSS also for AAA cases as well as its potential to generalize to other applications and anatomies. This is the expected behavior and further demonstrates the adaptivity of the proposed method.

On the number of volunteers in the in-vivo study performed in this work (R2, R3 and MR1): The number of volunteers is large enough to show initial clinical acceptability (R1,R3) and feasibility (R2) of CACTUSS. In order to make our assessments of the algorithm more rigorous, the evaluation is performed in a patient-wise split cross validation (R3). This will be further clarified in the manuscript.

Further, we acknowledge that an extensive statistical evaluation is relevant (R1, MR1). As noted by (R1, MR1), we report the Dice score as a metric to evaluate the accuracy of the segmentation, and in order to evaluate the clinical acceptability of CACTUSS, we report the MAE and standard deviation of the anterior-posterior diameter of the aorta compared to ground truth labels for every experiment. This is currently the most reliable diagnostic metric used in clinical practice as reported in the current medical literature [2], since there is no universally accepted standard for ultrasound assessment of AAA.

Minor comments will be clarified in the paper.

Overall, the novelty of the method was acknowledged by all reviewers and we believe that our proposed Common Anatomical CT-US Space (CACTUSS) concept offers new paths to the MICCAI community.

[1] https://www.synapse.org/#!Synapse:syn3193805/wiki/89480 [2] Hartshorne, T., McCollum, C., Earnshaw, J., Morris, J., Nasim, A.: Ultrasound measurement of aortic diameter in a national screening programme. European Journal of Vascular and Endovascular Surgery (2011). https://doi.org/https://doi.org/10.1016/j.ejvs.2011.02.030




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I would like to thank the authors for addressing all comments raised by the reviewers and in my original rebuttal. This is an interesting paper and is of interest of MICCAI community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper received good reviews from all three reviewers. One of the major concerns was the small dataset and the lack of patient data with AAA. However, I am glad that the authors have already shared the source code with the general community and have invited the community to evaluate the proposed methodology on AAA data available elsewhere. The method is sufficiently novel to accept for MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main contribution of this work is a new framework for abdominal aortic aneurisms (AAA) that uses an intermediate representation learnt from CT labels and ultrasound, to improve the segmentation of the aorta on ultrasound images. This framework has the potential to be applied to other ultrasound applications.

    Key strengths:

    • Using an intermediate representation learnt from CT-based labels and US is novel and contributes to solving the challenge of data scarcity in ultrasound.

    Key weaknesses:

    • No statistical testing is included in the results.
    • The impact on AAA remains unclear due to the limitations of the dataset used.

    Review comments & Scores: R1&2&3&MR1 agree that the methodology proposed is novel. Data issues were raised by MR1 and addressed in the rebuttal by the authors. The authors also clarified that currently DICE is the metric used in clinical practice, therefore the chosen metric in this study.

    Rebuttal: MR1 highlighted data is limited, however I believe this is often the case in ultrasound-based applications and out of the scope of this rebuttal. Authors recognised that statistical evaluation is relevant. I agree with the authors that this initial feasibility study is sufficient to demonstrate the potential of the proposed method. Evaluation & Justification: Despite the dataset limitations and the lack of statistical analysis, I believe this work presents and interesting and novel methodology, that may be of interest in the MICCAI community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top