Authors

Yordanka Velikova, Mohammad Farid Azampour, Walter Simson, Vanessa Gonzalez Duque, Nassir Navab

Abstract

Anatomical segmentation of organs in ultrasound images is essential to many clinical applications, particularly for diagnosis and monitoring. Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance. Yet, in ultrasound, due to characteristic properties such as speckle and clutter, it is challenging to obtain accurate segmentation boundaries, and precise pixel-wise labeling of images is highly dependent on the expertise of physicians. In contrast, CT scans have higher resolution and improved contrast, easing organ identification. In this paper, we propose a novel approach for learning to optimize task-based ultrasound image representations. Given annotated CT segmentation maps as a simulation medium, we model acoustic propagation through tissue via ray-casting to generate ultrasound training data. Our ultrasound simulator is fully differentiable and learns to optimize the parameters for generating physics-based ultrasound images guided by the downstream segmentation task. In addition, we train an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on US images in an end-to-end training setting. The proposed method is evaluated on aorta and vessel segmentation tasks and shows promising quantitative results. Furthermore, we also conduct qualitative results of optimized image representations on other organs.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_42

SharedIt: https://rdcu.be/dnwdo

Link to the code repository

https://github.com/danivelikova/lotus

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a task-based ultrasound image representation derived from CT scans. In the study, the authors suggest a fully differentiable and learnable ultrasound simulator. Additionally, they train an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on ultrasound images. The segmentation performance shows improvement compared to baseline results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. They propose a fully differentiable and learnable ultrasound simulator for task-optimized ultrasound image representation.
2. A segmentation task-specific ultrasound image was synthesized from CT scan labels, and end-to-end training was utilized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. It seems obvious that good performance can be achieved when performing a segmentation task using an ultrasound representation from a task-specific ray-casting simulator. The ultrasound representation obtained from the trained ray-casting simulator appears to have extreme contrast, with certain organs being enhanced or attenuated. For example, in Fig. 3, the results of a simulator specialized for kidney segmentation tasks show that the kidney area is enhanced. In fact, this enhanced ultrasound image creates a sense of discrepancy with the actual situation.
2. No new network is proposed. Unpaired image-to-image translation was performed using the CUT network presented in Ref [8].
3. The results of the performance comparison with existing networks do not sufficiently prove that the network you proposed is better. You should compare your proposed network with state-of-the-art networks in the field of semantic segmentation to demonstrate its improvements.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

As noted in the paper, the source code and data used for these experiments are publicly available. Therefore, the reproducibility of the paper is good.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. You should compare your proposed network with state-of-the-art networks in the field of semantic segmentation to demonstrate its improvements.
2. You should propose a network that performs a specialized unpaired image-to-image translation task for the task-based ultrasound (US) representation workflow you are suggesting. Claiming novelty is difficult when using existing networks.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposal of a fully differentiable ray-casting approach for ultrasound simulation and the construction of a task-specific learning workflow was commendable. However, the resulting ultrasound images had a sense of discrepancy from the actual ultrasound images, as certain organs were excessively enhanced or attenuated. Additionally, acknowledging novelty is difficult, as the network used for unpaired image-to-image translation utilized the existing CUT from Ref. [8].
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

Concerns about the weaknesses raised are alleviated, as the novelty claimed in the paper is well explained in the rebuttal.

Review #2

Please describe the contribution of the paper

The paper proposes a task-specific optimization of ultrasound image representation, focusing on vessel and aorta segmentation. It introduces a ray casting-based differentiable ultrasound generation methodology, which, in conjunction with an unsupervised image-to-image translation network, facilitates significant image optimization. This optimized image is then employed to train the segmentation network, enhancing the applicability of task-specific image representation. The proposed methodology outperforms the supervised approach and using just the image-to-image translation network (frozen differentiable renderer experiment) on real images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper’s primary strength lies in its proposed end-to-end learning method, combining differentiable rendering, unsupervised image-to-image translation, and direct learning from task-specific loss, in this case, segmentation for ultrasound images.
2. The utilization of CT label maps to generate simulated images through differentiable rendering is another strong point, enabling the training process to proceed without requiring any annotation, which can be a significant bottleneck.
3. Finally, the performance comparison and qualitative results in the paper provide clear evidence of the benefits of this approach, indicating its potential applicability to other tasks in ultrasound imaging.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Supervised U-Net can be considered a reasonable baseline, the paper would benefit from a comparison with existing state-of-the-art techniques. The study could be enhanced by leveraging public datasets, if available, to ensure the generalizability and robustness of the proposed method.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is well-described for potential reproduction; using public datasets, if available, would have further facilitated this process.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Further detail on the image-to-image translation network’s training and the segmentation network used in the end-to-end approach would enhance the comprehensiveness. Additionally, a brief discussion on the method’s robustness against small perturbations, noise, and unencountered artifacts would provide valuable insights into potential limitations as optimal image representations can lead to unintended consquences during applications.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The innovative concept of optimizing ultrasound image representation in an end-to-end manner, utilizing differentiable rendering based on the task, holds the potential for broad impact. The proposed method clearly outperforms the supervised approach and image-to-image translation in vessel and aorta segmentation.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

7
[Post rebuttal] Please justify your decision

The authors introduce a novel method for optimizing task-specific image representation for ultrasound segmentation, which is supported by successful experimental results. There are areas to improve, such as a more thorough robustness analysis and establishing a stronger baseline. However, the current version of the paper, along with the authors’ responses, clearly highlights its value. This serves as a strong basis for acceptance.

Review #3

Please describe the contribution of the paper

The authors proposed a framework that learns to optimize task-based ultrasound representations and was evaluated for aorta and vessel segmentation tasks. To simulate ultrasound images, the authors used CT segmentation maps and a ray-casting model. They also trained an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on US images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The concept of learning to generate the most optimal simulated image, which can yield the best segmentation results, is interesting.
- I commend the authors for their efforts in producing the dataset, which comprises manually segmented in-vivo ultrasound images (500 for the aorta, 400 for vessels), and I look forward to its public release in the future.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

In some parts of the manuscript, it is unclear where the contributions of the study begin and where previous work ends. A substantial portion of the manuscript reiterates concepts from other papers, such as “contrastive learning for unpaired image-to-image translation” (CUT), which could be cited, and instead highlight the unique contributions of the present study.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors mentioned that the code and dataset are publicly available. It is hoped that the dataset includes the manually segmented in-vivo ultrasound images (500 for the aorta, 400 for vessels) mentioned in the manuscript as this would be a valuable contribution. In case the authors are not planning to release that data, it would be beneficial for the reproducibility of their work to use other publicly available datasets such as Breast Ultrasound Dataset B [1] or Dataset BUSI [2]. [1] Yap, M.H., Pons, G., Marti, J., Ganau, S., Sentis, M., Zwiggelaar, R., Davison, A.K. and Marti, R.(2017), Automated Breast Ultrasound Lesions Detection using Convolutional Neural Networks. IEEE journal of biomedical and health informatics. doi: 10.1109/JBHI.2017.2731873 [2] Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Major Comments:
1. The authors indicate that the differentiable ultrasound renderer is based on the formulation of ray-based simulation of ultrasound images introduced by B. Burger et al., 2012. However, the specific modifications made to this formulation in comparison to the previous work are not delineated. The authors have employed the pronoun “we” throughout this subsection to present the methodology, suggesting sole responsibility for the explained method.
2. Similarly, in section 2.2, the authors have reported incorporating a modified unsupervised network, CUT (T. Park et al., 2020), for unpaired image-to-image translation in the Real → Reconstructed US case. Nonetheless, this section only provides a summary of the CUT method without explicitly stating the specific modifications made to it.
3. The paper mentions that in-vivo ultrasound images were collected from 11 individuals, and for each set of annotated images, 100 images were randomly chosen as test sets for both segmentation tasks. However, if the random selection was carried out at the frame level, as noted in the manuscript, there might be a considerable risk of data leakage that could influence the reported results. Moreover, in section 3 (Experiments), it is also mentions that the methods were tested on three hold-out subjects, and the average DSC was reported. It is not clear whether this part contradicts or complements the previous part. It would be beneficial if the authors could provide more clarification on how they selected the test set.
4. It is noted that a small subset of 10 labeled images from the actual US domain were used as a stopping indicator for the complete training pipeline. Additionally, on page 7, it is stated that the networks for the supervised approach were trained for 120 epochs. If the stopping strategy was only employed for one set of experiments and not for the supervised approach, is it reasonable to compare the results? What if the supervised model was not the most optimal after precisely 120 epochs?
5. It is explicitly mentioned that the proposed method leverages the higher resolution, and improved contrast of CT scans to simulate ultrasound data, which appears to be inaccurate. As per my understanding, the method relies solely on the segmentation masks of a CT dataset, which even can be simulated without the need for higher resolution or improved contrast of CT scans from real subjects. It would be beneficial if the authors could comment on this matter.
Minor Comment: On page 7: “For the supervised approach, we trained and tested the networks, for 120 epochs, with a learning rate of 10−3 and the Adam optimizer.” –> It seems the “tested” needs to be removed as does not fit in the context of 120 epochs, learning rate, and the optimizer.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea is interesting and has been presented well. However, there are concerns about the fairness of the comparison and some lack of clarity regarding the distinctions between the contributions of this study and those of previous work.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The key novelty is an unsupervised domain adaptation based segmentation approach for ultrasound images using the information from CT scans, applied to great vessel segmentation. The approach handles the issue of differences between simulated and real ultrasound images by employing a differentiable image simulator that also utilizes the intermediate simulations from CT for learning. CUT approach using contrastive loss is used to synthesize images from CT to address the problem of introducing artifacts occurring on US images. However, details of how the synthesis adds such artifacts and noise is unclear from the description. As reviewers point out, the stopping criteria seems a bit adhoc and the rationale could be motivated better. The reviewers also point to lack of comparison to other state of the art methods. It would help to at least discuss some of the state of the art methods and discuss the differences between current and those approaches on a conceptual level to explain the novelty and gaps addressed by this approach.

To address in rebuttal: Please clarify details of the synthesis approach, especially how artifacts and noise are introduced into the US scans. Also please address the concerns of reviewers regarding the rationale of the selected segmentation method and compare/discuss with respect to state of the art methods. Finally, please explain the rationale for the stopping criteria.

Author Feedback

We thank the reviewers for their constructive feedback. We appreciate that they recognize the novelty of the approach and its impact for the MICCAI community (R2, R3, MR), as well as the clarity of the manuscript (R1, R2, R3, MR).

The main contribution of LOTUS is a segmentation approach with no need for hard and costly annotations of ultrasound images by only using available CT labels and intermediary ultrasound image representations optimized for transferring such labels for the final task of ultrasound segmentation. This is achieved by developing a differentiable rendering pipeline based on the downstream task together with an unsupervised image-to-image translation in an end-to-end manner which “holds potential for broad impact”(R2). Quantitative results performed in-vivo on volunteers show that the proposed methodology outperforms a supervised approach trained on 500 labeled US images, as well as using just the image-to-image translation network on real images, thus “provide clear evidence of the benefits of this approach, indicating its potential applicability to other tasks in ultrasound imaging.”(R2).

Since our primary objective is to develop a segmentation approach alleviating the need for labeling and supervision in the ultrasound domain, direct comparison with existing (supervised) segmentation methods is done only as an indication of the limits of performance of the proposed methodology and not with the objective of outperforming them(R1, R2, and MR). While supervised or semisupervised methods might outperform our method with many data, the need for ultrasound labeling and an expert to segment the ground-truth set still persists. Furthermore, we would like to clarify that the choice of the segmentation network (U-Net) was not the focus of this work, but was made as it is widely used for segmentation. We agree that other architectures could also be utilized as far as the same one is used in both, the proposed framework and the supervised one. We believe that comparisons will remain valid. We appreciate the reviewer’s suggestions and will ensure that the revised manuscript will highlight better the unique contributions of LOTUS in ultrasound segmentation without explicit ultrasound annotations.

Furthermore, we would like to clarify that our approach does not focus on introducing a new image-to-image translation network (R1). Instead, we model the entire framework of transferring CT labels to a new representation and the end-to-end training, which incorporates the intermediate space along with the image-to-image network that is optimized dynamically.

On the choice of the stopping criteria (R3, MR) for our pipeline: while we experimented with different automatic strategies to determine the optimal epoch, such as KL divergence, Wasserstein distance, etc., none of them consistently yielded accurate results. Thus, we adopted this approach of a small subset of labeled images after the segmentation network had converged to ensure robustness during inference. This will be further clarified in the manuscript.

Additionally, we acknowledge the importance of discussing the robustness of our method against artifacts(R2, MR). Currently, our model primarily incorporates the basic physics of ultrasound imaging without explicitly considering artifacts. We appreciate the reviewer’s suggestion, and we agree that exploring the robustness of our method against artifacts would be valuable, we will take this feedback into consideration for future improvements.

Regarding the higher resolution and improved contrast of CT scans (R3): CT typically offers better resolution and contrast compared to ultrasound, thus we believe using CT labels yields more accurate outcomes.

Minor comments will be clarified in the paper.

Once again, the novelty of the method was acknowledged by all reviewers, and we believe that the proposed LOTUS: Learning to Optimize Task-based US representations concept offers new paths to the MICCAI community.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors responded to the major concerns of the reviewers. The paper has clear novelty in terms of the application and in the use of unsupervised domain adaptation for US image segmentation. The authors are strongly encouraged to update and clarify the limitations of their approach including the lack of analysis of robustness of the method with noise and artifacts as well as clearly explain the rationale for the selected method and clarify key differences with current methods.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presents an unsupervised domain adaptation based segmentation approach for ultrasound images, specifically focusing on great vessel segmentation. By incorporating information from CT scans and utilizing a differentiable image simulator, the proposed method addresses the challenge of differences between simulated and real ultrasound images. The paper introduces a novel method for optimizing task-specific image representation in ultrasound segmentation, demonstrating promising results in experimental evaluations. There are areas that can be improved, but the concerns raised by reviewers have been adequately addressed, and the novelty of the proposed approach is clearly explained.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

After carefully considering the reviewers’ feedback and the authors’ rebuttal, a unanimous decision has been reached among the reviewers to accept the paper. One reviewer, in particular, increased their score from 3 to 5, indicating a positive change in their assessment. Furthermore, all reviewers express their appreciation for the paper’s contribution to the field.

The authors have diligently addressed all the concerns and provided clarifications in their rebuttal, which have satisfied the reviewers. As a result, the Meta Reviewer recommends accepting the paper for publication. The unanimous consensus among the reviewers, combined with the careful addressing of concerns, solidifies the decision to accept the paper and acknowledges its value in advancing the field.

back to top