Authors

Walter Simson, Louise Zhuang, Sergio J. Sanabria, Neha Antil, Jeremy J. Dahl, Dongwoon Hyun

Abstract

Ultrasound images are distorted by phase aberration arising from local sound speed variations in the tissue, which lead to inaccurate time delays in beamforming and loss of image focus. Whereas state-of-the-art correction approaches rely on simplified physical models (e.g. phase screens), we propose a novel physics-based framework called differentiable beamforming that can be used to rapidly solve a wide range of imaging problems. We demonstrate the generalizability of differentiable beamforming by optimizing the spatial sound speed distribution in a heterogeneous imaging domain to achieve ultrasound autofocusing using a variety of physical constraints based on phase shift minimization, speckle brightness, and coherence maximization. The proposed method corrects for the effects of phase aberration in both simulation and in-vivo cases by improving image focus while simultaneously providing quantitative speed-of-sound distributions for tissue diagnostics, with accuracy improvements with respect to previously published baselines. Finally, we provide a broader discussion of applications of differentiable beamforming in other ultrasound domains.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_41

SharedIt: https://rdcu.be/dnwwU

Link to the code repository

https://github.com/waltsims/dbua

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a differentiable autofocusing “layer” (differentiable delay-and-sum beamformer) which they use to optimize the speed-of-sound map to optimize for speckle brightness, coherence, and minimize phase-shifts.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- I enjoyed reading the paper. It is very nicely written: all the details are explained well.
- The paper solves an interesting and useful problem – phase aberrations caused due to local speed-of-sound variations.
- I appreciated the criteria of phase-error minimization for local speed-of-sound optimization.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
I have a significant concern regarding the novelty of the current work:
- The paper’s primary claimed novelty is the differentiable delay-and-sum (DAS) beamformer, which enables optimization of local speed-of-sound parameters. Regrettably, the differentiable DAS is not new and was a core contribution of Vedula et al. 2019 [1], which the authors may have overlooked in their literature review. Vedula et al. 2019 introduced a “dynamic focusing layer,” which is precisely the differentiable DAS proposed in the current paper.
- However, Vedula et al. employ differentiable DAS to optimize transmit beam patterns, while the current work uses it for a distinct purpose: reducing phase aberrations. I encourage the authors to reference Vedula et al. 2019 and revise their novelty statements accordingly.
Broader Questions:
- I am interested in understanding how the proposed method of phase-error minimization compares to supervised speed-of-sound (SoS) map estimation approaches (Feigin et al. 2019). I would assume that SoS prediction networks would provide accurate SoS maps as long as the training set is sufficiently diverse and rich. Is the main advantage of the proposed approach that it does not require training? How significant is the limitation imposed by the requirement for training in supervised learning approaches?
[1] Vedula et al., Learning beamforming in ultrasound imaging, Proc. Medical Imaging in Deep Learning (MIDL), 2019.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The proposed method is straightforward to implement and should be reproducible. Although the datasets are synthetic and may require access to raw data from ultrasound machines as well as ground-truth speed-of-sound maps of the phantoms, open-source benchmarks like CUBDL exist. Therefore, obtaining such raw data should not pose a significant challenge.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please see above.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I really enjoyed reading the paper. My only concern is the novelty w.r.to Vedula et al. 2019 as discussed in the “Weaknesses” section. Personally, I really like this line of work. I think “phase-error minimization” using differentiable DAS, as done in the paper, is a meaningful contribution. I would be happy to raise the score if the novelty statements are adjusted.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors propose a novel physics-based framework called differentiable beamforming approach for a rapid reconstruction of sound speed map. Speckle brightness/coherence factor (CF) maximization and phase shift minimization aims to optimize sound speed map, but in return benefiting the reconstruction of echo imaging. Several methods were compared against the new approach with simulations and experiments. The results show that the proposed method outperforms the CUTE method in different types of imaging cases (homogeneous and heterogeneous) and the speckle brightness and CF methods in heterogeneous case.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors incorporate the modern differentiable operations and differentiable programing technology using JAX to ultrasound DAS beamforming, which is technically novel and interesting. This idea itself is concise but not simple, providing a paradigm for reconstructing sound speed map via a quick implementation of gradient descend method. The selection of the loss functions is novel too.
- It is validated on both synthetic and in-vivo data, both demonstrating performance boost.
- The presentation of this paper is clear to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The key innovation of this work is mostly attributed to the differentiable programming by JAX, whilst the source seems not being provided, leading to less reproducibility for this proposed method. Moreover, how the automatic differentiation works in the paper is interesting to see.
- The analysis in Section 4 seems simplistic. Further quantitative analysis about resolution, image quality (SSIM or PSNR, etc) and/or reference image analysis can be considered to be included. Figure 4 lacks of comparison with other methods.
- I am confused about why the differentiable beamformer performs better than the standard sound speed characterization. As we may know, many full waveform inversion (FWI) methods have been implemented in ultrasound imaging, providing supreme image quality and resolution. Please clarify this.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The source code and data are not available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. Missing definitions: In Eq.2, the superscripts N_t and N_r are not defined. In the last sentence of section 2.2, what does “s” mean? In Eq.7, what operation does angle_E represent?
2. In section 2.2, the authors mention that “DAS is composed of elementary differentiable operations…”, I do not get the idea of this sentence, please clarify and add some reference(s).
3. “In this work, we will show the the promise of …”, please remove extra “the”.
4. In section 2.4, the reference 10 is perhaps inappropriate, and when we refer to CF, we are more likely to cite the reference “Coherence Factor of Speckle from a Multi-Row Probe”.
5. In section 3.1, “The loss was … on a regular 1521 grid spanning the image.” I do not get the idea of this sentence, especially the “1521 grid spanning the image.”
6. “For the phase error loss, 17-element…for beamforming.” If I understand correctly, the authors chose the center of the transducer as the midpoint, and selected 17 elements (subarray) symmetrically on the left and right to simultaneously send and receive ultrasound signals and then beamforming at a constant sound speed (e.g., 1480 m/s), then the subarrays slide (to maintain symmetry) for the next beamforming, and the phase error can be calculated once for each two imaging step; when the sound speed changes (e.g., 1485 m/s), the above process is repeated until the phase error is minimized, and then the sound speed at that point is the estimated tissue sound speed. If so, there is a question, assuming there are 3 sub-array imaging processes (Ta, Ra), (Tb, Rb), (Tc, Rc), then how to calculate the phase error now, calculate a vs. b, b vs. c, and then add them up according to Equations 7 and 8? or calculate a vs. b, b vs. c, a vs. c, then summation? or something else? If I have misunderstood, please explain in detail to clear my confusion.
7. At the same time, according to the estimated sound speed for the final beamforming, then what is the final beamforming strategy, is it STA (i.e., one element transmits and all receive) or 17 elements transmit/receive? I think this is not very clear in the current manuscript.
8. The strategy of in vivo data collection is not described.
9. In Results section, “On the corrected images, image brightness is enhanced,…” it is not obvious in Fig.3(a), but can be clearly seen in (b).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a paper of strong technical novelty. But there is some space to enhance its technical details, for example the implementation of automatic differentiation and the extension of experimental analysis.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

Happy to see that the source codes and data would be conditionally available. I enjoyed reading the discussion on the technical novelty but still hope the R2’s concerns will be prudently considered in your future work when there is no character limits. All in all, I believe this work may potentially benefit the ultrasound image community.

Review #3

Please describe the contribution of the paper

Authors propose a method for sound speed estimation in ultrasound imaging. The estimated spatial sound speed parameter maps are utilized to improve the beamforming and consequently the quality of reconstructed b-mode ultrasound images. Results present that the proposed method accurately estimates sound speed parameter in numerical phantoms.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- manuscript is well-written and easy to follow.
- proposed method outperformed the CUTE technique in numerical experiments (k-wave simulations).
- proposed approach is intuitive and straitforward to use. It can also be modified to include estimation of other image reconstruction related parameters.
- proposed method is optimization driven and can be used to estimate the sound speed maps based on a single data aquisition. Therefore, proposed method mitigates some problems related to techniques based on convolutional neural networks. First, the proposed method does not require training data in the standard sense, presumbly addressing some of the out-of-distrution inference problems known for convolutional networks. Second, the proposed method can work with various ultrasound image generation techniques (e.g. different transmit/receive events), not only the ones used to generate the training data as in the case of the convolutional networks.
- authors investigated several loss functions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- authors do not discuss the limitations of the proposed method. For example, real-time utilization of the proposed method would probably be difficult due to the requirement to optimize the sound speed maps separately for each data acquisition. The optimization time required for a single image is not mentioned. A well-trained convolutional network would probably be much more efficient in applications with respect to time.
- authors do not perform experiments with real tissue-mimicking phantoms, with known sound speed spatial distribution. Only qualitative results obtained using liver tissue are presented.
- proposed method could be better described. Authors did not describe how the training was performed (e.g. iterations, learning rate). How the gradient descent method was implemented, did the authors update the entire map in a single iteration? It would be difficult to reproduce the proposed method based on the current descriptions.
- authors could investigate different acquisition approaches and investigate for which approach the proposed method works best. Additional experiments could be performed to provide more convincing and quantitative results, aside the experiments with simple numerical phantoms.
- it is unclear how the results presented in Table 1 were calculated. Presumbly, the calculations were done over the entire image. However, the CUTE technique is not suitable for the calculations of the sound speed close to the transducer surface.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors do not plan to release the code, therefore the reproducibility would be based on the descriptions of the proposed method. These, however, could be more detailed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
I believe that the paper should be extended by including additional experiments. My suggestions for the MICCAI submission are as follows:
- please, provide information about the time required for a single sound speed map reconstruction. Specify the gpu used for the experiments. Discuss the limitations.
- please, specify how the sound speed values were determined in Table 1. The CUTE technique is not really suitable for the calculations of the sound speed close to the transducer surface. I would suggest to exclude the first 5 mm from the calculations (Fig. 3). It would be great if authors could add one more row (Table 1) with the estimates for the circular inclusion region only (Fig. 3).
To further improve the submission and extend it to a journal paper my suggestions are as follows:
- include numerical experiments with tissue mimicking phantoms presenting more complex geometries (e.g. several inclusions).
- compare different image generation methods. This would confirm that the proposed optimization technique is general.
- compare the proposed method with a convolutional neural network pre-trained on numerical data. Please, notice that a CNN can be also fine-tuned in an unsupervised way using the same loss functions as in the manuscript.
- include more data from patients. For example, compare liver from a healthy patient with a liver from a patient with fatty liver disease. Moreover, generate a speed of sound parametric map for a b-mode image presenting a tumor (e.g. breast/liver mass).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I believe that this is a nice paper, presenting an interesting approach to sound speed estimation in ultrasound. Manuscript lacks when it comes to the experimental evaluation of the proposed method. However, I believe that the method itself is worth presenting at the conference.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision
I would like to thank the authors for addressing my comments.

Minor issues/comments:
- I agree with the authors that the work of Vedula et al presents a different approach to image reconstruction, where the speed of sound is assumed to be constant for the entire medium.
- the desciptions of the proposed method seem to be still somehow limited. For example, the authors did not address the comments of the second reviewer about the beamforming method, which I consider to be important. While the authors plan to release the code to clarify the implementations, including additional descriptions within the manuscript would make the paper much better.
- run time required for a single image is around 300 seconds, which makes it difficult to apply the proposed method for real-time imaging. It is good that the authors now mention this limitation in the manuscript.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The three reviewers agree that this work has merit, but point to several issues that should be addressed. First, reproducibility of differentiable programming by JAX is an important concern. There is a concern about the similarity of the contribution on differentiable DAS to (Vedula et al. 2019). Vedula et al. employ differentiable DAS to optimize transmit beam patterns, while the current work uses it for the distinct purpose of reducing phase aberrations. There are also important additional comments to improve the validations and some major issues about clarity of the presentation that should be addressed.

Author Feedback

We thank the reviewers for their thorough review of our submission, “Differentiable Beamforming for Ultrasound Autofocusing” (DBUA), their positive feedback, and constructive comments. We appreciate the recognition of our work as “solving a useful problem” (R1), being technically novel with “strong technical novelty” (R2), and outperforming the baseline (R2, R3). At the same time, all three reviewers commented on the presentation as clear (R1 “excellent clarity”) and “well/nicely-written” (R1, R3) and have found the work “worth presenting at the conference” (R3).

MR1 and R1 highlight that Vedula et al., 2019 [1] have incorporated DAS components into a CNN architecture to learn beamforming parameters. We recognize that [1] has seeming similarities to our proposed DBUA. However, [1] differs substantially from DBUA in architecture, purpose and execution. First, [1] described a supervised learning approach that uses a neural network architecture and training data to learn transmit and receive beamforming parameters over a training dataset. DBUA does not involve any “learning” over a training dataset and instead aims to characterize the tissue properties for a given acquisition instance using differentiable numerical operations. Second, the CNN-based reconstruction in [1] used “beamformed SLA [Single Line Acquisition]” as the ground truth reference for the neural network; critically, this ground truth was calculated with a constant sound speed “assumed to be 1540 m/s”, which is precisely the assumption that our approach seeks to address. Finally, our main contribution is the joint quantitative estimation of tissue parameters and autofocusing of ultrasound reconstruction using a novel phase-error minimization loss, which is derived from the physics of wave propagation and the statistics of ultrasound backscatter and independent of any B-mode training samples. Differently from [1], this loss does not require a ground truth for comparison and instead measures the deviation from a physics model. We view the overall goal and optimization strategy of the proposed DBUA as substantially different from [1], but will include a citation of [1] and remark on similarities and differences in the related work section of the camera-ready version.

Regarding the evaluation of CUTE: as mentioned by R3, the CUTE baseline does not produce accurate estimates in the first 5 mm of the imaging domain. For our baseline evaluation, we calculated the error over the whole imaging domain, which demonstrates a major advantage of DBUA, namely sound speed reconstruction without the 5 mm limitation. Additional evaluation against CUTE, excluding the first 5 mm, shows that DBUA still outperforms CUTE for all experimental scenarios. Specifically without the first 5 mm, MAE for CUTE of the quadrant phantom is 66.3±37.5 m/s (DBUA 36.8±28.4), and the inclusion layer CUTE 16.0±14.1 (DBUA 7.4±5.0). The complete results will be added to the supplementary material of our paper.

Regarding the run time required for a single image (R3), our GPU-based implementation runs in ~300 seconds on an NVIDIA RTX A6000. This information will be added to the manuscript.

To ensure the reproducibility of the work, as was highlighted by MR1, R1, and R3, we will make our code, simulated data, and all parameters (e.g. iterations, learning rate) publicly available upon acceptance.

All typographical errors and notation clarifications will be addressed in the camera-ready version.

We finally thank all reviewers for their ideas on future evaluations of DBUA (R2: metrics, R3: tissue-mimicking phantoms), and for acknowledging our approach as “novel” (R2), a “meaningful contribution” (R1) and “worth presenting at the conference” (R3).

[1] S. Vedula, O. Senouf, G. Zurakhov, A. Bronstein, O. Michailovich, and M. Zibulevsky, ‘Learning beamforming in ultrasound imaging’, in Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, 2019

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed all the comments, especially the one about similarity to Vedula et al MICCAI 2019. I therefore recommend accepting the paper.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a physics-based differentiable US beamforming model that can be used to estimate speed of sound in different visualized tissues and correct B-scan reconstructions for aberrations. The method is quantitatively tested on several phantoms, and qualitatively tested on a in-vivo scan.

Strengths: All reviewers acknowledge that the paper is clear and easy to follow, and that the results are convincing key reviewer concerns were well addressed in the rebuttal

Weaknesses: A few technical details of the methodology could have been explained in more detail. Hopefully a code release can attenuate this issue.

Overall I believe this paper presents a clear contribution with solid results and I recommend acceptance.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The author has provided substantial and valid explanations for the novelty of their work compared with Vedula et al., 2019’s work. In overall, this work is of good quality and suitable to be accepted by MICCAI.

back to top