Authors

Mingyuan Luo, Xin Yang, Hongzhang Wang, Liwei Du, Dong Ni

Abstract

Freehand 3D ultrasound (US) has important clinical value due to its low cost and unrestricted field of view. Recently deep learning algorithms have removed its dependence on bulky and expensive external positioning devices. However, improving reconstruction accuracy is still hampered by difficult elevational displacement estimation and large cumulative drift. In this context, we propose a novel deep motion network (MoNet) that integrates images and a lightweight sensor known as the inertial measurement unit (IMU) from a velocity perspective to alleviate the obstacles mentioned above. Our contribution is two-fold. First, we introduce IMU acceleration for the first time to estimate elevational displacements outside the plane. We propose a temporal and multi-branch structure to mine the valuable information of low signal-to-noise ratio (SNR) acceleration. Second, we propose a multi-modal online self-supervised strategy that leverages IMU information as weak labels for adaptive optimization to reduce drift errors and further ameliorate the impacts of acceleration noise. Experiments show that our proposed method achieves the superior reconstruction performance, exceeding state-of-the-art methods across the board.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_28

SharedIt: https://rdcu.be/cVRvT

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Authors proposed a new deep learning model MoNet for integration of US images. They used data from IMU for training of the MoNet.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Clear usage of IMU for the first time in this literature
- MoNet for reconstruction of image sequences
- Combining ResNet and LSTM for using temporal information in 3D reconstruction
- Use of 6DOF for reconstruction
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Movement records from IMU looks really noisy and to solve that problem, better IMU sensor and possibly Kalman filter can be used. The scale of location and position is in meter which is not make sense in this study.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Supplementary data shows that authors made a GUI for their work. I suggest to publish that GUI with your model to this research community.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Authors claimed that the method is really better than others. And this is clear from quantitative and qualitative data. However, explanation is weak about why the method works better than others.
- I suggest authors to publish their GUI and also their results for other researchers
- IMU and details of network implementation should be added to the paper.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I really enjoyed reading your research report. It is a combination of Sensors (IMU) and deep learning field which is a promising solution for 3D reconstruction of 2D images. I am not a fan of estimating 6DOF using Machine Learning and Statistical methods. Authors exactly answer this issue especially when I see they used EM sensor for comparison section. Experimental results and also video in supplement data clearly show the effort of authors for improvement of 3D reconstruction problem.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

Conventional Freehand ultrasound is utilized as tomographic imaging modality by utilizing IMU sensors together with a novel deep motion network (denoted as MoNet by the authors) for almost seamless 3D reconstruction. Thus, no exterior devices are needed anymore. Both key contributions of this paper address the IMU, on the one hand getting the elevation displacement of the planes and further to reduce the sensor drift. So this work addresses the current main challenges of Freehand ultrasound reconstruction, namely elevational displacement estimation and the cumulated drift.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Presentation of the methodology is very precise, sound and clear. State of the art and problems of approaches from related work are precisely delineated. So it seems, the paper perfectly addresses the need / missing link of the given solutions. The visualizations (cf. Fig. 5) are very nice, useful and impressive.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Process of calibration mentioned but not explained. Similar it is unclear, how the ultrasound and the IMU sensors get synchronized at highest precision. Evaluation is performed with SOTA and proposed MoNet approach on two home in-house datasets. It would be nice to have other/external benchmarking datasets in the sense of an away game being in depth tested, too – if possible regarding input data requirements.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

without the two utilized datasets and the IMU-enriched ultrasound device hard to judge.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Fig.3 hard to interpret due to noisy sensors. Maybe an additional trend-line or visualization of the local variability can help the reader to understand how much the pose gets stabilized within adjacency of the neighboring frames. Statement “All images were scaled down to 0.6 times their original size” needs some clarification – why? Due to the input network tensor size?

Which PyTorch version is used?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is very sound, the method is nice and the results are very impressive. Furthermore, relevance for clinical use is high.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper describes a method to reconstruct 3D ultrasound volumes from 2D video clips and IMU signals with a deep learning-based approach. In particular, two ideas related to the usage of the IMU information are investigasted: (1) the network uses the orientation but also the acceleration signal from the IMU (which is a challenge since it is very noisy) (2) an online refinement of the prediction by trying to match the IMU signal The method is evaluated on two datasets (carotid and arm) and yields better estimate than 3 baselines.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper addresses a relevant and challenging problem, namely the use of IMU acceleration data for 3D ultrasound reconstruction, which has not been done before.
- The method outperforms several baselines, and an ablation study is performed to investigate the effect of the two contributions.
- The references and figures are adequate and generally well-made.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I am a bit skeptical about the justification of some parts of the method.
- The chosen notations make the paper a bit harder to understand (more details in Section 8).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Training/evaluation code, as well as pretrained models and data, are specified as “available” but no link has been provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The main focus of the paper is the usage of the acceleration signal of the IMU, so I will first focus my main comments on this part. While the experiments do show that this is effective, I am a bit skeptical about the justification of some parts of the method.
- I am not sure to understand Eq.2, and in particular why would removing the average signal would reduce the effect of the noise. Do you assume a noise model with a non-zero average? Also, what is the point of subtracting the gravity vector, since the output vector is enforced to have a zero-mean? It seems to me that the equation could be simplified into: Ai <- Ai - 1/N \sum_n An.
- While I do understand the inspiration behind Equation 3 (v_i = v_{i-1} + a_{i_1}), I am not sure it really makes sense for feature maps. Of course the network can be trained with this model, but I am wondering whether this really brings something compared to just concatenating them.
- About the self-supervised step,
- are some layers frozen or are they all adapted?
- why doesn’t the network prediction collapses to the IMU raw data, and completely ignore the original image content?
- It must be noted that this step requires more computational power and is therefore not as easy to deploy (compared to just runnning an inference).
- It would be interesting to study quantitatively how well the acceleration is estimated (for instance report correlations between accelerations). Unlike what is stated in the paper, the bottom row of Figure 3 does not seem very convincing to me: I don’t see any particular trend between the blue and red plots.
- One key drawback of most current methods is the difficulty to detect “turning points” (having the user go back and forth to extend the covered area for instance), due to the implicit ambiguity of the direction of the probe movement. Since this method uses IMU acceleration, it raises some hope that such points could be detected and estimated, yet this is not discussed at all.
General remarks
- The notations are not very standard and make some equations a bit confusing:
- angles are represented by E (instead of Greek letters like phi, theta, etc.) or sometimes with alpha,
- inverse transformations with a * (instead of ^{-1})
- accelerations sometimes with A, sometimes with a
- The results are reported in a Table with only the mean and standard deviation, so no information on the distribution is available: we don’t know whether one method is more robust or more subject to outliers than another one. Replacing it with boxplots or violin plots would give more information to the reader.
- Future work is not discussed at all in the conclusion. I think it would be interesting to see how the self-supervised step is able to help the generalization of the network (to new US systems, sonographers, motion speed, anatomies) Besides, being able to detect turning points would be a big deal and would give much more impact to this approach.
- In my experience, the performance of IMUs differ a lot across models. I think it would be important to report the brand and model used for the study.
- The authors used a statistical test to prove the significance of their results, which is commendable. However, the t-test that they used assumes the results to follow a Gaussian distribution, which is neither guaranteed nor discussed. A better alternative would have been a Wilcoxon signed rank test, which does not assume a particular distribution.
Minor comments
- The input of the network is not clear - are the pairs of images encoded as 2 channels?
- Was the ResNet pre-trained (for instance on ImageNet)?
- “Based on ResNet, LSTM” -> LSTM modules are not necessarily based on ResNets
- It would be interesting to know whether the authors came up with this architecture after a lot of trials (i.e. this is the best position for those LSTM modules), or whether they just added them here because it makes sense and provided better results right away.
- What are “loop scans” exactly? I don’t see any such motion in the 6 examples shown. Could you add one?
- Figure3: the blue line hides the red one and makes the figure difficult to read. Consider adding some transparency (for instance apha=0.75 in Matplotlib)
- It seems a bit suprising to only use a batch size of 1. Isn’t it a problem for the LSTM modules?
- Figure 5 is nice, but it would be even better with a legend to distinguish between the different colors (more visual than reading the colors in the caption)
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper deals with a niche but active area of research. Using the acceleration from an IMU had never been proven to be useful so far, so this paper does contribute to the state of the art. As I mentioned in the other subsections, the paper would benefit from a polishing and a rewriting of some parts, but overall I recommend acceptance.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The authors propose a method to integrate IMU signals into freehand ultrasound reconstruction, and show that they achieve state of the art performance. All reviewers have praised the quality of the paper and shown no major concerns. Authors could improve the paper further by looking into:
1. Give further details of the IMU make/model, and any other relevant experimental / implementation details missed
2. Improve the reasoning/justification behind the model design.
3. Comment on the noisy nature of IMU signals and how this is addressed.
4. Review and check mathematical notation as indicated by reviewer #3.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

N/A

back to top

Deep Motion Network for Freehand 3D Ultrasound Reconstruction