Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Jiamin Liang, Xin Yang, Yuhao Huang, Kai Liu, Xinrui Zhou, Xindi Hu, Zehui Lin, Huanjia Luo, Yuanji Zhang, Yi Xiong, Dong Ni

Abstract

Ultrasound (US) is widely used due to its advantages of real-time imaging, radiation-free and portability. In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because collecting adequate videos is clinically unpractical. In this paper, we propose a novel framework to synthesize high-fidelity US videos. Specifically, the synthesis videos are generated by animating source content images based on the motion of given driving videos. Our highlights are three-fold. First, leveraging the advantages of self- and fully-supervised learning, our proposed system is trained in weakly-supervised manner for keypoint detection. These points then provide vital information for handling complex high dynamic motions in US videos. Second, we decouple content and texture learning using the dual decoders to effectively reduce the model learning difficulty. Last, we adopt the adversarial training strategy with GAN losses for further improving the sharpness of the generated videos, narrowing the gap between real and synthesis videos. We validate our methods on a large in-house pelvic dataset with high dynamic motion. Extensive evaluation metrics and user study prove the effectiveness of our proposed method.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_30

SharedIt: https://rdcu.be/cVRvV

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a framework to generate videos from static content Ultrasound images. This work is based on self- and supervised- learning approaches. at the first key point detector network is trained in a weakly supervised manner and then a dense motion network is used to learn occlusion and heatmap, finally, an encoder with two outputs (content and texture) is designed to generate a final prediction. A quantitative and qualitative result was demonstrated and compared to other works.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Generating video from static images for US procedures that can be used for training junior medics. Considering the effect of deformation and occlusion in the design and using a discriminator to increase the quality of the final video.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- How occlusion and deformation are obtained as ground truth for this work?
- Detecting sparse features as keypoints within US should be a challenging task, how robust is the proposed module to noise and outliers?
- What is the level of similarity between the source and driving image? What does it need to be for this framework to prevent any failure?
- Why a single module such as optical flow has not been used instead of keypoint and dense motion?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

According to the authors the code and data will be available, but it is not challenging to replicate this work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Quantitative results for the keypoint detector are essential to provide a better understanding of their effects on the final results,
- More clarification about VGG loss should be given; “For texture part, due to unavailability of its ground truth, we adopt feature reconstruction VGG loss [5] to constrain the similarity of driving frame and the final prediction with texture information.” It seems from eq 5 that this loss is directly applied to derived and source images with the hope that it can improve the texture output.
- Contrastive loss can also be explored in the future work
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes an interesting framework for US imaging that can be used in other domains as well.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

In this paper, the authors proposed a weakly-supervised learning-based framework to synthesize high-fidelity ultrasound videos, by animating the static source images based on the motion of the driving videos.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clear and well-organized. The embedded demo is helpful for understanding the task and observing the results.

Unsupervised image animation is currently a popular topic in general cv domain, but it is indeed less-explored in the medical domain due to some realistic issues such as noises and varying sizes as mentioned in the paper. Accordingly, the authors resort to a weekly-supervised approach for motion transfer, which is more suitable for more complicated medical problems, and could be generalizable to other tasks.

The proposed framework has a two-branch architecture to learn content and texture separately. I think it is an interesting idea and could have the potential to help better understand the motion transfer process.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is not quite fair enough to compare the proposed model with FOMM, given that FOMM learns features in a totally unsupervised manner. It could be more convincing if the authors conduct more extensive ablations on examining the contributions from each proposed component in the network.

The novelty of this paper is somewhat limited theoretically, given that FOMM has been already well-explored in the cv domain.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author checked all reproducibility questions. I think it is easy to implement the code given the description provided in this paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

There is no explicit definition for the model variants: ours-P, ours-PT, ours-PTG.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The practical use of the proposed weekly-supervised approach in US synthesis could be impactful. The paper is well-written and the proposed model has achieved clear improvements over SOTAs.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

I agree with the other reviewers concerns about the novelty of this paper, yet I tend to accept it given the improvements it gains and the practical importance it has. Overall I personally think this paper’s quality is fair enough for the MICCAI community.

Review #4

Please describe the contribution of the paper

In this paper, a conditional ultrasound video generation model is proposed. In training phase, the key point and motion module takes source and driving image as input and predict occlusion map, and optical flow. Then the optical flow and occlusion map are used for frame generation, supervised by GAN and reconstruction loss.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Introduce the animation generation techniques to medical image analysis. The major framework is from existing methods but this is a good application.
2. Achieve good video quality. The video quality is measured by quantitative, qualitative and user study.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Novelty seems to be limited. The major frameworks in fig 2 and fig 3 comes from [11], including the self-supervised keypoint detection, optical flow and occlusion prediction, and the occlusion aware conditional generation. The novelty parts are (a) introducing the manual label (b) introducing the GAN loss and VGG loss.
2. Lack of ablation study. There are limited experiments showing the newly add components works. For example, compared with [11], it is unknown if the manual label and GAN loss helps to improve the final results.
3. Lack some illustrations of acronyms. For Table. 1, it is not clear what is P, PT, and PTG.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Data and code will be available, according to the response.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

(1) Add more explanations to make the paper self-contained. (2) Considering more ablation studies in the future work
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The major concern lie in the novelty part.
Number of papers in your stack

8
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a generative model that can animate a static ultrasound source images based on the motion of a driving ultrasound video. The reviewers agree that the method and motivation is interesting but have found some substantial problems regarding a) quantitative results for the keypoint detector, b) fairness of the comparisons, c) ablation studies, and d) limited methodological novelty in comparison to FOMM. I would ask the authors to comment on these issues in a rebuttal and justify some of their claims in a better way.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

8

Author Feedback

We thank all the reviewers (R) for reviewing and recognizing our work. Our novelty is clarified thoroughly. Required information have been provided. Full code has been released and the writing will be improved.

Q1: Novelty. (MetaR2, R2, R4) A1: Our work has remarkable novelty in the methodology, results and generality, regarding the first investigation on the challenging ultrasound (US) animation. Our work is significantly different from FOMM. FOMM pioneered the animation on static images in CV field, but suffers from two main drawbacks in US animation: (1) the keypoint detection in FOMM is noisy due to the intensive speckle noise, content ambiguity and complex motion in US. (2) The only decoder in FOMM severely blends the low- and high-frequency (LF and HF) features. The synthesis results of FOMM on US tend to be blurry due to the HF information loss. The problem further accumulates and becomes worse as the animation proceeds. Our method novelties address the problems: (1) we adopt a simple but effective weakly-supervised method by introducing few supervised keypoints. This design ensures the correct detection of vital areas through motion and further stabilizes the neighboring synthesis. It contributes 12.5% improvement (Tab. 1) over FOMM on FVD metric. (2) We propose a dual-decoder structure (Fig. 3) to enforce and decouple the LF content from the HF texture to better combat the blur. (3) We introduced the frame-wise VGG and GAN loss to further improve the sharpness and reduce the blur accumulation along frames. We are generalizing our method to other tasks, like the neonatal hip joint US video animation.

Q2: Code. (R1, R2, R4) A2: Our full code repository, including code and testing videos, has been released via the anonymized link (https://github.com/miccai1283/miccai_1283). Online demo also supports: http://120.25.78.227:8080/UVS

Q3: Experiments. (R2, R4) A3: (1) We apologize for the missing explanations of “ours-P”, “ours-PT” and “ours-PTG”, which results in the confusions about our ablation studies to the reviewers. These three acronyms denote our ablation studies, including gradually adding keypoint supervision (‘-P’), texture decoder (‘-T’) and GAN loss (‘-G’) to the plain FOMM. (2) Experiments proves that our proposed modules effectively improve the performance over FOMM (Tab. 1, Fig. 4). (3) We believe all the comparisons are fair since we carefully set the same settings including the number of self-supervised keypoints, model backbone, learning rate, etc., for FOMM and our methods.

Q4: Results of keypoint detector. (MetaR2, R1, R4) A4: Quantitative evaluation of the detected supervised keypoints is provided. The standard Percentage of Correct Keypoints (PCK) and mean Euclidean distance (MED) error are 94.26% within 30 pixel and 17.81±9.03 pixel, respectively. The results indicate the robust prediction of supervised keypoints.

Q5: Method details. (R1) A5: We are sorry for insufficient details due to the limited space. (1) We have the supervised keypoint annotations but no ground truth of occlusion and deformation maps (OM and DM) due to difficulty in obtaining them. Hence, we resort to weakly-supervised methods based on the intermediate OM and DM for synthesis. (2) Directly predicting optical flow is noisy in US. It also needs extra effort to describe and convey the needed motion cues from driving to source. We will add the comparison in journal version. (3) To prevent failure, we suggest to choose driving and source data that come from the same modality to ensure structure representation consistency. Also they shouldn’t vary too much in resolution and lack necessary anatomical structures.

Q6: Method details. (R1, R2, R4) A6: We thank the reviewers for suggestions. In our journal work, we will explore the contrastive loss with the current reconstruction loss. This may help improve the synthesis quality. We will also apply our method to more modalities (like colonoscopy) and higher modalities (like 4D cine).

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper seems interesting and weaknesses are addressed in the rebuttal. R2 lowered their score but still votes for weak accept. Overall the paper leans towards a weak accept.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NR

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed the main concerns by the reviewers, and particularly that about novelty. Interestingly, the rebuttal process has not made reviewers lean more towards acceptance, but rather the opposite with one reviewer downgrading their score (although still in acceptance). Overall, the paper is interesting for the MICCAI community and this AC leans towards acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This manuscript presents a method to generate ultrasound video acquisition using a GAN based approach, that takes as input a source image (for style) and a driving video (for motion) and uses two decorders for generating the motion and the texture/style information.

The goal task is an interesting and relevant task for MICCAI. Some of the writing is unclear, such as the difference Our-P, Ours-PT, and Ours-PTG but the rebuttal addressed this well, and makes it clear it is an ablation study of sorts (at least showing the serial additive benefit of each element) The main comparison is with FOMM, which is related to the presented approach, although the rebuttal clearly explains the differences and they demonstrate an added value over their approach. Overall, most of the major criticisms were addressed by the rebuttal, and those not addressed are request for more detailed evaluation of the robustness of the model as well as a more detailed ablation study. However, I think even without these additional experiments the method shows promise and I think has merit to be included in MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

6

back to top

Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling