Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Ziyang Zheng, Jiewen Yang, Xinpeng Ding, Xiaowei Xu, Xiaomeng Li

Abstract

Cardiac structure segmentation from echocardiogram videos plays a crucial role in diagnosing heart disease. The combination of multi-view echocardiogram data is essential to enhance the accuracy and robustness of automated methods. However, due to the visual disparity of the data, deriving cross-view context information remains a challenging task, and unsophisticated fusion strategies can even lower performance. In this study, we propose a novel global-local fusion (GL-Fusion) network to utilize both global-based and local-based multi-view information to improve the accuracy of echocardiogram analysis. Specifically, a multi-view global-based fusion module (MGFM) is proposed to mine global context information and to constrain feature pairs in the same phase of different cardiac cycles. Additionally, a multi-view local-based fusion module (MLFM) is designed to extract local features’ correlations of cardiac structures in different views. Furthermore, we collect a multi-view echocardiogram video dataset (MvEVD) to evaluate our method. Our method achieves an 81.57% average dice score, which demonstrates a 7.11% improvement over the baseline method, and outperforms other existing state-of-the-art methods. To our knowledge, this is the first exploration of a multi-view method for echocardiogram video segmentation.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_8

SharedIt: https://rdcu.be/dnwCI

Link to the code repository

https://github.com/xmed-lab/GL-Fusion

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a novel method for segmentation of cardiac structures by fusing information across multiple views. They demonstrate significant performance improvements over baseline methods in experiments on a single institutional dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The motivation of this work and the network design is good: the appropriately fusing information from multiple views to improve performance is potentially very useful and it is not obvious how best to achieve it. The performance gains over single view models are significant.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The major weakness of this paper is that it is very challenging to understand the methods. The description of the methods are vague and mathematically imprecise. For example, the entire explanation of the multi-view global fusion module is this: “we here introduce the view-wise attention module to aggregate the cross-view information (see Figure 2)”. It is impossible to understand from this what the authors have actually done. The diagrams help somewhat but are still imprecise: what are the shapes and dimension of the tensors involved? What is meant by the “circle containing a cross” operation? When sigmoids are used is this the softmax function or sigmoid? If softmax, over which dimesion(s)? What is the network architecture of the encoder and decoders? I don not believe I could come close to reproducing these experiments given the explanation given in the paper. The explanations would also benefit from editing by a native English speaker,
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have made their code and some other information available. The data is not available, limiting reproducibility. As mentioned above, the level if detail is far short of sufficient to reproduce the experiments if the code were not provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

In 2.1, the videos are inconsistently denoted X and V.

Is the number of frames T the same for all views? I would assume not but as written it is implied that they are the same.

The effect of SGCM and TCC are reported in table 2a but these acronyms are never defined in the manuscript and I have no idea what they refer to. There are a number of language errors in the manuscript. For example “spare” instead of “sparse”, “the cardiac” instead of “the heart”, and “argumentation” instead of “augmentation”.

Percentage performance improvements should be reported
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is too vaguely described to be understandable. If this could be rectified, this could be a strong paper given the good motivation and the excellent results.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

In this paper, the authors introduce a novel GL-Fusion network that combines both global-based and local-based multi-view information for more accurate echocardiogram analysis. To achieve this, the authors propose a multi-view global-based fusion module (MGFM) that captures global context information and constrains feature pairs within the same phase of different cardiac cycles. Moreover, the authors design a multi-view local-based fusion module (MLFM) that extracts the correlations of local features of cardiac structures in different views. Additionally, the authors create a multiview echocardiogram video dataset (MvEVD) to evaluate the performance of our proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This study presents a novel approach to multi-view echocardiogram video segmentation, which to the best of our knowledge, is the first of its kind. The proposed method, GL-Fusion, incorporates a multi-view local-global fusion module that combines information from different views to improve the representation of each view. Additionally, a dense cycle loss is designed to enforce feature similarity based on temporal cyclicality, utilizing unlabelled data. 2. Extensive experiments are conducted, and the results demonstrate that our method outperforms existing methods, with an average dice score of 0.81.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Need to imptove the English writing an typos
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

good reproducibility
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Need to imptove the English writing an typos
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. This study presents a novel approach to multi-view echocardiogram video segmentation, which to the best of our knowledge, is the first of its kind. The proposed method, GL-Fusion, incorporates a multi-view local-global fusion module that combines information from different views to improve the representation of each view. Additionally, a dense cycle loss is designed to enforce feature similarity based on temporal cyclicality, utilizing unlabelled data. 2. Extensive experiments are conducted, and the results demonstrate that our method outperforms existing methods, with an average dice score of 0.81.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposed the GL-fusion for Multi-view Echocardiogram Video Segmentation via extracting and fusing the global information and local information of the data. Besides, a multi-view echocardiogram video dataset called MvEVD is collected to validate the performance of GL-fusion. Experiments demonstrate the superiority of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

As claimed in the paper, this could be the first study to examine multi-view echocardiogram video segmentation

Local and global information of multi-view data are taken into account simultaneously.

This paper analyze the challenges of using existing methods to process echocardiogram video segmentation, making motivations of this work persuasive.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Some existing multi-view segmentation methods are suggested using as comparison.

It could be better if the the illustration of methodology is clearer and detailed.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

High rate in reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Could the collected data “MvEVD be public access? If yes, it could be benefit to the community. It could be better to make the intoduction on methodolology clearer.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is specifically designed based on the characteristics of data.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

For echo imaging analysis, fusing information from multiple views is a sound plan and the experiment results validate the author’s claims. More clarifications and details for methodologies are needed.

Author Feedback

We thank the area chair and reviewers for the constructive and insightful comments on our manuscript. Overall, all three reviewers agreed that multi-view information fusion for echocardiogram videos is a novel approach with good motivation, R1 and R2 claim that the results of our fusion method outperform existing methods and single-view baseline. Also, R2 and R3 emphasize our contribution of developing a new dataset named “MvEVD” which enables new research on multi-view echocardiogram video segmentation. They also consider the good reproducibility of GL-Fusion, which make this paper more reliable. Thank you for your advice and careful review. We are now editing this paper and making it clearer and more detailed. All errors will be corrected in our final version. ->[R1, R2, R3] A general explanation of our method In the “MvEVD” dataset, each patient sample comprises three views of echocardiogram videos. A consistent number of frames T is sampled from each view and then passed through the “DeeplabV3” encoder to extract features. Subsequently, we introduce a novel approach incorporating multi-view global-based and local-based fusion modules to process the obtained feature maps from the multiple views. Our global-based fusion module concatenates the multi-view feature maps and applies a shared weight self-attention mechanism to fuse their features. Furthermore, we utilize a cycle loss, leveraging the characteristics of the heartbeat cycle, to jointly train the model using the unlabeled frames. In the local-based fusion module, we obtain local feature masks using the “DeeplabV3” decoder and a center block. These masks highlight features with stronger intensity that are closer to the object center, while discarding background information that is farther away from the center. This selection is based on the understanding that morphological information should remain consistent closer to the center. On the other hand, the global-based fusion module takes into account background information and is more complex, lacking structural connections. To fuse the multi-view local features, we incorporate a self-attention module. Finally, both the global and local features of annotated frames are supervised using a binary cross-entropy loss, utilizing the ground truth annotations.

->[R3] Response to Dataset “MvEVD”: For this dataset, we have plans to make it public. Also, we are now papering for the permission license from data collection agencies and approval of ethical issues from medical centers . All the details will be added to our final version. ->[R1] The misunderstanding of the symbol and inconsistent notation: Sorry for the inconvenience that we have not plotted legends in Figure 2. The “circle containing a cross” operation is element-wise matrix multiplication, while the “circle containing a +” is element-wise addition. The σ denotes the sigmoid operation. The shapes and dimensions of the tensors and legends in Figure 2 and inconsistent notations will be corrected. ->[R2] Typing errors of MGFM and MLFM: Thank you for pointing out our typing error, the effectiveness of “SCGM” and “TCC” should be “MGFM” and “MLFM”. ->[R3] Response to comparing existing methods: We have added an existing multi-view segmentation method since some methods have not released their code yet, which is unable to be reproduced. The Transfusion [1] method achieves a 72.76 overall Dice score, while ours is 81.57. For each view, transfusion reaches 78.78, 80.23 and 59.31, while ours is 82.61, 83.55 and 78.59 in view PLVLA, LVSA and A4C, respectively. Method PLVLA LVSA A4C Avg Dice Transfusion[1] 78.79% 80.23% 59.31% 72.78% Ours 82.61% 83.55% 78.59% 81.57%

[1] Liu, D. et al., 2022, Transfusion: multi-view divergent fusion for medical image segmentation with transformers. MICCAI 2022.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The issues with method description and figures have been explained by the author’s feedback.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The topic of the paper is interesting, and the authors have adequately addressed the main comments stemming from the reviews. Such a revised version should be an interesting paper to be published in the main proceedings.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work tried to achieve multi-view echocardiogram video segmentation via a proposed global-local fusion network. The major concern is that the methodology description is not quite clear and the writing of the manuscript is bad. Nevertheless, I think the clinical application of this work is quite interesting, which is a very challenging task in clinic.

back to top

GL-Fusion: Global-Local Fusion Network for Multi-view Echocardiogram Video Segmentation