Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Sarina Thomas, Andrew Gilbert, Guy Ben-Yosef

Abstract

Accurate and consistent predictions of echocardiography parameters are important for cardiovascular diagnosis and treatment. In particular, segmentations of the left ventricle can be used to derive ventricular volume, ejection fraction (EF) and other relevant measurements. In this paper we propose a new automated method called EchoGraphs for predicting ejection fraction and segmenting the left ventricle by detecting anatomical keypoints. Models for direct coordinate regression based on Graph Convolutional Networks (GCNs) are used to detect the keypoints. GCNs can learn to represent the cardiac shape based on local appearance of each keypoint, as well as global spatial and temporal structures of all keypoints combined. We evaluate our EchoGraphs model on the EchoNet benchmark dataset. Compared to semantic segmentation, GCNs show accurate segmentation and improvements in robustness and inference run-time. EF is computed simultaneously to segmentations and our method also obtains state-of-the-art ejection fraction estimation. Source code is available online: https://github.com/guybenyosef/EchoGraphs

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_37

SharedIt: https://rdcu.be/cVRv2

Link to the code repository

https://github.com/guybenyosef/EchoGraphs

Link to the dataset(s)

https://echonet.github.io/dynamic/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a graph convolutional network (GCN) for segmenting the myocardial border of the left ventricle in single-frame echo. They also propose a ulti-frame model with a GCN branch for ED/ES segmentation and other branches for EF regression and ED/ES frame classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Application of GCN for efficient LV segmentation in echo is interesting and relevant.

    • A public dataset is used for the experiments and the source code will be released.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method has a failed to achieve improved results compared to existing methods (both in table 1 and table 2).

    • The papers lacks the ablation study for the multi-frame GCN model. The GCN branch is parallel to the EF regressor; the results of EF regressor with and without GCN is not presented.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be released and the used dataset is public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The naming of the 4th row of table 1 is misleading (calling it “GCN - Regression”). This row is basically a CNN video encoder plus EF regressor (does not have GCN layers). The GCN branch is a parallel path to EF regressor and does not directly contribute to regressed EF results.

    • The author can add an ablation study to show in what extent the GCN path is contributing to the regressed EF results (reporting EF regression with and without GCN path in the multi-frame GCN - Fig 1 and Table 2). If the ablation results supports the paper’s claim and verifies that the GCN is improving the EF regression results, I’d change my review rating to “accept”.

    • The paper could include power analysis to show whether the differences across reported results are statistically significant.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the weaknesses. My main concerns are lack of the ablation study and significance of results.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Thanks to the authors for the provided answers. The rebuttal has mainly addressed my comments regarding ablation studies and power analysis.



Review #2

  • Please describe the contribution of the paper

    In this work, the authors propose a CNN + GCN based approach for joint LV segmentation and EF estimation, from cardiac Ultrasound images. The segmentation is done via keypoint regression, as opposed to semantic segmentation. This seems to be the key novelty here. Spiral Net, a type of GCN, is used for the keypoint regression. They train a single frame model for keypoint regression. They also train a multi-frame model, which does both keypoint regression + EF estimation. EF can be estimated from either the keypoints, or the direct estimation. The direct regression based estimation seems more accurate. The keypoints that are outputted help enhance explainability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The main interesting aspect is the adaptation of mesh based formulation of spiral net, to contours (which, makes sense since contours are even simpler than meshes), and using that to do keypoint regression of the LV boundaries. This sort of approach would be ideal if estimating well defined keypoints in an anatomy was necessary.

    • Having the keypoints/segmentations is also helpful in enhancing the explainability of the model. Clinicians generally like it better if they can visualize the LV contours used for EF calculation.

    • Different components of the method are presented as building blocks and can be mixed and matched as needed.

    • Results are presented for both keypoint regression errors and the EF errors.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The keypoint based approach may not be easy to emulate in a different setting with a different set of training data. Do the keypoints have to be in anatomical correspondence? I guess so, given their name? If so, it’d be quite hard to build a large dataset with keypoints that are registered to one another. Or is it that only apex/base keypoints are labeled? Perhaps this part is just unclear to me.

    • While more interesting, and also explainable, the keypoint based approach doesn’t seem to give better EF results.

    • About spiral net - I think the reason its preferred in the original literature is that it bakes in a stronger inductive bias about the graph. Prior to this, message passing was done by aggregating messages from neighbors in a permutation invariant way. I thought that with spiral net, you don’t need this requirement because now you have well defined neighborhood encoding (just like in CNN)? It’d be better to state that explicitly in the paper as opposed to just mentioning the efficiency.

    • Because there’s the single frame method and the multiple frame method with keypoints, ed/es classification, direct regression, it’s a bit hard to follow right away.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Reproducibility seems fine. Components are modular. Code is shared online.
    • One issue is that there are multiple ways their approach could be used so, for someone else to compare to this work, it could be a bit tedious.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • It wasn’t clear how exactly is the ED/ES classification is used ultimately. In the multi-frame GCN setup, the output is (40 X 2) keypoints for ED and ES. Is the graph also composed for 2 frames only - the ED and ES? And is the information simply concatenated to the feature or used explicitly to form the graph from 2 frames only?

    • Your main figure (figure 1) - has B, C, W, H variables which I didn’t seem defined elsewhere. What is B by the way? Batch size?

    • In 2.1 (Encoder) section - you say the original image/video needs to be compressed to meet the input requirements. This could perhaps be phrased a bit better. Because you’re not just compressing - you’re trying to learn relevant features that are useful downstream, right?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Its a good paper, with a really nice formulation for the keypoints regression. However, the keypoints are hard to come by in real life - at least the way I’m understanding them - maybe wrong about it.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors propose a DL model for the segmentation of the LV in four-chamber-view using echocardiography. Thanks to the combination with graph neural-networks, they are able to recover 2D contours instead of just a segmentation mask. The model is fast thanks to the use of light architectures, and is evaluated against the SOA.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Strong comparison against the SOA
    • The research is clinically relevant.
    • Authors focus on prediction time, which is critical for real-world applications but often omitted in research.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not see any major weaknesses in the paper, but explanations on the methodology could have been improved.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Details on the implementation are in the manuscript and supplementary material. Authors plan to release the code repository. They use public datasets. Note to the authors: You can use https://anonymous.4open.science/ for providing an anonimous link to your repository during review.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Major comments:

    • Current experiments make it difficult to correctly assess the exact contribution of the GCN head. It is missing comparison of the proposed model with MobileNetv2/ ResNet18 without the GCN part.

    • In the EF prediction, the comparison between your EF regression against the SOA volume based methods is unfair.

    • Again regarding the EF prediction, why the authors think that the whole video results worse? Moreover, the best results need to be highlighted in bold in the Table.

    • Why are the SOA networks different in the EF prediction and segmentation task?

    Minor comments

    • (pg 1) “EF describes the blood volume pumped by the heart in each cycle”: That is the stroke volume. Ejection fraction is as defined in pg 4.

    • More details are needed on the graph convolutional point decoder. What are spiral connections?

    • Figure 1, Please state what are B, C, W in the legend.

    • “Q1 How accurate, efficient and robust is segmentation of a single frame”, Fig 2: “ the GCN (with MobileNet2 backbone) is more robust” : Add the Hausdorff distance should be added to Table 1), also, quantify the number of outliers.

    • Details of the machine settings are important when reporting timing.

    • Looking at the processing time per frame in Table 1, it seems that the authors managed to reach real time. Does it still hold even when adding the image preprocessing time/loading time of the network? If so, I would add it to the manuscript since it is an important achievement.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the fact that their model has potential to be used and deployed in clinical practise due to its high computational speed. 2D echocardiography is the main imaging modality in cardiology, and one of its drawbacks is the difficulty in its quantification. Therefore I think that this contribution is an important step towards overcome that issue. As stated by the authors in their text, the fact that a 2D contour of the endocardium is provided as output can be useful for many other applications.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers find that the use of CNN + GCN with key point regression for LV segmentation is interesting and clinically relevant. Nevertheless, there are concerns of how the key points are chosen, and the results of the EF regression.

    The meta-reviewer also has some concerns related to the missing important details of GCN.

    1. While the nodal outputs are the 2D coordinates, what are the initial features of the nodes?
    2. How is the output of the CNN encoder used by the GCN nodes? For example, do the encoder features become the nodal features? Or is there a special way of using them?

    Such details cannot be found in the submission and in the references, and they are essential as GCN is a relatively new technique especially to the MICCAI audience.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank the reviewers and the meta-reviewer for their constructive feedback to improve the manuscript. We address the major points and explain how we would revise the manuscript accordingly.

We agree with the reviewers that including an ablation study with direct regression would better demonstrate the benefits of the GCN head. We performed an ablation study showing directly regressing EF without the GCN head results in a MAE of 4.28, a R2 of 0.72 and RMS of 5.75 applied on the single heartbeat. These results show combining the EF head with the GCN is better than direct regression, supporting our claim that the GCN helps to create predictions that are not only more explainable but also more accurate. We hypothesize that the accuracy improvement is due to extended structural learning of the GCN to model the contour keypoints, which is also useful for learning the EF(Rev1,Rev3). We tried to make a fair comparison to the SOA since all selected models are not based on volumes but regress the EF directly, either by assuming an arbitrary sequence start (comparable to Tab2-’ED/ES classifier’) or ED/ES selection based on volumes (comparable to Tab2-’peak computation’). This also explains why SOA models are different for segmentation and EF prediction, because the segmentation is only used for ED/ES classification as opposed to our method that optimizes both (Rev3).

We agree that significance tests would benefit the study, and we found the Dice results were statistically different using a Wilcoxon signed rank test. This will be included in the results (Rev1). We acknowledge that there is room for improvement in how the GCN and its configurations are described. Given the limited space, we cannot extend the description but would like to carefully revise the section to improve overall comprehension. In a revised version, we will add details on the spiral convolutions and emphasize that the inductive bias given by the fixed ordering of the nodes during message passing is the main advantage (Rev2,Rev3). The initial feature vector of each graph node is the CNN encoder output. This vector is concatenated with the feature vectors of neighboring nodes following the order of the spiral sequence (Meta). We will also make this clearer in the manuscript.

The ED/ES classifier serves two purposes: 1) Its output is concatenated to the GCN input feature vector to add additional information to which keyframes the keypoints belong 2) It directly outputs whether and where a keyframe is present in the sequence. If present, the graph output represents the contour of the respective keyframe. If the classification head detects that no keyframe is present, the GCN output should not be used. In this work, the graph always consists of 2 frames given that we only have 2 frames labeled. Technically, an extension to more frames is possible and may be further explored in the future. We experimented with a 2-frame approach in which only labeled keyframes were used. Given that the SOA methods work on sequences we refrained from adding it to the manuscript. However, we experienced that additional frames are beneficial since they seem to add more context (Rev2).

We agree that keypoint selection in the medical domain is not trivial, especially without unique anatomical landmarks. We used the original 40 points of the EchoNet dataset but those only approximate the contour and are not exact correspondences. We also explored other datasets with coarse equidistantly sampled keypoints based on the basal and apex point as reference and conclude that keypoints do not need to be exact for the GCN to learn a shape representation (Rev2). There are many different model configurations which may be confusing to a reader. We will add a config file to our repository in which we provide parameters needed to reproduce the experiments(Rev2).

We hope we could respond to the reviewers’ main concerns and are looking forward to the final feedback.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers find that the use of CNN + GCN with key point regression for LV segmentation is interesting and clinically relevant.

    In the rebuttal, the main concerns such as the clarity of the methodology, key points selection, and the benefits of using the GCN head have been addressed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have satisfactorily addressed the concerns of AC and reviewers, and I would like to recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addresses satisfactorily the comments about ablation studies and power analysis of the reviewer. The reviewer that had rejected it in the first round, considers now that the paper merits to be presented in MICCAI

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top