List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao
Abstract
The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle.
The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing.
However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43901-8_22
SharedIt: https://rdcu.be/dnwC6
Link to the code repository
https://github.com/MasoudMo/echoglad
Link to the dataset(s)
https://data.unityimaging.net/
Reviews
Review #1
- Please describe the contribution of the paper
The authors tackle the problem of the automated detection of four left ventricle landmarks on parasternal long axis echo 2D frames using hierarchical graph neural networks (HGNN). The landmarks can be used to provide measurments of myocardium dimensions that are potentially related to heart failure. The HGNN uses nonlinear message passing to combine one pixel level graph of the inital 2D frame with K downscaled versions (auxiliary graphs) to implement a coarse to fine detection of the four landmarks. The model is evaluated on in- and out-of- distributions datasets both in a quantitative and qualitative fashion in terms of error (distance) and detection performance. It is also compared to 5 previous works and to ablated versions.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The HGNN is well adapted to solve the task.
- The paper is relatively well written and clear to describe a relatively complex model.
- The evaluation is excellent in terms of in- and out-of- distributions and comparisons, allowing to highlight the relevance of the approach used.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Although globally well written, some specifc parts remain unclear on the approaches used. In particular
- how the initial CNN is combined with the K auxiliary graphs to create the “Main Model” ?
- in Section 3.4, the process to extract node features remains unclear “the grey-scale image is initially expanded in the channel dimension using a CNN” (and “d-dimensional embeddings in Section 3.5”). More details are necessary.
- the message passing aggregation function GNN_l(G^i) in Eq. (1) is not explicited and very important to understand the learning process of the HGNN
- the loss functions lack details in Section 3.6: is BCE used for match versus no-match ? Is the L2 loss based on Euclidean distance in physical space ?
- The paper lacks a discussion. The conclusion is very short. It is crutial to interpret the results and discuss the specific benefits of HGNN. All elements are already there to justify the HGNNs but no discussion neither summarizes nor interpret the results.
- Although globally well written, some specifc parts remain unclear on the approaches used. In particular
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors mention the publication of their code in Section 4.2. They evaluate their approach on one public and one private dataset. Therefore the evaluation of their code on the public dataset should be feasible, given that they clarify the aspects mentioned above. Implementations details are provided in Section 4.2.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
The paper would be clear if the aspects mentioned above are clarified. A discussion and extended conclusion is needed and also on the potential extension to videos. The authors will have to reduce other sections to free space. Some futher minor details:
- it remains not clear if the layers l of the graph directly correspond to the auxiliary graph k
- caption Fig. 2: “granulaity” -> “granularity”
- providing the figure and table numbers in supp. material would help (instead of e.g. “A figure in the supp. material further clarifies how the model generates landmark location heatmaps on different scales.”)
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is a well suited application and throrough evaluation of GNNs, with some novelty in the hierarchical design. It certainly has merit and could have impact if the points mentioned above are clarified.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
In the revised version, the authors may be able to clarify why node aggregation is not necessary, as well as to write an informative short discussion about the results and the relevance of the proposed model. This would address my two main concerns.
Review #2
- Please describe the contribution of the paper
This work presents a novel strategy for measuring the left ventricle’s dimensions in parasternal long-axis (PLAX) echocardiographic images, consisting of a hierarchical graph neural network (GNN) for landmark detection that employs multi-scale loss for hierarchical supervision. The strategy was validated in two datasets under in-distribution (IN) and out-of-distribution (OOD) settings, achieving interesting results against the state-of-the-art (SoTA).
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Novelty: the proposed strategy is methodologically novel, employing graph neural networks for left ventricle landmark detection, and taking advantage of a hierarchical output representation to tackle the sparse annotation problem (instead of relying on Gaussian label smoothing strategies). Ablation study and state-of-the-art comparison: the authors have adequately compared their strategy against baseline variants of their method (including vanilla U-Net, a main-task GNN and a single-scale loss variant) and SoTA approaches (5 distinct methods), including both quantitative and qualitative results. Applicability: the proposed hierarchical GNN scheme for landmark detection seems generic and sufficiently interesting to a panoply of clinical applications that rely on similar detection tasks.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Lack of methodological and experimental details: despite the overall good description and appealing figures (both in main text and supplementary file), certain aspects of the proposed methodology and experimental design lack sufficient detail to be fully understood and potentially reproduced (these may later be perceivable from the code made publicly available but currently omitted for the sake of anonymization). Lack of discussion: no discussion is made about the achieved results, study limitations or potential future work.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors employ a private dataset of nearly 30000 PLAX echo images. However, no description is given regarding study design (multi-center or single-center), population enrollment and demographics (normal subjects vs. patients with left ventricular pathologies, and how many of each), image acquisition parameters (spatial and temporal resolution, ultrasound scanners used, etc.) or labelling details (e.g. instructions given to annotator(s), how many annotations were involved and whether there was any type of consensus, etc.). Moreover, no indication of ethical approval is given. Moreover, as commented above, certain methodological details are lacking. For example, how is the multi-scale loss defined? Do the different scales equally affect the final loss or were distinct weights given to the different resolutions? Were all resolutions (K=7 plus main one) used for loss computation? Regarding inference, were the different scales combined somehow, or do you simply compute the metrics over the landmarks extracted by the main graph? The same can be said about the experiments: the results reported in Table 1 and 2 of the main text, and Table 1 of the supplementary file, were obtained with the respective methods’ official implementation or did the authors implement them? Were all parameters kept the same or were they adapted somehow to handle your private database?
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
In addition to the comments raised above, some comments follow:
- Regarding the work of Gilbert et al., please verify your statement, as the original work states that heatmaps are “rotated such that the long axis was orthogonal to the direction of measurement” (and not statistically based). This also means that this strategy penalizes the points along the ventricular wall differently (less), which according to your comment in the caption of Fig. 1 is relevant. Is the Gaussian label smoothing issue still a problem in this specific case? Please comment.
- When collectively denoting all graphs as G_i, do you mean that V_gi represents the nodes at all levels? If so, does h_out represent the output at all scales? If so, how does equation (3) work (given the softmax operation)? If not, please clarify the formulation of your hierarchical message passing and multi-scale loss function.
- A simple training-validation scheme was used in your ablation study. Please consider using a cross-validation scheme to increase the reader’s confidence in the obtained results.
- Why were the thresholds 2- and 6-mm set? Is there a clinical reasoning for them? Have they been used before? Please clarify and/or consider using a multi-threshold metric (see Maier-Hein et al. arXiv 2022 or Reinke et al. arXiv 2023).
Minor remarks:
- The letters “l” and “s” are used twice across the manuscript: one to define neighboring nodes in per-resolution graphs (section 3.3) and later, respectively, to define GNN layers or nodes in the hierarchical graph G_i (section 3.4). Please correct.
- In equation (2), please include the superscript of h_nodes. Since “l” starts at 1, and there are L GNN layers, the MLP should be applied to your L+1 output, right? Consider starting the count in 0 (the features z from section 3.4), so the superscript number represents the output of the respective GNN layer (as intended by the description below equation 2).
- In page 5, correct “measurments” to “measurements”.
- Please define the abbreviation UIC on page 6 before using it on page 7.
- In page 7, please correct “6 x 6” to “16 x 16”, and formally state “(L=3)” for GCN layers.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I found the manuscript well written, methodologically novel and sound, and adequately supported through an ablation study and SoTA comparison. Despite the few weaknesses described above (lack of some methodological/dataset/experimental details), these seem feasible to be corrected in the rebuttal phase, which would result in an interesting proceeding paper.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
The paper focuses on detecting four landmark locations (and their related distance measurement) to assess the left ventricle chamber function. A hieratical graph neural network is constructed for this purpose. The experiments were performed on a public dataset and a self-collected one.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The methodology is straightforward, which is well summarized in the overview figure. The key parts are a pyramid of feature extraction with graph CNN, followed by MLP.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
How to get the ground truth of four landmarks is not clear. From Fig.1, there seems to be no golden standard to determine the precise locations of the four landmarks. There is the potential subjective error from different human annotators. This variance is also reflected in the two experiments. In addition to the three line segment distance from four landmarks, is segmentation of the chamber, septal, and posterior wall a more accurate option for the assessment?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors indicate the codes will be available from GitHub, though there is no indication on the self-collecte dataset.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Please refer to the item 6 weakness.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Please refer to the item 6 weakness.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
A novel strategy is presented for measuring the left ventricle’s dimensions in parasternal long-axis (PLAX) echocardiographic images, consisting of a hierarchical graph neural network (GNN) for landmark detection that employs multi-scale loss for hierarchical supervision. The strategy was validated in two datasets under in-distribution (IN) and out-of-distribution (OOD) settings, achieving interesting results against the state-of-the-art (SoTA). The reviewers find the paper of interest - includes novel methodology in an important task, with strong experiments and results. Main issue is raised in lacking on methodological detail and additional clarifications are requested, especially by Reviewer 3. Please address in a rebuttal.
Author Feedback
We appreciate the insightful feedback provided by the reviewers. They highlight the novelty of our approach, and our detailed and comprehensive evaluation against state of the art methods. Below, we aim to clarify the comments related to the evaluation criteria we have used (R3), and further details on our methodology (R1 & R2).
Section 1: Evaluation Criteria and Gold Standard
Reviewer 3 questioned the origin of the gold standard and its accuracy. In routine clinical practice, LV measurements are determined by a sonographer, and subsequently verified by an echocardiographer. This is indeed the procedure followed for the EchoNet LVH dataset. As noted by the reviewer, this method can have inter-observer variability. However, it is the distance between landmarks (LVID, LVPW, and IVS) that plays a significant role in diagnosing LVH, as the formulas recommended by the American Society of Echocardiography as the clinical guidelines approximate LV mass utilizing these measurements rather than the exact landmark locations [1]. Additionally, to address the noted variability, we have introduced a less stringent metric, the success detection rate (SDR), which forgives minor errors. In summary, these measurements are a recognized part of standard clinical practice in LVH diagnosis, and our model evaluation closely aligns with real-world clinical applications.
Reviewer 2 asked about our choice of 2 mm and 6 mm as the thresholds for SDR. Our decision was based on the healthy ranges for IVS (0.6-1.1cm), LVID (2.0-5.6cm), and LVPW (0.6-0.1cm). Hence, the 2 mm threshold provides a stringent evaluation of the models, while the 6 mm threshold facilitates the assessment of out-of-distribution performance.
Section 2: Methodology Questions
- Reviewer 2 asked for clarification on our multi-scale loss information. We utilized a multi-scale binary cross-entropy loss, treating all levels equally for the sake of simplicity. We did this because of the following duality: the coarser scale landmarks must be easier for the model to find (so intuitively, the model should be penalized if it makes a mistake there) while the pixel-level landmarks are of the main interest (and therefore the pixel-level mistakes should also be penalized). Hence, instead of weighing different scales differently, we introduced a measurement MSE loss which exclusively penalizes main pixel-level predictions.
In relation to the question about inference time, during inference, the model outputs multi-scale landmark locations, but only landmarks on the pixel-level graph are utilized. That being said, discrepancies between coarse and fine-level landmarks can be employed for sample rejection, indicating low model confidence.
- Reviewer 1 raised concerns about the initial CNN network’s use. This network functions similarly to 1*1 convolutions to expand the depth dimension of images. In this way, the GNN’s node features are not limited to a depth of 1 (as in a grayscale image) and allow the model to capture complex relationships through the enhanced depth of pixel features.
Finally, regarding the aggregation method of the GNN, since our final task is node classification and not a graph-level task, we have not employed an aggregation method. This will be made clearer in the revised text.
In response to additional feedback regarding the need for a detailed conclusion and discussion, we will address this by linking our findings with our ablation studies in the discussion section. We will also elucidate why our proposed architecture works and discuss possible extensions of our work to video data, given the additional space after paper acceptance. Lastly, we have a well-documented codebase that enables the reproduction of our results and facilitates a better understanding of the proposed model.
[1] Devereux et al. Echocardiographic assessment of left ventricular hypertrophy: Comparison to necropsy findings. The American Journal of Cardiology, 57(6):450–458, 1986.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
All reviewers found the paper ready for acceptance at this time. One reviewer increased the score following the rebuttal.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
A hierarchical graph neural network was proposed to measure the left ventricle via landmark detection on echocardiography images. The method was evaluated both on in domain and out of domain data and has good results compared with SOTA methods. Reviewers noted a number of unclear parts regarding experimental setup and methodology. Authors provided a good clarification of these sections and made some proposals on how to revise the final paper. The overall contribution seems relevant for MICCAI community.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The method is considered novel and the key issues is clarification of the methodology. The authors have responded to these issues which appear to be satisfactory to respond to reviewer 3’s concerns.