Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kavitha Vimalesvaran, Fatmatülzehra Uslu, Sameer Zaman, Christoforos Galazis, James Howard, Graham Cole, Anil A Bharath

Abstract

Cardiac magnetic resonance (CMR) is the gold standard for quantification of cardiac volumes, function, and blood flow. Tailored MR pulse sequences define the contrast mechanisms, acquisition geometry and timing which can be applied during CMR to achieve unique tissue characterisation. It is impractical for each patient to have every possible acquisition option. We target the aortic valve in the three-chamber (3-CH) cine CMR view. Two major types of anomalies are possible in the aortic valve. Stenosis: the narrowing of the valve which prevents an adequate outflow of blood, and insufficiency (regurgitation): the inability to stop the back-flow of blood into the left ventricle. We develop and evaluate a deep learning system to accurately classify aortic valve abnormalities to enable further directed imaging for patients who require it. Inspired by low level image processing tasks, we propose a multi-level network that generates heat maps to locate the aortic valve leaflets’ hinge points and aortic stenosis or regurgitation jets. We trained and evaluated all our models on a dataset of clinical CMR studies obtained from three NHS hospitals (n = 1,017 patients). Our results (mean accuracy = 0.93 and F1 score = 0.91), show that an expert-guided deep learning-based feature extraction and a classification model provide a feasible strategy for prescribing further, directed imaging, thus improving the efficiency and utility of CMR scanning

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_54

SharedIt: https://rdcu.be/cVD7a

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors describe a multi-stage machine learning approach to detect pathologies of the aortic valve from Cine MRI imageing sequences. This is intended to support medical staff in determining which other types of imaging are required for the specific patient, potentially improving the imaging duration and quality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The presented method is very interesting and relies on a simple, yet obviously powerful, machine learning approach. The authors describe a multi-step network to first detect features of potential aortic valve pathologies and then to classify them. The paper is well written, easy to understand and the methodology is explained properly. I would specifically like to point out the importance of explainability in AI, which the authors have addressed with their multi-level approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not see any major weaknesses.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be relatively straight-forward to reproduce. It would be great if the dataset could be made available to the public, since significant effort has to be taken to annotate the data as was done here in this study.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I have no major points of criticism. Minor issues:

    • Please add units to Table 1. Also, it would be nice to give the resolution of the Cine MRI sequences to relate pixels to mm.
    • Did you perform an ablation study?
    • It would be interesting to see in which cases classification failed. Could you give an example image, maybe?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper tackles an important medical imaging topic, the topic is addressed in a convincing way. The algorithmic details are adequate and the paper is well-written and explains everything in detail.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Overall paper describes an approach for aortic valve disease classification based on cardiac MR. Using deep learning based heat map regression, features are derived from anatomical landmarks & contours (aortic hinges & leaflets) and pathophysiological dynamics (stenotic and regurgitant jets). Based on these features random forests are trained to obtain an estimate of whether or not aortic valve disease is present.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Disease classification and the combination of machine learning and explainability are generally interesting to the MICCAI community. The evaluation is done on a reasonably sized training and testing sets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Many details are not actually clear from the manuscript and the presentation is rather confusing at times: 1) What does curve tracking mean and how is this implemented? 2) What quantitative criteria do you use to obtain “probability of being a pathological curve” 3) How are features (angles, length, distance to image patch center, probability) computed from the heat maps? 4) “coronary cusp leaflet” is a confusing formulation. Does this model only one aortic leaflet? Does it model contours or a landmark (located on the free edge of the leaflet)? 5) The number of frames to be used also depends on cardiac phases, as aortic stenosis is only apparent during systole, and regurgitation only during diastole, which are typically of different duration. Also patients may rarely suffer from both stenosis and regurgitation, more frequently from only one of these conditions. How is this handled?

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Mainly the open points on the description should be clarified, as right now I would not be able to implement exactly the same as many details are not really described. I realize this may be tricky given the size limit of the paper, maybe it would be possible to replace some text with a more descriptive figure.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    6) What is the sensitivity w.r.t. hinge detection? Practically this is a surrogate for the distance to hinges (since crop window is computed based on these), no? 7) Please provide the image resolution and relate of the error in mm to the overall size of anatomy and subcomponents. 8) The sentence in Fig 4 “The green curve shows the predicted an AV-regurgitation curve.” seems incomplete - please correct this, message not clear. 9) The processing is based on 2D slices only where as jets may have 3D shape, please comment on potential sensitivity w.r.t. imaging limitations and variations in acquisition

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall quite interesting, nonetheless very confusing on many parts, overall the lacks clarity, which could be solved with a better presentation (e.g. using graphical rather than textual description)

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This works presents a multi-stage deep/machine learning strategy for aortic valve abnormalities classification from a single cine CMR view (3-chamber). The authors propose to regress heatmaps with the location of the aortic valve hinge points, followed by regression of curves representing the valve leaflets and pathological blood jets. By applying a ridge detection method on these curves, several handcrafted features are extracted per frame, later aggregated in a per-video feature set, and use for classification using a random forest (RF) classifier. The approach was validated in data from 3 centers, with interesting results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty: the proposed approach presents novel aspects, both in terms of application (first attempt to classify aortic valve abnormalities from a single standard CMR view) and methodology (an ingenious multi-stage hybrid [deep/machine learning] strategy that mimics clinicians reasoning and provides an explainable framework). Explainability: a key aspect for clinical practice integration, which is here accounted for with a tractable machine learning classifier (RF). Applicability: the proposed curve-based heatmap regression (followed by a ridge detection method) seems generic and sufficiently interesting to a panoply of clinical applications.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of methodological/implementation details: certain modules/aspects of the proposed methodology lack sufficient details to be fully understood and potentially reproduced. These include details regarding heatmap creation, curve tracking algorithm, proposed network, feature extraction, etc. (see specific comments below). Limited results supporting algorithmic decisions: does replacing the conv block with dense blocks effectively improved heatmap regression? Does a three-stage heatmap regression (Fig. 3) outperforms a single- or dual-stage network?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors employ a novel multi-center dataset comprised of 1017 patients. However, limited description is given regarding image acquisition parameters, instructions given to annotators (e.g., which rules were used to define a pathological jet and its extent?), methods employed for quality control (any type of consensus?), etc. Moreover, it is not understood if the dataset used for training heatmap regression (80 patients) is a subset of the main dataset (1017 patients) or an independent one. If the former, how was that dealt with when evaluating the accuracy of the classifier? Several methodological/implementation details are lacking, hampering the reproduction of the authors’ method/results. See “Weaknesses” and associated specific comments below.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In addition to the comments raised above, some comments follow:

    • The term “curve” may be difficult to understand when first reading the manuscript, especially in the abstract. To improve comprehension, consider replacing the term in the abstract (referring to blood jet or similar) and then properly define it in the main text.
    • Please provide further detail regarding your proposed networks. Are the number of levels and filters per level equal to the original U-net? What is the initial localization network’s input size?
    • How are the curve-based heatmaps created? Perhaps a Gaussian convolved with a binary mask of the ground-truth curves?
    • Since the heatmap can present insignificantly low values, was a minimum threshold value used to “stop” the curve tracking (i.e. ridge detection) algorithm?
    • How is the orientation of the curve defined? A vector connecting the initial and end point (as suggested by Fig. 4), or perhaps a line fitted to all curve points?
    • How is the “probability of being a pathological curve” defined? From the text after eq. (2), it seems to be the estimated mean probability of the regressed map over the curve, rather than a true “probability” of being pathological (vs. healthy).
    • No reference to “random forest” is given in the “Method” section. Please add it there (including certain implementation details found in section 3).
    • Was a patient-disjoint split use to separate annotated frames into training, validation, and test sets?
    • Consider decreasing the length of the abstract (the first sentences are unessential).
    • Given the variable pixel spacing of CMR images (and potential resampling used during your pipeline), consider reporting the localization accuracy in mm.

    • Please correct English grammar/spelling mistakes/typos. A few examples follow:
      • p.1: where it reads “leaflet hinge points” should read “leaflets’ hinge points”.
      • p.2: remove “of” from “assess of blood flow”.
      • p.5: remove “an” from “predicted an AV-regurgitant curve” in Fig. 4 caption.
      • p.7: where it reads “utlises” should read “utilises”.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found the manuscript well written and methodologically sound. Despite simple, the proposed pipeline is ingenious (with certain modules potentially applicable to other clinical tasks), provides interpretable results and shown a good overall performance. Despite the few weaknesses described above, these seem feasible to be corrected in the rebuttal phase (lack of methodological details limiting the reproducibility, limited discussion on the results, etc.), which could result in an interesting proceeding paper.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers are satisfied with the paper in terms of novelty and evaluation, however, very detailed questions have been raised with respect to the method. Therefore, I suggest the authors to address these questions in the rebuttal phase to improve on reproducibility. Overall an already very convincing paper.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR




Author Feedback

We would like to thank all the reviewers for their careful evaluation of our work, constructive suggestions, and provisional acceptance. We have now clarified and addressed the comments.

R1&R3: The voxel size of images varies between 1.17x1.17 pixels to 1.56x1.56 pixels x 8mm (slice thickness). Our localisation network takes an input image of 256×320 pixels. For curve estimation, we cropped input frames to have a size of 96×96 pixels. We will provide acquisition parameters as supplementary material. We performed an ablation study (Table 3 added with results) comparing U-Net and other versions of our network (single and double networks) for heat map regression. Our final network (Figure 3 updated with further details) with 3 sub-networks better approximates heat maps with smaller mean absolute errors. We have added examples of failure cases (Figure 6).

R2&R3: The aortic valve has 3 leaflets but only 2 are seen in the 3CH cine – the non-coronary cusp and the right coronary cusp. Each cusp inserts into the base of the aorta at their respective hinge points. Curves in this study are simply used to define a thin collection of connected pixels that are used to represent locations of the: right coronary cusp, non-coronary cusp, stenotic jet and regurgitant jet. With regards to curve tracking, we use a simple tracking algorithm to detect the curves by tracing ridge points in generated heat maps. The start location for tracking is determined by finding the location of the maximum value in the related heat map. Then, we check if the maximum value is above an initial threshold. If so, we search for the next point with a step size of 3 pixels in the best direction, which is found by comparing sampled values, in the heat map, with angles of 0 degrees to 359 degrees. Curve tracking continues if the probability of the last traced point is over a stop threshold of 0.1. The orientation of the curve is defined by the first and last point of a curve. We will include the algorithm in the appendix.

Then we compute features from predicted curves. One of the four features is the “probability of being a pathological curve” which should technically read “probability of being a curve” used to discriminate false, artefactual curves from true ones. We do this by treating generated heat maps a probability map of being curves and take the average of probabilities sampled for traced curve locations.

With regards to hinge point detection, our network presents good localisation performance with a mean distance error of 3.5±4.2mm (mistakenly written as pixels) given the average size of an input image is 500x400mm. We also use a large cropping window to tolerate any faulty detection.

Our current method presents an interpretable and accurate classification of abnormal aortic valves from a single cine. We aim to perform multi-class classification (including co-existing disease) and other views/planes in future work.

R3: We utilized two separate datasets. The first dataset: expert annotated frames from 80 patients were used for the heat map regression task. For this task, we curated a training dataset of 1221 unique frames with expert-derived annotations of key landmarks. A sample of 100 frames, selected by stratified random sampling was double labelled by two experts in a blinded manner, showing good interrater reliability. Data splitting was based on patient-wise selection. The binary annotations were smoothed with a Gaussian-like kernel with σ=5 pixels to generate heat maps.

Pathological jets were only labelled if clearly and obviously seen, with the understanding that experts may have slightly different interpretations especially for challenging cases. This reflects every-day clinical practice. If the structures were not clear enough for visualization, then they were asked to say “not clearly seen”. The second dataset, comprising 1017 fresh patients, was used for pathology classification, of which only 90 patients had mixed valve disease.



back to top