Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Pooneh Roshanitabrizi, Holger R. Roth, Alison Tompsett, Athelia Rosa Paulli, Kelsey Brown, Joselyn Rwebembera, Emmy Okello, Andrea Beaton, Craig Sable, Marius George Linguraru

Abstract

Rheumatic heart disease (RHD) is a common medical condition in children in which acute rheumatic fever causes permanent damage to the heart valves, thus impairing the heart’s ability to pump blood. Doppler echocardiography is a popular diagnostic tool used in the detection of RHD. However, the execution of this assessment requires the work of skilled physicians, which poses a problem of accessibility, especially in low-income countries with limited access to clinical experts. This paper presents a novel, automated, deep learning-based method to detect RHD using color Doppler echocardiography clips. We first homogenize the analysis of ungated echocardiograms by identifying two acquisition views (parasternal and apical), followed by extracting the left atrium regions during ventricular systole. Then, we apply a model ensemble of multi-view 3D convolutional neural networks and a multi-view Transformer to detect RHD. This model allows our analysis to benefit from the inclusion of spatiotemporal information and uses an attention mechanism to identify the relevant temporal frames for RHD detection, thus improving the ability to accurately detect RHD. The performance of this method was assessed using 2,136 color Doppler echocardiography clips acquired at the point of care of 591 children in low-resource settings, showing an average accuracy of 0.78, sensitivity of 0.81, and specificity of 0.74. These results are similar to RHD detection conducted by expert clinicians and superior to the state-of-the-art approach. Our novel model thus has the potential to improve RHD detection in patients with limited access to clinical experts.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_57

SharedIt: https://rdcu.be/cVD7d

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper describes a deep learning pipeline for automatic processing of colour Doppler images for the purpose of detecting and grading rheumatic heart disease. The pipeline works with non-ECG-gated data and is intended for use in low resource settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I like the focus of the work on the low resource setting, which is an issue that is often overlooked in the research literature.

    The paper is generally well written and easy to understand.

    The authors have put together a pipeline that seems to be robust and performs similarly to human experts.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some details of the training/validation procedure were unclear.

    Details of statistical testing were not presented clearly.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall reasonably good. The model was clearly described although code has not been made available.

    One point of concern though - in the checklist, the authors answered “Yes” to “The range of hyper-parameters considered, method to select the best hyper-parameter configuration, and specification of all hyper-parameters used to generate results.” I agree that they specified final hyperparameter values but I did not see any description of the range of values tested or how the hyperparameters were optimised (see detailed comments below).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I enjoyed reading the paper and found it to be mostly well-written and easy to follow. But I was left slightly frustrated by an occasional lack of detail, especially with regard to the training/validation procedure.

    For example, in Section 2, the authors give details of their dataset of 2136 Doppler videos. They also mention that 95 of these were annotated with view information, systole frames and left atrium segmentations. Later (Section 4), it is stated that the training/validation/test sets were 5108/1277/1510 images. Why switch from talking about videos to images in this way? I found this slightly confusing. Also, the numbers given in Section 4 are for the rheumatic heart disease (RHD) detection task, but no mention is made of the training/validation of the preprocessing steps (for which only 95 videos were available). What training/validation split was used here? And was there any overlap between these 95 and the data used for training/testing the RHD task? I.e. is it possible that some of the RHD test set had been “seen” before when training the preprocessing steps?

    Regarding hyperparameter optimisation, as noted above the authors stated their final values and (at least for the RHD detection task) which data were used to optimise them but did not mention the ranges of values tested or the strategy used for optimisation.

    For the model description, I also found the text slightly unclear. The first model uses 3D CNNs but the input data are 4D (64x64x3x16) so presumably one dimension was handled by defining multiple input channels? Which one? Later, for the transformer model it is stated that 3 colour channels are used so I presume the same approach is adopted for the 3D CNN model but please state this explicitly. And does every video clip have 16 frames, all of which are of the same view? This should be mentioned in Section 2 if so. Finally, the results of the 3D CNN and transformer models are combined using a “maximum voting strategy”. I presume this strategy only makes sense when there are multiple A4CC/PLAXC videos for a subject? E.g. if there is just one of each and they disagree how does maximum voting help? What do you do if the votes are split equally?

    The results of the statistical testing are also not clearly presented. What was being compared to what? In Table 2, there are two symbols (*, **) indicating p-values of 0.03 and 0.04 respectively. The second column from the right has both of these symbols - so how can a statistical test have two different p-values? I’m sure I have misunderstood something but this is because the results have not been clearly presented in my opinion. Finally, there are no asterisks in the rightmost column - does this mean no statistically significant difference was found on the test set?

    Other more minor comments:

    • The Introduction mentioned that RHD is often associated with mitral regurgitation (MR) and cited some prior work on detecting MR. But then the rest of the paper is all about RHD. Is there a clinical need to detect RHD? Or MR? Or both? In the data used in this paper, what proportion of the subjects had MR and were there subjects who were either MR +ve/RHD -ve or MR -ve/RHD +ve?
    • The last sentence of the Introduction seems like it should have come earlier, not as the closing sentence of this section. Also in this sentence: “life even” should be “life and even”.
    • In the introductory text of Section 3.2 I would recommend the authors explain why they are introducing two different models (3D CNN and transformer based) as I was confused by this at first. They could mention here that they will be used as an ensemble later.
    • In Section 4, the authors state that they use 5-fold cross validation for hyperparameter optimisation. But then which model was used for final testing? Were the best hyperparameters used to retrain using the entire training set?
    • The last paragraph of the Discussion highlights a very important point in my opinion. The authors might want to comment on what level of expertise is required to acquire images that are of good enough quality to be processed robustly by their pipeline. What evidence do they have that “minimal training” will be enough to acquire such images?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I liked the work done by the authors and the paper was written well. But there were too many details left out or presented with a lack of clarity for this to be of a standard to be accepted at the MICCAI main conference.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I think the authors have done a good job with the rebuttal. The clarifications they provided about the data, training and hyperparameters were very useful. If the authors can make the (relatively minor) revisions to the paper to clarify these points then I am happy to raise my recommendation to Weak Accept.



Review #2

  • Please describe the contribution of the paper

    This work presents and evaluates a deep learning method for diagnosing rheumatic heart disease (RHD) from Doppler echocardiography. It starts with data homogenization to identify two specific echo acquisition planes and identifies the left atrium during ventricular systole. Then an ensemble model is used to predict RHD. The ensemble includes a 3D multi-view CNN that analyzes all frames during systole as volume data, and a multi-view Transformer that evaluates the images frame-by-frame. The results demonstrate the benefit of combining these networks into an ensemble.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very well written.
    • The clinical motivation to use ungated Doppler to diagnose RHD in low-resource settings is compelling and significant.
    • The deep learning stragegy is logically motivated.
    • Evaluation is on a relatively large point-of-care dataset.
    • Performance of automated RHD diagnosis is on par with expert clinical assessment.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The deep learning strategy is an integration of existing methods.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The paper provides a substantial methodological description.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please refer to the comments on strengths and weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Excellent presentation, compelling clinical problem, logical methodology, good dataset and evaluation.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors propose a pipeline for RHD diagnosis with color doppler echocardiograms (CDE) in a low-cost setting. The authors preprocess the ungated CDEs with deep-network-based view selection, frames of interest selection, and left atrium (ROI) localization to generate video clips in ROI with A4CC and PLAXC views. Then the authors use a 2-view 3DCNN network and a 2-view 2DCNN+Transformer network to predict the RHD probability simultaneously, and use a maximum voting result of these two probabilities as the final prediction. The authors validate their method on a 591-patient dataset, and achieve a better result than a previous RHD prediction method.

    This study focuses on CDE obtained by hand-held ultrasound devices, which are low priced and easy to deploy in low- and middle-income countries.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem is relatively new, interesting, and has not received much attention, but it would have great potential and impact. There are few works on automatic RHD diagnosis with CDE; the most related one is [Ref. 19] in the paper. This study focuses on CDE by hand-held ultrasound devices, which are low priced and easy to deploy in low- and middle-income countries. This work is very practical and can benefit people, especially children, who live in low-income conditions. Our community probably would pay more attention to this kind of study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Organization: In Section 3.1, the authors use too much space on implementation details of preprocessing networks, which causes not enough room for the experiment section to include comparisons with other methods.
    2. Lack of technical novelty a. All models in this paper are from existing methods. Both view classification and frame selection use ResNet, and ROI localization use VGG based LinkNet [Ref. 26]. The first prediction models are a 3DCNN model and a DenseNet+transformer model, which are widely used in the community. The straightforward maximum voting is used to aggregate two RHD scores by prediction models. b. The whole framework is complicated. The pipeline is long, resulting in minor errors at the beginning of the pipeline and would become large at the end with propagation.
    3. Unsound evaluation a. The authors only show their experimental results with their proposed method. There is no comparison with other methods, for example, other echocardiograms classification methods, ultrasound classification methods, or even video action recognition methods for natural videos. b. The authors only show their method with the data after processing. Showing the experiments on the data with less preprocessing (view selection, view classification+frame selection, and view selection+ROI localization) would be helpful for readers to understand how challenging the original task is. These can be a part of the ablation studies.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors filled out the reproducibility checklists, but will not release their codes and dataset in the future.

    Then implementation details of all models in the paper are well described. It is easy to follow the paper and reproduce their work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Suggestion:

    1. If possible, move all implementation details in Section 3.1 and experiments of preprocessing to supplementary, which saves lots of room for more comparison experiments.
    2. If possible, run ablation studies for preprocessing.
    3. If possible, run comparison experiments with other methods.
    4. If possible, provide inter-observer annotation results.

    Future works:

    1. It would be helpful to develop a single- or two-stage model, which would highly speed up the training and inference.
    2. It would be helpful to develop a model with fewer parameters, which can be deployed on mobile devices, such as smartphones, and laptops without GPU. With this setting, handheld devices can be connected to mobile devices and directly predict the probability after imaging.
    3. It would be helpful if the authors could release the dataset in the feature, which would attract more researchers to contribute to this problem.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors are:

    1. a new and exciting task
    2. impact on the community and society

    The paper brings a good problem and application to the community, which leads the rate to a positive side. However, some unignorable weaknesses lead the score to a ‘weak accept’.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors provide the details of validation in their rebuttal.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors present a novel Doppler processing pipeline targeted at low resource settings. All reviewer value the scope of the paper and the fact that is well written. At the same time reviewers have pointed out lack of clarity in specific sections. I sugegst authors focus on addressing those, particularly a description of the training / validation procedure, and the statistical testing. Moreover, I suggest the authors read carefully the many insightful comments of the reviewrs and try to address them all.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10




Author Feedback

We thank the reviewers for the thorough and positive evaluation and their constructive feedback. Please find below the answers to your questions and concerns: • Reviewer #1 and Meta Reviewer requested clarifications about the training/validation procedure. We trained/validated the preprocessing steps with 95 of 591 subjects, which had expert annotated labels. We applied 5-fold cross-validation with a split ratio of 80:20 and balanced numbers of positive/negative cases. We then tested on the rest of 496/591 subjects. For 591 subjects, there were 8,403 videos, including 2,136 videos in A4CC or PLAXC view. For each video, the number of images-frames varied as is typical in clinical practice. The number of frames also varied with the tasks, e.g., the first frame was used to detect the view, all frames were used for frame selection, and only frames during ventricular systole were used to localize the atrium and detect RHD. The training/validation/test images for each task were: 1) detect view: 1,578/390/6,435; 2) select frame: 9,806/2,450/84,698; 3) localize atrium: 2,512/628/29,332. We found a typo in the paper; RHD was detected using videos [4D data], not images. Regarding the concern about data overlap, we emphasize that images from the same patient were never shared between the training, validation, and test sets. • Reviewer #1 asked about hyperparameters. We considered the number of epochs, batch size, and learning rate in the range [20:600], [10,32,64], and [1e-5,1e-4,1e-3], respectively. The best parameters were selected based on the maximum model accuracy at validation. We used the best hyperparameters to retrain the model using all training datasets.
• Reviewer #1 is correct; every preprocessed video had 16 frames of one of the two views. We used multiple input color channels, hence the 3D architecture for 4D data. Finally, subjects have more than one video for each A4CC/PLAXC view, which can also include multiple ventricular systoles. The maximum voting strategy used all these data. • Reviewer #1 asked about statistical tests. “*” and “**” symbols show the comparison on the validation data between the model ensemble and 3D CNNs and Transformers, respectively, using the Wilcoxon signed-rank method. This will be clarified in the table footnote. We only reported ensembled results on the test data. • Reviewer #1 asked to clarify RHD vs. MR detection. MR is a typical manifestation of RHD; however, MR may not be seen in all RHD cases, e.g., with aortic regurgitation (ref. [3]). In our dataset, 22 subjects did not have MR, including 17 normal and 5 RHD cases. • Reviewer #1 requested evidence about minimal training for data acquisition. Ref. [6] shows nurses without experience were trained within 4 – 5 days. • Regarding technical novelty, we agree with Reviewers #2 and #3 that our method is built on state-of-the-art approaches. We emphasize that we proposed a strong and unique clinical application with life-saving potential in low-resource settings. To be successful in this complex situation with hand-held devices, technical novelty was also needed as itemized in the Introduction. In short, our novelty includes harmonization of highly variable data and extraction and combination of spatiotemporal information with an attention mechanism and early fusion of multiple views for effective decision making. • Reviewer #3 was concerned about propagating errors. We provided results at each step of the approach to understand the impact of errors (Table 1). Note that the final results with aggregated errors are comparable to the work of experts. • Reviewer #3 suggested more evaluations. Note that we compared our method with expert clinicians and the state-of-the-art in ref. [19], which is the only related approach to our application. We agree that additional ablation studies would be valuable, however, rebuttal guidelines prevent us from presenting new experimental results and changing the structure of the paper substantially.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed all major points in their rebuttal and now all reviewers recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers agree to accept the paper after the rebuttal. My only comment would be to improve the figures and to use vector graphics instead of low resolution pixel images.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal is well articulated and all reviewers agree to accept this paper after rebuttal. Authors are requested to improve on the quality of the Figures.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top