Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Dilek M. Yalcinkaya, Khalid Youssef, Bobak Heydari, Orlando Simonetti, Rohan Dharmakumar, Subha Raman, Behzad Sharif

Abstract

Dynamic contrast-enhanced (DCE) cardiac magnetic resonance imaging (CMRI) is a widely used modality for diagnosing myocardial blood flow (perfusion) abnormalities. During a typical free-breathing DCE-CMRI scan, close to 300 time-resolved images of myocardial perfusion are acquired at various contrast “wash in/out” phases. Manual segmentation of myocardial contours in each time-frame of a DCE image series can be tedious and time-consuming, particularly when non-rigid motion correction has failed or is unavailable. While deep neural networks (DNNs) have shown promise for analyzing DCE-CMRI datasets, a “dynamic quality control” (dQC) technique for reliably detecting failed segmentations is lacking. Here we propose a new space-time uncertainty metric as a dQC tool for DNN-based segmentation of free-breathing DCE-CMRI datasets by validating the proposed metric on an external dataset and establishing a human-in-the-loop framework to improve the segmentation results. In the proposed approach, we referred the top 10\% most uncertain segmentations as detected by our dQC tool to the human expert for refinement. This approach resulted in a significant increase in the Dice score (p<0.001) and a notable decrease in the number of images with failed segmentation (16.2% to 11.3%) whereas the alternative approach of randomly selecting the same number of segmentations for human referral did not achieve any significant improvement. Our results suggest that the proposed dQC framework has the potential to accurately identify poor-quality segmentations and may enable efficient DNN-based analysis of DCE-CMRI in a human-in-the-loop pipeline for clinical interpretation and reporting of dynamic CMRI datasets.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_44

SharedIt: https://rdcu.be/dnwBE

Link to the code repository

https://github.com/dyalcink/dQC

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The study focuses on the challenges of manual segmentation of myocardial borders in each time frame, and the lack of reliable quality control techniques for deep neural network (DNN)-based segmentation of DCE-CMRI datasets. The authors propose and evaluate a novel space-time uncertainty metric as a quality control tool for DNN-based segmentation of DCE-CMRI datasets. The proposed metric is validated on an external free-breathing DCE-CMRI dataset, and a human-in-the-loop framework is established to improve the segmentation result with the guidance of the proposed QC tool. The study finds that referring the top 10% most uncertain time frames to the human observer for refinement significantly increases the Dice score for these uncertain frames.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, this study proposes a quality-control technique for DNN-based segmentation of DCE-CMRI datasets and demonstrates its effectiveness in improving segmentation results. The study has important implications for improving the efficiency and accuracy of DCE-CMRI analysis, which can ultimately lead to better diagnosis and treatment of ischemic heart disease.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The study has some limitations regarding novelty in both metrics and methodology, as well as efficiency and accuracy.

    1. The metrics used in the study rely heavily on human involvement, which is a time-consuming process.
    2. The methods used in the study are not particularly innovative in terms of efficiency or accuracy.
    3. The study lacks comparison to state-of-the-art methods, making it difficult to determine the method’s relative effectiveness compared to other approaches.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is not available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The study’s performance was not very impressive. For instance, randomly selecting the same number of time frames from each test case for referral to the human reader did not result in a significant change in the Dice score. This finding suggests that the method may not be effective in improving performance.
    2. Please add more discussion of the potential application. The method involves human correction, which is time consuming. Is it effective in real clinical study?
    3. Please compare with other methods.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The innovation of the method and the experimental results.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Most problems have been solved.



Review #2

  • Please describe the contribution of the paper

    The paper proposes and evaluates a novel metric “Quality Control (QC)” to improve the automatic segmentation of dynamic cardiac MR. The QC tool is based on a 2.5D neural network that patch-wisely segments target tissue (myocardium). Then the agreement between patches’ segmentation is investigated as the metric of the acquisition quality for each MR frame. According to the metric, the authors also propose the human-in-the-loop segmentation correction that aims at improving the clinical procedure.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method is novel. The authors evaluate the segmentation agreement for the same pixels between different patches so that the consistency of one semantic can be estimated by changing the receptive field.
    • The clinical application can be valuable. The authors justify the constraint of dynamic MR and design a new procedure (human-in-the-loop segmentation correction) within the obtained metric (QC) to overcome this limit.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The quantitative evaluation can be better presented in the manuscript, e.g. a table listing all the comparative results in Section 3.2 will help the reader better follow the paper.
    • The classification of “Difficulty grade” in Section 3.3 may be subjective.
    • (Minor concern) The loop of patch for the segmentation may be time-consuming. It is better to report the inference time per acquisition.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is in-house and the code has not been publically available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • I would suggest the name “2.5D model” instead of 3D for the DNN since the model considers frames as channels.
    • The patch-wise segmentation may degrade the segmentation quality so that “S” of Equ.1 can be imprecise. I would suggest training a vannila U-Net for frame-wise segmentation as the value of “S”.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method and the application is novel while the method and the evaluation can be improved as noted in Weakness.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have addressed my concerns in the rebuttal. I believe that the paper has been improved but I would like to maintain my recommandation.



Review #3

  • Please describe the contribution of the paper

    A novel space time uncertianty metric as a QC tool for a deep learning segmentation task on a DCE CMRI dataset. The tool is enhanced to provide better segmentation results with a human in loop framework included to retrieve guidance from clincians and improve the DICE score.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of a decision support tool is exactly how DL models should be utilised and with a human in the loop this paper is an interesting and worthy of inclusion.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I feel a digaram to show your pipeline/steps would have been nice to include.

    Furthermore, I do not see all the trainign parameters or how they were identified as optimal?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There is no code available and therefore not as easily reproducible ].

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In the conclusion there is mention of active learning. I believe active learning removes some of the images that are highly uncertain, but depending on the implementation perhaps this is not the case, however an interesting path that I would be keen to see, as it would be good to make a model more gnerisable and confident in correct preditions rather than discard data it could see in real life.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is interesting and the direction and aim of what is trying to be achieved is an important aspect of DL work in the field. However there are some minor drawbacks and improvements that can be made which would provide a more complete analysis as suggessted in above comments.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Happy the authors considered the recommondations for improvements and happy to accept.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose and evaluate a novel space-time uncertainty metric as a quality control tool for DNN-based segmentation of DCE-CMRI datasets. The proposed metric is validated on an external free-breathing DCE-CMRI dataset, and a human-in-the-loop framework is established to improve the segmentation result with the guidance of the proposed QC tool. The study finds that referring the top 10% most uncertain time frames to the human observer for refinement significantly increases the Dice score for these uncertain frames.

    Strengths of the paper:

    • The proposed method improves the efficiency and accuracy of DCE-CMRI analysis, which can ultimately lead to better diagnosis and treatment of ischemic heart disease.
    • Authors proposed a decision support tool, which utilised a human in the loop for validation.
    • The paper proposes and evaluates a novel metric “Quality Control (QC)”
    • The clinical application can be valuable.

    Weaknesses of the paper:

    • The study has some limitations regarding novelty in both metrics and methodology, as well as efficiency and accuracy.
    • The metrics used in the study rely heavily on human involvement, which is a time-consuming process.
    • The study lacks comparison to state-of-the-art methods, making it difficult to determine the method’s relative effectiveness compared to other approaches.
    • The paper would benefit from adding a diagram to show the pipeline/steps proposed.
    • It would be good to compare the proposed method with with other methods.
    • Please add more details of the training parameters, and they hyper-parameter tuning has been performed.
    • The quantitative evaluation can be better presented in the manuscript
    • The classification of “Difficulty grade” in Section 3.3 may be subjective.

    Recommendation: The paper proposed a novel quality control methods based on uncertainty for segmentation of DCE-CMRI. Reviewers believe that are some points that will need to be revised or included for acceptance.




Author Feedback

We are grateful to the reviewers for their insightful remarks and are highly encouraged that the reviewers have found our proposed technique novel (all reviewers) and considered it to be “exactly how DL models should be utilised with a human in the loop” (R3) that “can be valuable in clinical applications” (R2). Responses to major comments:

— R1: “The study’s performance was not very impressive… randomly selecting the same number of time frames from each test case for referral to the human reader did not result in a significant change in the Dice score… the method may not be effective in improving performance.” This is a misinterpretation of the key result in our work, which is likely due to the convoluted way we had described our results. We will fully update the text to rectify this issue. Our contributions are 2 fold: (C1) we propose a novel spatiotemporal quality control (QC) metric for test-time evaluation of free-breathing segmentations in a model-agnostic fashion; (C2) we show the impact of this metric on analysis of external DCE-CMRI datasets in a human-in-the-loop setting. Specifically, in a scenario where only 10% of the dataset can be referred to the human reader, we show that although random selection of cases does not improve performance (p>0.5 for Dice), our QC-guided selection results in a significant increase in Dice (p<0.001). Also, the rate of failed segmentations for our approach improves to 11.3% (after human correction of the QC-selection) which is 3.1% lower than what random selection achieves (14.4% failure). Reducing the failure % is especially important in DCE-CMRI since myocardial blood flow analysis is highly sensitive to failed segmentations, i.e. a 3.1% reduction here is impactful.

— R1: “The metrics used in the study rely heavily on human involvement, which is a time-consuming process.” It is true that segmentation of DCE-CMRI datasets as performed in clinical practice is time consuming. Indeed this is our motivation which aims to reduce the human involvement by an order of magnitude by judiciously selecting a small % of the external dataset to be referred to the reader to increase the efficiency of clinician-A.I. collaboration (by 10 fold if 10% of cases are referred). We will clarify this in the revision.

— R1 suggested comparison with other methods which we agree is important to show. However, we are not aware of an alternative technique to our 1st contribution (C1 above) except for other CMRI applications such as T1 mapping and cine imaging (doi: 10.1109/TBME.2022.3232730). These approaches have only been applied to 2D datasets whereas ours is designed for 2D+time DCE-CMRI and temporal uncertainty localization. For our 2nd contribution (C2), the most relevant work is in brain MRI (doi: 10.1148/ryai.2021200152) and uses random selection of a subset of the external dataset, which our results suggest is inferior for spatiotemporal DCE-CMRI datasets.

— R1 concerns re: innovation: To the best of our knowledge our work is the first using the discrepancy between patch-based segmentations to extract a dynamic QC metric for temporal uncertainty localization. Further, our results describe an impactful application of this QC metric.

— R2 comment re: difficulty grade: Since the guidelines by the leading society (doi: 10.1186/s12968-021-00827-z) do not specify an objective grading system, we devised the “difficulty grade” based on direct clinical feedback (25 cases) from 2 experienced cardiologists. In the revision we will acknowledge that any grading system may induce subjectivity.

— R2 & R3: As suggested, we will improve Fig. 1 (to fully illustrate the pipeline) and will make the code publicly available. Per R2, we will represent our DNN model as 2D+time instead of 3D and add the inference time on a modern workstation. We will provide details of hyperparameters (R3), add a table for quantitative results (R2) & results for training of a vanilla U-Net for frame-wise segmentation (R2).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The original paper was received positively with a number of concerns on metrics, methodology and comparison to state-of-the-art methods. The rebuttal has addressed the reviewers’ major concerns on methodology and metrics, however, there was no comparison to other method. While I agree that previous works does not focus on the spatio-temporal aspect, it would have been interested to have a comparison to techniques for other CMRI applications such as T1 mapping and cine imaging. For these reasons, the recommendation is toward weak reject.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors did a very good job in this rebuttal, which has diminished the major concerns of the reviewers from the original review.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work focused on solving the lack of reliable quality control for the automatic segmentation of dynamic enhanced-MRI. They proposed a novel metrics namely quality control, to achieve this goal and improve the segmentation accracy. I think the methodology and clinical application of this work look both quite interesting for the community. The concerns rasied by the reviewers on performace and comparision experiments have been well-solved. I therefore recommend to accept this interesting work.



back to top