Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mahdi Gilany, Paul Wilson, Amoon Jamzad, Fahimeh Fooladgar, Minh Nguyen Nhat To, Brian Wodlinger, Purang Abolmaesumi, Parvin Mousavi

Abstract

MOTIVATION: Detection of prostate cancer during transrectal ultrasound-guided biopsy is challenging. The highly heterogeneous appearance of cancer, presence of ultrasound artefacts, and noise all contribute to these difficulties. Recent advancements in high-frequency ultrasound imaging - micro-ultrasound - have drastically increased the capability of tissue imaging at high resolution. Our aim is to investigate the development of a robust deep learning model specifically for micro-ultrasound-guided prostate cancer biopsy. For the model to be clinically adopted, a key challenge is to design a solution that can confidently identify the cancer, while learning from coarse histopathology measurements of biopsy samples that introduce weak labels. METHODS: We use a dataset of micro-ultrasound images acquired from 194 patients, who underwent prostate biopsy. We train a deep model using a co-teaching paradigm to handle noise in labels, together with an evidential deep learning method for uncertainty estimation. We evaluate the performance of our model using the clinically relevant metric of accuracy vs. confidence. RESULTS: Our model achieves a well-calibrated estimation of predictive uncertainty with area under the curve of 88%. The use of co-teaching and evidential deep learning in combination yields significantly better uncertainty estimation than either alone. We also provide a detailed comparison against state-of-the-art in uncertainty estimation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_40

SharedIt: https://rdcu.be/cVRv5

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This study proposes a learning model for PCa detection using micro-ultrasound that can provide an estimate of its predictive confidence and is robust to weak labels and OOD data. The proposed model uses a co-teaching paradigm to handle noise in labels, together with an evidential deep learning method for uncertainty estimation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is that the faced problem is clinically directly related, and the evaluation experiments also consider the clinical concerns.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of the paper is that the comparison with other studies should be added, especially the final heatmap.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) This paper proposes a micro-ultrasound PCa detection learning model that is robust to weak labels and OOD samples. The weak labels and OOD samples are common problems in medical classification tasks. Therefore, it is better to provide the comparison with the DNN model that are proposed for solving the problems of weak labels and OOD samples. (2) The used clinical evaluation metrics are important parts of this manuscript. Please add some references. (3) About the accuracy and calibration error, please add the standard deviation. (4) In section 3.3, the heatmap can provide good biopsy targets, which is critical for the adoption of precision biopsy targeting using TRUS. This found is important, but maybe it is better to add the comparison of heatmaps by using different models.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is clincally interesting.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors presented the development of a deep learning model with a focus on micro-ultrasound-guided prostate cancer biopsy. This framework was tested leveraging a dataset of micro-ultrasound from 194 patients. The results from approaches assessed (evidential deep learning (EDL) and EDL + co-teaching) show promise in aiding the detection of prostate cancer.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The development of a predictive algorithm to assist in the detection of prostate cancer using micro-ultrasound, without the need of additional MRs, which can be expensive and prone to co-registration error, is a welcoming direction.
    2. The paper is well organized and presented. The “Methodology” section, with the aid of Figure 1, which is well done, is especially easy to follow.
    3. Given the available data, the authors presented a thoughtful and well-designed validation experiment.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. A major component of the study is to compare different uncertainty methods. While this reviewer appreciates the trend and comparisons of different methods presented in Figure 2, the lack of data in tabular form makes it challenging to assess and gauge a full performance profile. In particular, the upper bound of confidence threshold for Evidential approach in Figure 2(a) is interesting in comparison to its counterparts.
    2. There appears to be lack of discussions on the performance difference between EDL and EDL + Co-teaching for sensitivity and specificity, and whether such differences are clinical relevant, especially given the impact of co-teaching is a particular interest of the study.
    3. For results presented in Figure 3, it would provide additional information and discussion if the heatmaps are compared with confidence threshold via quantification such as Jaccard or Dice against a baseline configuration.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility would be a challenge that requires significant effort.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. This reviewer would encourage the authors to minimize the use of acronyms, which would enhance the overall readability of the paper.
    2. Figure 2(c), comparing to its (a) and (b) counterparts, lacks a legend.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper provides an interesting methodology with strong validation design and moderately strong validation results.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a framework for training a classification model to detect prostate cancer from micro-ultrasound images which accounts for limitations in the quality of ground truth–specifically weak and out of distribution labels. Furthermore, the inference component of the framework provides uncertainty estimates for predictions with adjustable levels of confidence.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths are providing methods to adapt to sub-optimal training data, which reflects a challenge with most real-world machine learning tasks in medical imaging; and a means of interpretability by utilizing adjustable confidence thresholds which better integrate with an operator’s decision making.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Performance comparison with conventional ultrasound would be an appropriate gold standard, if data were available in a similar patient population. While performance in terms of confidence and accuracy are compared across model variations a more clinically relevant performance assessment should include actual decision making across a group of operators. For instance, if an operator is given control to adjust the confidence threshold for given images in a test set 1) would they take a biopsy and 2) where would they take the biopsy.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided code and details on parameter selection that may allow others to reproduce performance, however, only if similar data and ground truth information is found elsewhere.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is very clearly written. The motivation for adopting their framework appears well informed by the clinical problem and data available for training. I eagerly look forward to performance comparisons with standard of care TRUS and mpMRI-fusion biopsies.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors in my recommendation were the focus on an excellent clinical problem and appreciation for the advantages of confidence thresholding in order guide operators when using this model in clinical practice. This level of pragmatism baked in to the framework, coupled with the encouraging performance, motivated my high score for this paper.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Clearlt presented work with sound methodology and validation. Agreed with all the reviewers, i also recommend acceptance.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    nr




Author Feedback

Reviewers acknowledged the “clarity of the paper, significance of the clinical problem, level of pragmatism in the methodology, and well-designed experiments”. They asked for more information on comparison with conventional ultrasound and mp-MRI, and with other weak label and out-of-distribution (OOD) methods. They also asked for clarification on clinical assessment metrics, and requested minor edits to details of figures, text, and references which we will address in the camera-ready version.

Comparison with conventional ultrasound and mp-MRI As R3 pointed out, there is no matching data to directly compare the results of this study to conventional ultrasound and/or mp-MRI. However, substantial previous literature and large multi-centre trials (e.g., PROMIS, [Ahmed-2017]) report the low sensitivity of conventional transrectal ultrasound and systematic biopsies at 42-55%. A recent meta-analysis of published studies with micro-ultrasound showed that micro-ultrasound identifies clinically significant cancer that would not have been detected by mp-MRI fusion biopsy or systematic biopsy. It also showed the utility of micro-ultrasound in both biopsy naïve and repeat biopsy patients. In synthesized data from 18 studies and 1125 patients, micro-ultrasound-guided prostate biopsy resulted in comparable detection rates for prostate cancer diagnosis to that of mp-MRI [Sountoulides-2021]. We will add this discussion and references to the manuscript within the page limit.

Comparison of weak labels and OOD methods We compare our approach to other methods for weak labels and OOD, separately. For OOD, we limit ourselves to methods that provide prediction confidence and compare several state-of-the-art approaches in that category, including ensemble methods [Lakshminarayanan-2017], Monte-Carlo dropout [Gal-2016], and evidential deep learning [Sensoy-2018]. For weak label methods, we rely on the findings of [Javadi-2021] showing the success of the co-teaching method, and [To-2021] which found that co-teaching significantly out-performed other methods such as robust loss functions. Therefore, we only implemented this framework. Further comparison with the other weak label methods is a direction that we will consider for future work.

Clinical evaluation metrics We evaluate our model using patch-wise and core-wise accuracies, where core-wise predictions are aggregated from patch-wise predictions within a core. These metrics have been previously reported in the literature [e.g., Javadi-2021]. We will add the references in the camera-ready version.

Prospective outcomes for clinical decision making Although assessing the outcome of actual decision making based on the proposed methodology will be the most important clinical metric, influencing the decision of clinicians based on our early-stage study needs further approval from the medical ethics board. Our model has only been applied to retrospective clinical data. Prospective studies are the subject of immediate future work.



back to top