Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Asmaa Aljuhani, Ishya Casukhela, Jany Chan, David Liebner, Raghu Machiraju

Abstract

Advances in digital pathology and deep learning have enabled robust disease classification, better diagnosis, and prognosis. In real-world settings, readily available and inexpensive image-level labels from pathology reports are weak, which seriously degrades the performance of deep learning models. Weak image-level labels do not represent the complexity and heterogeneity of the analyzed WSIs. This work presents an importance-based sampling framework for robust histopathology image analysis, Uncertainty-Aware Sampling Framework (UASF). Our experiments demonstrate the effectiveness of UASF when used to grade a highly heterogeneous subtype of soft tissue sarcomas. Furthermore, our proposed model achieves better accuracy when compared to the baseline models by sampling the most relevant tiles.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_36

SharedIt: https://rdcu.be/cVRrT

Link to the code repository

https://github.com/machiraju-lab/UA-CNN

Link to the dataset(s)

cancer.gov/tcga

https://cancerimagingarchive.net/datascope/cptac/home/?filter=%7B%22Tumor%22:%5B%22SAR%22%5D,%22Image_Available%22:%5B%22Yes%22%5D%7D


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a novel methodology for addressing one of the main challenges when analyzing WSIs in computer-aided diagnosis systems, which is focusing on the right patches to classify.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The methodology seems correct and it’s interesting. The paper is clear and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors should compare their approach to other current state-of-the-art techniques, such as applying the Blue Ratio to extract only the most relevant patches from each WSI.

    Apart from that, the amount of WSIs used is very small, and should be improved.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Currenly. there are many publicly-available datasets with thousands on WSIs, such as TCGA-PRAD and PANDA. I would recommend the authors to increase the amount of WSIs used in other to evaluate their methodology in a most robust and realistic scenario.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although some flaws are present in the paper and some improvements have been detailed, the current state of the paper is good enough. The results may not be as good as they could be (mainly due to the amount of WSIs used), but future work could address that and provide an improvement over current alternatives for detecting relevant areas of the WSI.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper tackles the weakly supervised histology image classification. Authors proposed a two-stage training framework to train a tile-level classifier with whole slide image labels. The main idea is to use an uncertainty-aware CNN (UACNN) trained with noisy labels to sample the most diagnostically relevant tiles for each WSI, and then to train another CNN based on the sampled tiles for better tile classification performance. The motivation is clear, and the technique is sound.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    –Simplify. The models for training are all basic convolutional neural networks. It is easy to implement. –The improvement is significant. The two-stage training strategy delivers significant accuracy improvement for histology image classification to the baseline the authors defined.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    –The novelty may be overstated. Uncertainty is already used for histopathology image segmentation, i.e. [1], –The dataset is small and incomplete. There are in total 85 WSIs used in the experiments. The 85 WSIs are categorized into 3 grades but the cancer-free slides, which are usually important for weakly supervised WSI analysis, are absent. This absence affects a lot to the models trained with noisy labels, i.e., the baseline methods. –Lacks comparison with related works. There are quite a few studies in the domain aim to solve the problem of weakly supervised histopathology image classification, e.g. [2-4]. The goal of these studies is very similar to this paper but none of them is compared.

    [1] Thiagarajan P, Khairnar P, Ghosh S. Explanation and Use of Uncertainty Obtained by Bayesian Neural Network Classifiers for Breast Histopathology Images. IEEE Transactions on Medical Imaging, 2021. [2] Li J, Chen W, Huang X, et al. Hybrid Supervision Learning for Pathology Whole Slide Image Classification[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021: 309-318. [3] Lerousseau M, Vakalopoulou M, Classe M, et al. Weakly supervised multiple instance learning histopathological tumor segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020: 470-479. [4] Lerousseau M, Classe M, Battistella E, et al. Weakly supervised pan-cancer segmentation tool[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021: 248-256.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The histopathology images are obtained from TCGA database. The authors say the code will be made public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    –Evaluate the proposed method with larger dataset and compared it with SOTA weakly supervised tile classification/WSI segmentation methods. –The Resnet18 trained with noisy labels (WSI labels) for tile classification was used as the baseline. It can be regarded as the lower bound of the proposed method. Besides, the Resnet18 trained with tile-level labels, which can be regarded as the upper bound of the weakly supervised method, should also be evaluated and compared in Table 1. The experiments are not difficult to conduct, as for the slides seem to be fine annotated by pathologists.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    –The scale of dataset is small and incomplete for the study. –Lack comparison with important related works.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents an uncertainty-guided sampling approach for efficient training of tissue classification in whole-slide imaging. The proposed method uses weak-labeling for tiles with low uncertainty to improve classification accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Uncertainty estimation with Bayesian neural networks is a powerful tool in medical image analysis, which this paper uses to select highly informative samples from otherwise weakly labeled WSI. The method is straightforward and the experimental results suggest its effectiveness.

    I appreciate the use of statistical tests to show the significance of the results. This should become the standard in medical imaging with deep learning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No major weaknesses, only minor comments (see below).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper uses a publicly available dataset and the methods are described sufficiently well.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The “1” in the second paragraph, Section 1 seems to be a typo.
    • All figure axis labels are very small and can only be read with digital zoom. Try to increase the label size to at least 8 pt.
    • I assume that the prediction probability is referred to as $ max P(y* x, D) $ but could not find this stated in the paper.
    • What do the values in Tab.1 represent? Mean ± std? Please state.
    • I would appreciate some sentences or a formula on the loss functions SCE and OR. This would make the paper more self-contained.
    • Beyond the scope of this paper: I wonder how well the uncertainties are calibrated, i.e., how well they correlate with the predictive error.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper uses a well-known method from variational inference to increase efficiency and accuracy in WSI classification. It does not propose a novel method at its core, but presents a solid empirical evaluation of Bayesian methods with a nice application in digital pathology. I vote for accept.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Well written paper, clear rationale and straightforward approach. Although uncertainty estimation has been used before in histopathology, this paper proposes a sound framework for using it to carry out tile selectin for 2 stage MIL training. Like reviewer #3, I would be interested to know if the uncertainties are calibrated as that would be very useful in practice. I am also curious about the time this takes to run. The main weakness of the paper is that it is only tested on one relatively small dataset and it does not make comparisons with SOTA alternatives.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

Dear Reviewers (R1, R2, R3, and Meta-reviewer),
We truly appreciate your positive and constructive feedback. We will carefully incorporate them in the final version, including fixing typos, adjusting the figures’ axes labels, adding loss function and prediction probability formulas (R3), and an implementation section to report model training and prediction performance (Meta-reviewer). Our plan is to extend our analysis to all 6 subtypes of the TCGA-SARC dataset to include 200 cases. Response to reviewer #1: As recommended by R1, we will compare our uncertainty-aware approach with SOTA. It is important to highlight that the suggested Blue Ratio and similar measures are limited in analyzing diseases given the high variability in tissue structures across different subtypes, and often with cellular compartments with low blue intensity (e.g., fat cells). Furthermore, these measures do not measure the uncertainty we wish to extract from tile prediction.
Response to reviewer #2: Regarding the size of the dataset, TCGA-SARC is relatively the smallest cohort, constituting 1% of all adult cancers. Our work presented a method to handle datasets of small sizes robustly. It reduces the reliance of weakly supervised methods on large amounts of training data. R2 stated the lack of normal controls. We would like to clarify that TCGA-SARC carries only tumor slides, containing non-tumor regions with additional diagnostic grades as weak labels for entire slides. The tile labels are collected annotations of demarcated tumor and normal regions. Therefore, a direct comparison of tumor and grade classifiers would be unreasonable. However, a grade classifier can infer tumor tiles by their low uncertainty measure. Further, we incorporated the CPTAC dataset, an additional sarcoma dataset that provides tumor and normal WSI-labels. Our approach showed promising results with 87% accuracy and F1-scores of .91 for normal and .76 for tumor. The work identified low predictive uncertainties of the three grade classes to represent tumors. The additional validation will be incorporated in the final paper. We also thank R2 for providing additional references. Given the nature of our proposed approach and dataset, we focus on the issue of processing small and weakly labeled datasets to improve classification accuracy. The referred papers [2, 3, 4] proposed work for substantially larger datasets. Paper [1] had a similar sample size and a technique similar to our proposed approach. We will discuss the similarities and differences of this paper in the related work section. Response to reviewer #3: Regarding the correlation between uncertainties and their predictive errors, we found that a quadratic polynomial model fit the data better than a linear model, as the quadratic model had a higher R-squared. We also noticed that grade 2 showed lower uncertainty than the other classes, as shown in Figure 4, suggesting that adding additional samples from the minority classes would improve classification performance. Response to Meta-reviewer: Our initial SOTA comparisons included different loss functions, such as SCE, OR, etc. that were used previously to handle noisy data. However, we acknowledge the value to expand comparisons to other weakly supervised approaches, like Chowder [Courtiol et al., 2018], and to widely studied and weakly labeled data sets, like the Kather et al. dataset.



back to top