Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Puria Azadi, Jonathan Suderman, Ramin Nakhli, Katherine Rich, Maryam Asadi, Sonia Kung, Htoo Oo, Mira Keyes, Hossein Farahani, Calum MacAulay, Larry Goldenberg, Peter Black, Ali Bashashati

Abstract

The utility of machine learning models in histopathology image analysis for disease diagnosis has been extensively studied. However, efforts to stratify patient risk are relatively under-explored. While most current techniques utilize small fields of view (so-called local features) to link histopathology images to patient outcome, in this work we investigate the combination of global (i.e., contextual) and local features in a graph-based neural network for patient risk stratification. The proposed network not only combines both fine and coarse histological patterns but also utilizes their interactions for improved risk stratification. We compared the performance of our proposed model against the state-of-the-art (SOTA) techniques in histopathology risk stratification in two cancer datasets. Our results suggest that the proposed model is capable of stratifying patients into statistically significant risk groups (p < 0.01 across the two datasets) with clinical utility while competing models fail to achieve a statistical significance endpoint (p = 0.148-0.494).

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_74

SharedIt: https://rdcu.be/dnwKx

Link to the code repository

https://github.com/pazadimo/ALL-IN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a method to predict survival using information from histopathology images leveraging local and global associations. Proposed method uses Graph networks to leverage inter patch relationships, and then perform a fine to coarse distillation to use multiple scales to make a final survival prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Author presents a novel way to use GNN and extract information at multiple scales using graph cuts and orthogonal loss regularizers. They also perform local to coarse distiallation using three attention mechanisms. They evaluate their method on two datasets and also present an ablation study

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Determination of K - number of super nodes is not clear. Also, it’s not clear how this would impact the performance.
    2. It looks like super-nodes are always bounded spatially because of the initial adjacency matrix being built using spatial neighborhood. Would ‘K’ vary without spatial constraints or would the performance is better with more recent adjacency bases that use feature correlation.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility might be slightly challenging without knowing all parameters (ex: number of super nodes) but this method is reproducible in a reasonable manner.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. More details/analyses on how adjacency was chosen would help.
    2. Empirical parameters, if any, needs to be explicitly specified.
    3. Visualization of super nodes with an actual image would be a good way to convey the workflow
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Authors present valuable work with novelty. Few experimental details/clarifications required.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The response does not clarify all the questions raised in the reviews.



Review #2

  • Please describe the contribution of the paper

    In this paper the authors propose a hierarchical GNN formulation, with both local and global interactions, for survival prediction. The authors use several techniques to generate features for the global nodes, then propose three attention-based mechanisms for integrating this information to perform patient stratification into high-risk/low-risk groups or survival prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Prognostic biomarker discovery is important and potentially clinically relevant.
    • The authors compare three attention policies for combining local and global information, and provide intuition for why one is superior to another.
    • The authors use three datasets from three different sites of origin. One lacks survival data, but likely improves the learned representations for the nodes. This makes their result mode likely to generalize.
    • The ablation studies help understand which parts of their formulation contribute to the model performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • This contribution is critically limited by any explanation around train-test splitting or how the datasets were used. As-is, it reads like the results for model fitting, but provides little information around whether this model would perform on (even in-distribution) held-out test data. Without this, it is unclear if the result is simply due to added “flexibility” in their model. Additionally, they fit the model to their two survival datasets separately (as far as I can tell).
    • Comparison of p-values significance/non-significance is not a good method for determining superiority of one method to another because it depends on statistical power. A more correct method would be comparing the c-indexes or hazard ratios or odds ratios between the methods and showing that one hazard ratio or odds ratio is statistically significantly different from the other.
    • Method for computing CIs provided is not clear. Bootstrapping? Within their proposed methods, the CIs clearly overlap the central values of their other methods, which indicates that we should not presume one method superior to another among the three they propose.
    • Prostate cancer has a clinical grading scheme (Gleason) which is prognostic based on visual morphology. It is unclear whether their method simply recapitulates this scoring. Additional exploration of the biology is needed.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Sufficiently reproducible if code is provided. Paper likely does not provide sufficient detail on model architecture/training to replicate otherwise.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see “weaknesses”. If the datasets were combined for training and analyzed on a held-out test cohort (or cross-validated) this would solve the most prominent concern about the ML. For biological interpretation or evidence of the utility of the proposed biomarker, it would help to include e.g. Gleason grade, age, cancer stage as a covariate in their survival models.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evidence in the paper is insufficient to support the author’s claims.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The authors have done a good job of updating information in the paper to clear up how data splitting was performed, which was the most prominent issue with the original manuscript. I still have concerns regarding the statistical validity of their c-index comparison. Specifically, the authors compare over 5 seeds for each of 5 splits. This experimental structure isn’t amenable to using a t-test to compare the groups since the observations are non-independent. Ideally a paired t-test (and its underlying assumptions) should be applied to compare one model to another on the 5 held-out splits.

    I am also concerned about the interpretation that the model identifies a novel prognostic biomarker absent the inclusion of clinical covariates, even though the patients meet some criteria for low-risk.

    I think these are moderate weaknesses, so I’ve moved my assessment to reflect the added information.



Review #3

  • Please describe the contribution of the paper

    This paper propose a graph-based neural network to combine the global and local features for prostate cancer risk stratification and survival prediction. In addition, it also explores 3 different fine-course feature combination strategies and compare the performance with SOTA approaches on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Main strengths: 1. Use graph-based network to better incorporate local patch information and interactions compare to the multiple instance learning 2. Explore the super-nodes to extract global contextual information across the WSI 3. Evaluate 3 different fine-coarse feature distillation strategies to combine the local and global features 4. Investigate the usability of the developed model on two datasets and show promising clinical utility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Main weakness: 1. No discussion of why Mixed Guided Attention feature distillation method produces the best results. 2. No description of the training and testing data splitting information provided No statistical significance evaluation applied to c-index measurements.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper propose a graph-based neural network to combine the global and local features for prostate cancer risk stratification and survival prediction and show promising clinical utility. However, I do have the following questions or comments: 1. It is not clear how many patients were included in the training and testing or is it cross validation? 2. It will be more clear if authors can provide statistical significance of the difference between your proposed method and SOTA methods in terms of the c-index listed in the Table 1 . It seems DGC has large variance and you best method may not significantly better than it. 3. It is not clear how do you set the cut-off for KM curve analysis. Please also plot the patient number of high and low risk group below the KM curve at different time points. 4. In section 3.4, the Sn should of dimension K x d not M x d 5. In section 3.5, the notation is very confusing. The W_share were not shown in the equation. Please also list the dimensions of W and h_hat. And describe more clearly how MCA method enable the weights sharing across local and global features. X and S are of different dimension. Another question is how do you determine the number of super nodes? How will different number of super nodes affection the final results?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is clearly organized. It is a novel idea of integrating global and local patch information using graph-based model for prostate cancer survival analysis. The application is of clinical significance.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Summary of the Key Strengths and Weaknesses of the Paper:

    Key Strengths:

    • The paper presents a novel approach that utilizes Graph Neural Networks (GNN) to extract information at multiple scales by employing graph cuts and orthogonal loss regularizers.
    • Prognostic biomarker discovery is highlighted as an important and potentially clinically relevant aspect of the study.
    • The authors compare three attention policies for combining local and global information, providing intuitive explanations for the superiority of one over the others.
    • Two datasets from three different sites of origin are used, even though one lacks survival data. This choice likely improves the learned representations and enhances the generalizability of the results.
    • Ablation studies are conducted to understand the contributions of different components in the model formulation.
    • The utilization of a graph-based network enables better incorporation of local patch information and interactions compared to multiple instance learning approaches.
    • The exploration of super-nodes allows for the extraction of global contextual information from Whole Slide Images (WSI).
    • Three different fine-coarse feature distillation strategies are employed to effectively combine local and global features.
    • The study investigates the usability of the developed model on two datasets and demonstrates promising clinical utility.

    Areas of weakness in the paper that require further attention:

    • The paper lacks clarity in determining the number of super-nodes (K) and how this choice impacts the performance of the model.
    • The method for computing confidence intervals (CIs) is not clearly explained. It is unclear if bootstrapping or any other specific method was used. Additionally, the CIs provided overlap with the central values of other methods, suggesting that no method can be presumed superior based on the proposed CIs.
    • The paper does not sufficiently explore whether the proposed method simply recapitulates the prognostic scoring based on the clinical grading scheme (Gleason) for Prostate cancer. Further investigation into the underlying biology is necessary.

    Recommendations to enhance the study:

    • Provide more detailed analyses on how the adjacency matrix was chosen in the model.
    • Explicitly specify any empirical parameters used in the model.
    • Include visualizations of super-nodes using actual images to improve the understanding of the workflow.
    • Consider combining the datasets for training and analyzing them on a held-out test cohort or performing cross-validation to address concerns about the model’s performance and provide evidence of the utility of the proposed biomarker. Including covariates such as Gleason grade, age, and cancer stage in the survival models would further support biological interpretation.
    • In Section 3.4, ensure that Sn is correctly defined with the appropriate dimensions (K x d).
    • In Section 3.5, clarify the notation by explicitly showing W_share in the equation, listing the dimensions of W and h_hat, and providing a clearer explanation of how the MCA method enables weight sharing across local and global features when X and S have different dimensions.
    • Address the question of how the authors determine the number of super-nodes and investigate how different numbers of super-nodes affect the final results.

    Key points the authors should focus on in their rebuttal responses:

    • Address the insufficiency of evidence in the paper to support the claims made.
    • Provide a clear explanation of the train-test splitting process and how the datasets were used.
    • Include statistical significance measures comparing the proposed method with state-of-the-art methods in terms of the c-index listed in Table 1, particularly considering the observed large variance in DGC and the potential lack of significant improvement over it.
    • Discuss the spatial constraints on super-nodes resulting from the initial adjacency matrix and investigate the variation of ‘K’ without spatial constraints. Additionally, explore the potential performance improvements with more recent adjacency bases that incorporate feature correlation.
    • Address concerns regarding the lack of information on model performance on held-out test data and the separate fitting of the model to the two survival datasets.
    • Emphasise the importance of using appropriate methods (e.g., comparing hazard ratios or odds ratios) instead of relying solely on p-values to determine the superiority of one method over another.
    • Provide an explanation for why the Mixed Guided Attention feature distillation method produces the best results.
    • Include a description of the training and testing data splitting information.
    • Clarify how the cut-off for KM curve analysis is set and consider plotting the patient number of high and low-risk groups below the KM curve at different time points for improved visualisation.




Author Feedback

We thank the reviewers for the positive reception of our work. Please find below our response:

Dataset splitting, training details (R2,R3,MR) We split the data into 5 folds to train the models with patient-wise cross-validation (each run with 4 folds as the train set and 1 held-out fold as the test set). Unlike previous works, we repeated training for models with 5 seeds to account for initialization variability. In total, for each dataset, we trained and tested each model 25 times to eliminate any potential initialization or data bias. We will clarify this in the final version.

Combining datasets (R2,MR) We introduced two datasets related to prostate cancer but with different clinical endpoints and clinical needs (PCa-AS, PCa-BT). PCa-AS includes prostate active surveillance (AS) patients who were monitored without undergoing treatments and their endpoint was cancer progression. However, PCa-BT includes patients who underwent brachytherapy (i.e., radiation therapy), but without guaranteed effectiveness. The endpoint for this set is cancer recurrence. As such, given the differing clinical questions and endpoints, these datasets cannot be combined.

Relation to Gleason groups (R2) We appreciate this very insightful comment. We should highlight that all the patients in our cohort are low-risk prostate cancer patients (based on NCCN guidelines which incorporate Gleason score and other clinical factors). Even though these patients are low-risk based on the current guidelines, a subset of them experience disease progression and our goal was to explore histopathology images with the hope of identifying biomarkers that would aid us in finding at-risk patients. To avoid confusion, we will add this explanation in the revised manuscript.

Comparison of methods’ C-index (CI) (R2,R3,MR) As suggested by the reviewers, we conducted statistical tests (t-test) comparing CIs of our model with baselines, which we will include in the revised paper. Briefly, our results showed that our model’s CI values are statistically better than all baselines in PCa-AS and also superior to all models, except DGC, in PCa-BT. Thus, the CIs (based on “Harrell’s estimator” in scikit-survival (R2)) and the suggested tests confirm the superiority of our model in hazard prediction; however, the clinical utility (i.e., actionability of findings) lies in the patient stratification scenario where high- and low-risk patients may need to be treated differently. We performed LogRank tests to evaluate the models’ ability to separate patients into underlying risk-groups which proved our model achieved statistical significance in stratifying patients for both sets (while all baselines failed in this scenario).

Cut-off for KM curve (R3,MR) For PCa-BT, we set the cut-off as the ratio of patients with recurrence within 3 years of therapy initiation (based on the related medical studies, e.g., PMID: 25089248) and used the ratio of progressed cases for PCa-AS.

Super-nodes (SN) relation to spatial domain and number of SNs (R1,R3,MR) MLP layers (eq 3) are utilized for SN assignment independent of nodes’ positions and edges. This enables the assignment of nodes to the same SN, based on correlation of their features, even if they are not spatially close or directly connected. This distinguishes our minCUT layer from some other coarsening methods like ASAP_Pooling that are limited only to the local clusters. Also, R_minCut increases the probability of assigning neighboring patches to the same SN as closer patches have a higher likelihood of similarity. This ensures SNs capture both spatial and feature correlations. Intuitively, the number of super-nodes “K” should not be very large or small, as the former encourages SNs to only represent local clusters and the latter leads to larger clusters and loses subtle details. We trained the model on 100 SNs as there can be thousands of patches in each slide. We will make the splits as well as our code and the utilized parameters available.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The responses of the authors during the rebuttal process is not convincing enough. We hope that the constructive remarks will help you to improve the work for any future submission.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has addressed some of the reviewers’ comments. One question regarding the c-index still remains. Overall, this is a borderline paper with a good method but needs more clarification about the experimental setup.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a graph-based method to incorporate local and global features in WSI for prostate cancer prognosis. This paper received mix reviews (two positive and one negative) and the negative review mainly focus on the statistical test and interpretation, which can be regarded as the limitation of this work. Although the above limitation, the paper and even the rebuttal information would provided valuable information for the computational pathology and I thus vote for accepting this paper. The authors are suggested to improve this paper and discuss these limitations in the final version.



back to top