Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zichen Wang, Mara Pleasure, Haoyue Zhang, Kimberly Flores, Anthony Sisk, William Speier, Corey W. Arnold

Abstract

The diagnosis of prostate cancer is driven by the histopathological appearance of epithelial cells and epithelial tissue architecture. Despite the fact that the appearance of the tumor-associated stroma contributes to diagnostic impressions, its assessment has not been standardized. Given the crucial role of the tumor microenvironment in tumor progression, it is hypothesized that the morphological analysis of stroma could have diagnostic and prognostic value. However, stromal alterations are often subtle and challenging to characterize through light microscopy alone. Emerging evidence suggests that computerized algorithms can be used to identify and characterize these changes. This paper presents a deep-learning approach to identify and characterize tumor-associated stroma in multi-modal prostate histopathology slides. The model achieved an average testing AUROC of 86.53% on a large curated dataset with over 1.1 million stroma patches. Our experimental results indicate that stromal alterations are detectable in the presence of prostate cancer and highlight the potential for tumor-associated stroma to serve as a diagnostic biomarker in prostate cancer. Furthermore, our research offers a promising computational framework for in-depth exploration of the field effect and tumor progression in prostate cancer.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_62

SharedIt: https://rdcu.be/dnwKi

Link to the code repository

https://github.com/zcwang0702/DeepFieldEffect_StromaNet

Link to the dataset(s)

Please refer to the citation papers [22] and [23] for detailed dataset information


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an approach to identify tumor associated stromal patches using a convolutional neural network for feature extraction followed by a graph attention network taking into account patch spatial relationships to predict the correct label (tumor associated stroma vs. normal stroma) for each patch. In order to account for label noise and domain shifts due to the heterogeneity of their data sources the authors propose also to use label consistency among patches having similar features and an adversarial loss that pushes for reducing the ability of the feature extraction network to discriminate between patches from different data sources.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and mostly clear. The method is scientifically sound and the choices of the authors are justified from an intuitive standpoint (eg. the method is design to address challenges of this particular application) and a theoretical standpoint (eg. the author choices find confirmation in previous literature).

    • The authors performed an ablation study, conducting experiments that progressively show the contribution of each module.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is not immediately clear that the system is trained end-to-end, and that the feature extraction network is trained at the same time as the GAT and the data source discriminator. It is important to stress that.

    • Unless the authors are planning on publishing the dataset they used for this study, the creation and curation of the dataset should not be listed among the contributions of this work

    • The dataset used to train the stroma segmentation (using 513 20x patches, manually annotated), is pretty small. It seems that the segmentation approach is performing well in 5-fold cross validation, but does the performance vary on biopsies vs. resections? I would expect a large performance variation given the need - in the proposed classification algorithm - for an adversarial loss to mitigate domain shift between biopsies and resections.

    • The spatial clustering technique used in the paper is not super clear. A better explanation or a drawing would probably help. (minor concern)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I believe the presentation of the paper is clear and will allow reproducibility. Publishing the data/code would also help. Some more information about the MLP used for the adversarial part of the method should also be included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Clarification of some aspects of the paper will help (see weaknesses). I do not have any specific suggestion. In the future, you can probably evaluate the true impact of this approach within a more pathology-oriented work (eg. journal submission) where you could explore the clinical significance of being able to identify tumor associated stroma

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is sound, the results look strong and the ablation study is showing the advantages brought by each module in the system. The paper is well written and apart from some moderate weaknesses there are no major concerns from my side.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    A novel deep-learning approach to identify and characterize tumor associated stroma in multi-modal prostate histopathology slides has been presented in the paper. The way the problem is formulated seems to be novel and interesting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel problem formulation. Construction of graphs and the use of them is interesting. The training mechanism aids the capture of intended information. The idea to handle noisy data is interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Due to the novel problem formulation, comparison with SOTA is lacking. The core deep learning modules that are used does not seem to have novel blocks.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper gives all the details. However, the dataset may not be available to reproducing the result.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    For sec 2.2, 2.3 and 2.4, clearly highlight the novelty. The motivation for using graph based approach is not articulated well. This has to be addressed.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach has novelty and has some merit for acceptance.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I am not changing my decision. Will stick with weak accept.



Review #3

  • Please describe the contribution of the paper

    The paper investigates the characteristics of tumor-associated (reactive) stroma in prostate cancer. Since no differentiation to normal stroma is performed to date by experts, the authors used proximity to the tumor region for the assessment. Using these weak labels, the authors apply a classification approach on tumor patches to show that these patches can be discriminated using deep networks. Further, they employ graph-based networks to model the tumor microenvironment and report an improved performance using this. They additionally employ techniques such as domain-adversarial training and a regularizer to account for label noise with success.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    — I like the overall approach of using weak labels created by proximity overall. I think the general idea of trying to derive more patterns than what’s known to date to pathologists is very valuable. — I also think that a graph-based approach is sensible since it allows to incorporate the tumor microenvironment into the learning approach more directly. — While the domain-adversarial training is a particularity of the dataset setup used in this work, I think it is a smart addition to the pipeline. — The approach is sound and an ablation study gives further insights into the subparts of it. — The introduction given by the authors is concise and motivates the paper well. — The authors run their pipeline multiple times and report mean/std. — The approach was carried out on a dataset of large size.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    About the data: The paper gives us almost no detailed insights into the data, in particular the patient cohort or the distribution of pathologies therein. For Dataset A, not even the number of cases is given. Dataset B mentions the (not really important) size of the whole mounts, but does not give us any idea about the area of the tumors within the dataset, which would be much more relevant, and would typically only be a small fraction of the prostate tissue. The authors also state that „some“ non-tumor regions were also annotated, but give no further details. For Dataset C, the authors do not state any specifics beside the number of WSIs at all. All in all I think that the dataset is described very sloppy and strongly impacts reproducibility. The authors did not state where the samples were taken from, or how they were processed. More severely, they did not report about IRB approval.

    About the method: The authors state that the model derived from Dataset A was evaluated to have a mean dice score of 95 percent and was then used on Dataset B and C. While we don’t learn from the paper where Dataset C and B come from or what other specifics they might have that could lead to a domain shift, we can at least infer that Dataset B and Dataset A are using different magnification levels. This typically breaks generalization, and I think that a evaluation on Dataset A will not give any trustworthy implications on Dataset B, unless otherwise validated.

    I am also worried about a potential data bleed in the dataset: Since the whole approach deeply depends on a superb segmentation performance, I think that numeric assessment of the segmentation approach on the target domain (dataset B and C) is a must. If the segmentation performance breaks apart on the dataset B and C, there is a high likelihood of a data bleed from the non-stroma tissue into the subsequent classification task. The non-stroma tissue, in particular tumor tissue, would be easy to classify, especially against the non-tumor samples from Dataset C, and thus simplify the detection task significantly. Hence, the whole evaluation would fall apart in this case.

    While the authors claim the curation of a novel dataset within the third point of their main contributions, the dataset seems to be not available publicly.

    The proximity threshold of the stroma being attributed to tumor-associated stroma is somewhat arbitrary and there was no rationale given on how this value was determined.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are a multitude of issues with the reproducibility of this paper, in particular a strong disagreement between the self-reported reproducibility response and the paper. — While the authors state in the reproducibility response that they provided all relevant statistics, I strongly disagree with this. They did neither provide the number of cases included in dataset A, nor where the samples where taken from or just any information about the study cohort. — While the authors state that they provide a complete description of the data collection process, this is entirely missing and thus just a false statement. — While the authors state that IRB approval was required for the data acquisition, they do not give any details if it was obtained in the paper. — The authors claim that code and models are available, but this is mentioned nowhere in the paper and thus I must assume that they do not aim to release it. — The authors claim that they release the dataset alongside the paper („Dataset or link to the dataset needed to run the code. - Yes“), but there is no mention of that in the paper.

    Overall, I think the reproducibility of the paper is thus poor.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    While the paper (especially Figure 1) implies a three-dimensional fusion of data, this is not what’s happening. I don’t think that the three-dimensional structure of the biopsies or the whole-mounts is regarded at all in this approach, at least the authors give no details on how registration was performed, etc.

    Could the authors please detail to what degree the performance differs between tiles from Datasets B and C? One would suspect at domain shift caused by various causes, in particular different fixation and other processing for biopsies compared to whole mounts. If the performance on Dataset C is considerably superior than on Dataset B tiles, we could suspect that the classification learnt a domain difference.

    The authors state that the rationale for using biopsies is that these provide „complementary multi-modal information“. While I agree in general, I’m wondering if the main point of including these slides here is to have true negatives? Effectively, no total prostatectomy will be performed for on negative cases, and this is also why the authors chose to take only the negative samples from the biopsy-cohort (which I think is wise). Maybe the authors could highlight this very practical reason of integrating biopsied tissue and sections.

    While the authors used the PointRend approach to achieve masks for “more precise downstream tasks“, it is unclear if these masks introduce a bias into the labeling process that was not catched by the pathologists. It has been shown multiple times that computer-aided labeling workflows are subject to a confirmation bias. This bias might be in the direction of better predictable masks, which might, however, not necessarily be more correct masks.

    The authors give no insight into if the adversarial training led to a domain-invariant representation. Domain-adversarial training like each adversarial training does not necessarily converge to the nash equillibrium.

    Minor comments: I had a hard time understanding many parts of the paper, especially Figure 2. It is not clear from the Figure how the spatial information is fed to the network. The ResNet itself does not provide any means for forwarding meta information, so there must be a different path, which is, however, not disclosed in the Figure.

    The placement of the figures is a bit non-ideal. They appear a bit too early to understand them in my opinion.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper, while presenting a sound idea, does not perform a sufficiently detailed evaluation of all components the (rather complex method) consists of. Further, I think that the reproducibility of the findings is low.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I agree with most points the authors make in their rebuttal. The non-availability of the datasets is still a big issue for reproducibility, but I still think it is valuable to see this paper published and presented at the conference.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work proposes an approach to identify tumor associated stromal patches using a convolutional neural network for feature extraction followed by a graph attention network that taking into account the patch spatial relationships to predict the correct label (tumor associated stroma vs. normal stroma) for each patch. The paper is, in general, well and clearly written. The method is scientifically sound and the design choices are well justified. The authors propose a novel problem formulation and suggest interesting combination of the techniques to tackle the problem. However, there are a few critical aspects that the authors should improve for better readability, transparency, and clarity of the paper. The authors propose a new formulation of the problem. But, the methodology itself lacks the novelty. The authors may rephrase the main text to further strengthen the claims on the main contributions. Moreover, there is a concern on the quality of the segmentation and its impact on the results. The authors may provide further explanation/insights into this. The training procedure, clustering procedure, and model execution can be further improved for transparency and reproducibility of the work. Description of the data, in particular, should be improved as well.




Author Feedback

We appreciate reviewers’ valuable comments. If the paper is accepted, we will address the revisions outlined below.

Reviewers #1 and #2

  1. Model training and spatial clustering: All modules were optimized simultaneously in an end-to-end fashion, we will emphasize it in section 2.4. Our approach utilizes spatial patch graphs and graph networks, without relying on spatial clustering techniques. To prevent any confusion, we will remove “cluster” from the term “spatial patch cluster graph”.

  2. Data availability Dataset A is publicly available, while Datasets B and C are currently private. We will provide public links for code and Dataset A. We will revise the third contribution to: “We developed a comprehensive pipeline for constructing tumor-associated stroma datasets across multiple data sources”. Please refer to point 5 below for data description.

  3. Motivation for graph model It is valuable to analyze stroma regions using graph-based approaches given the spatial nature of cancer field effect and tumor microenvironment; we will emphasize this in our contribution summary.

  4. Segmentation quality Please see point 6 below.

Reviewer #3

  1. Detailed data description This work is IRB approved, which will be noted in the revision. Dataset A consists of 513 tiles from 40 patients’ whole mount slides. It combines two sets of tiles: 224 images from 20 patients with stroma, normal glands, low-grade and high-grade cancer, and 289 images from 20 patients with dense high-grade cancer (Gleason grades 4 and 5) and cribriform/non-cribriform glands. In Dataset B, average tumor area proportion of prostate tissue is 9%, and average tumor area is 77 square mm. Dataset C comprises 6,134 negative H&E biopsy slides from 262 patients, where all biopsy samples were negative. Please refer to point 2 for data availability.

  2. Segmentation quality and impact We trained our stroma segmentation model with extensive data augmentation, including image scaling and staining perturbation. The model was trained on whole mount slide Dataset A, with the same modality, preprocessing steps, and similar Gleason distribution as Dataset B. With a testing Dice score of 95% on Dataset A, our model ensures reliable performance on Dataset B. Furthermore, the model’s robustness was validated by achieving a 92.7% testing Dice score for stroma segmentation on a public prostate biopsy dataset [1], ensuring accurate segmentation on Dataset C. To precisely isolate stroma tissues and avoid data bleeding from epithelial tissues, we only extracted patches where over 99.5% of regions were identified as stroma at 40X magnification. [1] Salvi et al., “A Hybrid Deep Learning Approach for Gland Segmentation in Prostate Histopathological Images.”

  3. Rationale for the proximity threshold It is possible that tissue outside of a cancer foci, but directly next to it, could be tumor-associated. Our approach uses a 5mm tumor distance threshold to minimize the likelihood of labeling tumor-associated stroma as normal. The 5mm threshold was chosen based on the fact that prostate cancers provoke minimal inflammatory response, especially in lower grade (Gleason group < 3, which we used), that would not extend beyond this distance. Thus, by sampling patches beyond 5mm, which is relatively large given the limited tumor foci area (see point 5), we can minimize potential label noise.

  4. Figures We did not perform registration between biopsy and whole mount. Figure 1 implies data integration without three-dimensional fusion, as we used separate patch graphs for each slide. We will modify Figure 1 to avoid confusion. In Figure 2, the ResNet only extracts patch features without utilizing spatial information. Instead, we incorporate and propagate spatial information through patch graph construction and graph attention networks.

  5. Labeling process The pathologist annotated tumor foci without seeing stroma masks. Thus, there were no computer-aided labeling workflows or confirmation bias involved.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose to utilize CNN and graph attention network for analyzing tumor associated stromal patches with focus on the patch spatial relationships to conduct patch classification. The authors address the major concerns of the reviewers such as description of data, segmentation quality and impact, processing procedure. However, the paper can be further improved by clearly providing the motivation and contribution of the work.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has clarified most of the reviewers’ comments, particularly about the dataset, experimental results and design rationale. Reviewers all lean towards acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The non-availability of the datasets is still a big issue for reproducibility, but the reviewers agree that this is valuable paper to be presented at the conference.



back to top