List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Wentai Hou, Yan He, Bingjian Yao, Lequan Yu, Rongshan Yu, Feng Gao, Liansheng Wang
Abstract
Cancer survival prediction requires considering not only the biological morphology but also the contextual interactions of tumor and surrounding tissues. The major limitation of previous learning frameworks for whole slide image (WSI) based survival prediction is that the contextual interactions of pathological components (e.g., tumor, stroma, lymphocyte, etc.) lack sufficient representation and quantification. In this paper, we proposed a multi-scope analysis driven Hierarchical Graph Transformer (HGT) to overcome this limitation. Specifically, we first utilize a multi-scope analysis strategy, which leverages an in-slide superpixel and a cross-slide clustering, to mine the spatial and semantic priors of WSIs. Furthermore, based on the extracted spatial prior, a hierarchical graph convolutional network is proposed to progressively learn the topological features of the variant microenvironments ranging from patch-level to tissue-level. In addition, guided by the identified semantic prior, tissue-level features are further aggregated to represent the meaningful pathological components, whose contextual interactions are established and quantified by the designed Transformer-based prediction head. We evaluated the proposed framework on our collected Colorectal Cancer (CRC) cohort and two public cancer cohorts from the TCGA project, i.e., Liver Hepatocellular Carcinoma (LIHC) and Kidney Clear Cell Carcinoma (KIRC). Experimental results demonstrate that our proposed method yields superior performance and richer interpretability compared to the state-of-the-art approaches.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43987-2_72
SharedIt: https://rdcu.be/dnwKv
Link to the code repository
https://github.com/Baeksweety/superpixel_transformer
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes a system for survival prediction based on whole slide images. It does so by adding to levels of hierarchy i.e., patch and tissue level in a graph convolutional setting. The tissue level features are then assigned to a set ‘pathological’ components which were previously learned by the K means algorithm. Finally, this representation is used as input in a transformer architecture that estimates the risk for the whole slide.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The work claims better results commonly used techniques for analysis of WSI such as MIL based methods. The hierarchical architecture is interesting and allows to use multiscale information from patches to bigger regions (tissues).
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Some relevant baselines are missing such as HIPT (https://openaccess.thecvf.com/content/CVPR2022/html/Chen_Scaling_Vision_Transformers_to_Gigapixel_Images_via_Hierarchical_Self-Supervised_Learning_CVPR_2022_paper.html) which also add multi scale information in a hierarchical fashion (without using GCNs).
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors provide some descriptions on the hyper paramters used and the public datasets. However, some details on the specific GCN and transformer are missing. For example what is th non linearity used in the GCN layer? what is the number of ehads in the transoformer? The authors also provide a repo with code, which is interesting; however, when looking at it in detail, it is currently in an unusable way because there are no details on how to run the code, some scripts seem to be missing e.g. in the run.sh the scripts interpretable_transformer.py and graph_transformer.py are mentioned but are not available in the repo.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
The paper in general is well written and it is easy to follow. I think some details need to be added such as the details of the GCN and transformer. Some details are missing also on the interpretability, for example, fig 3.d (contextual interaction), does this correspond to the self-attention? if so, which head is being analyzed? what about the other heads? Finally, an interesting baseline to add is the HIPT work which follows a similar idea e.g. use transformers in a hierarchical fashion.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is well written and presents an interesting methodology to use multiscale information. There are some concerns that need to be addressed (see comments above), and some concerns regarding the code repositors. In its current way, it is just not usable, and that along with some details missing, make the work in its current shape not fully reproducible.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This paper proposes a novel learning framework, i.e., multi-scope analysis driven HGT, to represent and capture the contextual interaction of pathological components. Extensive comparison with prior arts are performed, and demonstrate that the proposed method outperforms SOAT on Cancer Survival Prediction. Ablation/tuning experiments are reported with limited conclusion. The code is to be released by the authors.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The result is state of the art compare to the competing methods.
- The code will be available, which will help greatly with reproducibility.
- Interpretability of the proposed framework is given.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Motivation - The authors mention that complex pathological patterns (e.g., tumor lymphocyte infiltration, immune cell composition, etc.) leads to difficulty in Cancer Survival Prediction. However, further insights are required to appreciate what spatial patterns are generally observed for the cancer prognosis by providing examples.
- Originality: The methodology is not novel. The authors claim that they leverage an in-slide superpixel and a cross-slide clustering, to mine the spatial and semantic priors of WSIs. However, it seems to only simply use the original Graph Convolutional Network and Transformer Architecture. There are many methods to process multi-level biological entities, but this paper lacks comprehensive comparison with them.
- Experiments-The compared methods are too old (most of them were published in or before 2021), and lack the comparison with multi-level processing methods. The analysis of ablation/tuning experiments are weak. It’s unclear how much gain is actually from the novel designs, or increased number of parameters.
- Clinical significance-No insights of how the method can provide a better understanding of the patch-level, tissue-level, and their relationships is given.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors are providing code so this should be reproducible.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The motivation needs to be clarified. Please introduce and compare with papers published in latest 2-3years, especially those multi-level or multi-scope processing methods.
- More experiments should be conducted to compare with latest multi-level processing methods and validate the effectiveness of designed modules.
- Since the platform for this study relies on representing and capturing the contextual interaction of pathological components, visual evidence needs to be provided. For example, what spatial relationships and patterns do pathologists observe and how they look like?
- The authors haven’t discussed limitations and failure modes of their study.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the result is state-of-the art, I find this study lacks methodology novelities and leave to many holes for understanding why the proposed design actually work and not because of some other variability in experimentation. Therefore, I recommend weak reject.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
This study developed a multi-scope analysis driven Hierarchical Graph Transformer (HGT) to capture not only biological morphology but also contextual interactions in Whole Slide Images (WSI) for prediction of cancer survival. The framework demonstrated superior performance compared with other state-of-the-art approaches on one private and two public datasets
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well written and presented . Previous studies and the associated limitations were comprehensively discussed; study objectives were clearly laid out; methods and the interpretation of the results were also clearly stated. The proposed framework is innovative. While the individual algorithms were previously used, the framework to combine the algorithms and apply on the WSI to capture multi-scope features is novel. The proposed framework has high clinical impact given that the model is able to predict cancer survival in a pan-cancer fashion.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
While evaluated on multiple cancer types, the model was only validated in cross-validation fashion with no independent validation set The rationale for selecting some critical algorithm hyperparameters is not provided, e.g. number of superpixels, patch size and number of pathological components. The details for the model training process such as learning rate, epoch number is also lacking.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The code was already made publicly available, indicating a potential high reproducibility.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Instead of using a constant superpixel number, would it not make more sense to set the superpixel number as a relative value to tissue area?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The novelty of the paper is relatively high and the presentation of the paper is pretty clear and comprehensible by readers
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Summary of the Key Strengths and Weaknesses of the Paper:
Key Strengths:
- The paper is well-written and presented.
- The proposed hierarchical architecture allows for the use of multi-scale information, resulting in better results compared to commonly used techniques for the analysis of Whole Slide Images (WSI).
- The results achieved by the proposed method are state-of-the-art compared to competing methods.
- The availability of code enhances reproducibility.
- The interpretability of the proposed framework is addressed.
- Previous studies and their limitations are comprehensively discussed.
- The study objectives are clearly defined.
- The methods and interpretation of the results are clearly described.
- The proposed framework is innovative and has high clinical impact as it can predict cancer survival across different cancer types.
Areas of weakness in the paper that require further attention:
- Some relevant baselines, such as HIPT, which also incorporate multi-scale information hierarchically, are missing from the comparisons.
- Further insights are needed to understand the spatial patterns observed in cancer prognosis and their relationship to the proposed method.
- The paper claims to leverage in-slide super-pixel and cross-slide clustering for spatial and semantic priors but primarily uses the original Graph Convolutional Network (GCN) and Transformer Architecture without comprehensive comparisons to other multi-level processing methods.
- Recent papers, especially those related to multi-level or multi-scope processing methods, should be included and compared with the proposed approach.
- The analysis of ablation/tuning experiments is weak, making it unclear how much improvement is due to the novel designs or increased parameters.
- The paper lacks insights into how the method enhances the understanding of patch-level, tissue-level, and their relationships.
- While multiple cancer types are evaluated, the model is only validated using cross-validation, lacking an independent validation set.
- The rationale for selecting critical algorithm hyper-parameters, such as the number of super-pixels, patch size, and number of pathological components, is not provided.
- Details of the model training process, including learning rate and epoch number, are lacking.
To assist the authors in enhancing this study, we offer the following recommendations:
- Provide additional details about the GCN and Transformer Architecture, as well as further clarity on interpretability.
- Include a baseline comparison with the HIPT work, which follows a similar hierarchical transformer approach.
- Compare the proposed method with papers published in the last 2-3 years, particularly those focused on multi-level or multi-scope processing methods.
- Conduct more experiments to compare and validate the effectiveness of the designed modules against the latest multi-level processing methods.
- Provide visual evidence to support the claim of representing and capturing the contextual interaction of pathological components.
- Discuss the limitations and failure modes of the study.
- Address the concerns regarding the analysis of ablation/tuning experiments and provide a clearer understanding of the contributions of the novel designs and increased parameters.
- Explore and explain how the method contributes to a better understanding of the patch-level, tissue-level, and their relationships.
- Consider including an independent validation set in addition to cross-validation for evaluating the model.
- Provide a rationale for selecting critical algorithm hyper-parameters and share details of the model training process, including learning rate and epoch number.
Key points the authors should focus on in their rebuttal responses:
- Address the reviewers’ concern about the lack of methodology novelties and clarify the reasons behind the proposed design’s effectiveness.
- Discuss the limitations and failure modes of the study.
- Compare the proposed method with papers published in the last 2-3 years, particularly those focused on multi-level or multi-scope processing methods.
- Consider including the HIPT work as an interesting baseline due to its similarity in using hierarchical transformers.
- Address the concerns regarding the analysis of ablation/tuning experiments and provide a clearer understanding of the contributions of the novel designs
and increased parameters.
- Explain how the method contributes to a better understanding of the patch-level, tissue-level, and their relationships.
- Discuss the choice of cross-validation for model validation and consider incorporating an independent validation set.
- Provide a rationale for selecting critical algorithm hyper-parameters and share details of the model training process, including learning rate and epoch number.
Author Feedback
Thank AC and reviews for acknowledging and constructive comments to our work. We conducted more extensive experiments, and addressed comments as follows.
1.Motivation (R2): Clinical practice and researches show that the morphology and interactions of tumor and surrounding tissues (i.e. pathological components) are important evidences for cancer survival prediction. However, existing methods lack of adequate representation and quantification of the interactions of these pathological components, which limited the performance and interpretability.
2.Novelty (R2): We proposed a hierarchical GCN based spatial morphology extraction (SME) module and a Transformer based semantic context perception (SCP) module, which are organically combined with multi-scope analysis to represent and quantify the interactions of pathological components for survival prediction task. To the best of our knowledge, our framework is the first attempt to explicitly capture both morphological and contextual representations of pathological components, which brings new insights and interpretability for learning representation of WSI.
3.Limitations (R2): The analysis of pathological components is the core of the proposed framework. Therefore, to ensure performance, it is required that the training set can encompass all heterogeneous pathological components contained in the cancer species, and the validation set follows i.i.d..
4.More comparisons (R1/R2): Compared with HIPT (morphological, CVPR22), NAGCN (semantic, CVPR22) and two high-level baselines, our method has a certain superiority in most comparisons. Method CRC LIHC KIRC HIPT 0.601±0.002 0.656±0.005 0.639±0.005 NAGCN 0.577±0.010 0.611±0.002 0.639±0.006 w/o SME 0.597±0.002 0.647±0.002 0.630±0.003 w/o SCP 0.594±0.001 0.645±0.001 0.621±0.007 Ours 0.607±0.004 0.657±0.003 0.646±0.003 *The Vit_256-16 encoder of HIPT is replaced as Resnet50 for fair comparison.
5.Effectiveness of model design (R2): 1) The design principles are in line with the clinical experience of pathologists and existing research. 2) Our framework fully utilizes and combines the topological representation capabilities of GCN and the contextual awareness capabilities of Transformer. 3) We guides the learning direction of the model by injecting spatial and semantic prior knowledge captured by multi-scope analysis, rather than simply fitting training data with parameters.
6.Hyperparameters and analysis (R1/R2/R3) : 1) The non linearity of GCN is ReLU. The number of Transformer heads is 8, and the attention scores of all heads are averaged to produce the heatmap of contextual interactions. 2) Our model is trained with a mini-batch size of 16, and a learning rate of 0.00001 with Adam optimizer for 30 epochs. 3) The initial number of superpixels is determined by the average size of dataset. It should be noted that the initial superpixels will further merged according to their similarity, so the actually number of superpixels for model training is dynamic. 4) The patch size of WSI is determined by the average size of dataset and the input size of feature encoder. 5) The number of clusters is an empirical value, which represent the number of heterogeneous pathological components of the cancer species. 6) We have updated the open-source code and added README.md for the readers to conveniently reproduce our results.
7.Model complexity (R2): The parameter of our model is on the same level (≈10^1 Mb) as that of most MIL models. Moreover, in the forward propagation process, both patch aggregation and tissue assignment are continuously reducing the number of features, thus occupying less memory. Even on CRC dataset (largest WSI contains >100k patches) , our method can still execute training and validation on a 3090GPU.
8.Evaluation procedure (R3): Our used 5-fold evaluation procedure is not a simple 5-fold cross validation. During training process, 25% training samples are split as extra validation dataset for choosing checkpoints.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
In its current way, the study is not usable, making the work not fully reproducible in its current stage. The explanation is also rather weak and we suggest strengthen it to get the methodology adoptable.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This study presents the development of a Hierarchical Graph Transformer (HGT) driven by multi-scope analysis. The goal is to capture both biological morphology and contextual interactions in Whole Slide Images for predicting cancer survival.
During the first round of review, the reviewers and AC appreciate the paper’s high-quality writing, effective design, and interpretability. However, they raise concerns about the absence of a hierarchical multi-scale design, the rigor of the evaluation process, the level of innovation, and the choice of hyperparameters. The author provides a comprehensive rebuttal, summarizing and addressing these concerns. Consequently, the paper receives two positive reviews and one negative review.
I believe this paper addresses an important clinical problem using a novel approach combining Graph Convolutional Networks and Transformers. This problem has persisted in the field of digital pathology, and the paper brings a fresh perspective. While the proposed solution may have some drawbacks and may not be the optimal one, it has the potential to spark interesting discussions and inspire new ideas at the MICCAI conference.
Based on these reasons, my recommendation tends towards accepting the paper.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors address most of the major concerns raised by the reviewers. There is a novelty in the methodology and results show the superiority of the work. However, the authors may provide a clearer explanation and insights into the model design, which can be supported by the experimental results or related references.