Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ziyu Guo, Weiqin Zhao, Shujun Wang, Lequan Yu

Abstract

In computation pathology, the pyramid structure of gigapixel Whole Slide Images (WSIs) has recently been studied for capturing various information from individual cell interactions to tissue microenvironments. This hierarchical structure is believed to be beneficial for cancer diagnosis and prognosis tasks. However, most previous hierarchical WSI analysis works (1) only characterize local or global correlations within the WSI pyramids and (2) use only unidirectional interaction between different resolutions, leading to an incomplete picture of WSI pyramids. To this end, this paper presents a novel Hierarchical Interaction Graph-Transformer (i.e., HIGT) for WSI analysis. With Graph Neural Network and Transformer as the building commons, HIGT can learn both short-range local information and long-range global representation of the WSI pyramids. Considering that the information from different resolutions is complementary and can benefit each other during the learning process, we further design a novel Bidirectional Interaction module to establish communication between different levels within the WSI pyramids. Finally, we aggregate both coarse-grained and fine-grained features learned from different levels together for slide-level prediction. We evaluated our methods on two public WSI datasets from TCGA projects, i.e., kidney carcinoma (KICA) and esophageal carcinoma (ESCA). Experimental results show that our HIGT outperforms both hierarchical and non-hierarchical state-of-the-art methods on both tumor subtyping and staging tasks.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_73

SharedIt: https://rdcu.be/dnwKw

Link to the code repository

https://github.com/HKU-MedAI/HIGT

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript introduces a Hierarchical Ineraction Graph-Transformer (HIGT) framework for the diagnosis of pyramid structured gigapixel whole slide images (WSI). They propose a new module RAConv+ to learn short-range relationships, and use IHPool to progressively aggregate the hierarchical graph. Besides, they design a Bidirectional Interaction module to establish communication between different levels of WSI levels, and a Fusion-block to aggregate features from different resolution levels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A new framework incorporating hierarchical information aggregation and interaction is proposed for WSI diagnosis.
    2. They redesign the HIViT model to capture the short-range and long-range information among varying WSI magnifications.
    3. Sepraable Self Attention is applied to reduce high computational cost of Graph-Transformer model.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Though the authors claim a new framework is proposed, while mainly update some of the designs of intermediate modules.
    2. The diagnosis tasks (grading/subtyping) the authors used to demonstrate the performance of the proposed framework are kind of simple, barely binary classification.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not applicable

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The size of two datasets used to evaluate the proposed framework is relatively small. TCGA-Lung dataset is relatively larger (more than 1,000 WSIs). Both the evaluated task (grading+subtyping) are just binary classification. Evaluation on more complex task and larger dataset is desired for evaluating the framework’s superiority.
    2. The framework schematic figure (Figure 1) is not perceptual intuitive to understand. The two arrows on the Region-level Block are confusing. The connections from first row to second row is also not straightforward to grasp.
    3. On ESCA staset, no information regarding tumor typing is provided. Also not sure why staging is complex compared to typing (the performance on staging is far lower compared to typing?)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelties of the proposed framework and redesigned modules is limited. The evaluation is not rigorous enough.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    – Designed a Bidirectional Interaction module to establish communication between different resolution levels of WSIs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    – The motivation is clear. The paper aimed to establish bidirectional communication between different levels within the WSI pyramids. – The improvement is significant. The paper achieved satisfactory improvement on two public datasets. – It is well organized and easy to read.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    – The contribution may be over-claimed. The main structure of the hierarchical representation of WSI followed [2]. The contribution of resolving the multi-resolution problem of WSI may be over-claimed. – The proposed solution is simple and crude. The authors directly added up features from different resolutions to establish bidirectional communication. – The mathematical formulations of methods are not clear enough. In section 2.3 – Bidirectional Interaction Block, the upper and lower corner markers were confusing, especially for the interaction progress from patch nodes to region nodes. – Some experimental results are inconsistent with previous works. In the comparison experiments on KICA dataset, the paper showed that DS-MIL [1] is performed better than H2-MIL [2]. However, in H2-MIL [2] paper, H2-MIL achieved much higher metrics on KICA dataset in the same tasks.

    [1] Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14318–14328 (2021) [2] Hou, W., Yu, L., Lin, C., Huang, H., Yu, R., Qin, J., Wang, L.: Hˆ2-mil: Exploring hierarchical representation with heterogeneous multiple instance learning for whole slide image analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 933–941 (2022)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experiments were conducted on two public available datasets. The authors claimed the code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    – The authors could introduce more details about the Bidirectional Interaction (BI) Block. One of the main contributions in the paper is to achieve bidirectional communication between different resolutions. Based on H2-MIL, BI block is the main innovative module in the paper, which should be described more clearer and in more detail. – Add some introduction of the experimental setup of compared methods to ensure fairness and credibility of the comparison. HIPT [1] is a self-supervised pretrain framework utilizing hierarchical features in WSIs, instead of a weakly-supervised method for WSI classification like the proposed framework. The authors need to describe that how to use a pretrain method for weakly-supervised classification. – Please elaborate why you chose KimiaNet as the feature extractor and where/how the KimiaNet was pre-trained. – The authors forgot to introduce the subtyping information of ESCA dataset.

    [1] Chen, R.J., Chen, C., Li, Y., Chen, T.Y., Trister, A.D., Krishnan, R.G., Mahmood, F.: Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16144–16155 (June 2022)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    – The motivation is clear. – The improvements of performance on two public datasets are satisfactory. – The description of the original part of the method is not detailed enough.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a Hierarchical Interaction Graph-Transformer (HIGT). It combines Graph Transformer and multi-scale pyramid structure information of whole slide image. The experiments were conducted on two public datasets of TCGA and the results demonstrated the competitive performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The methodology section is well-written, providing a comprehensive and clear description of the proposed method. The language used is simple and easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    –The motivation of the innovations is not strong enough, and the role of the Bidirectional Interaction Block has not been fully explained. –The mean operation mentioned in Section 2.3 is not reflected in Fig. 1. So it is a little bit confusing in Fig. 1. –The dataset information is incomplete (about task Typing in the ESCA dataset) and the size is relatively small.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experimental data are from the public dataset. The authors say the code will be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    –Clarify the motivation and details of the method, especially the Bidirectional Interaction Block section. –Perhaps some interpretable explanations or visualizations can be provided to assist in demonstrating the novelty of the method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The Bidirectional Interaction Block is interesting, but the motivation is not strong enough and the explanation is insufficient. The effectiveness of the module was only verified from experimental results, yet no convincing explanation or theoretical basis for bidirectional interaction was provided.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received mixed comments. Reviewers acknowledge the methodological design of the method and the improvements in performance with respect to the baselines. However, the reviewers raised concerns about the simple experimental setting [R1], over-claim of contributions [R2], inconsistent experiments [R2] and missing details [R3]. Thus, the authors are invited for a rebuttal to address the reviewer’s concerns.




Author Feedback

We thank AC and reviewers’ positive comments. We now address specific concerns raised by individual reviewers:

#AC Q1: Simple experimental setting and performance gap between 2 tasks (R1)

  • We followed the same experimental setting in H2-MIL. The tumor staging is inherently more complex than typing [1], as staging is prognosis-related and evaluating both morphological and structure features such as atypical cellular structures, potential tumor metastasis, tumor invasion depth, etc. Therefore, the performance gap exists in both our paper and previous works [1]. Note that staging requires comprehensive features, the large performance improvement for staging indeed validates the benefit of our bidirectional interaction design.
  • We also conducted additional experiments on relatively larger datasets: TCGA-LUNG (999 cases with 2 tumor types) and TCGA-RCC (906 cases with 3 tumor types). The mean AUC of ours and HIPT (most powerful SOTA) are 96.68 vs 94.97 and 98.99 vs 97.29 respectively, further showing the effectiveness of our method.

Q2: Further clarify the motivation and contribution, especially BI module (R1, R2, R3)

  • Our new framework aims to solve significant limitations of previous multi-resolution works, rather than “just intermediate module design updates” nor solely “resolving the multi-resolution problem”. The previous multi-scale works (e.g., HIPT) (1) learn only global relations, (2) utilize unidirectional interactions between complementary resolution levels, and (3) have high computational costs.
  • Thereby, we develop (1) a novel hierarchical GNN-ViT structure to learn both local and global relations of the WSI pyramids; (2) a novel Bidirectional Interaction (BI) module to establish bidirectional communication between different resolutions, which allows complementary integration of information from different levels, and thus improving the hierarchical modeling capabilities; and (3) adopt iteratively pooling and separable self-attention to decrease computation complexity.
  • Our method has large performance improvement (especially for the important yet complex staging task) and large computation cost reduction (from quadratic to linear) over other SOTA methods.

Q3: Inconsistent experimental results (R2) We used the code released in H2-MIL and adopted hyperparameter tuning to ensure fairness. The inconsistency might be due to unknown implementation details in H2-MIL.

Q4: Missing information about ESCA dataset (R1, R2 and R3) The ESCA dataset has 67 squamous cell carcinoma cases and 94 adenocarcinoma cases.

#R1 (also refer to Q1,2,4) Q5: Framework schematic figure The blue and green arrows in Figure 1 represent the features of region/patch tokens. We will add the mean operation in the BI module to Figure 1.

#R2 (also refer to Q2,3,4) Q6: The simple design of BI module The motivation of the BI module is to learn complementary interaction between different resolution levels. While it is simple, we find it significantly improves the performance (see Table 3.), especially for the more complex staging task, which requires comprehensive multi-scale features. We will explore more advanced design (cross attention) in future.

Q7: Other comments and mathematical formulations KimiaNet is trained on 240,000 histopathology patches. We use it to maintain consistency with H2-MIL. The symbols \bar{P}, \hat{P}, and \hat{R} are intermediate variables for patch/region tokens. The corner mark l represents the l-th ViT Block. Although HIPT is a self-supervised pretrain framework, it also conducts WSI classification experiments in their paper and we follow the same setup.

#R3 (also refer to Q2,4) Q8:Interpretable explanation We will visualize the learned structures and attention scores to help understand our method in the revised version.

[1] Zhao et al.: MulGT: Multi-task Graph-Transformer with Task-aware Knowledge Injection and Domain Knowledge-driven Pooling for Whole Slide Image Analysis. AAAI 2023.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents an hierarchical interaction graph-transformer for analyzing WSIs. The method is based on graph neural networks and transformers learning both short-range local information and long-range global representation of the WSI pyramids. The reviewers raised some concerns about the experimental setting, the contributions and inconsistent experiments. The authors submitted a rebuttal to address these points and provide clarifications and additional comparisons. The metareviewer thinks that the answers of the authors clarified different things and he/ she thinks that the paper could be interesting for the MICCAI. The authors need to add these clarifications on the final version of the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a Hierarchical Interaction Graph-Transformer framework (HIGT) to simultaneously capture both local and global information from WSI pyramids via a novel Bidirectional Interaction module. Reviewers acknowledge the methodological design of the method and the improvements in performance with respect to the baselines but have concerns regarding experimental settings and missing details. In my opinion, Authors rebuttal has well addressed these issues and therefore I suggest to accept the paper, especially given its nice performance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In this paper, the authors propose a Hierarchical Graph-Transformer framework for whole slide images for weakly supervised learning In the initial reviews, the following strength have been identified: The approach sensibly integrates hierarchical information as well as short- and long-range information, and addresses high computational costs of Graph-Transformers. Criticism include the incremental nature of the proposed approach and a limited evaluation, as well as inconsistent results compared to published work.

    The authors discuss these aspects in their rebuttal, though certain questions stay open: If there is a performance difference of over 0.1 compared to published work [7] on the same dataset, this requires discussion and a simple pointer to hyperparameter tuning is not sufficient. The results reported in the rebuttal for TCGA-Lung and TCGA-RCC also don’t seem to match published work [3], albeit with a smaller difference. While this may be an issue in [7] or in the current paper, it is it is unclear to what extent which results may have been cherry-picked, but these results need to be discussed from my perspective. Lastly, I would like to mention that the review-rebuttal process at MICCAI is not intended / not fully suited to assess the role and validity of new results during the rebuttal phase. While the approach is interesting in general, the current inconsistencies cause me to rate this paper below the acceptance threshold for MICCAI.



back to top