Authors

Wentai Hou, Helong Huang, Qiong Peng, Rongshan Yu, Lequan Yu, Liansheng Wang

Abstract

Graph neural network (GNN) has achieved tremendous success in histological image classification, as it can explicitly model the notion and interaction of different biological entities (e.g., cell, tissue and etc.). However, the potential of GNN has not been fully unleashed for histological image analysis due to (1) the fixed design mode of graph structure and (2) the insufficient interactions between multi-level entities. In this paper, we proposed a novel spatial-hierarchical GNN framework (SHGNN) equipped with a dynamic structure learning (DSL) module for effective histological image classification. Compared with traditional GNNs, the proposed framework has two compelling characteristics. First, the DSL module integrates the positional attribute and semantic representation of entities to learn the adjacency relationship of them during the training process. Second, the proposed SHGNN can extract rich and discriminative features by mining the spatial features of different entities via graph convolutions and aggregating the semantic of multi-level entities via a vision transformer (ViT) based interaction mechanism. We evaluate the proposed framework on our collected colorectal cancer staging (CRCS) dataset and the public breast carcinoma subtyping (BRACS) dataset. Experimental results demonstrate that our proposed method yield superior classification results compared to state-of-the-arts.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_18

SharedIt: https://rdcu.be/cVRrs

Link to the code repository

https://github.com/HeLongHuang/SHGNN

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This work proposes a new Graph Neural Network (GNN) method for the classification of histology images. The two contributions of the proposed method include firstly the dynamic graph structure that is learnt as part of the learning stage. Secondly, the employed vision transformer mechanism improves the feature extraction using the deploying the multi-level structure of the graph.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed GNN method dynamically learns the graph structure that employs feature representations and position attributes to connect nodes, i.e. entities. The feature extraction and aggregation for classification have been improved using a vision transformer that improves method performance.
- The evaluation has been done on two datasets and the proposed method has been compared to its relevant competitors. Selected methods have been compared based on quantitative metrics.
- Use of the proposed method can result in the generation of more explainable graphs that can better relate to the medical explanation of the problem and can therefore result in more reliable / accurate results.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Explaining the multi-level structure of the proposed structure, it would be informative to mention the number of levels used in the experiments and to explain the effect of that as a hyper-parameter.
- The literature review could have been enriched on the use of vision transformers in previous work.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code has been provided to regenerate the results. Part of the experiments has been done on BRACS dataset which is publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Even though the flow of the text is good, there is a relatively large number of grammatical mistakes, mostly in the use of articles such as ‘a’, ‘an’, ‘the’.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- This paper well explains the problem it is addressing that is of clinical importance in the disease diagnosis through histology image classification.
- Most relevant prior works have been addressed and their shortcomings have been explained.
- The proposed method has been designed to address the initially presented problems.
- Claims have been addressed through the experiments and the story makes a good sense.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper presents a spatial-hierarchical GNN framework including a dynamic structure learning module to explore the spatial topology and hierarchical dependency of the multi-level biological entities in order to improve histological image classification. The proposed framework was evaluated on two datasets and results demonstrate that it outperformed the state-of-the-art methods on both datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper has a clear structure and well-written, it is overall easy to follow.
- The method is described with details, and enables interpretability of prediction outcomes.
- The code is open sourced and one of the datasets is publicly available.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Despite the detailed description of the graph-based metrics, the authors do not explain how the training strategy was designed. They mentioned that they used a 5 repeated 3-fold cross-validations but they did not explain if it is an end-to-end learning fashion or no.
- There are several spatial graph convolution methods existing in the literature. Why choosing GraphSAGE in particular?
- The rationality of choosing transformer is not clear. Why not using GAT network which is based on attention mechanism?
- The limitations of this framework needs to be discussed.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The checked points in the reproducibilty checklist match perfectly the information provided in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- In the first paragraph, citations started with [6,28] it is better to see an order in the references [1, 2] instead.
- Please see the weakness section.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-written and it shows a good validation on multiple datasets, however there are some concerns regarding the details of GraphSAGE and transformer. For this reason I picked “accept” rate.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

In the paper a method for the classification of histopathological images is presented. The method is based on a Graph Neural Network (GNN) approach, introducing a dynamic learning. Moreover, an approach visual transformer-based is used for the final classification.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the concept of dynamic learning in GNN is new, with respect to my knowledge
- the use of a vision transformer in GNN is a novel approach and it is particularly interesting
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- results are not very convincing, above all on BRACS dataset, further results should be added
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Beyond the presence or absence of the software, further details in terms of reproducibility could be given, for example the stop criterion for the choice of the best model.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The paper is well written and well organized. The method is clear and the comparisons are sufficient. However, I have some suggestions:
- please, could the authors clarify how dinamically change the structure of graphs? what is meant by dynamic change? do it change during epoch?
- In subsection 2.3 it is written “ Specifically, each hierarchical sequence is tokenized and attached with positional embedding as the input of a Transformer encoder consisting of Multi-Headed Self-Attention [26], layer normalization (LN) [4] and MLP blocks.” How the tokenization of graph in ViT happens? Please, clarify
- further information on reproducibility should be added, for example the early stop criterion for the choice of the best model
- with particular reference to BRACS dataset, in the cited paper, as HACT, the results are given in terms of F1 measure, why did the authors enter the results only in terms of AUC? AUC as Accuracy do not take into account the imbalance of the dataset, which is particularly evident in BRACS. Instead, F1 measure is adapted in the case of imbalanced dataset and I suggest to added also this measure
- the comparisons were performed on the same dataset for all methods? HACT used a previous version of the dataset, thus I would make sure of this.
- the reference to CRCS dataset is missing. Moreover, the reference to BRACS is not [22], but the following: Brancati, N., Anniciello, A. M., Pati, P., Riccio, D., Scognamiglio, G., Jaume, G., … & Frucci, M. (2021). BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images. arXiv preprint arXiv:2111.04740.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The results are not very convincing. It would be appreciated if F1 measures are added for the experiments.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a histopathology image classification with a Spatial-hierarchical Graph Neural Network. The method includes a learned dynamic graph structure and the use of vision transformer to improve the feature extraction. Evaluation is conducted on a private dataset and BRACS and the results show improved performance over recent graph-based methods. Overall the method design is interesting and the paper is well written. Reviewers have asked some good questions mainly regarding design choices, some method details, use of datasets and the choice of evaluation metric. Why is only AUC reported? This is quite limited. Are all compared approaches using the same BRACS dataset? The paper should be improved to address these questions.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

Thank you for your affirmative comments and valuable suggestions. Reviewers have asked some good questions mainly regarding design choices, some method details, use of datasets and the choice of evaluation metric, which we believe, either can be easily addressed in the final version. Detailed responses are as follows.

—Grammatical mistakes. We will carefully check the writing of the article and correct the grammatical mistakes in the final version. —Training Strategy. Dynamic structure learning module and spatial hierarchical graph neural network are trained in an end-to-end manner to obtain task-related information. We will emphasize this in the final version of the paper. —Why GraphSage? Thanks for your constructive comments. Theoretically, all spatial graph convolution can be used to explore the spatial relationship between entities at the same level. We will further explore the impact of different convolution operations and modify the relevant expressions in the final version. —Why transformer? The connection relationship between nodes in different levels is unknown, but their hierarchical attributes (i.e. positional encoding) are determined. Therefore, it is more suitable to use transformer rather than GAT to explore the hierarchical relationship between nodes at different level. —Our limitations. The limitation of our method lies in the lager model complexity, which mainly comes from the extraction of multi-level entities. —How does the DSL module work？ DSL module jointly represents the location attribute and feature attribute of the entity through learnable linear layers. Then, based on the learned joint representation, the graph structure is constructed by an online KNN algorithm. DSL modules are trained end-to-end with SHGNN, so the graph structure will be changed in each iteration. —Model choice. To reduce the bias caused by data splitting, we evaluate the performance of the model through 3-fold cross-validation. For the training dataset, we randomly split 20% samples in it to tune model parameters. —Tokenize in Vit. Following standard Vit, the tokenization process for each hierarchical sequence contains two steps. The first one is node embedding by a linear layer (nn.Linear). Another one is the positional encoding to introduce the level information by a trainable parameter (nn.Parameter). —Evaluation Metric. We chose AUC because it is not affected by decision threshold and dataset balance, which can more robustly represent the performance of multi-class classification [1]. Thanks for reviewer’s suggestion. We compared the comprehensive indicators (AUC and F1) of all methods, as shown in the table below. It can be seen that our algorithms have reached the leading performance except that the F1 on the BRACS dataset is lower than HACT. This may be because HACT uses a more complex graph convolution operation (PNAConv), while our algorithm only uses the simplest spatial convolution (GraphSage). If we adjust the convolution operator used in SHGNN, there is still much room for improvement in the performance of our model. Method AUC F1 CRCS BRACS CRCS BRACS GoogleNet 92.61 89.16 78.62 60.73 ADN 91.36 89.57 83.31 67.22 CGC-Net 92.05 93.15 78.18 67.78 HACT 94.61 93.61 84.35 74.01 Ours 96.11 95.01 85.55 72.06

—Dataset and citation. All comparisons were performed on the same dataset. CRCS is our collected colonoscopy pathological image dataset. BRACS dataset are the previous version download from [2]. Thanks for reviewer’s suggestion and we will cite the correct dataset reference in the final version.

Reference [1] https://scikit-learn.org/stable/ [2] https://www.bracs.icar.cnr.it/

back to top

Spatial-hierarchical Graph Neural Network with Dynamic Structure Learning for Histological Image Classification