Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yushan Zheng, Jun Li, Jun Shi, Fengying Xie, Zhiguo Jiang

Abstract

Transformer has been widely used in histopathology whole slide image (WSI) classification for the purpose of tumor grading, prognosis analysis, etc. However, the design of token-wise self-attention and positional embedding strategy in the common Transformer limits the effectiveness and efficiency in the application to gigapixel histopathology images. In this paper, we propose a kernel attention Transformer (KAT) for histopathology WSI classification. The information transmission of the tokens is achieved by cross-attention between the tokens and a set of kernels related to the spatial relationship of the tokens on the WSI. Compared to the common Transformer structure, the proposed KAT can describe the hierarchical context information of the local regions of the WSI and thereby is more effective in histopathology WSI analysis. Meanwhile, the kernel-based cross-attention paradigm sharply reduces the computational amount. The proposed method was evaluated on a gastric dataset with 2040 WSIs and an endometrial dataset with 2560 WSIs, and was compared with 5 state-of-the-art methods. The experimental results have demonstrated the proposed KAT is effective and efficient in the task of histopathology WSI classification and is superior to the state-of-the-art methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_28

SharedIt: https://rdcu.be/cVRrL

Link to the code repository

https://github.com/zhengyushan/kat

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a novel framework for WSI classification. Its main contribution comes from: 1) leverage the spatial relationship of the patches and kernels, some representative points, 2) introducing a novel module of ViT.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths: 1). constructs an abstract concept over the patch. 2). proposes the kernel attention module, its main novelties comes from: (a) reduce computational cost by replacing patch-to-patch attention with patch-to-anchor attention, (b) introducing hierarchical context information by adjusting N, a hyperparameter which can controls the scope of calculating the self attention.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1). In 2.1, I’m confused as to why Foreground needs to be used in both the Tiled WSI and the Feature matrix. That means, in my opinion, if you have divided the tissue region into patches based on the mask, there are only patches of the tissue region. So you need not extract the patches again. 2). In Fig.2., you probably could explain what the T operation is. 3). It is unclear whether your results have statistically significant. 4). This paper is interesting. But why not perform experiments on another public datasets (e.g., Camelyon16) to prove its universal effectiveness?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are some confusion in Pre-processing and Data preparation. It would be better if you could explain it in more detail or give the pseudocode in supplementary.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please see 5.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) It handles the spatial relationship between patch-level and WSI-level information very well. 2) And, this paper implements hierarchical context information of the local regions in an innovative way. I vote for weak accept because the authors not implement their method on mainstream dataset like Camelyon16.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposed a kernel attention transformer (KAT) for WSI classification. The information transmission of the tokens is achieved by cross-attention between tokens and a set of kernels related to the spatial relationship of the tokens on the WSI. KAT was evaluated on a gastric dataset with 2k WSIs and an endometrial dataset with 2.5K WSIs. Results have shown the proposed KAT is effective and efficient in the WSI classification and is superior to the state-of-the-art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel transformer named kernel attention transformer (KAT) is proposed. It can describe hierarchical context information of the local regions of the WSI and thereby is more effective.
    2. A very large (~ 5K WSIs) dataset is collected for experiments. Results have shown the effectiveness of the proposed KAT.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. One main issue is the lack of clear details to present motivations. The motivation of using kernel attention in transformer is not very clear.
    2. KAT cannot outperform baselines in large margins. The claim about the effectiveness and efficiency of the KAT is challenged. Some representative WSI models are not compared.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not clear if codes will be available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The main contribution is to proposed a kernel attention transformer. However, such motivation is not clearly presented in the paper. The authors claimed that “It makes the KAT be able to learn hierarchical representations from the local to global scale of the WSI, and thereby delivers better WSI classification performance.”, but we could not understand such novelty in the Fig. 1 to see why such kernel attention is useful for WSI classification.
    2. Also, the authors didn’t explain Fig. 1 clearly in the methodology section. For example, how to choose the anchors on the WSI and how to perform soft-masking and generate anchor-related masks shown in Fig. 1(g).
    3. In table 2, we could see KAT cannot perform better in Gastric-2K dataset compared with TransMIL in terms of accuracy, sensitivity and specificity in different tasks. Also, it is not clear if results in Endometrial-2K datasets are significant better than baselines. Also, the proposed KAT is not the fastest method and thus it is not easy to understand how the KAT could be efficient and effective.
    4. Std values should be reported as the authors said they used 5 fold cross-validation. ROC curves should also be presented and statistical tests could be done to test curves of each model.
    5. Attention-based MIL model should be discussed and compared because they have been widely used in WSI classification and diagnosis. “Data-efficient and weakly supervised computational pathology on whole-slide images.”, Nature BME, 2021. “Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks.” Medical Image Analysis, 2020.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Though this paper has novel kernel attention transformer, motivations are not very clear and presented in the paper. The authors should emphasize why their framework could be very suitable for WSI classification. Otherwise, it just used a fancy technique. Experiments cannot support the claims that the proposed model is more effective and efficient than baseline models as we can see in some cases the proposed KAT cannot perform better than TransMIL or Nystromformer.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, some of my concerns are addressed. But my concern about marginal improvements are not well solved and statistical tests should be used to test the significances of AUCs.



Review #3

  • Please describe the contribution of the paper

    The paper proposes the Kernel Attention Transformer for classification of WSIs, which builds on the promise of ViTs whilst addressing some of the issues they face when confronted with WSIs. It also provided details on the pipeline into which this network fits.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written and the methods clearly described with all details provided. The authors have included computational and memory requirements, which is a useful addition. There is a thorough and fair appraisal of the method compared to SOTA and, although no confidence intervals are provided, the method’s value is clearly shown. The method draws upon a number of already high-pedigree methods (ViT, efficientNet etc.) but adds credible novelty by combining them with other techniques such as Kmeans to compute the anchor points for the weighting masks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Very few. If I had to find one, perhaps it would be the arguably incremental nature of the innovation. With some of the metrics showing quite marginal gains, it would be useful to see confidence intervals for these figures.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code availability is mentioned and the training dataset is not explicitly mentioned. The architecture is well-described but it would be difficult to reproduce the results from this paper alone.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    “is the most appreciate for the dataset.” - “appreciate” should be “optimal”? Very little worth adding - this is a great paper and I won’t waste our time trying to find any further niggles.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is really well written, the results are strong and the discussion about other methods demonstrates an awareness and respect for the latest advances in the field. Although there is little in the way of architectural or algorithmic details, this is clearly a strong piece of work that deserves a place at MICCAI.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a kernel attention transformer model for WSI classification. The reviewers raise several concerns about the unclear motivation, lack of baselines, lack of technical and experimental details, insufficient results analysis, etc. We invite the authors to carefully address these concerns in rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

The rebuttal addresses the following four concerns. 1.Unclear Motivation This work was motivated by the problem of ViT in the aspect of structure description and computational complexity for histopathology WSI analysis. The positional embedding in ViT designs for the natural sense image dataset, e.g., ImageNet, where all the images are in the same size. The image patches are ranged in a consistent sequence for the ViT tokens. It ensures the tokens receive consistent structural information throughout the training and inference stages. However, the size and shape of histopathology WSIs are not fixed, and the tissue region varies a lot in different WSI. It makes the patches from different WSIs but for a certain token are positional inconsistent or even positional conflicting, especially under the setting that only the foreground features are extracted to fill the ViT tokens. This makes ViT difficult to aware the structural difference among WSIs, and thereby affects its capacity on fine-grained WSI classification tasks that rely on tissue distribution, e.g., subtyping and grading. Moreover, the self-attention operation is computationally inefficient when facing thousands of tokens from WSIs. TransMIL tries to solve the above problems, but its design still assumes that tissue is a fixed-size square region, for which the positional inconsistent and conflicting problem were not yet properly tackled. We believe there is more effective and explicit approaches to describe the structural information of tissue for ViT while maintain a low computational complexity. To achieve this goal, we proposed the anchor-based WSI description approach and the corresponding KAT model. The anchors are adaptively allocated based on the size and the shape of tissue, and then used for gathering and broadcasting the information in multiple scopes of their nearby regions based on the cross-attention operation between the kernels and tokens in KAT. In this way, the structural information of the WSI is definably and efficiently built into KAT, which is verified effective on the fine-grained WSI subtyping tasks.

  1. Lack of Baselines The reason we did not take CLAM and DeepAttnMISL (Reviewer #2) as baselines was their results had shown a significant gap to the SOTA as reported in the TransMIL’s and PatchGCN’s papers. Now, we evaluated CLAM under our experimental settings, and found it achieved an AUC of 0.951/0.929 in the binary classification task on the Endometrial/Gastric dataset, which is 3.2%/3.8% inferior to our method, and achieved a macro-AUC of 0.791/0.790 in the subtyping tasks, which is 4.4%/6.5% inferior to our method.
  2. Insufficient results analysis Area under ROC curve (AUC) is the most commonly used measurement to evaluate the overall performance of a classifier by statistically testing the ROC curves. So, we took AUCs as the major metrics in the evaluation. Accuracy, sensitivity, and specificity were used to illustrate the category preference of the model under the default decision threshold (by argmax function), which were secondary metrics. We observed the AUC/macro-AUC of our method is 0.9%~2.9%/2.4%~4.1% superior to the second-best methods in the binary/subtype classification task on the two datasets. Therefore, we concluded our method was more effective than the others.
  3. Lack of technical and experimental details –The details for “how to choose the anchors on the WSI and how to perform soft-masking and generate anchor-related masks” had been given in the second half of section 2.1 with Eq. 1~4. (Reviewer #2) –We directly divided the dataset into train/validation/test parts to conduct the experiments for the reason our datasets are relatively large. We did not conduct 5-fold cross-validation nor did we claim to do so. (Reviewer #2) –We will try to elaborate the details as much as possible into the writing space allowed. –We will publish the code and an implementation on a public dataset to cover the technical and experimental details.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a kernel attention transformer model for WSI classification by exploring hierarchical structural information. The proposed method addresses the computational inefficiency problem of WSI learning, which benefits the community. The rebuttal has addressed most of the reviewers’ concerns. It is suggested that the authors consider the reviewers’ comments, and include more technical details and results discussions in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a kernel attention transformer to classify WSI images. In the first round review, reviewers’ concerns focused on the motivation, rigor of experiments, effective of methodological design. Two reviewers provide positive review results, while the third one gives negative review results. The major concern for me is the marginal improvements of the method, which was still not fundamentally addressed in the rebuttal. Otherwise, the other concerns have been addressed by the author.
    For these reasons, my overall opinion is that the recommendation is toward acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a kernel attention transformer (KAT) for histopathology WSI classification. Reviewers initially has diverse views about the paper yet some of negative comments are addressed by the author’s rebuttal, e.g. the motivation of the method. I tends to accept the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top