Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Daniel Reisenbüchler, Sophia J. Wagner, Melanie Boxberg, Tingying Peng

Abstract

Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting the potentially rich contextual information beyond individual entities. On the other hand, Transformers with global self-attention modules have been proposed to model the interdependencies among all instances. However, in this paper, we question: Is global relation modeling using self-attention necessary, or can we appropriately restrict self-attention calculations to local regimes in large-scale whole slide images (WSIs)? We propose a general-purpose local attention graph-based Transformer for MIL (LA-MIL), introducing an inductive bias by explicitly contextualizing instances in adaptive local regimes of arbitrary size. Additionally, an efficiently adapted loss function enables our approach to learn expressive WSI embeddings for the joint analysis of multiple biomarkers. We demonstrate that LA-MIL achieves state-of-the-art results in mutation prediction for gastrointestinal cancer, outperforming existing models on important biomarkers such as microsatellite instability for colorectal cancer. Our findings suggest that local self-attention sufficiently models dependencies on par with global modules. Our implementation will be published.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_37

SharedIt: https://rdcu.be/cVRrU

Link to the code repository

https://github.com/agentdr1/LA_MIL

Link to the dataset(s)

https://portal.gdc.cancer.gov

https://zenodo.org/record/3784345#.Yrq6TC0RphE

http://www.cbioportal.org

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a local attention graph-based Transformer for multiple instance learning with an efficiently adapted loss function to learn expressive WSI embeddings for the joint analysis of microsatellite instability and mutation prediction.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A framework integrates a novel local attention graph-based Transformer that restricts self-attention calculations in Transformers by using kNN graphs to model regional regimes within a tile.

Validations were done on two TCGA datasets for gastrointestinal cancer with reasonable baselines.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There are no explicit major concerns, the following relevant work should be discussed,

Shao, Zhuchen, et al. “Transmil: Transformer based correlated multiple instance learning for whole slide image classification.” Advances in Neural Information Processing Systems 34 (2021).

Please check my minor comments at 7.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors claimed that the implementation would be published. The paper should be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I assume all the results are validated with a statistical test; please clarify.

Sensitivity analysis or relevant discussion should be added for the trade-off of using LA-MIL and T-MIL.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The first attempt to predict microsatellite instability or tumor mutational burden with genetic alterations.

The first transformer-based approach for mutation prediction.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors proposed LA-MIL, a local attention-graph based transformer for multiple instance learning in whole slide images. The authors demonstrated LA-MIL could effectively predict microsatellite instability and tumor mutational burden jointly with genetic alterations in gastrointestinal cancer.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method addressed important issues in current MIL methods for WSIs. For example, passing bag labels to instances may introduce noises as instance labels might be inconsistent with bag labels. Also, a lot of existing methods are not leveraging spatial correlation between instances, which contains important information in WSIs.

The method is clearly described and the visualization helps to understand the proposed method more intuitively.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While the paper is well-written, the experiments part of the paper could be further strengthened to better demonstrate the effectiveness of the proposed method.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the paper is good. The authors provided necessary details from the reproducibility checklist.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. While multi-target prediction is very useful as it allows predicting different phenotypes through one single model. However, it would be interesting to see if these tasks are helping each other for more effective prediction. The author could perform additional experiments to train and predict each task individually using the proposed method and compare with the multi-target prediction results.
2. For results in Table 2, while T-MIL and LA-MIL contain standard deviation, the existing methods do not. Are these results generated through predictions on the same set of cross validation partitions? If not, please ensure that the compared methods are trained and tested on the same set of training and testing data for a fair comparison.
3. For the visualization in Fig.3, could the author also include visualization for T-MIL as it seems to have similar performance as LA-MIL?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper addresses important issues in applying multiple instance learning to prediction tasks in whole slide images.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

A local-attention graph-based transformer model is proposed to restrict self-attention calculations which improves the performance of mutation prediction comparing with the state-of-the-art methods in many cases.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Well motivated proposed approach, good analysis of the problem at hand, latest deep learning based approach to solve problem, careful experiments and clear analysis of the results, comparison with state-of-the-art methods are given.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Nothing
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Datasets and experimental setup are clearly mentioned.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Nothing
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Main strengths of the paper.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

In this paper, the author proposed a local attention graph-based Transformer for MIL problems (LA-MIL) of WSI data. The model is tested on mutation prediction for gastrointestinal cancer and some improvement is shown.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I can see some new component in the method proposed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Part of the paper is confusing to me. It also lakes ablation study on the impact of parameters. The gain of proposed method is kind of marginal and not consistent.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Author claims to release code of model only. Some of the parameter selection such as k of kNN is not clear. I can see some difficulty in reproducing the result with these information only.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

In this work, the author only focused on the manually annotated tumor regions which reduce the difficulty of the problem. This also makes the model’s performance dependent on the quality of manual annotation. Why not use the whole foreground tissues? In method 2.1, the author mentioned using a symmetric adjacency matrix A to represent spatial relations. However, the kNN of each foreground patch is not symmetric, please clarify how to make A symmetric. The description of A above is not consistent with Fig.1 as well, where kNN is represented by a n x k x l matrix. Also, it is not clear what l is in Fig. 1. Seems it corresponds to l blocks of local attention layer. Does it imply the kNN of each transformer block is different? Based on method 2.1, kNN is based on coordinates which should be fixed. Usually, transformer relies on positional embedding to learn the spatial relationship between patches. But I did not see the definition of positional embedding. Without positional embedding, it is just a local multi-head attention to me. Calling it transformer will be confusing. Also, it will make more sense to compare with other attention-based MIL method such as ABMIL or DSMIL with additional local constraints to understand what the major gain comes from: multi-head attention or local constrains. In Fig. 2, the authors shows kNN with different sizes (K values). And in implementation section, they mentioned different selection of K values. However, in the result, they did not show the impact of selecting different K’s. Not sure which k they selected and how did they pick the value. An obvious advantage of using multi-head attention is the size of the network. It is desired to show the size of the network in Table 2 for different method as well for a fair comparison.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Transformer for image is hot in the past year. However, a careful discussion and experimental design is absent in this work to show its strength and limitations.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposed LA-MIL, a local attention-graph based transformer for multiple instance learning in whole slide images, to effectively predict microsatellite instability and tumor mutational burden jointly with genetic alterations in gastrointestinal cancer. The proposed method addressed important issues in current MIL methods for WSIs, such as inconsistent instance label and integrating spatial information. Nice visualization is provided to illustrate the results. Sensitivity analysis or relevant discussion should be added for the trade-off of using LA-MIL and T-MIL.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Author Feedback

We would like to thank all the reviewers for their valuable comments which are very helpful to direct our future work. We address the issues below. R1/MR) “Sensitivity analysis”. We did not discuss these analyses in the manuscript due to space limitation. However, we will add not only sensitivity analyses in an extended paper, moreover we will perform several external validations. R3) “In multi-target prediction it would be interesting to see if these tasks help each other for effective prediction.” We have conducted another experimental line concerning improvements based on possible correlations to other biomarkers. The results for individual binary predictions (MSI, TMB, BRAF) stayed approximately the same. R3) “T-MIL and LA-MIL contain standard deviation, the existing methods don’t. Are these results generated through predictions on the same set of cross validation partitions?” We compared our method with four SOTA works, where each study performed their own splittings for each individual target (=15 possible splits for tcga crc). Using exactly the same splits is not possible for our multi-target setup. We will add further statistics from other works like standard deviation and min/max bounds (supplementary material). R4) “The work only focused on the manually annotated tumor regions which reduce the difficulty of the problem. Why not use the whole foreground tissues?” As generally done in genetic alteration prediction and to facilitate the task, we used tumor-occupied tissue regions, as only these regions harbor genetic alterations. Extended work will no longer rely on tumor RoI annotations. R4) “The gain of the proposed method is kind of marginal and not consistent.” In this work, we aimed to show the sufficiency of local self-attention. Moreover, our model is the first in which the receptive field of self-attention can be arbitrarily restricted from 1 to n. Another goal is to further optimize cuda kernel functions to additionally tackle the quadratic computational complexity (using locality) for a huge number of tiles. R4) “Symmetric adjacency matrix A.” With A, we denoted the sparse nxn matrix which corresponds to a binary matrix indicating neighboring. This matrix is symmetric as we used the Euclidean distance for building the graph. In Fig.1, we showed the nxk data structure which was processed. We clarify this. R4) “The description of A above is not consistent with Fig.1, it seems it corresponds to l blocks of the local attention layer. Does it imply the kNN of each transformer block is different?” We clarify that different (l graphs) as well as the same graph can be used for each local attention block. R4) “On selection of k.” As detailed in the Implementation part, we used k=16,64 for the first and second layer, respectively, with the aim of constraining self-attention to small local regimes. We will add an ablation study for different choices of k in a future work.

back to top

Local Attention Graph-based Transformer for Multi-target Genetic Alteration Prediction