Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Philip Chikontwe, Soo Jeong Nam, Heounjeong Go, Meejeong Kim, Hyun Jung Sung, Sang Hyun Park

Abstract

Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases; but, curation of accurate labels is time-consuming and limits the application of fully-supervised methods. To address this, multiple instance learning (MIL) is a popular method that poses classification as a weakly supervised learning task with slide-level labels only. While current MIL methods apply variants of the attention mechanism to re-weight instance features with stronger models, scant attention is paid to the properties of the data distribution. In this work, we propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature. We assume that in binary MIL, positive bags have larger feature magnitudes than negatives, thus we can enforce the model to maximize the discrepancy between bags with a metric feature loss that models positive bags as out-of-distribution. To achieve this, unlike existing MIL methods that use single-batch training modes, we propose balanced-batch sampling to effectively use the feature loss i.e., (+/-) bags simultaneously. Further, we employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder. Experimental results on existing benchmark datasets show our approach is effective and improves over state-of-the-art MIL methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_41

SharedIt: https://rdcu.be/cVRr9

Link to the code repository

https://github.com/PhilipChicco/FRMIL

Link to the dataset(s)

https://camelyon16.grand-challenge.org/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a method for WSI classification using Multiple instance learning. They exploit the assumption that the features from positive instances have larger magnitude, and by recalibrating with the predicted highest magnitude instance (critical) they produce better separable groups.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and clearly states the objectives of the work. The idea is simple but effective.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation on only two datasets, one of them is a bit outdated and the other one is in house.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper should be reproducible for the CAMELYON experiments, but it is not clear whether the in-house dataset will ever be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I would have liked to see a more in depth analysis on further datasets but I understand the limitations of space.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is valuable and is well written. Addresses an interesting problem.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a method built using the positional encoding (PEM) [26] followed by a single pooling multi-head self-attention block (PMSA) [19] modules for WSI classification.

    The main contribution is exploring the effectiveness of the feature re-calibration idea (which is used in few-shot learning) to produce balanced bags of +/- instances and, as a result, improve WSI classification.

    Two ideas are used for feature re-calibration: 1- max-critical instance embedding 2- feature magnitude loss

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem, idea and proposed solution are clear. Specifically, the application of feature re-calibration for producing balanced bags is interesting.

    The organization of the paper is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The technical novelty is limited. The whole architecture is built on top of PEM and PMSA modules.

    The main contribution is exploring feature recalibration for increasing separation between +/- samples which has very limited application as it only works on the binary classification problem.

    I think a simple contrastive loss (or loss with weighted samples) probably makes this improvement while it is not limited to binary classification.

    The experiments are not also enough. Note the CAMELYON16 is a fairly old dataset. Further, there is no ablation study on γi hyperparameters.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation settings are available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    As mentioned before, I guess a contrastive loss (or loss with weighted samples) provide with us the current improvement and still is not limited to binary classification. So, I suggest authors to explore if this statement is correct.

    PEM and PMSA abbreviations have been introduced many times.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see weaknesses section.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors worked on multiple instance learning for whole slide image classification. The authors re-calibrated the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature. The authors of this manuscript also proposed a balanced-batch sampling method to effectively use the feature loss and a position encoding module to model spatial/morphological information and perform pooling by multi-head self-attention with a Transformer encoder.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The study objective is interesting. The authors used multiple instance learning for image classification and to solve ground truth labeling issues. The authors worked on the benchmark datasets. The authors reported that their approach outperforms the existing methods on CM16 and COLON-MSI datasets in terms of accuracy and AUC. The density plots between normal vs tumor and MSS vs MSI seem significant. The authors checked the performance of their algorithm by varying some loss functions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors didn’t include qualitative results. Hence, it is hard to review the classification performance. Multiple instance learning is not a new approach. Hence, technical novelty is limited. The authors didn’t share their source codes.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors didn’t share their source codes.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It will be better if the authors include qualitative results. The authors should also revise the result validation section as much information is not available. The results should be validated by pathologists as well. The authors should include stepwise results of figure 2. The quantitative analysis part also needs to improve. Include ROC instead of tabular format.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Details are available in the main strengths of the paper.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors present a new multiple instance learning method for WSI classification. They assume that in binary MIL, positive bags have larger feature magnitudes than negatives, thus we calibrate features by using the statistics of the max-instance (critical) feature. All reviewers agree the paper is well written and the method is well motivated and effective. Yet there are a few minor weakness such as insufficient in-depth analysis (R1), qualitative results (R3), and more importantly, the lack of source code (R2&R3). Authors should consider reviewer comments in their final submission and consider to release their code to increase reproducibility and help other researchers to use their algorithm.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

Summary: Scores: R1(SA),R2(WR),R3(WA). We thank the reviewers for their insightful feedback! We are pleased they find this work interesting (R1,R2,R3), simple yet novel and effective (R1), and well-motivated with clear objectives (R1,R3). The major concerns are evaluation on only two datasets (R1) , method maybe limited to only binary classification (R2), and lack of qualitative results (R3). We answer specific questions below and will include feedback in the final version.

(R1,R2)”Evaluation on only two datasets- one in-house and a benchmark.” : We understand the reviewer’s concern. While CAMELYON16 may be considered a fairly old benchmark – existing/recent MIL approaches still employ this dataset for evaluation and there is still ample room to further improve performance over state-of-the-art approaches. As for in-house dataset, our goal was to highlight the utility of our approach in a more challenging setting, we plan to release this set to further research in micro-instability (MSI) classification. An extended version of this work will include more datasets such as TCGA LUAD/LUSC including multi-class datasets, with more in-depth analysis.

(R2,R3) “Lack of technical novelty, and approach is built on top of PEM and PMSA”: We disagree with the opinion that novelty is limited. Though PEM and PMSA are existing techniques, note that recent works have equally employed variants of the Transformer inspired architecture for WSI analysis. Our core contribution is to rather show that re-calibration is important for WSI, especially since our motivation is supported by the performance of the simple baseline (Sec. 2). Conceptually, our approach can be considered model agnostic i.e., recalibration could be used in any MIL model. Secondly, we highlight how training with balanced bags can further enforce the presented ideas i.e., using the feature loss $L_{fm}$. A future iteration/extension will ablate the contribution of PEM and PMSA, as well as the utility of re-calibration in the compared methods.

(R2) “Feature re-calibration is limited to the binary scenario, a simple contrastive loss could also work”: We agree with this observation. Since we employ balanced +/- bags, our design is inherently limited to the binary setting. However, this $L_{fm}$ can still facilitate better representation learning in the multi-class scenario when assumptions regarding the label of bag are ignored. Conceptually, $L_{fm}$ contrasts representations as an explicit contrastive loss would; however, prior works note that a standard contrastive loss [A] often requires large batches to be effective. Thus, it is not entirely obvious whether the same results could be achieved.

[A] Chen et al. “SimCLR: A Simple Framework for Contrastive Learning of Visual Representations”. ICLR

(R3) “Lack of qualitative results / No source code”: Due to limited space, we could not include plots such as TSNE or figs. of patch-based segmentation (this is slightly out of the scope of the paper). We agree that including a figure to show patch-based classification on a slide would better support the claims, we will include this in the final version. Also, we will release the source code.



back to top