Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Fei Li, Mingyu Wang, Bin Huang, Xiaoyu Duan, Zhuya Zhang, Ziyin Ye, Bingsheng Huang

Abstract

In current pathology image classification, methods mostly rely on patch-based multi-instance learning (MIL), which only considers the relationship between patches and slides. However, in clinical medicine, doctors use slide-level labels to summarize patient-level labels as a diagnostic result, indicating the involvement of three levels of patch, slide, and patient in actual pathology image analysis, which we refer to as the multi-level multi-instance learning (ML-MIL) problem. To address this issue, we propose a novel and general framework called Patients and Slides are Equal (P&SrE), inspired by the doctor’s diagnostic process of repeatedly confirming labels at the patient and slide level. In this framework, we treat patients and slides as instances at the same level and use transformers and attention mechanisms to build connections between them. This allows for interaction between patient-level and slide-level information and the correction of their respective features to achieve better classification performance. We evaluate our method on two datasets using two state-of-the-art MIL methods as baselines. The results show that our method improves the performance of the baselines on both slide and patient levels. Our method provides a simple and effective solution to the common problem of ML-MIL in medical clinical scenarios and has broad potential applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_7

SharedIt: https://rdcu.be/dnwGK

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a multi-level multi-instance learning (ML-MIL) framework for patients’ diagnosis. Additionally, it treats slides as instances with each patient as a bag and regards patches as instances when using each slide as a bag. Moreover, it allows for interaction between patient-level and slide-level information and the correction of their respective features. Experiments demonstrate its effectiveness on two pathological image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is well written.
    2. The idea is very clear.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty is limited, because it is very intuitively to form two MIL problems for patients’ diagnosis. Additionally, self-attention is also very widely used in MIL problems, it is not clear the difference of this method from previous ones. For example: Shao Z, Bian H, Chen Y, et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in neural information processing systems, 2021, 34: 2136-2147.

    2. Experiments only adopt two comparison methods. It is better to utilize more comparison method to boost the convince of the proposed framework. Because there are many MIL-based methods for WSI classification.

    3. “Doctors usually select certain key slides for careful observation and information aggregation during diagnosis, similar to the self-attention mechanism. “ I think this motivation is suitable for any attention mechanism. Thus, it is better to explain the reason to use self-attention rather than other attentions.

    4. There are too many grammar errors and typos. Like: 1) The equations should be followed by a comma or period. 2) There is no space before “where”. 3) publicly dataset-> public dataset

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think this paper is easily reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper proposes a multi-level multi-instance learning (ML-MIL) framework for patients’ diagnosis, by forming two MIL problems with allowing for interaction between patient-level and slide-level information and the correction of their respective features. Experiments demonstrate its effectiveness on two pathological image datasets.

    This paper is well written, but i still have several major concerns as follows:

    1. The novelty is limited, because it is very intuitively to form two MIL problems for patients’ diagnosis. Additionally, self-attention is also very widely used in MIL problems, it is not clear the difference of this method from previous ones.

    2. Experiments only adopt two comparison methods. It is better to utilize more comparison method to boost the convince of the proposed framework. Because there are many MIL-based methods for WSI classification.

    3. “Doctors usually select certain key slides for careful observation and information aggregation during diagnosis, similar to the self-attention mechanism. “ I think this motivation is suitable for any attention mechanism. Thus, it is better to explain the reason to use self-attention rather than other attentions.

    4. There are too many grammar errors and typos. Like: 1) The equations should be followed by a comma or period. 2) There is no space before “where”. 3) publicly dataset-> public dataset

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of idea is limited. Additionally, comparison methods used in this paper are too few.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, I still regard that the novelty of this paper is not enough. Thus, I hold on to my decision.



Review #2

  • Please describe the contribution of the paper

    The author focused on the multi-level multi-instance learning problem of pathological images. They propose a method that uses both slide-level and patient-level labels simultaneously. The experimental results are sufficient to demonstrate their claim.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The task of the patch-slide-patient multi-level multiple instance learning is a new task setting. The proposed method can use the slide-level and patient-level features at the same time. This idea is unique.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The discussion about the related work is not sufficient. I know the task is new but the MIL is a very popular setting in a pathological image diagnosis. The importance of patient-level prediction is not well-persuasive. How many slides does it usually consist of in each patient?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author cannot publish the code of the proposed method. I recommend the author to add the detail of the implementation to the supplementary material. For example, the numbers of the network layers, the equation of the loss function.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    If there is a comparison between two-stage MIL and the proposed methold in the experiment, it makes the claims of the author strength. I have some question about the proposed method. The patient-level feature vector is calculated before the transformer input. Is the vector necessary as the transformer input? If the vector is calculated with the output of the transformer, would the result be worse? In the experiment, the patient-level label has two classes but the slide-level label has three classes. Why was the normal label removed in the patient-level? If the patient-level label has three classes, will the proposed method need to change?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please answer my questions above.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal almost satisfied me. Therefore, I keep the rating.



Review #3

  • Please describe the contribution of the paper

    This paper presents a multi-level multi-instance learning (ML-MIL) problem that aims to consider the relationships between patches, slides, and patients. This concept presented is novel.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a novel general framework for the unique “patch-slide-patient” ML-MIL problem in the medical field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In terms of presentation, the article is detailed in terms of methodological description, but the Introduction is unclear and lacks a description of current research advances (relevant methods) for the slide-patient problem. Beyond the presentation, some methodological aspects need clarification while the evaluation needs improvement.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is more reproducible but still needs to clarify certain specific implementations, such as the number of transformer blocks used.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1.In terms of presentation, the article is detailed in terms of methodological description, but the Introduction is unclear and lacks a description of current research advances (relevant methods) for the slide-patient problem.

    1. The introduction indicates that the literature [9] is a solution to the ML-MIL problem, but I did not find a section in that literature about its exploration of the relationship between patients and slides. Please rephrase or replace it.
    2. None of the figures covered in the manuscript are cited/interpreted in the paper.
    3. It is recommended that the AUC be added to the evaluation metrics.
    4. In terms of the concept of MIL, the paper treats the patients and the slides as instances of the same level, so what is the bag in this case?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a multi-level multi-instance learning (ML-MIL) problem that aims to consider the relationships between patches, slides, and patients. This concept presented is novel. However, A fairly general approach to attentional mechanisms/transformer is used to explore the relationships between the levels.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper introduces a multi-level multi-instance learning (ML-MIL) framework for patients’ diagnosis, treating slides as instances and patients as bags. It allows for interaction between patient-level and slide-level information and corrects their respective features. The experimental results on two pathological image datasets demonstrate the effectiveness of the proposed method. Reviewers acknowledge the focus on the multi-level multi-instance learning problem of pathological images and find the proposed approach novel and sufficient in supporting the claims made by the authors.

    As there were conflicting opinions among the reviewers of this paper, it is important to verify whether the concerns raised by Reviewer 1 were addressed in the rebuttal. The reviewer concerns about the novelty of the proposed method, suggesting that forming two MIL problems for patients’ diagnosis is intuitive and self-attention has been widely used before. Two reviewers also suggest including more comparison methods including two-stage MIL in the experiments to enhance the convince of the proposed framework.




Author Feedback

We sincerely appreciate the reviewers’ positive evaluation and recognition of the novel task setting and concept of ML-MIL in our paper:

  • Novelty and contribution (R2-“a new setting”, “The proposed method …. This idea is unique”; R3-“concept presented is novel”, “a novel general framework for the unique … ML-MIL problem”).
  • Results (R2-”results are sufficient to demonstrate their claim”; R1-“Experiments demonstrate its effectiveness on two pathological image datasets”).
  1. (R1) Reiterating novelty: To begin with, the idea of addressing the ML-MIL problem across “patch-slide-patient” levels has not been formally acknowledged or sufficiently tackled. Forming two MIL problems for patient-level diagnosis may seem intuitive, but it has not been properly addressed, because applying MIL at the patient-slide level often results in subpar performance due to the limited number of “patient” bags. We are the first to formally introduce the ML-MIL concept, highlighting its clinical importance and necessity, as real-life diagnosis and treatment must occur at the patient level, while most prior research concentrates on the slide level. Furthermore, our innovation is not whether employing self-attention or transformers, but the proposed approach “P&SrE” to use transformers to treat slides and patients as same-level instances. This method overcomes the issue of having fewer higher-level bags, leading to more precise predictions at both slide and patient levels.

  2. (R1&R2) For “more comparison methods”: We conducted experiments using a new method CLAM (clustering-constrained-attention multiple-instance learning) in the CD-ITB dataset, but PLEASE NOTE this experiment results do not affect the validity of the proposed method. At the slide level, the ACCs are Clam (0.58) vs. Clam+P&SrE (0.59). At the patient level, the ACCs are Clam+MaxS (0.64) vs. Clam+MaxMinS (Acc 0.63) vs. Clam+P&SrE w/o PSFI (0.64) vs. Clam+P&SrE (0.69). All in all, our approach is a flexible framework, not a specific MIL method, and can be combined with MIL methods such as ABMIL and DSMIL. Comparisons with two-stage MIL (MaxS and MaxMinS) were conducted (Table 2), but current research on two-stage MIL methods is still limited.

  3. (R2&R3) Introduction and discussion about the related work of MIL, especially patient-slide level MIL: We acknowledge the limited coverage of patient-level MIL in our paper. Existing research predominantly focuses on the slide-patch level, neglecting the patient-slide level. This dearth of attention to slide-patient correlation motivated us to introduce the concept of ML-MIL for the first time in our article.

  4. (R1&R2) Concerns on the diagnostic process and slide quantity:
    • Regarding the choice of attention mechanism: The diagnostic process for doctors requires aggregating multiple slides for patient diagnosis and involves reviewing typical slides to double-check the diagnostic results. Information exchange and integration between slides and patient level is needed. Self-attention is more ideal for this purpose than other kinds of attention (such as cross-attention or doctors’ attention).
    • Regarding patient-level prediction and slide quantity: Aggregating slide results into patient-level outcomes is crucial for clinical guidance. In our study, we utilized the CD-ITB dataset (average 5 slides/patient) and the Camelyon17 dataset (average 5 slides/patient).
    • “Patient” refers to the pre-diagnosed person with ITB or CD. CD-ITB involves binary patient-level classification without normal labels, although some patients may contain normal slides.
  5. (R2) Necessity of patient-level feature vector: Patient-level features are crucial. Simply averaging slide interactions may underutilize transformer attention. Incorporating global information as a guide before inputting it into the transformer (e.g., class token in ViT) is vital for reasonable and understandable information interaction.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    R1 raised concerns about the novelty of the proposed technique, and I agree that it utilizes existing methods without significant modifications. However, I believe that the problem targeted in this study is an essential application for analyzing pathological images. As there are not many techniques available yet to address this specific problem, I consider it meaningful. The authors conducted thorough experiments comparing with other state of the art techniques. It would be beneficial to incorporate the results presented in the rebuttal into the final paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the reviews R1 had major concerns about the method’s novelty as well as comparison. From the rebuttal I feel that the authors have sufficiently addressed the concerns around novelty as well as the comparison using CLAM. I feel the paper has some merits and warrants an accept



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Paper presents multi-level multi-instance learning to integrate slide- and patient-level information for diagnosis. Strengths include interesting problem formulation and reasonable validation across 2 cohorts. Major critique involves insufficient discussion of previous work and background, to explain the uniqueness in applying MIL in this context. The rebuttal attempts to explain this, though in limited fashion - it mostly reiterates the intuitive nature of the proposed framework (mentioning a couple of specific papers would have helped). Comments on attention and methods are well addressed. Additional comparison experiments were conducted, though the results don’t necessarily indicate the improvement via the proposed method.



back to top