Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hongyi Wang, Luyang Luo, Fang Wang, Ruofeng Tong, Yen-Wei Chen, Hongjie Hu, Lanfen Lin, Hao Chen

Abstract

Whole Slide Image (WSI) classification remains a challenge due to their extremely high resolution and the absence of fine-grained labels. Presently, WSI classification is usually regarded as a Multiple Instance Learning (MIL) problem when only slide-level labels are available. MIL methods involve a patch embedding module and a bag-level classification module, but they are prohibitively expensive to be trained in an end-to-end manner. Therefore, existing methods usually train them separately, or directly skip the training of the embedder. Such schemes hinder the patch embedder’s access to slide-level semantic labels, resulting in inconsistency within the entire MIL pipeline. To overcome this issue, we propose a novel framework called Iteratively Coupled MIL (ICMIL), which bridges the loss back-propagation process from the bag-level classifier to the patch embedder. In ICMIL, we use category information in the bag-level classifier to guide the patch-level fine-tuning of the patch feature extractor. The refined embedder then generates better instance representations for achieving a more accurate bag-level classifier. By coupling the patch embedder and bag classifier at a low cost, our proposed framework enables information exchange between the two modules, benefiting the entire MIL classification model. We tested our framework on two datasets using three different backbones, and our experimental results demonstrate consistent performance improvements over state-of-the-art MIL methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_45

SharedIt: https://rdcu.be/dnwJ0

Link to the code repository

https://github.com/Dootmaan/ICMIL

Link to the dataset(s)

https://camelyon17.grand-challenge.org/Data/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a modified MIL algorithm that is able to to couple information at both the instance level and the bag level to improve the model training process. It is based on the simple assertion that the decision boundary at the bag level and at the instance level should be similar provided that the dimensionality of the feature space is the same. An elegant iterative method of alternate updating the weights of the instance encoder and the bag classifier is described and its efficacy is demonstrated on one public dataset and 1 new, private dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Training a an end to end model with slide level labels is very difficult in digital pathology because there is usually a very imbalanced dataset and the positive signal is drowned out by the negative instances. The method suggested here, where the bag level model is used as a teacher to label instances, is simple and elegant and appears to be very generalizable.
    2. The proposed method is tested against a large number of alternative approaches, including some of the most popular methods at present and some very new approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some of the figures (2 and 3 in particular) are not particularly useful and it took me a while to understand how the approach was overcoming the problem of patch imbalance - it may be possible to make the key innovation in this paper clearer.
    2. I would like to have seen a better description of the training data - specifically the #patches from tumour regions and non-tumor regions in the Camelyon dataset and, if possible, the number of tumour vs stromal/background patches in the HCC dataset. In the camelyon dataset were the patches randomly sampled or were the annotations used to get a more balanced ratio of positive and negative instances?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    For the experiments on the Camelyon data there appears to be sufficient information and code available to repeat the experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The core idea behind this method is very simple and the proposed method of coupling the instance and bag information is also elegant but simple - this means that the work should be relatively easy to transfer to a wide variety of computational pathology tasks. The explanation could be clearer and a couple of the figures do not seem to add information but otherwise the paper is well written. The main issue I have is in understanding figure 5. For the camelyon dataset there is a huge imbalance in the data - perhaps >80% of the tissue patches in positive bags correspond to negative instances. In figure 5 are the positive instances selected from regions of the slide corresponding to the tumours or are they randomly sampled? If they are random then why are almost all of the instances from positive bags clustering around the positive bage in tsne space - at least 70% of these instances should cluster near the negative bag? This figure is why I would like more information about how the patches were sample for training. In Table 2 the results for DTFD-MIL[22] are lower than those reported in the original publication and it is not stated what variation of the DTFD-MIL method was used. Were the results on the Camelyon data in table 2 taken from the literature or did you rerun all of the experiments?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The core idea behind this paper seems obvious in retrospect but as far as I am aware no one has made use of this approach before. The mechanism of coupling the bag and instance level information is elegant. I would, however, like some clarification on the patches used for the tsne plot and for training - the authors are clear that the patch level annotations were not used in training the network but I would like to be sure that the annotations were also not used when sampling the patches.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    My minor concerns about the smpling were fully answered -the authors should incorporate their responses in the final manuscript. I agree with the authors that the gigapixel size of the images does impede our ability to train end to end using existing methods - it may be possible to squeeze images into GPU memory using various schemes but the models will still struggle to learn if the fraction of tissue on the slide related to the signal is very small compared to the un-informative regions - the authors did well to choose Camelyon as one of their tasks as this does illustrate that exact problem. I think this paper offers a simple solution to a difficult problem and I certainly intend to try out this approach myself.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel framework called ICMIL for WSI classification which iteratively couples the patch feature embedding process with the bag-level classification process to enhance the effectiveness of Multiple Instance Learning (MIL) training. ICMIL enables the fine-tuning of the patch feature extractor, thus boosting the performance of MIL-based WSI classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation is straightforward and the writing is easy to understand.

    2. Experiments are comprehensive. Multiple existing methods are included for comparison.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. EM-based methods are widely used in previous methods and improve performance to some extent. It is not surprising to see the boosted results with EM-based iterative training.

    2. The improvements of the proposed EM-based iterative training over existing MIL-based methods are marginal given the introduced intensive computation.

    3. From table 1 (a) we can observe that as the interaction goes, the performance finally drops. It is difficult to determine when to stop the interactive training.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors indicates that code will be made available upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    It would help readers to evaluate the proposed method if an analysis of model and time complexity is provided.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The writing is good and results are somewhat convincing.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    remain unchanged



Review #3

  • Please describe the contribution of the paper

    This paper proposed an iteratively coupled multiple Instance learning framework for whole slide image classification.

    The technique novelty lies in the iteratively coupled scheme to model the relation between the instance and bag.

    Experimentally, it is validated on two benchmarks with state-of-the-art performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the proposed iteratively coupled scheme is interesting and moderately novel. It makes multiple feature interaction between the instance and bag representation.
    • the state-of-the-art performance
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The technique presentation of this paper is very vague and inaccurate (1) There is no specific dataflow and network structure figure, which can exactly reflect each step of the dataflow. All the figures are from a high-level concept perspective, which is not beneficial for either implementation or understanding the details. (2) In the methodology section, equation 2 and 3. How do they correspond to the Fig3 and 4, and the iteratively coupled scheme? (3) Whether the data flow can warrant the permutational invariant property from instance representation to the bag representation?

    -Lack of comparison and discussion of recent MIL based related work. For example, [1] Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification. MICCAI 2021. [2] Deep multi-instance learning for survival prediction from whole slide images. MICCAI 2019 [3] Su, Ziyu, et al. “Attention2majority: Weak multiple instance learning for regenerative kidney grading on whole slide images.” Medical Image Analysis 79 (2022): 102462.

    -Some implementation details and basic claims are very inaccurate and misleading. (1) The proposed framework uses a backbone of ResNet-50, how about the other methods compared in Table 2? If some of the prior works use ResNet-18 or ResNet-34, then the outcomes can be rather unfair and misleading. (2) Regarding the compared methods in Table 2. Which among them are self-implemented, and which among them are cited? Please make it clear. (3) Introduction, paragraph two. The claim on ‘limited GPU memory’ and ‘high computational cost, end-to-end training of the feature extractor and bag classifier is prohibitive’ is not standing, which in turn affect the claimed motivation of this work. From the implementation detail, we can see that the proposed framework also splits the whole slide images into 256 \times 256 patches, which does not demand very burdensome computational cost.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    the authors promise to provide the source code. But the technique framework, presented in current form, is vague and impossible to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to the weakness part for improvement.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weakness is much more than the strength. Especially, the manuscript fails to detail the technique framework. It presents in a very vague way, and can be somewhat unconvincing. Some other critical details such as backbone comparison, related work and etc. also affect the claimed novelty. Therefore, the reviewer has to reject it.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thanks for the authors to provide a detailed rebuttal. After reading rebuttal, I think most of my concerns have been addressed. Besides, to reach consensus with the other reviewers, I am willing to improve my rating to weak accept.

    However, it means, the authors are highly suggested to improve their writing and presentation, especially on Fig.4, to clearly present the dataflow and detailed framework.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a modified MIL algorithm that couple information at both the instance and bag level to improve model training. An iterative method is used to update the weights of the instance encoder and the bag classifier, which is demonstrated to be effective on both a public and private dataset. The proposed framework called ICMIL enables fine-tuning of the patch feature extractor, which boosts the performance of MIL-based WSI classification. The novelty of the technique lies in the iteratively coupled scheme to model the relation between the instance and bag. The experiments validate the proposed method with state-of-the-art performance on two benchmarks.

    As there were conflicting opinions among the reviewers of this paper, it is important to verify whether the concerns raised by Reviewer 3 were addressed in the rebuttal. The reviewers pointed out that some implementation details and claims were inaccurate and misleading, and that the claim about limited GPU memory and high computational cost was not justified. Additionally, a reviewer requested more information on the training data, specifically the number of patches from tumor and non-tumor regions in the datasets. Finally, it was noted that it is difficult to determine when to stop interactive training.




Author Feedback

We thank all reviewers for their constructive comments. Our itemized responses are as follows.

  1. More information on the training data [R#1] a) #patches and tumor/non-tumor regions There are 3.7million patches in Camelyon16 and 17.4 million in the HCC dataset. Camelyon usually has ~10% tumor patches in each WSI, while HCC dataset usually has ~60% tumor area for each WSI. b) Were the annotations used during sampling? Annotations are NOT used during patch-sampling (tiling) for training.

  2. Comments regarding the relation between EM and ours [R#2] Apart from the EM-inspired part, we also propose to use the teacher-student scheme for finetuning the embedder, which leads to further performance boost as is shown in Table 1b. As is also pointed out by R#1, ICMIL is a novel, elegant, and generalizable approach.

  3. When to stop the iterative training [R#2] We decide the training iteration according to the validation results. As in Table.1a, we present a study on the ceasing point of the iteration for Camelyon. It is shown that 1 iteration brings obvious performance boost w/o consuming too much time, achieving a good cost-performance balance. Empirically, we recommend using 1 ICMIL iteration as it can bring satisfying performance w/ less extra training time.

  4. Justification of claims on ‘limited GPU memory’ and ‘high computational cost’ [R#3] WSIs are gigapixel images usually labeled on the slide level. It is highly infeasible to conduct an end-to-end training with that many pixels as inputs at the same time. MIL methods split a WSI into N non-overlapping 256x256 patches to separately convert them into 1024-dim features through the embedder. Eventually, these features still need to be recombined into a N*1024 matrix as the bag classifier’s input. MIL cannot accept a single patch as input since its patch-level label is unknown. Thus, the entire pipeline is conducted on the gigapixel WSIs with only bag-level labels, making end-to-end training very costly.

  5. Implementation details [R#3, R#1] a) Embedders of different MIL methods ResNet-50 is used as the embedder for all the MIL methods in our experiments, following the settings of previous SOTA method DTFD-MIL. b) Which methods are cited/self-implemented? We self-implemented all the methods on the HCC dataset. For Camelyon, we self-implemented Mean/Max Pooling and DTFD-MIL while citing other results from the DTFD-MIL paper [22]. We will clarify in the final version. c) The sampling method for Fig.5 Fig.5 does not use random sampling. For better readability, we obtain one positive instance from each positive bag and one negative instance from each negative bag w/ max pooling.

  6. How Eq.2&3 correspond to Fig.4 [R#3] L_c in Eq.2 is Consistency Loss in Fig.4 and L_w in Eq.3 is Weight Similarity Loss in Fig.4. We will revise Fig.4 for better readability in the final version.

  7. Comments related to figure about data flow [R#3] Thanks for the suggestion. We will present a clearer figure in the final version by using the classic ABMIL backbone as an example for network structure. The data flow will be highlighted to help understanding.

  8. Comparison with recent MIL works [R#3] As agreed by R#1 and R#2, we have conducted comprehensive experiments to compare our method against many alternatives, including some of the most influential methods [8,10,14] and some very new approaches [18,22]. We notice that the newest work mentioned in R#3’s comment, Attention2majority (MIA 2022), has achieved 89.1% AUC/82.0% F1/86.5% Acc on Camelyon16, and our ICMIL based on DTFD-MIL consistently outperformed it on all metrics.

  9. Guarantee of the permutational invariant property [R#3] The permutation invariant property is ensured by the aggregator in MIL pipelines. What our method does is enabling coupled training of the bag classifier and the instance embedder. It can be applied to any existing MIL methods without modifying their aggregator, thus not affecting this property of the used MIL backbone.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The issues pointed out by the reviewers have been mostly addressed in the rebuttal, and all reviewers have expressed their acceptance of the paper. I believe that if the points presented in the rebuttal are incorporated into the manuscript, it would be appropriate to accept the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After the rebuttal, reviewers have all agreed to accept the paper, mainly based on the technical contributions and simplicity of the methods. The authors should revise the method description and address the reviewers’ comments in the final version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    his paper proposes a modified MIL algorithm that is able to to couple information at both the instance level and the bag level to improve the model training process. The reviewers appreciate the innovation, high writing quality, and SOTA performance. This paper eventually get one strong accept and two weak accept. I would recommend accept this paper.



back to top