Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jingwei Zhang, Xin Zhang, Ke Ma, Rajarsi Gupta, Joel Saltz, Maria Vakalopoulou, Dimitris Samaras

Abstract

Histopathology whole slide images (WSIs) play a very important role in clinical studies and serve as the gold standard for many cancer diagnoses. However, generating automatic tools for processing WSIs is challenging due to their enormous sizes. Currently, to deal with this issue, conventional methods rely on a multiple instance learning (MIL) strategy to process a WSI at patch level. Although effective, such methods are computationally expensive, because tiling a WSI into patches takes time and does not explore the spatial relations between these tiles.To tackle these limitations, we propose a locally supervised learning framework which processes the entire slide by exploring the entire local and global information that it contains. This framework divides a pre-trained network into several modules and optimizes each module locally using an auxiliary model. We also introduce a random feature reconstruction unit (RFR) to preserve distinguishing features during training and improve the performance of our method by 1% to 3%. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and LKS, highlight the superiority of our method on different classification tasks. Our method outperforms the state-of-the-art MIL methods by 2% to 5% in accuracy, while being 7 to 10 times faster. Additionally, when dividing it into eight modules, our method requires as little as 20% of the total gpu memory required by end-to-end training. Our code is available at https://github.com/cvlab-stonybrook/local_learning_wsi.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_19

SharedIt: https://rdcu.be/cVRrA

Link to the code repository

https://github.com/cvlab-stonybrook/local_learning_wsi

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In this work, the idea of training a network block by block (locally supervised networks) has been adopted to feed Whole Slide Images to the network. Random feature reconstruction has been proposed to improve the last layer feature quality rather than the whole slide reconstruction. Results have been discussed in three microscopic WSI datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea is novel in pathology and seems applicable; however, comparisons are not comprehensive. The main strengths of the paper: 1- Feeding WSI to the network is beneficial as it preserves spatial information. However, 5X is not what pathologists look at in many cases. 2- Recunstractiong a part of WSI is a good idea to overcome the size difficulties; however, ten patches 128x128 seems intuitive without further investigations. 3- Apply Experiments in 3 public datasets is a good strategy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Experiments are not comprehensive at all. 1- As you were able to feed bigger images in K=4, why other experiments like 10x and 20x are not reported. This is really interesting if we can see even low numbers with higher magnification in some datasets. 2- Selecting batch size one does not make sense to me. One image and class is one, then the next image class two. Even if you pad images or crop images and feed at least 4 was much more meaningful. 3- Why just compare with MIL methods? To me, this is not even a MIL strategy. MIL networks use a part of patch information, and this is not a fair comparison to compare with just MIL methods. 4- I believe SOTA is ignored. You may report the original paper number (SOS for LSK) at least and then discuss the advantage of your method. Also, readers want to know the comparison with many other states of the art WSI classifications. It may be good to compare results with results of already published papers with the same experimental setup (at least for lung, I know 4-5 papers with more than 95% ACC) 4- big question mark here. “only. Our method was able to fine-tune the ImageNet pre-trained weights to adapt to the medical image domain, while other methods directly used the ImageNet pre-trained features.” What does that mean? Other models were not able to fine-tune!

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Seems OK.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    You must edit the paper at least with Gramerly. “the golden standard for many” Gold standard “WSIs is challenging due to their enormous image sizes.” I in WSI stands for Image; then image is repetitive. WSIs are “which processes the entire slide exploring the entire local and global information that contains” think about rewriting the sentence, specially contain “Duan et al. [9, 8] proposed to train each module by minimizing intra-class and maximizing inter-class distances to improve the data separability.” 8 is a survey and the main contribution of 9 is much wider than Fisher idea. “LK = Lcls(H(FK(xK−1)), y)” consider semicolon “,” after equations that continue by where

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good innovation with questionable experments. I just accepted to give the paper chance to survive.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper proposes to employ the locally supervised learning scheme [24] for bypassing the memory bottleneck that exists in end-to-end WSI representation learning. To further deal with the memory issue, the authors replace the reconstruction loss in the original locally supervised learning scheme [24] with a random feature reconstruction unit.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Given the memory bottleneck for WSI representation learning, the idea of using the locally supervised learning scheme is interesting and reasonable.

    The motivations and solutions are clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The technical novelty is limited. The main difference between this work and [24] is the random sampling step added to the reconstruction unit.

    Some results have not been reported. For the LKS dataset, Please report the result of the original paper, which is the SOS method [18].

    There is no explanation why 10 locations are used for the RFR model.

    There is no hyperparameter to control preference of classification and reconstruction losses. An ablation study is needed.

    The batch size is 1. There is no discussion around this. Doesn’t this affect the stability of training?

    The equations have not been referenced.

    This is not clear what do TCGA-NSCLC and TCGA-RCC stands for

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation details are available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Include the results of the SOS method for the LKS dataset.
    • Add discussions around batch size, number of locations (10) you use for the RFR model
    • Reference your equations
    • Introduce the abbreviations
    • Improve the writing
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using locally supervised learning for WSI representation learning is interesting. However, the paper may not be ready for publication in its current form. Please refer to the weaknesses section for more detailed comments.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    For the LKS dataset, I requested to report the results of the original paper, the SOS method [18]. In response, the authors argued that

    ” Direct comparison with SOS was not possible since the authors neither discuss their validation set nor provide the source code”.

    However, both dataset and source code is publicly available in this repo:

    https://github.com/cradleai/LKS-Dataset

    Further, this is not clear to me that if the authors don’t have access to the splits, then how you came up with an ACC of 90.73%

    As a result, I change my rating to reject.



Review #2

  • Please describe the contribution of the paper

    To overcome the spatial relations loss of conventional multiple instances learning for WSIs classification, this paper proposed locally supervised learning by splitting deep network into multiple gradient-isolated modules.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The entire whole slide images can be trained on GPU. they evaluated their method on three public datasets and achieved satisfactory results. This work is interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In 2.1, what is necessity to use feature reconstruction unit?
    2. In RFR, the detailed parameters are suggested to give.
    3. At the same position, how does random ensure that most positions are ergodic? The corresponding proof was not found in the following text. And, the reason choosing cropped size is memory?
    4. In LKS dataset, why the magnification was 4x, which different from other two dataset. Meanwhile, suggest to explain the reasons for choosing 4-5 magnification.
    5. How are the number and location of instances determined?
    6. The difference between different number of Module block is minor, however, no specific analysis and verification were given.
    7. For results, it is suggested to add statistical analysis of significance, or cross-validated.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    For all code related to this work that they will release if this work is accepted

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. In 2.1, what is necessity to use feature reconstruction unit?
    2. In RFR, the detailed parameters are suggested to give.
    3. At the same position, how does random ensure that most positions are ergodic? The corresponding proof was not found in the following text. And, the reason choosing cropped size is memory?
    4. In LKS dataset, why the magnification was 4x, which different from other two dataset. Meanwhile, suggest to explain the reasons for choosing 4-5 magnification.
    5. How are the number and location of instances determined?
    6. The difference between different number of Module block is minor, however, no specific analysis and verification were given.
    7. For results, it is suggested to add statistical analysis of significance, or cross-validated.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Innovation of method and strong evaluation

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    Although patch-wise multi-instance learning is the mainstream, this paper introduces an alternate that enables the processing of an entire pathology image directly. The modules of the whole model were locally supervised so the computation memory could be saved for larger input images, and a random feature reconstruction unit was proposed to preserve the discriminative features of local supervision for large pathology images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Introduce a novel method that directly processes the entire pathology image, instead of patches. This should be interesting for the community.
    2. The reconstruction part in the previous locally supervised method is too costly for pathology images. The authors propose a novel random feature reconstruction unit to adapt effectively.
    3. The proposed method outperforms multiple multi-instance-based benchmarks in classification tasks using three different datasets.
    4. The proposed method is memory and computation efficient compared with patch-wise methods. These are attractive attributes for pathology image classification in practice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Did not clarify that whether there are other related works tried processing the whole slides images directly, instead of the patch-wise methods.
    2. In comparison, other patch-wise multi-instance learning benchmarks used image net pretrained featured were used directly, while the proposed method finetuned the weights. However, some methods such as contrastive learning can be used to transfer the ImageNet pretrained weights to pathology images.
    3. Why were the gated attention multiple instance learning, instead of a simple global average pooling and a multi-layer perception, used in (auxiliary) classifiers?
    4. It is expected to illustrate how the model (ResNet34) were divided into K parts.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to authors, the code will be public. Also, the datasets are public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It will be interesting if the prediction accuracy of each auxiliary classifiers can be displayed. Increasing accuracy is expected to support that the modules extract more discriminative information from the input. It seems possible that the modules did not learn anything new but just keep the information of input.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper is well written and presents interesting ideas. However, some pretrained methods for patch-wise multi-instance learning should be considered for fair comparison, and some other key information is expected to add.

  • Number of papers in your stack

    1

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The author replied most of questions from reviewers. The author claimed that their method performed comparable (ACC in the same range) with the MIL SOTA with obvious memery efficiency. Although the complete answers to some concerns repuired more experiments and were limited to computation memory (e.g. finetuned MIL method, 20x images, cross-validation with statistical analysis of significance, etc) , I still think it is a good attempt (novel and memory-efficient) to process WSI directly.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a block-by-block design to process the entire WSI directly with optimized GPU memory consumption. The reviewers unanimously agree with the innovation, or at least call this as an interesting approach. However, there are concerns regarding the clearness of the methodology, the rigor of the evaluation and the rationale/validity of some design, which need to be carefully addressed before the paper can be further considered. Please see the reviewer comments for further details. Here are important points to address in the rebuttal:

    The clearness about the experimental setting and results

    The reviewer’s concern about details in methodology

    language issues

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3




Author Feedback

We thank reviewers and AC for constructive criticism and positive evaluation. Here we address major concerns, while minor concerns including language, abbreviations and clarity issues will be corrected in the camera ready version.

Experimental setting and results:

  • R1&R3 worry that a batch size of 1 may cause instability. Most histopathology MIL approaches also used a batch size of 1 [12,15,21]. To mitigate such potential instability, we used a common optimization practice of accumulating gradients of 8 batches before updating the parameters.

  • R1&R2 wonder about the choice of resolution. In fact, the applicability of our method is independent of the selected magnification. Given the application and the available hardware, higher magnifications could also have been selected (up to 10X given our current memory limitations). We chose a low magnification for simplicity, faster computation and comparisons in three datasets where we report performance close to SOTA. Anecdotally 5X appears to perform well on various classification tasks. For the LKS dataset, during submission we inadvertently ran experiments at 4X resolution. We will report 5X results of 89.76% accuracy (ACC) and 0.9562 AUC which are comparable and confirm our claims. Additionally, the 10X results are 89.86% ACC and 0.9428 AUC.

  • RE: hyperparameter choice (R2&R3): We empirically chose the number and size of patches sampled for the RFR on the validation dataset, following standard ML practice. Through ablation we found that the two losses in Sec 2.2, L_{cls}, L_{rec} contribute equally. We will add to the final.

  • R1 asks about fine tuning: We clarify that weakly supervised MIL methods for histopathology classification (e.g. [17]) often use pretrained weights because fine-tuning is memory-expensive. However, using local learning alleviates the memory need, generating better representations for each specific task.

Novelty:

  • R3 claims limited novelty. We believe that the use of local learning (e.g. [24]), in MICCAI related applications is novel. We introduce local learning to WSI classification which is a nontrivial task due to large image size: directly applying [24] is not possible; it still requires a significant amount of memory as discussed in Sec. 2.1. To solve this problem, we introduce a sampling-based approach RFR and show it is effective in Tab. 4, proving its necessity (R2).

Comparisons:

  • Concerning lack of comparisons with respect to SOTA (R1&R3), and in particular SOS [18] for the LKS dataset. Direct comparison with SOS was not possible since the authors neither discuss their validation set nor provide the source code. We obtain ACC of 90.73% in LKS same as [18] (we will add this comparison to the paper). We provide additional experiments on another 2 datasets, proving generalizability.

  • R1 claims that we ignore the SOTA for the TCGA-NSCLC dataset, with (unfortunately NOT specified) papers reporting >95% ACC. After searching, we have not identified papers with such high ACC. Thus we wonder whether the reviewer was referring to AUC. Indeed, our method reports 93.8% AUC on the specific dataset. According to our knowledge the SOTA for this clinical task is 96.3% AUC (95% CI: 93.7–99.0) in [17]. For experimental uniformity, we used the exact same splits on all comparisons and reported ACC and AUC. Thus we had to re-run the two different variants of [17] on our splits scoring 91.9% AUC. We believe our AUC is in the same range as the SOTA, while 7-10X faster.

  • R1&R4 mention missing comparisons with non-MIL approaches. We discussed some non-MIL methods on page 2. We showed our method outperformed [20], a non-MIL method, in supp Tab. 2. Overall, we chose MIL approaches as the major comparison since mostly MIL methods achieve SOTA in histopathology image classification. The long-term goal of computational pathology is to process WSI directly and we strongly believe that our work is a step towards this goal.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents an innovative block-by-block design with optimal GPU memory consumption. In the first-round of review, most of the reviewers appreciate the innovation of this method. The major concerns focused on the details of experimental setting and rigor of experiments. I think the rebuttal has addressed the most of such concerns except for the SOS comparison. But this concern has been alleviated by comparing with CLAM method. In my opinion, this approach is innovative and might impact broader applications in medical image analysis. For these reasons, the recommendation is toward acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a WSI classification method. The reviewers were mostly positive about the methods but there was one main concern about the result comparison with [18]. The authors claimed that direct comparison with results reported in [18] was not possible due to missing information about data split, which might indeed be the case. The final version should be updated to clarify all the main comments from reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes locally supervised learning by splitting deep network into multiple gradient-isolated modules, to process the entire WSI directly with optimized GPU memory consumption. The idea is novel and well-motivated. Most of reviewers are positive about the paper in the initial phase and the remaining minor problem is also addressed in the rebuttal. I recommend to accept the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top