List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Jiawei Yang, Hanbo Chen, Yu Zhao, Fan Yang, Yao Zhang, Lei He, Jianhua Yao
Abstract
Whole slide image (WSI) classification often relies on deep weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels. Yet the descent performance of deep learning comes from harnessing massive datasets and diverse samples, urging the need of efficient training pipelines for scaling to large datasets and data augmentation techniques for diversifying samples. However, current MIL-based WSI classification pipelines are memory-expensive and computation-inefficient since they usually assemble tens of thousands of patches as bags for computation. On the other hand, data augmentations, despite their popularity in other tasks, are much unexplored for WSI MIL frameworks. To address them, we propose ReMix, a general and efficient framework for MIL based WSI classification. It comprises two steps: reduce and mix. First, it reduces the number of instances in WSI bags by substituting instances with instance prototypes, i.e., patch cluster centroids. Then, we propose “Mix-the-bag” aug- mentation that contains four online, stochastic and flexible latent space augmentations. It brings diverse and reliable class-identity-preserving semantic changes in the latent space while enforcing semantic-perturbation invariance. We evaluate ReMix on two public datasets with two state-of-the-art MIL methods. In our experiments, consistent improvements in precision, accuracy, and recall have been achieved but with orders of magnitude reduced training time and memory consumption, demonstrating ReMix’s effectiveness and efficiency. Code is available.
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_4
SharedIt: https://rdcu.be/cVRq1
Link to the code repository
https://github.com/TencentAILabHealthcare/ReMix
Link to the dataset(s)
https://ieee-dataport.org/open-access/unitopatho#files
https://camelyon16.grand-challenge.org/Data/
Reviews
Review #1
- Please describe the contribution of the paper
This work introduces a novel framework coined ‘ReMix’ for whole slide image (WSI) classification that leverages latent space augmentation (LA) on WSI instance cluster prototypes under the multiple instance learning (MIL) paradigm. WSI bags are reduced by replacing instances with cluster prototypes, enabling MIL parallelization, with several LA augmentation strategies applied to the prototypes facilitating generalization. The work is well motivated and provides extensive experiments on two public datasets showing very competitive results. Also, the introduced framework is agnostic to existing state-of-the-art MIL models - highly scalable and has plug-and-play functionality.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is easy to read and well-motivated. The use of latent space augmentation for Histopathology is novel and addresses a relevant problem for WSI analysis in healthcare.
- The authors report competitive results on public datasets with significant gains over recent methods.
- Augmentation strategies for WSI classification are often under-explored, thus; the use of LA augmentation and it’s variants is in this work very interesting. Especially in the multi-class setting when considering different pathologies, LA facilitates better model generalization.
- I appropriate the extensive experiments to validate LA, including ablations on computational efficiency and several hyper-parameters regarding.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- While the authors ablate each of the proposed augmentations, it is unclear which augmentation is more useful in a general sense i.e., the considered baselines (ABMIL,DSMIL) show gains with different augmentations.
- ‘Mix-the-bag’ is a key contribution of this work. However, results only show isolated evaluations on different components of the strategy. It would be beneficial to have included ‘Mix’ i.e., all combinations of the augmentations ( append + replace + inter. + covary ) in the evaluation to better assess the generality of the idea.
- It is unclear how such augmentations would work with recent works that have spatial WSI reasoning such as TransMIL [1]. This work should have been included in the evaluation to better hightlight potential failure cases.
[1] Shao et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. NeurIPS (2021)
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Authors checked “Yes” for most questions on the reproducibility of the paper
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
I am bit concerned the choice of baselines (ABMIL,DSMIL) is not very indicative of the generality of this method; though I do understand the limited space in the manuscript can be a factor. To expand on this, from a technical standpoint, DSMIL implicitly computes prototypes and thus LA could be directly applied with without resorting to K-means cluster learning as a prior step. Is there a reason only these baselines were employed?
It would be interesting to see whether ReMiX can work with recent methods such as TransMIL [1]. Especially given that ReMix-DSMIL and it’s variants had marginal improvements on CAMELYON-16 over the baseline.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Overall, this work provides strong empirical evidence and has no major weaknesses. It would be interesting to see more comparisons with other recent MIL methods, however the work in its current form is sufficient to be considered at MICCAI. I believe the research community working in WSI can benefit from this simple and effective technique.
- Number of papers in your stack
4
- What is the ranking of this paper in your review stack?
1
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This paper propose ReMix, a general and efficient framework for WSI classification, which has two steps, reduce and mix. ReMix is evaluated on two public datasets and experimental results have shown its effectiveness and efficiency.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A general, simple yet effective method to improv the training efficiency of MIL framework for WSI classification.
- An efficient latent augmentation for MIL-based WSI classification.
- Improved performance can be observed on two public datasets.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The novelty is limited, either Reduce or Mix. They are somewhat straight-forward processing strategies which have been used in WSI-related application before. See my detailed comments below.
- No comprehensive comparisons with other MIL baselines.
- Not easy to be used in the practice.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors claimed the code will be made available.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- The proposed model contains two parts, reduce and mix. Though it somewhat looks like interesting, novelty is still limited. Reduce part uses K-means on the patches’ representations to obtain K clusters as prototypes to represent the bag. Such similar idea has been proposed in the previous WSI work, like WSISA. In addition, Mix part includes latent augmentation and the technique behinds such part does not sound novel.
WSISA: Making Survival Prediction from Whole Slide Histopathological Images, CVPR 2017
-
Though two baselines are compared, it is not clear how are they implemented. Does the author implement the baseline models following the original setting? Also, several new WSI-related MIL methods are not compared. CLAM: A Deep-Learning-based Pipeline for Data Efficient and Weakly Supervised Whole-Slide-level Analysis, Nature Biomedical Engineering, 2021.
- It is not easy to use the proposed model in practice. From Table 1, we could see different augmentation strategies achieve different results. How to decide the one for use is not clear and easy.
- Another concern is the lack of interpretability of the proposed model. Many MIL-based models show attention maps on patches/instances, it seems there is not applicable for the proposed model to have such visualizations.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The authors proposed an efficient ReMix for WSI classification. Though promising results have been shown, the novelty is limited, and it looks like an add-on model based on existing MIL methods. Also, the lack of interpretability and selection of augmentation technique will make it more challenging in practice.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
4
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
To address the WSI classification with high resource consumption, authors propose a simple yet effective MIL framework with two steps. In the first reduce step, the centroids at the feature space serve as instances for MIL, instead of original patches. Then, the mix step augments the reduced instances to regularize the training of the whole MIL framework. With the help of advanced pretraining for high-quality instances, the proposed ReMix framework can complete the WSI classification efficiently.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This work aims to address the resource consumption of MIL framework, which is a bottleneck in WSI classification. From the experiments, the proposed ReMix can improve the performance and reduce the resource cost at the same time.
- The proposed reduce and mix steps are simple yet effective, and can be easily extended to existing MIL framework.
- Significant improvements for two baselines on two datasets.
- The work is well written.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The training budgets comparison in Table 2 may be over-claimed, ignoring the cost of necessary pretraining.
- The reduce step may lose the spatial information, which is important for specific WSI tasks.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Hyper-parameters are complete. Authors are going to release the code.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
-
The training budgets comparison may be over-claimed. The success of reduce and mix depend on the high-quality pretraining, which would demand a lot of resources and time. However, the comparison in Table 2 excludes the time and memory demand of pretraining, which is not fair to SOTA baselines. Moreover, compared with the training budgets, the resource cost of inference is more significant for the practical applications.
-
The summary of existing WSI works that reduce the resource consumption may be improved. For example, [1] randomly samples specific number of patches as an augmentation of bags, and [2] utilizes the attentive regions with a sparse tree. Authors are suggested to discuss the differences or advantages over these works. [1] https://dblp.org/rec/conf/cvpr/HashimotoFKTKKN20 [2] https://dblp.org/rec/conf/aaai/0013ZCHHY21
-
For some WSI tasks related to regional proportion (e.g., HER2 scoring), the reduce step may lose necessary spatial information of massive patches, which would restrict the performance. Authors are suggested to discuss potential limitations of the method.
-
In abstract, “the descent performance of deep learning comes from harnessing massive datasets” is a little confusing. Please check the description.
-
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This work address reduce high resource consumption for WSI classification. The novelty of this work is enough for the conference. Significant performance improvements.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
1
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper describes a simple but effective way of augmenting data for MIL tasks. The reduce step is not novel: it appears to be identical to the K-means clustering approach to negative mining described in https://doi.org/10.1038/s41598-021-88494-z and CLAM (https://doi.org/10.1038/s41551-020-00682-w) also leverages clustering. Similarly, the idea of augmentation in latent space is not new, however the mechanism of applying it to bags in the MIL framework is new as far as I am aware. A particular strength of this work is that the methodology can be applied to any MIL framework. The paper is well written and the rationale for the methodology is very clear. I would also like to have seen the result of combining the augmentation methods into a single experiment – would that be better than selecting just one method?
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
1
Author Feedback
We sincerely thank all the reviewers and meta-reviewers for their hard work and devoted time.
Our general responses are listed below:
- Combining all the latent space augmentations (augs). After submission, we continued to improve existing results by combining all the augs. Some of the obtained results could be better than the current presented best results, but not all. It is anticipated and in line with experience from other tasks and different fields that too strong augmentation (too hard samples) is not beneficial for training, e.g. Table 2 from [1].
We will include the results using combined “Mix” augmentation in the camera-ready paper.
[1] Appalaraju, et al. (2020). Towards good practices in self-supervised representation learning. arXiv:2012.00868.
- Could ReMix be applied to spatial-aware MIL methods?
In the paper, we claimed ReMix is general to most of the spatial-agnostic MIL methods. Here we elaborate on if it could be applied to some spatial-aware MIL methods.
The most effortless extension is to apply “covary” or “interp.” augs to each patch. In that case, the position information of patches is unaltered but their representations are augmented by adding reliable semantic translations, following our initial motivation. The prototype querying and augmenting processes can be implemented efficiently with matrix operations. Such an extension can be flexibly applied to different methods that seperate patch encoder from MIL classifier, e.g., TransMIL.
The “reduce” step for spatial-aware methods would be tricky depending on the use cases, e.g., the patch size, and the information bottleneck in tasks of interest. We would be glad to continue to explore this direction.
@R3: We will add the potential limitations of ReMix to spatial-aware MIL methods in the final version.
LA in DSMIL without K-Means (R1) The implicit prototype computing process in DSMIL is dynamic as training goes (obtained after several layers). However, the cluster prototypes we use for aug are built from input features. The misalignment between these two feature spaces would make the augs less meaningful. An alternative is to maintain an online prototype bank storing DSMIL computed prototypes, or iteratively computing new prototypes after each training epoch. Both modifications can be feasible, but break the trend of the “simple” framework of ReMix.
Implementations (R1, R2) and interpretability (R2). @R2: Implementations: In the submission, we claimed “We use DSMIL’s codebase for implementation and training.” Therefore, both baselines are implemented from DSMIL’s official codebase, following original settings.
@R1: We chose these baselines since they are from the same unified codebase, ensuring fair comparisons. Besides, DSMIL is also a recent SOTA method published in CVPR2021 for WSI classification. We believe improving over a competitive SOTA can demonstrate the effectiveness of our method.
@R2: Interpretability: We did not change any part of the model. Thus, we did not see a barrier in visualizing attention maps. We are willing to compare the attention map differences before and after applying the “Reduce” step in the future.
- Training budget and inference speed (R3). In Table 2, the SOTA baselines also use the features from a pretrained model. Therefore, the budgets compared there are exclusively for MIL classifier training. The training time for full-bag setting can be accelerated by pre-loading all bag features into memory but at the cost of hundreds of times of memory, which could cause “out-of-memory” for large datasets.
Inference speed: one of our post-submission experiments showed that inferring over the reduced-bags would give comparable, if not better, results in UniToPatho dataset and slightly decreased, but comparable as well, performance for Camelyon16 dataset, showing the potential of using reduced-bags also at inference. We plan to release these results in the extended work.