Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhimiao Yu, Tiancheng Lin, Yi Xu

Abstract

Improving the feature representation ability is the foundation of many whole slide pathological image (WSIs) tasks. Recent works have achieved great success in pathological-specific self-supervised learning (SSL). However, most of them only focus on learning patch-level representations, thus there is still a gap between pretext and slide-level downstream tasks, e.g., subtyping, grading and staging. Aiming towards slide-level representations, we propose Slide-Level Prototypical Distillation (SLPD) to explore intra- and inter-slide semantic structures for context modeling on WSIs. Specifically, we iteratively perform intra-slide clustering for the regions (4096×4096 patches) within each WSI to yield the prototypes and encourage the region representations to be closer to the assigned prototypes. By representing each slide with its prototypes, we further select similar slides by the set distance of prototypes and assign the regions by cross-slide prototypes for distillation. SLPD achieves state-of-the-art results on multiple slide-level benchmarks, and demonstrates that learning representations with semantic structures of slides can make a more closer proxy to WSI analysis. Code will be available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_25

SharedIt: https://rdcu.be/dnwcC

Link to the code repository

https://github.com/Carboxy/SLPD

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a novel method for building region-level (4096x4096) representations of whole-slide images (WSIs) using self-supervised learning and prototype-based learning. This work is based on HIPT (CVPR’22) and DINO. Specifically, the authors propose using 3 losses for learning region embeddings: the regular DINO loss (L_self), the intra-level loss to push regions from the same WSI close to some pre-computed WSI-level prototypes (L_intra), and the inter-level loss to push closer regions from different WSIs but with similar prototypes (L_inter).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • As corrected pointed out by the authors, most works in computational pathology focus on patch-level SSL, moving to regions and WSIs is the next step.
    • Incorporating an inter and intra-level loss based on slide prototypes to complement the traditional DINO loss is promising and, to my knowledge, novel.
    • The experiments and ablations are convincing, despite using “easy” datasets (see weaknesses)
    • The references are up-to-date and acknowledge recent SSL in CPath papers
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Unless I missed it, the authors didn’t test on region-level extraction with DINO followed by MIL or DS-MIL. To me, this is the most important baseline to understand if the proposed SLPD leads to better weakly supervised performance than DINO region-level embedding extractor.
    • The method builds region-level embeddings but cannot readily build slide embeddings. The authors propose taking the mean of the region embeddings to go to slide-level. However, this sounds like a sub-optimal design choice that should be discussed.
    • The method is tested on relatively simple datasets (lung and breast subtyping from TCGA). It would make the story more convincing if the method was tested on more challenging and diverse datasets, e.g., subtyping, grading, and survival. For instance, the performance on BRCA subtyping is saturating and it becomes very hard to identify the best methods now.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper should be reproducible if the code is made public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • see weaknesses +
    • The authors should consider renaming “image” (16x16 patches) to tokens, consistent with the Transformer terminology
    • What is MIL in Table 1? Is it CLAM, ABMIL? The authors should specify it.
    • Some of the word choices are surprising, and the text could be slightly toned down, e.g., the authors mention “Our community” (instead, The field?), “delicate patch- and WSI-level augmentations”, (what’s a delicate augmentation?), “For instance, HIPT [8], a milestone work,”, or “HIPT, a cutting-edge method” (HIPT is a significant contribution to the field, but I wouldn’t call it a milestone).
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method addresses a very relevant problem in computational pathology. The proposed method is sound and promising, and the paper is easy to follow. However, some baselines are missing (see weaknesses), and the datasets are relatively simple.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a self-supervised slide-level representation learning approach to leverage intra- and inter-slide semantic consistency for context modelling in whole-slide histopathology images. Specifically, the authors perform the slide-level clustering within each WSI to bring similar regions (i.e., morphologically similar phenotypes) closer together while simultaneously encoding semantically similar slide representations by building correspondences between region representations and cross-slide prototypes.

    The proposed method has been validated on two publicly available large-scale datasets: TCGA-NSCLC and TCGA-BRCA. Further, the paper also presents several ablation experiments to show the efficacy of the proposed technique; while obtaining significant improvement over the other weakly supervised benchmark methods.

    Overall, the idea is impressive and novel in the context of weakly Supervised Learning for Histopathology Image Analysis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    — The proposed intra- and inter-slide distillation strategy presented in this paper is interesting which has been shown to achieve a high semantic correspondence both within and across slides, where each cluster corresponds to a specific morphological phenotype. This strategy is very impressive, in my opinion, and can be extended well beyond other weakly-supervised applications, such as survival prediction or prediction of treatment outcomes in computational pathology.

    — A strong set of ablation and baseline experiments have been conducted to demonstrate the usefulness of the proposed approach on challenging weakly-supervised classification tasks.

    — Overall the paper is well written and easy to follow the core idea of the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed approach seems to be a combination of previously existing weakly supervised techniques in histopathology: 1) Hierarchical Image Pyramid Transformer (HIPT) [1]; 2) CLAM [2] with a similar intra- and inter-slide clustering strategy, which has been employed in [2].

    [1] Chen, Richard J., et al. “Scaling vision transformers to gigapixel images via hierarchical self-supervised learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [2] Lu, Ming Y., et al. “Data-efficient and weakly supervised computational pathology on whole-slide images.” Nature biomedical engineering 5.6 (2021): 555-570.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The proposed method has been built on the publicly available well-known method “Hierarchical Image Pyramid Transformer (HIPT)”. Therefore, one can reproduce the proposed approach with default parameters, as mentioned in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    — The proposed approach seems to be a combination of previously existing weakly supervised techniques in histopathology: 1) Hierarchical Image Pyramid Transformer (HIPT) [1]; 2) CLAM [2] with a similar intra- and inter-slide clustering strategy, which has been employed in [2]. Please explain the difference in the present approach vs. [2] and perform the benchmark against method [2], which is missing in the ablation studies.

    — In a more recent paper by Xie et al., 2020 [3], the authors propose end-to-end learning for joint feature extraction and global-level clustering to obtain highly discriminative features for WSI classification. How is the method proposed in this paper is different from [3], except that it has been applied in the context of vision transformers?

    — How one should select K and how does model performance vary by changing K (for slide-level clustering). Is there a way to know apriori what would be the optimal value of K for different datasets? In Table 2, even with different value of K, the performance is almost similar for both datasets. I am curious to know what could be the reason for this ?

    [1] Chen, Richard J., et al. “Scaling vision transformers to gigapixel images via hierarchical self-supervised learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [2] Lu, Ming Y., et al. “Data-efficient and weakly supervised computational pathology on whole-slide images.” Nature biomedical engineering 5.6 (2021): 555-570. [3] Xie, Chensu, et al. “Beyond classification: Whole slide tissue histopathology analysis by end-to-end part learning.” Medical Imaging with Deep Learning. PMLR, 2020.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The proposed intra- and inter-slide distillation strategy presented in this paper is relatively novel in the context of weakly supervised learning and can be extended to other weakly supervised problems in pathology.

    • Strong set of baselines and extensive ablation experiments that showcases the effectiveness of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors mainly extend HIPT with two clustering-based loss to help the encoder learn better slide-level representations. Experiments on two public datasets prove the effectiveness of the proposed loss comparing to HIPT.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors enhance HIPT with two well-motivated losses: intra-slice & inter-slice prototypes helps to supervise the encoder learn better correspondence.
    2. The experimenst are done with ten-folds on two public datasets as a strong evaluation to prove the effectiveness of the proposed methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Although the proposed two losses are well motivated, the technical contributions of this work are somehow limited comparing to HIPT. The idea of using clustering prototypes to help histopathology analysis have been explored in a number of prior works, including: Yan J, et al. Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis[J]. Computerized Medical Imaging and Graphics, 2022. Yang J, et al. ReMix: A General and Efficient Framework for Multiple Instance Learning Based Whole Slide Image Classification[C]. MICCAI 2022. Pan W, et al. Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation[C]. IPMI 2023.
    2. To bring more insights for the community, the authors can explored more clustering methods to identify prototypes such as Spectral Clustering, DBSCAN, etc. It will also interesting to visualize the final identified prototypes, are they highliy related to some specific tissues?
    3. According to Table 2, the authors chose k=4 for NSCLC and k=2 for BRCA for the best performance. Is it necessary to seach the optimal k for anothor new cancer type? The authors can also give some insights on why on BRCA, the perforamce gets worse with a larger k.
    4. Some typos exist. on Page 8, “Number of slide neighbors. As demonstrated in Tab. 2(#5∼7), the performance of SLPD is robust to the number of slide neighbors.” should be “Tab. 2(#8∼10)”
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good, codes will be available upon publication and details are provided in the paper to help readers to reproduce the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to the weaknesses where suggestions are also given.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical contributions are somehow limited to enhance HIPT with clustering based losses and some highly related prior works are not discussed.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have patially addressed my concerns. The discussion about other clustering-related WSI analysis methods and the new experiments can be added during the revision. I am happy to improve my rating from weak reject to weak accept.



Review #4

  • Please describe the contribution of the paper

    This paper revisited the cutting edge representation learning method of WSIs and proposed to improve it by adding the intra-and inter-slide semantic structures to model the mutual-region/slide relations of WSI. Experiments were done on two public datasets and the ablation studies demonstrated the effectiveness of proposed methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Proposed the intra-and inter-slide semantic structures to model the mutual-region/slide relations for better WSI representation learning.
    2. Extensive experiments on public datasets to show the effectiveness of proposed methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details of the method are not clear.
    2. The algorithm is difficult to read due to the excessive use of notation without a clear figures to illustrate its meaning. (Add some notations in the figures may help.)
    3. The figure 1 is confusing, especially fig1.(c)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be released and the datasets are public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Explain the definition of self-distillation for the readers who are not very familiar with this concept.
    2. Explain why sustainable to represent the visual bias related to staining or scanning procedure rather than medically relevant fea- tures, while the proposed method can overcome this.
    3. Eplain the projection head gt/gs. What’s the difference between them.
    4. Did not introduce how to get the prediction from the representation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach outperformed state-of-the-art methods on two public datasets. However, improvements could be made to enhance the clarity of the algorithm description and figures, which would make the paper more reader-friendly.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received mixed comments. Reviewers acknowledge the interesting idea of the study and the convincing experiments and ablations. However, the reviewers raised concerns about the limited novelty [R2, R3], testing on simple datasets [R1], questions about some methodological, experimental designs [R1, R2, R3, R4]. Thus, the authors are invited for a rebuttal to address the reviewer’s concerns.




Author Feedback

We thank the reviewers for their valuable time and insightful feedback. All discussions/details in this rebuttal will be included in the manuscript and supplementary. 1 Novelty The proposed method differs from previous works in the definition and motivation of clustering. CLAM leverages clustering to obtain the instance-level supervision. ReMix and EPL use clustering to achieve efficient MIL training. #R3#Ref[1] adopts contrastive constraint on the columns of the feature matrix (called cluster)for enhancing the intra-class variance of instances. Proto2Seg uses clustering as an efficient way for human export annotation. However, we leverage the slide-level prototypes to model the semantic structure of the slide, and further explore the intra-/inter-slide correspondences with the clustering assignments and the delicately designed set-to-set similarity metric. Our main contribution lies in that we propose a novel prototype-based learning objective for WSI SSL by carefully considering the mutual relations of slides, and achieve SOTA results on multiple slide-level benchmarks. 2 Challenging Task Testing We test our method and DINO on the survival prediction task of IDC cancer type. We use 5 cross-validated concordance index (c-Index) as the metric: DINO: 0.587±0.070; Ours: 0.618±0.048. Our method brings an improvement of 3.1% c-Index. 3 Designs 3.1 Important Baseline (#R1): We evaluate the performance of region-level DINO baseline with DS-MIL. For NSCLC subtyping: AUC 0.917±0.035, Acc. 0.841±0.036; for BRCA subtyping: AUC 0.848±0.075, Acc. 0.854±0.032. As a comparison, our method SLPD improves the AUC score from 0.917/0.848 to 0.938/0.876. 3.2 Slide Embedding Generation (#R1): We also tried using Max-Pooling to achieve slide embeddings and performed KNN evaluation. For BRCA subtyping of AUC: DINO 0.723±0.081; Ours 0.749±0.061. Our method beats DINO under Max-Pooling and shows comparable performance to that of Mean-Pooling. We propose that Mean-Pooling is a straightforward yet representative metric for evaluating the representation ability, and our method has been proven effective in generating slide embeddings using various approaches. 3.3 CLAM (#R2): We evaluate the performance of CLAM, which is a powerful aggregator for MIL tasks. For BRCA subtyping of AUC: DINO 0.875±0.057; Ours 0.890±0.052. Our method can bring consistent improvements. 3.4 Optimal K (#R2,R3): Most human tumors are composed of genetically and phenotypically heterogeneous cancer cell populations [1]. We suggest the optimal number of prototypes should refer to clinical practice, by considering tissue types, cell morphology, gene expression and other factors. The heterogeneity within a slide of NSCLC and BRCA is usually not very high, so we empirically tried k=2,3,4 and set k=2/4 on NSCLC/BRCA. 3.5 Clustering Methods (#R3): We alternatively use Spectral Clustering to generate prototypes. Due to limited time for rebuttal, we use a subset of BRCA dataset in pre-training and evaluate on BRCA subtyping of AUC: DINO 0.819±0.0521; DINO+global cluster 0.815±0.068; DINO+slide-level cluster 0.836±0.073. We observe our method brings consistent improvements across different clustering methods. 3.6 Visual Bias (#R4): The digital canner configuration, tissue stain variation, artifacts and source site patient demographics can all lead to the (visual) bias among the whole dataset[1]. For example, we observe some slides are stained pink and some purple, which will result in color-related prototypes if we perform global clustering. On the contrary, the visual characteristics usually hold consistent within each slide, which can significantly alleviate such bias and obtain meaningful prototypes with slide-level clustering. [1] Dehkharghanian, Taher, et al. “Biased data, biased AI: deep networks predict the acquisition site of TCGA images.” Diagnostic Pathology 18.1 (2023): 1-12. 4 Others We will thoroughly proofread this paper to fix typos and polish the expressions.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents Slide-Level Prototypical Distillation to explore intra- and inter slide semantic structures for context modeling on WSIs. All reviewers appreciated the presented method and the convincing experiments and ablations however, they raised concerns about the novelty and the experimental and methodological designs. The authors provided a rebuttal to address these points, providing additional experiments and discussion. The metareviewer finds that these answers address properly the raised concerns and he/ she thinks that the paper can be an interesting contribution to the conference. The authors are encouraged to include these answers to their camera ready version.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have partially addressed the reviewers’ concerns. The comparison to other clustering-related WSI analysis methods and the new experiments can be added in the final version. As agreed with all reviewers, this paper is interesting and experimental results are convincing.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I agree with the experiment that using diiferent clustering methods to confirm the model’s performance.It will be better if more ablation study about this problem had been done.



back to top