Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hao Bian, Zhuchen Shao, Yang Chen, Yifeng Wang, Haoqian Wang, Jian Zhang, Yongbing Zhang

Abstract

With the development of computational pathology, deep learning methods for Gleason grading through whole slide images (WSIs) have excellent prospects. Since the size of WSIs is extremely large, the image label usually contains only slide-level label or limited pixel-level labels. The current mainstream approach adopts multi-instance learning to predict Gleason grades. However, some methods only considering the slide-level label ignore the limited pixel-level labels containing rich local information. Furthermore, the method of additionally considering the pixel-level labels ignores the inaccuracy of pixel-level labels. To address these problems, we propose a mixed supervision Transformer based on the multiple instance learning framework. The model utilizes both slide-level label and instance-level labels to achieve more accurate Gleason grading at the slide level. The impact of inaccurate instance-level labels is further reduced by introducing an efficient random masking strategy in the mixed supervision training process. We achieve the state-of-the-art performance on the SICAPv2 dataset, and the visual analysis shows the accurate prediction results of instance level. The source code is available at https://github.com/bianhao123/Mixed_supervision.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_20

SharedIt: https://rdcu.be/cVRY2

Link to the code repository

https://github.com/bianhao123/mixed_supervision

Link to the dataset(s)

https://data.mendeley.com/datasets/9xxm58dvs3/1


Reviews

Review #1

  • Please describe the contribution of the paper

    A mixed supervision Transformer is proposed based on the MIL framework for WSI classification/grading. The proposed framework suggest a novel manner to take advantage of both slide-level label and limited pixel-level labels. And a random masking strategy is proposed to avoid the performance loss caused by the inaccurate instance-level labels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel stragety to incorporte both slide-level and limted pixle-level labels.
    2. A random masking stragety to mitigate the effects caused by the inaccurate instance-level labels.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited pixle-level labels are required for the setting. While the pixel-level labels might not be available.
    2. What would be time cost compared with just using the slide-level supervision.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The combination of limited pixel-level labels with super pixels might cause challenges to reproduce. Suggest the code can be provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. In Table 1, there is one extra horizontal bar between ATMIL and TransMIL. And the results of ATMIL is very impressing, with the consideration that it only use slide-level supervision. Can the proposed framework also taking advantage of the strategies used in both ATMIL and TransMIL.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a novel manner to make use of the limited pixel-level annotation to boost the WSI grading, and also a random masking stragety to train a mixed supervision Transformer, and obtain promising peformance.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    Propose a mixed supervision scheme for an improved gleason grading in pathology images. Propose a mixed supervision Transformer that utilizes the recent advances in NLP. Obtain superior results to other competing models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed to use separate instance-level and slide-level labeling, which leads to a mixed supervision problem. They proposed, so called mixed supervision transformer to mix the two labels and to improve the overall performance of the model. The experimental results supports the introduction of mixed supervision and Transformer.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    By using mixed supervision transformer, the authors achieved the best performance on the SICAPv2 dataset but there was only a marginal improvement. The improvement they obtained seems to be due to masking method, not due to the new design of their method, questioning on the novelty of the method. Hence, the effect of mixed supervision is not so obvious as considering the effect of masking.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    likely to be reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors used both instance-level and slide-level labels for Gleason grading, which is interesting. To generate labels, the authors adopted super-pixel, which is a reasonable approach. But, the improvement made by the proposed work is only marginal. It seems that the effect of masking is larger than mixed-supervision. The authors may provide an extended discussion on this matter.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Propose an interesting way of mixing two levels of information for an improved cancer grading. Still, some questions on the experimental results, though.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The authors addressed all the concerns, provided extra experimental results, and clarified the description of their methods. The contribution of the mixed supervision and masking strategy become much clearer with the experimental results that they provide. It seems to be robust as well.



Review #3

  • Please describe the contribution of the paper

    Aiming at the Gleason Grading of WSI, authors propose a Transformer framework to utilize the slide-level labels and limited pixel-level labels under the MIL scheme. In particular, authors employ the superpixel techniques to convert the pixel-level labels into instance-level labels with improved quality. The random masking strategy and sinusoidal position encoding are also adopted to promote the Gleason classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The mixed supervision of slide-level label and instance-level label is an interesting idea in the field of digital pathology, which is reasonable to alleviate the lack of supervision for WSI classification.
    2. The work is built upon the state-of-the-art Transformer backbone.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Except for the instance label generation (Section 2.2), the novelty of random masking in Section 2.3 is very limited, which is quite similar to MAE [4].
    2. The effectiveness of mixed supervision is not solid in the experiment.
    3. The experiments are performed on a small dataset, which is not fair for slide-level baselines. The algorithm should also be evaluated on a larger dataset comprehensively.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Training hyper-parameters are complete The settings of backbone and position encoding are provided. The details of masking are missing. The details of limited pixel-level labels are missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Although the idea of mixed supervision is interesting, its effectiveness is not impressive. In the ablation study, the performance improvement mainly comes from the masking and position encoding, rather than the mixed supervision. Without the masking, this work with pixel-level labels (92.67%) performs worse than SOTA work [9] using merely slide-level labels (93.73%).

    2. Except for the mixed supervision, the novelty of the entire work is not so adequate for the conference. In fact, simple masking strategy [4] and sinusoidal position encoding are widely used in computer vision.

    3. When datasets are small, classification methods may gain significant supervision benefits from additional pixel/instance-level labels, such as the SICAPv2 dataset with only 155 slides used in this paper. But the recent Gleason scoring dataset (e.g., PANDA [1]) has more than ten thousand of WSI, where slide-level labels are enough to train huge models. The authors should discuss potential limitations of the algorithm, and analyze whether significant supervision advantages can be obtained on advanced datasets.

    [1] Myronenko, Andriy, et al. “Accounting for Dependencies in Deep Learning Based Multiple Instance Learning for Whole Slide Imaging.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.

    1. Unclear meaning of limited in so-called limited pixel-level labels. Does it mean that only a small number of samples with pixel-level labels are collected, while most of the samples only have slide-level labels? The authors do not indicate how to extend the algorithm to the so-called limited cases, nor reflect such a setting in the experiment.

    2. The produced instance-level labels (not the predictions) should also be added into Fig. 2. This is critical to illustrate the advantage of the instance-level labels over the inaccurate pixel-level labels, as an important basis of the mixed supervision.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the idea of mixed supervision is interesting, the novelty of the entire work is not so adequate for the conference, where simple masking strategy and sinusoidal position encoding are widely used in computer vision. In addition, the effectiveness of mixed supervision is not so convincing as presented in the experiment.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision
    1. The authors solved the concerns about the ablation performance and details of limited pixel-level labels.
    2. The authors do not indicate what kind of metric the values ​​of the PANDA dataset are. If they are the AUC used in the manuscript, the AUC of about 90 is very low for PANDA dataset, which indicates that there may be problems with the experimental results. In fact, the official evaluation of PANDA uses the harder Kappa, and the value on leaderborad is > 90.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work proposes a mixed supervision Transformer based on the multiple instance learning framework, and utilizes both slide-level labels and instance-level labels for Gleason grading at the slide level. The reviewers have raised some concerns regarding novelty of the approach and limited sample size. Please also provide some details on comparison with the most relevant published works in the rebuttal to justify novelty.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

Thanks for all valuable comments. In response to questions: misinterpretations of masking strategy and mixed supervision, limited dataset size and others, we explained and conducted experiments on a larger dataset PANDA to further prove the effectiveness and innovation of our method. (Note: Mask ratio is set to 0.5 unless otherwise specified, and 4-fold cross-validation is used. The metric is AUC, and * means the best performance.)

Common Questions 1 Evaluated on PANDA(5160 slides from data provider Radboud): ABMIL(0.7882),CLAM(0.6809), DSMIL(0.8277),LossAttn(0.8132),ATMIL(0.8628),TransMIL(0.8895),SegGini(0.8009),Ours(0.9379,*). Our method is 4.84% higher than the best slide-level method(TransMIL) and 13.7% higher than other mixed-supervised method SegGini. Our method obtains more significant and accurate supervision advantages on large dataset. 2 Masking strategy and mixed supervision effect: (1) We propose a masking strategy to assist the training of mixed-supervised network, which is a part of our framework and cannot be regarded separately. For instance, the performance is poor when using masking strategy without mixed supervision. The results of masking ratio(0.3/0.5/0.8): (PANDA:-0.1% / -1.99% / -1.76%, SCIAP:-0.64% / -2.76% / -2.18%, compare with the result of only slide-level label in PANDA:0.8729 and SICAP:0.9190). (2) We conducted ablation study on PANDA(w/o masking:0.9242, w/o masking and mixed:0.8729). It further proves that our mixed supervision and masking strategy are effective, and the improvement of mixed supervision is more significant in large dataset(masking:+1.37%, mixed:+5.13%). 3 Masking strategy is novel: Our strategy is different from MAE’s and is more suitable for the Gleason grading task. Since pixel-level labels in Gleason dataset may be inaccurate, our motivation is sparse redundant and inaccurate instance labels, and make partial instance tokens and labels participate in the training. Although MAE’s strategy masks some tokens, it calculates loss for all tokens. However, our strategy randomly deletes a certain proportion of tokens, which do not participate in subsequent network forward propagation and instance loss calculations. Therefore, our masking strategy can assist mixed supervision to achieve better results. 4 The explanation of limited pixel-level labels: Limited pixel-level labels mean most samples only have slide-level labels and only a few samples have pixel-level labels. Slide-level and instance-level loss are calculated for slide with pixel-level labels, and only slide-level loss is calculated for slide without pixel-level labels. The results of limited pixel-level ratio(1/0.8/0.5/0.3): (PANDA:+6.5% / +4.11% / +2.6% / +1.96%, SICAP:+2.39% / +1.48% / +0.59% / +0.77%, compare with the result of no pixel-level label in PANDA:0.8729 and SICAP:0.9190).

Meta-Reviewer: Please refer to Common Questions 1~4.

Reviewer 1:
1 Time cost comparison: Our method’s time cost is the same or less than Transformer-based methods. The speed of both methods is 240FPS. In particular, the speed will be faster(293FPS) when using masking strategy. (FPS: the number of inferred images in one second) 2 Framework extensibility: Our mixed supervision framework can be extended to any token-to-token structure similar to Transformer. Therefore, TransMIL is applicable, but ATMIL is not (adopt Attention-based aggregator and only output slide-level prediction). The result of TransMIL extending to our framework is (PANDA:0.9301, SICAP:0.9380).

Reviewer 2: Please refer to Common Questions 1~3.

Reviewer 3: 1 The details of masking: First, we count the number of tokens N for input features and use numpy.random.choice function to sample partial token index. Then we mask corresponding tokens and instance-level labels according to token index. 2 Description of Fig. 2: We will add generated instance-level label into Fig. 2. 3 Potential limitation: Attention-based masking strategy can be explored for better interpretability.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All 3 reviewers unanimously has recommended accept based on authors response to their critiques. I concur and recommend accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After rebuttal, there is consensus among reviewers that this paper should be accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a mixed supervision Transformer for WSI classification/grading, intergrading both slide-level label and limited pixel-level labels. The method is novel and has achieved superior performance on a public dataset. The authors addressed most concerns in the rebuttal, provided extra experimental results, and clarified the description of their methods. After rebuttal, two reviewers raised their scores and now all three reviews are positive; therefore, I recommend accepting this work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top