Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Marvin Teichmann, Andre Aichert, Hanibal Bohnenberger, Philipp Ströbel, Tobias Heimann

Abstract

Current approaches for classification of whole slide images (WSI) in digital pathology predominantly utilize a two-stage learning pipeline. The first stage identifies areas of interest (e.g. tumor tissue), while the second stage processes cropped tiles from these areas in a supervised fashion. During inference, a large number of tiles are combined into a unified prediction for the entire slide. A major drawback of such approaches is the requirement for task-specific auxiliary labels which are not acquired in clinical routine.

We propose a novel learning pipeline for WSI classification that is trainable end-to-end and does not require any auxiliary annotations. We apply our approach to predict molecular alterations for a number of different use-cases, including detection of microsatellite instability in colorectal tumors and prediction of specific mutations for colon, lung, and breast cancer cases from The Cancer Genome Atlas. Results reach AUC scores of up to 94\% and are shown to be competitive with state of the art two-stage pipelines. We believe our approach can facilitate future research in digital pathology and contribute to solve a large range of problems around the prediction of cancer phenotypes, hopefully enabling personalized therapies for more patients in future.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_9

SharedIt: https://rdcu.be/cVRq7

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper describes a method that uses k-siamese networks with an EfficientNet backbone to solve digital pathology tasks using one model instead of a typical two-staged approach (stage 1 being some method to identify which regions are important, either manual or using deep learning, and stage 2 being the actual classification task. The k-siamese network samples k tiles randomly from the original image, encodes them and then combines them to make a prediction.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have indeed identified an important challenge for digital pathology tasks, identifying which areas in an image are important for classification. In addition, they use the EfficentNet model a backbone model that has significantly less parameters than other comparable models.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

What if the task is related to only a very small area of the original image? Would this approach still work compared to e.g. a manual approach that ROIs the area that is relevant? E.g. can k be learned based on the complexity of the task?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

the methods are detailed, so someone should be able to reproduce their work. The first data set seems proprietary though, not sure if they are able to share it, this could be added to methods if IRB does not allow to share this.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Table 2: it’s not clear what the percentage is in the first column for each result in this table, it is not defined.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a reasonable approach for an important problem. The paper is well written and results are compelling.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

My assessment remains the same.

Review #2

Please describe the contribution of the paper

The paper introduces an end-to-end trainable k-siamese network with random tile selection for predicting molecular alterations. The method is shown to be better than the two stage pipeline which requires auxiliary annotations for region-of-interest in the first stage and dense tessellation and aggregation in the second to make the slide-level prediction.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Interesting results using k-siamese networks
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No comparisons with MIL and other end-to-end pipelines (cited in the intro). Even though these are technically complex, comparisons with these methods are needed to justify the proposed pipeline. Does these methods give comparable accuracy or better? Recent papers such as “Benchmarking artificial intelligence methods for end-to-end computational pathology” (https://www.biorxiv.org/content/10.1101/2021.08.09.455633v1.full.pdf) provides deep insights into the problem with extensive comparisons against end-to-end deep learning models (including MIL and Vision Transformers along with the classic [12] paper that was compared in this manuscript) for tumor subtyping as well as predicting molecular alterations on publicly available datasets (which have been preprocessed in a consistent way). This paper comes with a well put together github that makes it easy to run these end-to-end pipelines: https://github.com/KatherLab/HIA.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The explanations seems good enough to reproduce the paper but can’t be sure since the code itself won’t be released it seems.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

See above. Comparisons with other end-to-end pipeline is important to justify the proposed approach. Specifically, the following GitHub makes it easy to run these end-to-end pipelines: https://github.com/KatherLab/HIA.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

No comparisons with end-to-end pipelines especially when these are widely available via GitHub and easy to run.
Number of papers in your stack

3
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

My concerns have been addressed.

Review #3

Please describe the contribution of the paper

A CNN is proposed by combining k number of well known Siamese CNNs to predict molecular alterations for a number of different use-cases, such as microsatellite instability in colorectal tumors and specific mutations for colon, lung, and breast cancer.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Proposing a variant of Siamese CNN to make a decision based on k samples instead of two samples which is generally done by a normal Siamese network.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Even though the authors show that making a decision for k tiles by a k-Siamese CNN is better than combining k decisions by a regular CNN, however, they failed to show that an end-to-end approach (where segmentation is not done) is better than the two-stage approach (where segmentation is done before the final classification). Because the Seg-Siam approach is a two-stage approach and the AUC of the Seg-Siam approach is slightly higher than the k-Siam approach, an end-to-end approach. It is possible that using a k-Siamese CNN over a regular CNN is the reason for getting higher AUC for the k-Siam approach than for the Two-Stage approach.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Datasets and experimental setup are clearly mentioned.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Authors can emphasize more on the strength of k-Siamese CNN instead of on an end-to-end approach over a two-stage approach, since experimental results did not support their claim (k-Siam vs Seg-Siam).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Authors failed to show that an end-to-end approach is better than a two stage approach.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper introduces an end-to-end trainable k-siamese network with random tile selection for predicting molecular alterations. The reviewers agreed on the novelty of the proposed method, especially how k-siamese is employed to solve digital pathology problem. The framework can be applied to handle different problems. The paper is well organized and the method is clearly described. The reviewers also raised a few concerns, such as 1) the adaptability of the method on different problems; 2) comparison with other MIL methods; 3) the justification of k-Siamese CNN over end-to-end methods.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

9

Author Feedback

We would like to thank all reviewers for their constructive comments and the general appreciation of our approach.

The strongest concern was raised by Reviewer 2, who criticized that we did not compare our approach to state-of-the-art methods as presented in a recent paper 1 which compares weakly supervised and end-to-end training methods for WSI classification.

We have to admit that we were not aware of this work, as it is very recent and only available as preprint for now. Using the associated repository 2, in the meanwhile we were able to compare our results to the methods discussed in 1. On our colon MSI detection task, we achieved the following results:

Method AP AUC

Seg-Siam 83.07 94.42

K-Siam 82.91 94.28

Two Stage 77.03 90.54

CLAM 72.94 89.58

ViT 76.75 88.64

MIL 68.56 87.62

No Seg Ensemble 68.67 86.60

These results confirm that the known end-to-end methods are unable to outperform the widely used two-stage prediction pipeline, i.e. they effectively trade annotation effort for prediction performance. These results are somewhat expected given the performances shown in 1 compared to the Resnet baseline. In contrast, our approach can achieve state-of-the art performance without requiring auxiliary annotation, which is one of the main advantages we also point out in our paper. We believe that these new experiments reinforce our message and we will include the above table and discussions in experiments section of our paper in case of acceptance. We will make space for this by moving the current Table 1 to the supplement materials.

Another concern was raised by Reviewer 3, who observed that our Seg-Siam pipeline outperforms K-Siam, so our claim that K-Siam is better does not hold.

We would like to clarify that we did not claim K-Siam is better in terms of sheer performance. Our argumentation is that the performance difference between K-Siam and Seg-Siam is so minor that it will not be worth the effort to collect the required auxiliary annotations for most purposes. The resources will be much better spent in collecting more weakly annotated clinical data and using this with K-Siam. Overall, we show that K-Siam provides an efficient way of dealing with the label noise issue inherent to tile based processing. This makes a segmentation filter obsolete for most practical purposes.

Reviewer 1 asks some interesting questions about the limitations of our approach: Would it still work if the task is related to a very small area of the original image?

Our approach is very well suited for a large variety of tasks. Towards the goal of avoiding label noise, there should be at least one informative tile in each bag of tiles. As default, we use 24 tiles per bag, since this allows us to train the model on a single GPU with 11GB memory. Even if the ROI is only 5% of the total tissue area, we have at least 1 informative tile per bag in >70% of the cases. Tumor area on diagnostic slides is usually between 30% and 80% of the total tissue area, and typically most of the tumor tissue contains relevant information. However, when dealing with very small ROIs, we would indeed recommend using Seg-Siam to increase the information density.

Re: Table 2: it’s not clear what the percentage is in the first column for each result in this table, it is not defined.

It is the respective prevalence (“prev.”); we will clarify this in the camera-ready version.

We thank all reviewers for the detailed Feedback. We believe that the feedback helped us to improve the paper even further and that the additional experiments we provided helped us to show the merits of our approach. We hope to get the chance to present the paper in full in the MICCAI 2022 conference.

Method	AP	AUC
Seg-Siam	83.07	94.42
K-Siam	82.91	94.28
Two Stage	77.03	90.54
CLAM	72.94	89.58
ViT	76.75	88.64
MIL	68.56	87.62
No Seg Ensemble	68.67	86.60

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has addressed the critical comments from the reviewers. Now all reviewers are positive about the paper. This paper proposes a novel method for an interesting problem. The results look promising.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes an end-to-end method using k-siamese networks for molecular alterations prediction. The proposed method is practical and does not involve auxiliary annotation. The rebuttal has provided more detailed discussions, and addressed most of the reviewers’ concerns. It is suggested that the authors revise the final version accordingly.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper introduces an end-to-end trainable k-siamese network with random tile selection for predicting molecular alterations. The method is simple and elegant, leading to good results as well. Authors provide additional comparison in the rebuttal and also convince the initially negative reviewer. I also suggest to accept the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

back to top

End-to-end Learning for Image-based Detection of Molecular Alterations in Digital Pathology