Authors

Jingna Qiu, Frauke Wilm, Mathias Öttl, Maja Schlereth, Chang Liu, Tobias Heimann, Marc Aubreville, Katharina Breininger

Abstract

The process of annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6\% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_9

SharedIt: https://rdcu.be/dnwxR

Link to the code repository

https://github.com/DeepMicroscopy/AdaptiveRegionSelection

Link to the dataset(s)

http://gigadb.org/dataset/100439

Reviews

Review #3

Please describe the contribution of the paper

The topic of this paper is semantic segmentation of WSIs based on active learning, and the innovation is about the selection method of annotated regions that hadn’t received enough attention in previous studies.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The topic of this paper is semantic segmentation of WSIs based on active learning, and the innovation is about the selection method of annotated regions that hadn’t received enough attention in previous studies. The design of adaptive size of regions allows the model to use the least area of annotated regions (compared to the other four region selection methods) to achieve the same performance as the fully annotated WSI training model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. During the iteration, the first batch of regions is completely randomly selected. Will different initial regions have a significant impact on the results? If the first batch of regions contains too little information, may more iteration steps be required to achieve the performance of the fully annotated model?
2. In terms of clinical value, some additional content could be added. Although active learning can reduce the cost of annotation, multiple manual annotations are required during model training iterations. Especially in pathology diagnosis, this approach is inconsistent with clinical logic. Whether active learning has more convincing advantages proves the necessity of adopting such methods in clinical problems?
3. Please add a figure of the segmentation results. In segmentation problems, the performance of the model should generally be comprehensively evaluated based on the segmentation visual effect and mIoU.
4. Please add the panel labels of each image in Fig.3.
5. Please provide a detailed description of Fig.S2. Which iteration step were the regions selected during the model training process? An analysis of the calculated cost of achieving this effect could be added.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

For all code related to this work that they will release if this work is accepted
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. During the iteration, the first batch of regions is completely randomly selected. Will different initial regions have a significant impact on the results? If the first batch of regions contains too little information, may more iteration steps be required to achieve the performance of the fully annotated model?
2. In terms of clinical value, some additional content could be added. Although active learning can reduce the cost of annotation, multiple manual annotations are required during model training iterations. Especially in pathology diagnosis, this approach is inconsistent with clinical logic. Whether active learning has more convincing advantages proves the necessity of adopting such methods in clinical problems?
3. Please add a figure of the segmentation results. In segmentation problems, the performance of the model should generally be comprehensively evaluated based on the segmentation visual effect and mIoU.
4. Please add the panel labels of each image in Fig.3.
5. Please provide a detailed description of Fig.S2. Which iteration step were the regions selected during the model training process? An analysis of the calculated cost of achieving this effect could be added.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea designed and technical contribution
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This work proposes an active learning method for tumor localization on whole slide histology image , at which core is a new adaptive patch/region selection method to pick unlabeled histology patches to annotate for most efficient accuracy improvement in the next-round training.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed region selection method allows picking regions with variable aspect ratio to annotation, which solves a limitation of previous AL method in WSI image segmentation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Although using 2.6% of the annation in the proposed method reaches similar performance of full annotation, it is still not surprising considering that current SoTA semi-supervised learning method or self-supervised learning method on WSIs are potential to achieve better performance without iterative annotative and training for multiple times in active learning. It will be better to compare with SSL methods in experiments and show this found 2.6% is nearly the optimal subset for annotation.
2. The used patch-level classifation model, MobileNet, is weak nowadays. Although it’s not the main point of the paper, the backbone and training recipe are still too out-of-date. It will better to use more advanced and efficient backbones like convnext, resnext, efficientnet v2, etc.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Not so good in reproduction due to the code is not release and the performance of the method can vary quite a lot at different random condition referring to the experiment results (which is one drawback in this kind of active learning method).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Mentioned in weakness list.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work proposes a new region selection method for AL on WSIs, which is good in terms of originality. But the overall AL framework Is not new and the experiments are not extensive enough considering only Camelyon 16 is tested and the conclusion can be overfit to this dataset. Last, whether this kind of active learning method is a better choice compared to SSL-based method (which is also a mainstream method to address the limited annotation problem in WSLs), is still a question.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper
- Adaptive region selection for active learning was proposed to carry out the segmentation of wholt-silde -images.
- Each region was determined dynamically by first identifying an informative area and then detecting its optimal bounding box.
- Experiments on the public CAMELYON16 dataset demonstrated the effectiveness of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Active learning was explored in this article, which will decrease the burden of manual annotations.
- Instead of square bouding boxes, the article proposed an adaptive selection method, creating diversity for bounding boxes.
- The common strategies of active learning were summarized and compared. The experiments on a public dataset demonstrated the effectiveness of the proposed adaptive region selection method.
- Compared with fully annotations, 2.6% annotations were enough to achieve similar performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Although common strategies were summarized for active learning, there were still many modified and optimized methods. The article only compared the common ones instead of the SOTA methods.
- Referring to previous studies, the CAMELYON16 dataset was non exhaustively annotated. The article did not provide specific explanation for the selection of such positive samples in the training phase.
- The saturation accuracy of the article’s benchmark is FROC=0.779. The first rank in 2016 is FROC = 0.8074, and Google reported FROC=0.873 in 2017 (Detecting Cancer Metastases on Gigapixel Pathology Images, 2017). Thus, the FROC achieved in this article is not advanced.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In the Reproducibility Response, the authors promised to release the codes, and the data they used were from public dataset. Therefore, it is easy to carry out the reproduction of the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The comparison with SOTA active learning method is expected. For example, “Deep active learning for suggestive segmentation of biomedical image stacks via optimisation of Dice scores and traced boundary length” pulished in MIA 2022. This is only an modified method derived from common strategies for active learning. There are many similar articles published before.
- The training set contains non-exhaustive annotations. Please explain how to handle it if the adaptive region selection method selects that regions.
- The benchmark is not advanced enough. As mentioned in the weaknesses part, FROC has reached 0.8074 in 2016, and 0.873 in 2017. A good benchmark with SOTA performance is expected.
- If possible, more than one datasets are encouraged to prove the generalization of the proposed method.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Dynamically selecting the size of bounding boxes is interesting, as it not only reduces annotation pressure but also allows for flexible selection of annotation areas. I think this strategy can be transferred to many active learning scenarios.

However, the benchmark accuracy of this article is not high enough, which makes it impossible for me to evaluate the upper limit of the algorithm’s ability. Similarly, the generalization ability of the method is a concern for me.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers have rated the paper positively. While I find obvious merits in the paper there are also points that need further clarification such as comparison with SOTA SSL methods as well as its generalizability to more than one dataset.

Author Feedback

We appreciate the constructive comments from all reviewers. We have grouped similar comments and respond in a point-to-point manner:

Comparison with semi-/self-supervised learning (R1,MR): These are indeed relevant approaches which do not replace but may be combined with active learning (AL). As the focus in this paper is on improving the selected ROIs, a combination will be subject of future work.

Comparison against other AL methods (R2): As mentioned in the introduction, most relevant AL studies focused on developing pixel informativeness measures (IM) for accurate localization of useful annotation regions. We instead focus on how we can make the most of the measured informativeness. Therefore, we compare to the standard region selection method while keeping the IM constant for fair comparison. In the experiments, we employed the widely used model uncertainty as IM for this proof-of-concept. However, our adaptive region selection stands alone and could be flexibly combined with any IM, similar to the standard method.

Performance of the fully supervised benchmark (R2) and backbone selection (R1): We opted for moderate performance trade-offs of the benchmark method to limit computational costs. Indeed, the challenge winner in 2016 achieved an FROC of 0.8074 [1] but adopted two-phase training with hard negative mining and used an inference patch sampling stride of 4 pixels, whereas we enabled end-to-end training and used a stride of 128 pixels, meaning that our method required 1024-times less computation during inference. The performance of 0.873 in [2] was achieved with test time augmentation and may not be fully reproducible according to [3]. We demonstrated that reducing the stride from 256 to 128 pixels led to an improvement in FROC, but we did not continue reducing to 4 pixels because it is not feasible during AL method development. We tried EfficientNet, but no clear performance gain over MobileNet was seen. We chose MobileNet because its few parameters further limit the computation costs incurred.

Concerns with restriction to CAMELYON16 (R1-3, MR): We acknowledge the drawback of testing on a single dataset, and extension to further datasets is subject of future work. Still, we believe that CAMELYON16 provides broad insights into the proposed method given that 1) it is a widely used, public dataset on a clinically relevant task, 2) it contains multi-center data and heterogeneous target structures, e.g., micro and macro tumors, and 3) it has a size relevant for AL. Non-exhaustive annotations in CAMELYON16 concern some slides with two consecutive tissue slices on the same WSI where only one is annotated. We excluded these unannotated regions from selection. The results of this preprocessing step will be made available for community use.

Reproducibility (R1) and impact of initial selection (R3): To account for differences due to the initial selection and random effects during training, we carried out 10-15 select-train-inference iterations for all AL experiments and repeated all experiments 3x (see variations in Fig. 4). Code and settings like the training/validation split will be made available.

Clinical utility (R3): We see the utility of AL in the development of a computer-aided diagnostic system, where our method requires less annotated area and cycles than the standard method. During clinical use of such a system, only a single inference step is needed, posing no difference to other models.

In the camera-ready version, additional adjustments of figures, captions, references, etc., following the reviewer comments will be made.

[1] Wang et al. “Deep learning for identifying metastatic breast cancer.” arXiv:1606.05718 (2016). [2] Liu et al. “Detecting cancer metastases on gigapixel pathology images.” arXiv:1703.02442 (2017). [3] Guo et al. “A fast and refined cancer regions segmentation framework in whole-slide breast pathological images.” Sci Rep 9.1 (2019): 882.

back to top

Adaptive Region Selection for Active Learning in Whole Slide Image Semantic Segmentation