List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Yi Li, Yiduo Yu, Yiwen Zou, Tianqi Xiang, Xiaomeng Li
Abstract
Developing an AI-assisted gland segmentation method from histology images is critical for automatic cancer diagnosis and prognosis; however, the high cost of pixel-level annotations hinders its applications to broader diseases. Existing weakly-supervised semantic segmentation methods in computer vision achieve degenerative results for gland segmentation, since the characteristics and problems of glandular datasets are different from general object datasets. We observe that, unlike natural images, the key problem with histology images is the confusion of classes owning to morphological homogeneity and low color contrast among different tissues. To this end, we propose a novel method \emph{Online Easy Example Mining} (OEEM) that encourages the network to focus on credible supervision signals rather than noisy signals, therefore mitigating the influence of inevitable false predictions in pseudo-masks. According to the characteristics of glandular datasets, we design a strong framework for gland segmentation. Our results exceed many fully-supervised methods and weakly-supervised methods for gland segmentation over 4.6% and 6.04% at mIoU, respectively.
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_55
SharedIt: https://rdcu.be/cVRwG
Link to the code repository
https://github.com/xmed-lab/OEEM
Link to the dataset(s)
https://pan.baidu.com/share/init?surl=htY5nZacceXj_m2FlY8uXw
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes an online easy example mining (OEEM) method for weakly-supervised gland segmentation, which can distinguish easy and confusion pixels in the pseudo labels. In training, the proposed metrics are used for weighting the loss functions, where an easy pixel has a higher weight, and a confusion pixel has a lower weight. In the experiments, the proposed method was better than the compared method.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
*The approach that weights the loss based on the confidence of the segmentation network is reasonable. *In the experiments, the proposed method was better than the compared method.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
*The related works are not sufficient. The most related work was not cited and there is no discussion about the contribution from the following paper. This paper also focuses on the problem that pseudo labels generated using CAM are noisy labels, and introduces weights to the loss function, where the weight is obtained on the basis of uncertainty. The method is not completely the same but the idea and approach are very similar to the proposed method (the reviewer feels that the technical contribution is minor from this related work). Please check it. In addition, the adaptive training strategy is a common approach for learning with noisy labels. These are not cited and discussed. There are cited in the related works in the following paper.
Li et al., Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation, AAAI2021.
*Evaluation was not sufficient. In the evaluation, this paper listed many supervised approaches, but only one weakly-supervised method. In particular, the reviewer recommends the authors to compare with a typical pseudo labeling that usually takes pixel sampling using the confidence or uncertainty and trains a network with the masked loss, which ignores the non-selected pixels. This sampling strategy also has the ability to avoid confusing pixels. To show the effectiveness of the weighting by OEEM, this ablation study is required. Then, add the discussion of why weighting is a better strategy than the thresholding-based pixel selection method.
*It is unclear whether the training and test data were separated by patients (i.e., these data do not contain the same patient or WSI). Because the image features have similar features in a WSI, the network may be overfitted in training if the test and training data contain the images captured from the same patient or WSI.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Reproducibility is fine.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
*In the fully supervised learning, PSPNet was described as ‘ours w/o OEEM’. However, this is not contribution from this paper. The description ‘ours’ causes miss-understanding.
*In ablation study, there is a mIoU in CAM. CAM is a not mask data, whose pixel has a value. How to compute the mIoU?
*What does the SEAM CAM indicate? The reviewer could not catch up the setup of the ablation study. Please clarify it.
*In the paper, several metrics were proposed, and only the empirical conclusion was described. Please add the discussion why the l_normal was the best.
*Citation [15] and [16] are the same paper (duplicated).
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
3
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
As described in weakness, a very similar idea has been already published and there is no discussion about the contribution from the related paper. In addition, the evaluation was not sufficient. In the paper, comparing weakly-supervised methods is important, however, the authors listed many supervised methods and only one weakly-supervised method. Therefore, my rating tends to ‘reject’.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
3
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
4
- [Post rebuttal] Please justify your decision
Although the rebuttal addressed some of my concerns, the main concerns still remain. In the rebuttal, the author described that the difference between the proposed method and URN(Li et al. AAAI2022) is offline or online and not using CRF. However, the difference of the main idea was not clearly stated, and there is no evaluation of URN even though they performed several additional experiments and the codes of URN are available. Thus, my rating is ‘weak reject’.
Review #2
- Please describe the contribution of the paper
The paper proposes a method for weakly-supervised gland segmentation from histology images. The method can mine the credible supervision signals in pseudo-mask and mitigate the damage of noisy regions.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) This paper analyzes the difference between histology image and natural image in segmentation task, and the difficulty of gland segmentation. (2) The paper proposes a method that encourages the network to focus on credible supervision signals rather than noisy signals.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
(1) The method description is not clear enough. For example, how patch-level labels are composed and how they differ from image-level labels. (2) Why equation 6 performs best requires careful analysis. (3) As a weakly supervised method, the paper only compares with a weakly supervised algorithm SEAM, and the experiment is inadequate.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Reproduction is difficult because the method description is not clear enough.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
(1) It is recommended that the authors describe the method more specifically and clearly, and present not only the common weakly supervised methods but also the special features used for medical images. (2) It is suggested to add more experiments to validate the methods in this paper.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Some details in the paper need to be improved.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
2
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Review #3
- Please describe the contribution of the paper
This paper introduces a technique for weakly supervised semantic segmentation (WSSS) of glandular structures in histopathology images with image-level labels. To address the challenge of segmenting glands with similar/confusing homogeneity and low contrast, Online Easy Example Mining (OEEM) is proposed to mine confident regions in the pseudo-masks and reduce noise by suppressing ambiguous regions via a novel normalized loss. Extensive experiments on a public glandular dataset validate the effectiveness of their method over prior state-of-the-art; including several ablations on the variants of the OEEM losses.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The paper is well motivated, clear and easy to understand. Weakly supervised glandular segmentation is non-trivial due to the underlying homogeneity of tissue morphology and low contrast, especially since existing WSSS methods for natural images may underperform. • The proposed OEEM and normalized losses are novel with several ablations highlighting it’s effectiveness. • The use of a publicly available dataset is a plus, with a strong supervised baseline. • I appreciate the intuition employed for the different variants of the losses, especially considering the several assumptions drawn regarding normal and diseased images.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• The difference in performance between SEAM and PSPNET+ResNet38 (w/o OEEM) is questionable. This suggests the improvements are from the architectural choices. • For fair comparison, OEEM should have included to prior/compared works to validate that performance gains are not based on the network choice. E.g., UNet + OEEM, MedT + OEEM. • It is unclear if results on the fully-supervised methods on Glas Dataset are re-implemented or taken directly from the prior papers. • It is unclear how loss map $L$ was obtained from $\hat{X}$. Descriptions regarding this point were limited.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Descriptions of the method are sound. Authors will provide code upon acceptance.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
I commend the authors for an interesting take on glandular segmentation with image levels with OEEM and its variants. My main concern is whether the evaluation of SEAM, including the training procedure are sound. Based on Table 2, SEAM and the proposed fully supervised (PSP+Res38) report a significant difference in performance. They both employ the same backbone to produce pseudo masks for segmentation training, yet the reported scores vary. I hope the authors can clarify. By the authors own admission, MIL methods are commonly employed for this task. However, no recent MIL methods were included in the evaluation. This can potentially better support the argument. Based on the current results, it serves to suggest the technique is not model/backbone agnostic. (see prior comments regarding UNet+ OEEM etc)
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method improves weakly supervised semantic segmentation performance over prior art, including an improved supervised baseline. However, comparison is somewhat limited. For instance, only a single WSSS model i.e., SEAM was included in the evaluation. Therefore, performance improvements may be mainly dependent on other factors such as multi-scale testing and architectural choices. If the authors can adequately clarify/address the concerns, I am willing to raise my score.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
2
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
Having read the author responses and the other reviewer’s comments, i feel my initial concerns have been addressed and recommend ‘acceptance’. I hope the authors can include the necessary changes in the final version of manuscript.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper proposes to mine easy samples to train histology image segmentation algorithms. The reviewers gave mixed reviews. Some major questions include: (1) a comprehensive literature review is needed to compare with related works; (2) the method description is not clear. Particularly, how the patch level labels are composed and differ from the image-level labels, in Fig.2., is unclear; (3) in the evaluation, comparison with more latest weakly-supervised methods are needed; (4) ablation studies are needed to show the proposed sample mining method can outperform other sample query metrics such as uncertainty and confidence, and some in-depth analysis on why some metrics (Eq.6) are better than others is needed; (5) a clear explanation on the split of dataset is needed; and (6) fair comparisons are expected, with the same backbone and optimal parameters. The reviewers provided some more detailed suggestions, questions and comments in their individual reviews. In addition to the reviewers’ question, the meta-reviewer has one general concept question: usually, we perform hard sample mining which are near the decision boundary to train and improve classification/segmentation algorithm, why are easy samples more important than the hard/confusing samples to train your algorithm?
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
5
Author Feedback
We thank the reviewers for their valuable feedback. Overall, reviewers (R1, R2, R3) consider the paper is reasonable and well-motivated to mine credible supervisions, also appreciate its effectiveness over other methods. Besides, R2 & R3 agree with our analysis between natural and diseased images in WSSS, and R3 thinks the method is novel, clear, and easy to understand. The major concerns of R1 are on the related work and evaluation part. And other comments are about analysis and details. Below, we clarify important points summarized by meta-reviewer #1 and resolve possible misunderstandings. (1) Compare with related works: 1. Compared with offline methods like uncertainty in URN (Li et al. AAAI 2022), OEEM is an online mining method, which it’s a plug and play method without complex operations like CRF, multiple training or inference. 2. OEEM requires no hyperparameters like loss clipping in PMM (Li et al. ICCV 2021) for easier application. 3. Multiple instance learning is a weakly supervised method for histology image, which requires classifiable image-level labels. But glands exist in each image, and it’s necessary to provide patch-level labels. Since the supervision type has been changed, it’s not applicable. 4. Learning with noisy labels is almost designed for classification task instead of WSSS, while OEEM is a novel and problem specific method for segmentation based on special feature of medical image. (2) Description for patch level labels: All the codes and datasets will be released for easy reproduction upon acceptance. Specifically, GlaS dataset contains 165 WSIs (85 for training) at mean width 517 and height 761 (20X, 0.62 micrometer/pixel). There is no image-level label, since glands exist in each image. So we crop patches at side 112 and stride 56 to get balanced patch-level labels from masks. (3) Evaluation on additional WSSS methods: We enrich the evaluation as below: | SEAM (Wang et al. CVPR 2020) | SC-CAM (Chang et al. CVPR 2020) | Adv-CAM (Lee et al. CVPR 2021) | OEEM | | 66.11 | 71.52 | 68.54 | 77.56 | The results show that our OEEM is much more effective than other WSSS methods. (4.1) Ablation study on other sample metric: For the sample metric based on confidence with threshold from R1, the mIoUs range from 75.29 to 76.41 at 4 varied thresholds, which are lower than 77.56 of OEEM. Besides, it converges only with the second hyperparameter, warm up iteration. So, OEEM is more practical and effective than it, without any hyperparameter. (4.2) Why is Eq.6 better than other losses: Different from other three losses based on confidence, the Eq.6 also uses pseudo ground-truthes, which introduces more information. Besides, normalization via softmax amplifies the loss gaps and emphasizes the clean samples more. (5) Dataset split: Training and test data are separated by patients as the original split, without patch shuffle. (6) Fair comparisons & UNet: The listed fully results from other papers use varied frameworks and settings. So, for fair comparison, we propose a fully supervised baseline, whose framework and settings are totally the same to all weakly results. Besides, we change the segmentation model to UNet, and the mIoU is 71.29 with our pseudo-mask. After deploying OEEM, it raises to 73.32, which shows the effectiveness comes from OEEM instead of a strong baseline. (7) Why are easy samples more important than the hard: Unlike hard example mining in fully supervised learning, pseudo-masks from weakly supervision exist massive noise. It means hard samples and false samples are intertwined and indistinguishable. So we mine the easy samples to makes the supervision credible and mitigate the influence of noise from hard samples. (8) Other clarifications: 1. “ours w/o OEEM” indicates the pseudo-masks come from our classification stage. 2. In eq.3 the loss L is obtained by weighted cross-entropy with mining weight W. 3. CAM is obtained by argmax to compute mIoU.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The paper proposed to mine easy samples for gland segmentation in histology images. The rebuttal addressed most of the concerns raised by the reviewers. If the paper is accepted, the authors are expected to refine the draft by considering the review comments.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
8
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This is a potentially useful approach to dealing with noisy labels and some improvement over SOTA on a public dataset is claimed. Some ablation studies are lacking in the submission, and these cannot be inserted at this point but reviewers felt that some of their comments had been addressed.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
6
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The paper presents a new approach to mine easy labels for WSI segmentation. Reviewers had mixed response to the approach, which were moderately addressed in the rebuttal. I have the same question as AC 1 in that wouldnt the model benefit from difficult labels (closer to the boundary) than the east labels?
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
9