Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Manuel Tran, Sophia J. Wagner, Melanie Boxberg, Tingying Peng

Abstract

In computational pathology, we often face a scarcity of annotations and a large amount of unlabeled data. One method for dealing with this is semi-supervised learning which is commonly split into a self-supervised pretext task and a subsequent model fine-tuning. Here, we compress this two-stage training into one by introducing S5CL, a unified framework for fully-supervised, self-supervised, and semi-supervised learning. With three contrastive losses defined for labeled, unlabeled, and pseudo-labeled images, S5CL can learn feature representations that reflect the hierarchy of distance relationships: similar images and augmentations are embedded the closest, followed by different looking images of the same class, while images from separate classes have the largest distance. Moreover, S5CL allows us to flexibly combine these losses to adapt to different scenarios. Evaluations of our framework on two public histopathological datasets show strong improvements in the case of sparse labels: for a H&E-stained colorectal cancer dataset, the accuracy increases by up to 9% compared to supervised cross-entropy loss; for a highly imbalanced dataset of single white blood cells from leukemia patient blood smears, the F1-score increases by up to 6%.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_10

SharedIt: https://rdcu.be/cVRq8

Link to the code repository

https://github.com/manuel-tran/s5cl

Link to the dataset(s)

https://zenodo.org/record/1214456#.YhZI1ZYo-Uk

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=61080958


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a new self-supervised framework, S5CL, by devising three contrastive losses defined for labeled, unlabeled, and pseudo-labeled images. Specifically, it devised supervised contrastive losses (SupConLoss) to unlabeled data and integrate it in training with lableled data. Experiments demonstrate the effectiveness of the proposed method and ablation study shows the effectiveness of each component.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Well organized and written. This work analyzes the problem of self-supervised learning in medical image field and introduces the method logically.

    2. Motivation is clear and the method is novel. It extend the SupConLoss to devise loss in supervised, semi-supervised, and unsupervised levels.

    3. Improvement seems okay. Compared with three most relevant baselines, the improvement is about 1~2% with very limited training samples. And ablation study shows the effectiveness of different loss terms.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Novelty Authors should more clearly differentiate the proposed method from most relevant methods, such as BYOL, SupCon.

    2. Experiments and results (1) In experiments, authors choose three methods (i.e., CE; CE+SupCon; semi-sup MPL). However, pure self-supervised baselines are not includes. I strongly suggest authors include SOTA self-supervised methods as well. (2) The results on NCT-CRC-HE-100K dataset are very high while the results of Munich AML Morphology dataset are very low. Is it because the second task more challenging? Why? How is the performance influenced by the difficulty of dataset? (3) On Munich AML Morphology dataset, the results of MPL are strange (i.e., the F1 score is not increasing with more labeled data). (4) Authors say that the learned features of self-supervised methods and supervised methods are different. But in Fig. 3, we can see that different clusters are clearly separated. Although “weakly augmented images and similar images are embedded the closest to their origins, then comes strong augmentations as well as different looking images from the same class”. How do it contribute to final classification performance? Why is the embedding meaningful? Authors also claim that “it also makes the feature embedding space more compact and explicable”. I don’t see why it is more explicable. For example, given an unknown image (suppose we do not know the label in advance), the trained model may categorize it into a certain cluster, but how do we know the relationship between this image and all other points in the feature space? And what can we learn from it?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors provide enough information on method details and experimental settings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    First, address my major concerns above. Second, please proofread the paper and correct typos and minor issues.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the main contribution is extending the SupConLoss to multiple levels (i.e., supervised, semi-supervised, and unsupervised). Although the novelty seems somehow incremental, extensive experiments demonstrate the effectiveness of the proposed method and it is beneficial to the community.

    The major concerns are on experiments.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    To relieve the Pixel-wise annotation workload of histopathological data, this paper proposes a novel framework, called S5CL, that unifies fully-supervised, self-supervised, and semi-supervised learning through hierarchical contrastive learning. With three contrastive losses defined for labeled, unlabeled, and pseudo-labeled images, S5CL can learn feature representations that reflect the hierarchy of distance relationships between images with respect to their class labels and consistency with different degrees of augmentations. Also, the resulting framework is easy to use and highly flexible.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper proposes a novel framework, called S5CL, that unifies fully-supervised, self-supervised, and semi-supervised learning through hierarchical contrastive learning. Experiments show the effectiveness of this method: for a H&E-stained colorectal cancer dataset, the accuracy increases by up to 9% compared to supervised cross-entropy loss; for a highly imbalanced dataset of single white blood cells from leukemia patient blood smears, the F1-score increases by up to 6%.

    (2) The resulting framework is easy to use and highly flexible: one can omit unlabeled images and train fully-supervised; also can set the weights of the supervised and semi-supervised loss to zero and train self-supervised; or train with both labeled and unlabeled images in a semi-supervised way.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) This work is related to fully-supervised, self-supervised, and semi-supervised learning. But in the paper, the relationships and differences among the three are not clearly stated, which may lead to confusion for readers. It is suggested to improve this problem in the Introduction.

    (2) In the experiment, all models use the same encoder ResNet18. But there are many other advanced CNNs, and how about other backbone networks as encoders?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good. All methods used for the proposed framework are depicted clearly and noted with appropriate references.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) After a fixed number t of epochs e, the classifier can be applied to Z1U , yielding pseudo-labels. It may not be a good way , it is suggested to use the predicted confidence level to determine whether pseudo tags are acceptable or not, referring to FixMatch.

    (2) Generally speaking, in the combined loss function, adjusting the weight of the hyperparameters is very important, has a significant impact on the model. It is suggested that the value of parameters should be more fully explained or ablation experiments should be provided.

    (3) Figure 4d shows the ablation study of pseudo-labels. It can be seen that the effects of pseudo-labels are not always positive. Especially, pseudo-labels in Lc and LL are even worse than no pseudo-labels. So, a more detailed analysis is recommended.

    (4) In the experiment for Munich AML Morphology, there are only comparison result of fully-supervised and self-supervised methods, but no self-supervised algorithm. It is suggested to provide the results of a self-supervised algorithm for comparison on this dataset.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I make the decision mainly according to the novelty of the proposed framework and the organization of this paper.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper present a deep learning framework that is trained using a combination of supervised, semi-supervised and self-supervised losses. The proposed methodology borrows ideas from different SOTA approaches and combines them in a comprehensive way. Specifically, it utilizes two paths that correspond to labeled and unlabeled examples, for each of them appropriate losses are utilized (i.e. cross-entropy and supervised contrastive loss in different configurations). The authors show results on two medical tasks (multi-class H&E tile classification, multi-class single cell blood cytology images classification)

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very well written and organized, the figures are clear and one can easily follow it.
    • The combination of supervised together with semi/self-supervised losses is well motivated in the field of computational pathology.
    • The experimental configuration is convincing and well presented.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed work is quite similar to some of the referenced studies (i.e., [25])
    • There are some details that are missing and can further enhance the quality of the paper (refer to details comments).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Code release promise after acceptance.
    • Public datasets utilized
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • How is the proposed different from the related work (e.g. [25])?
    • Why different there were different batch sized for the labeled vs the unlabeled paths? How the batch size was tuned?
    • Not clear how the error bars on Fig 2 were calculated. Details are missing from the text.
    • More information on the classification task could help the reader (e.g. 9-way and 11-way classification). There are not so much details in the text.
    • a),b),c) are missing from Fig. 2.
    • Some references are missing (HED augmentation, Macenko’s method)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I am recommending a weak accept mainly due to some missing details in the text and the fact that the proposed method heavily borrows from already published literature. However, I believe it is still has a merit for the community.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposed a novel perspective of unifying fully-supervised, self-supervised, and semi-supervised learning. The high quality of paper organization and writing is appreciated by all reviewers. Moreover, the paper is well written and easy to follow. However, we still encourage the author to address some detailed concerns from reviewers, such as the clearness of the methodology and experimental design and missing details.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

Dear Area Chair and Reviewers,

We greatly appreciate your helpful and constructive feedback. It is gratifying to hear that all reviewers unanimously agree that the paper is well written and well organized. In addition, we are pleased that the Area Chair has provisionally accepted our manuscript and that all reviewers have voted in its favor. They acknowledge that the “extensive experiments demonstrate the effectiveness of the proposed method” and that “it is beneficial to the community” [R1]. In particular, our framework can be used “to relieve the pixel-wise annotation workload of histopathological data” while being “easy to use and highly flexible“ [R2]. This is achieved by combining different contrastive losses which “is well motivated in the field of computational pathology” [R3] and offers a “novel perspective of unifying fully-supervised, self-supervised, and semi-supervised learning” [AC]. Below, we would like to clarify the reviewer’s concerns.

Novelty There have been questions by R1 and R3 about the novelty of our method compared to other frameworks. We are pleased for the opportunity to clarify the main differences: Methods like SimCLR [3], Barlow Twins [4], and BYOL [2] pre-train models in a self-supervised fashion on unlabeled data. This, however, requires large batch sizes, long training times, and high-capacity networks [5]. In addition, the extracted features can differ significantly from the ones learned through supervision [6]. S5CL avoids this by training with both labeled and unlabeled images simultaneously. If we set the relevant weights to zero, self-supervised training is even a special case of our method. S4L [1] and FixMatch [7] also learn from labeled and unlabeled images at the same time. But neither applies a contrastive loss on labeled images directly: S4L treats labeled images as unlabeled instances in their triplet loss and FixMatch applies a consistency loss. It can be seen in the results section that a qualitative embedding space leads to better classification performance. As suggested by R2, in the final version, we adapt the introduction to clearly state these differences and our contributions.

Experiments As R1 points out, the results for MPL [8] on the Munich AML Morphology [9] dataset are low compared to NCT-CRC-HE-100K [10]. This is mainly because AML is a highly imbalanced dataset. Increasing the size of the labeled dataset means adding even more samples from the majority class. As a consequence, we also increase the class imbalance. We assume that this leads to a higher bias towards the majority class which greatly affects the results. Similar behavior can be seen with FixMatch in [11]. S5CL overcomes this by avoiding pseudo-labels in the cross-entropy loss. Furthermore, we are thankful for the remarks of R3 about some missing details. We have now included additional references. And we also explain how we fine-tune the batch size and how we report the mean ± std over 5 runs.

Improvements The suggestions and ideas by R2 are very welcoming. Based on the comments, we plan to improve S5CL to include a confidence threshold for pseudo-labels and a loss weighting strategy. A detailed ablation study of the pseudo-labels will help us herein. In addition, we plan to make a more detailed benchmark that includes several self-supervised baselines. We have omitted them here as MPL outperforms previous semi-supervised as well as self-supervised methods on ImageNet and other datasets.

Again, we are thankful for the opportunity to clarify the main concerns of the reviewers. We look forward to seeing how the MICCAI community will apply our proposed method.

References [1] Zhai et al. ICCV’19 [2] Grill et al., NeurIPS’20 [3] Chen et al., ICML’20 [4] Zbontar et al., ICML’20 [5] Chen et al., NeurIPS’20 [6] Purushwalkam et al., NeurIPS’20 [7] Sohn et al., NeurIPS’20 [8] Pham et al., CVPR’21 [9] Matek et al., Nature’19 [10] Kather et al., PMED’19
[11] Wei et al., CVPR’21



back to top