Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia

Abstract

Self-supervised learning (SSL), enabling advanced performance with few annotations, has demonstrated a proven successful in medical image segmentation. Usually, SSL relies on measuring the similarity of features obtained at the deepest layer to attract the features of positive pairs or repulse the features of negative pairs, and then may suffer from the weak supervision at shallow layers. To address this issue, we reformulate SSL in a Deep Self-Distillation (DeSD) manner to improve the representation quality of both shallow and deep layers. Specifically, the DeSD model is composed of an online student network and a momentum teacher network, both being stacked by multiple sub-encoders. The features produced by each sub-encoder in the student network are trained to match the features produced by the teacher network. Such a deep self-distillation supervision is able to improve the representation quality of all sub-encoders, including both shallow ones and deep ones. We pre-train the DeSD model on a large-scale unlabeled dataset and evaluate it on seven downstream segmentation tasks. Our results indicate that the proposed DeSD model achieves superior pre-training performance over existing SSL methods, setting the new state of the art. The code is available at https://github.com/yeerwen/DeSD

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_52

SharedIt: https://rdcu.be/cVRwD

Link to the code repository

https://github.com/yeerwen/DeSD

Link to the dataset(s)

https://nihcc.app.box.com/v/DeepLesion

https://competitions.codalab.org/competitions/17094

https://kits19.grand-challenge.org/data/

http://medicaldecathlon.com

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a non-contrastive self-supervised learning method and validates it on different 3D datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of cross-scale comparison sounds interesting.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
There are several weaknesses:
1. Since self-supervised pre-training has been widely adopted in 3D medical images [1-3], it is somewhat strange that the authors directly omitted the comparison with these methods in the experiment section. Meanwhile, the authors provided no literature review on the development of self-supervised learning in medical images. From my perspective, it seems that the authors just tried to borrow existing self-supervised methods, which are originally designed for natural images.
2. The authors fail to clarify whether they have used a 3D backbone and how to build a 3D segmentation network in details, which matter a lot in medical image segmentation. In fact, it is confusing that the authors used a 2D ResNet-50 (the authors cited kaiming’s paper) and applied 1x1x1 convolution on top of it. So, do you employ a 2D or 3D backbone in practice?
3. Lots of implementation details of baselines are missing. Moreover, they provided no visualization results of most baselines in the supplementary material. In my opinion, I’m not sure about whether the experimental comparisons are fair.
[1] Taleb et al. 3D Self-Supervised Methods for Medical Imaging. NeurIPS 2020.

[2] Zhou et al. Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts. ICCV 2021.

[3] Zhou et al. Models Genesis. Medical Image Analysis 2020.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Some important details are missing, such as how to build a 3D segmentation network based on ResNet-50.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Please address my concerns in the weakness part.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors omitted previous studies on applying self-supervised learning to 3D medical images, and I believe such activity should not be encouraged. Also, important implementation details are missing, which makes the reproducibility questionable.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

6
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

This paper proposes a new design for self-supervised learning (SSL) and shows its effectiveness in medical image segmentation. The proposed SSL method is based on DINO (a well-established SSL approach), where the loss is only computed at the last layer. This paper shows that it is beneficial to have supervisions (self-supervised) at intermediate layers.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Simple idea. The core idea is to introduce deep supervision to the DINO approach. This idea is simple and easy to implement. This makes the message from this paper clear and facilitates the adaptation of this idea.
- Good experiments. The paper conducted a series of experiments to demonstrate its effectiveness. I particularly like the comparison with two variants of DeSD (FC-DeSD and para-DeSD) which shows that the authors have some critical thinking in introducing deep supervision to DINO.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Limited novelty. The major issue with this paper is its novelty. The deep supervision idea is incremental to DINO, and the other parts are almost the same as the original DINO paper. Meanwhile, deep supervision itself is not a novel design. It has long been introduced to train deep neural networks [1].
- Unclear practical use of the proposed method. Although this paper demonstrates the effectiveness of introducing deep supervision to SSL, it is unclear how this idea may be used to address medical image segmentation in practice. In particular, the current mainstream segmentation approaches are U-Net or transformer-based, while the segmentation model used in this paper is simply a ResNet followed by several decoder layers. The paper does not show 1) how the proposed SSL method may help the mainstream approaches and/or 2) how the segmentation model in this paper may compete with the mainstream approaches.
[1] Lee, C. Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015, February). Deeply-supervised nets. In Artificial intelligence and statistics (pp. 562-570). PMLR.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The datasets are publicly available. SSL part is easy to reproduce since it is based on DINO. The paper does not show details on how the four sub-encoders are constructed (but it may be inferred since the encoder is ResNet-50). The downstream segmentation model cannot be easily reproduced. There are some descriptions of how to construct the decoder, but it requires more details to fully reproduce.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Can you show the effectiveness of DeSD on other tasks such as classification? Can you include comparison to other segmentation approaches?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I find it is hard to position this paper and hence acknowledge its contribution. This paper can be regarded as both an SSL paper and a segmentation paper. If we consider it as an SSL paper, then the current experiment is not enough to show the true SSL effectiveness. The mainstream SSL papers are all evaluated with image classification and standard benchmark datasets such as ImageNet. This paper does not conduct experiments with these settings, and hence it is unclear if it is really better than the mainstream approaches.

If we consider it as a segmentation paper, then the current experiment does not show the true segmentation effectiveness. The segmentation model used in this paper is a primitive one. It is unclear if the benefits demonstrated on that model can be readily transferred to more advanced segmentation models.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The extra experimental results showed the effectiveness of the method. So, I raised my rating. I would suggest the authors include those results in the paper (or at least supplementary material) with more details about the new experiments.

Review #3

Please describe the contribution of the paper

This manuscript introduced a new self-supervised learning method, referred to as DeSD, by introducing deep supervision into single self-distillation. DeSD was pretrained on the DeepLesion dataset and evaluated on 7 segmentation tasks from 3 datasets. DeSD showed higher or comparable performance compared with 3 SSL methods, including SimSiam, BYOL, and DINO.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This manuscript is well-written and easy to follow.
- DeSD showed promising results compared with 3 SSL methods.
- The ablation study of different deep self-distillation variants is interesting and insightful.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The novelty of the proposed method is limited. The main framework is based on DINO [1]. And the idea of deep self-distillation is identical to Zhang et al. [2].
[1] Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P. and Joulin, A., 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9650-9660). [2] Zhang, L., Song, J., Gao, A., Chen, J., Bao, C. and Ma, K., 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3713-3722).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper has moderate reproducibility. The code is not released, but most training details (e.g., architecture, optimizor, learning rate, batch size, and augmentations) are provided in this manuscript.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Please refer to my comments in Section 5.
2. The baseline methods compared in this manuscript, which is somewhat limited. All three methods are SSL developed in natural images. I would also suggest the authors include more methods developed for medical images. Particularly, a. SOTA segmentation method developed for medical imaging analysis (e.g., nnU-Net [3]); b. SOTA SSL method developed for medical imaging analysis (e.g., Models Genesis [4,5], Rubik’s cube+ [6], or other SSL methods evaluated in [7])
3. I would suggest the authors release the code and pretrained models to increase reproducibility.
[3] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J. and Maier-Hein, K.H., 2021. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), pp.203-211. [4] Zhou, Z., Sodha, V., Rahman Siddiquee, M.M., Feng, R., Tajbakhsh, N., Gotway, M.B. and Liang, J., 2019, October. Models genesis: Generic autodidactic models for 3d medical image analysis. In International conference on medical image computing and computer-assisted intervention (pp. 384-393). Springer, Cham. [5] Zhou, Z., Sodha, V., Pang, J., Gotway, M.B. and Liang, J., 2021. Models genesis. Medical image analysis, 67, p.101840. [6] Zhu, J., Li, Y., Hu, Y., Ma, K., Zhou, S.K. and Zheng, Y., 2020. Rubik’s cube+: A self-supervised feature learning framework for 3d medical image analysis. Medical image analysis, 64, p.101746. [7] Taleb, A., Loetzsch, W., Danz, N., Severin, J., Gaertner, T., Bergner, B. and Lippert, C., 2020. 3d self-supervised methods for medical imaging. Advances in Neural Information Processing Systems, 33, pp.18158-18172.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The proposed method showed promising results.
- However, the novelty of the proposed method is kind of limited.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Reviewer 1 pointed out the missing of key-related works and their comparisons in your experiments, and implementation details; Reviewer 2 questioned novelty and practical use. Reviewer 3 gave a higher score, but his/her comments are actually critical in both novelty and related works (shared by Reviewers 1 and 2). I feel that it may be challenging to overcome the criticisms via rebuttal, but I still want to give you an opportunity to do so.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

10

Author Feedback

The code and pre-trained model will be available on GitHub.

Novelty (R1, R2) (1) We reveal the issue that shallow encoder layers may lack sufficient supervision, since the similarity measurement is exerted only on the features produced by the deepest layers. (2) To address this issue, we first divide the encoder in the student network into four sub-encoders and then encourage the features produced by each sub-encoder to match the features produced by the teacher network. Extensive experiments confirm our motivation that such a fine-grained optimization contributes to the strong representations across all layers and also prove the effectiveness of DeSD.

Different to existing methods (R3) We used deep self-distillation to improve DINO. Comparing to [Zhang et al., ICCV 2019], our DeSD is different in (1) Motivation: The former aims to simplify the self-distillation process, while DeSD aims to address the issue of weak supervision to shallow layers; (2) Applications: The former is applied to supervised learning, while DeSD is suitable for unsupervised learning; (3) Supervision signals: The supervision of the former comes from the deepest part of the same network, while our supervision comes from a momentum network to avoid collapse due to no annotations.

Details of segmentation network (R1, R2) We used the 3D ResNet-50, where all 2D convolution kernels are replaced with their 3D counterparts, as the encoder and adopted four decoder blocks to gradually restore the spatial resolution. In each decoder block, the input feature is first up-sampled by a 3D transpose convolution layer, then added to the feature maps obtained by passing the output of the corresponded encoder block through a 3D convolution block, and finally processed by a 3D residual convolution block. The ASPP module is inserted between the encoder and decoder, and a 1×1×1 convolution layer is placed behind the decoder as the segmentation head for label prediction.

Details of Baseline (R1) For all baselines, we used the open source codes and followed their settings, except for the encoder, datasets, batch size, and training epochs, which are set to the same for a fair comparison. For the baselines designed for natural images, we replaced their data augmentation with that used in DeSD.

Segmentation performance (R2, R3) We used a 3D ResUNet under the nnUNet framework as the backbone for segmentation tasks. Here are the Dice obtained by nnFormer (trained 1000 epochs) [arXiv:2109.03201], DoDNet [CVPR 2021], and our DeSD on 7 segmentation tasks. It reveals that the DeSD-pre-trained model is superior to other two models, which are based on Transformer and CNN, respectively. Moreover, our DeSD improves the average Dice by 0.9% over DoDNet, which uses more images and other organ/tumor annotations for each segmentation task. Method: Liver/Kidney/HepaV/Pancreas/Colon/Lung/Spleen/Average nnFormer: 81.0/87.8/68.3/69.6/30.3/69.7/93.3/71.4 DoDNet: 81.2/87.1/67.9/71.5/51.6/71.3/93.9/74.9 DeSD: 81.9/89.2/68.2/70.6/51.9/72.7/96.0/75.8

Comparing to more SSL methods (R1, R2, R3) We further compared our DeSD to MG [MedIA, 2021] and PCRL [ICCV 2021] on 7 downstream datasets. It shows that DeSD achieves the highest Dice on all datasets. Method: Liver/Kidney/HepaV/Pancreas/Colon/Lung/Spleen/Average MG: 77.8/86.8/63.4/69.6/36.6/60.0/95.3/69.9 PCRL: 80.4/87.0/66.9/70.5/40.6/63.8/95.8/72.1 DeSD: 81.9/89.2/68.2/70.6/51.9/72.7/96.0/75.8

Classification task (R2) We used the LIDC-IDRI dataset, which contains 1018 CT scans with 2568 lung nodules, for benign-malignant nodule classification. Each nodule was annotated by up to 4 experts. We treated the nodules with a median malignancy < 3 as benign, > 3 as malignant, and = 3 as unlabeled (excluded). Here are the results of 4-fold cross-validation. It shows that DeSD achieves substantially improved performance. Method: AUC/ACC/SP/SE/F1 TFS: 77.2/81.8/89.3/65.2/69.2 SSD: 78.6/82.4/88.6/68.6/71.6 DeSD: 79.7/82.8/87.9/71.5/72.6

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors made good efforts in rebuttal, but it needs a serious revision before publishing at MICCAI
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NR

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper deals with an interesting topic, and the authors addressed most of the reviewers’ points. I lean towards acceptance, and I strongly recommend that the authors include the main paper’s new results.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents a new self-supervised learning method by introducing deep supervision into self-distillation. The insights are well demonstrated and the empirical results look strong. In the rebuttal, the authors provide ample additional comparison to SSL approaches and ablation, which seems to well address reviewers’ concerns and support the effectiveness of this method. Based on the scores, reviews and the rebuttal. I recommend accept.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

8

back to top

DeSD: Self-Supervised Learning with Deep Self-Distillation for 3D Medical Image Segmentation