Authors

Xiaoyu Liu, Bo Hu, Wei Huang, Yueyi Zhang, Zhiwei Xiong

Abstract

Biomedical instance segmentation is vulnerable to complicated instance morphology, resulting in over-merge and over-segmentation. Recent advanced methods apply convolutional neural networks to predict pixel embeddings to overcome this problem. However, these methods suffer from heavy computational burdens and massive storage. In this paper, we present the first knowledge distillation method tailored for biomedical instance segmentation to transfer the knowledge from a cumbersome teacher network to a lightweight student one. Different from existing distillation methods on other tasks, we consider three kinds of essential knowledge of the instance segmentation task, i.e., instance-level features, instance relationships in the feature space and pixel-level instance boundaries. Specifically, we devise two distillation schemes: (i) instance graph distillation that transfers the knowledge of instance-level features and instance relationships by the instance graphs built from embeddings of the teacher-student pair, respectively, and (ii) pixel affinity distillation that converts pixel embeddings into pixel affinities and explicitly transfers the structured knowledge of instance boundaries encoded in affinities. Experimental results on a 3D electron microscopy dataset (CREMI) and a 2D plant phenotype dataset (CVPPP) demonstrate that the student models trained through our distillation method use fewer than 1% parameters and less than 10% inference time while achieving promising performance compared with corresponding teacher models.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_2

SharedIt: https://rdcu.be/cVRvm

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a knowledge distillation method for biomedical instance segmentation.Specifically, the authors proposed two schemes: graph distillation and pixel affinity distillation to transfer the knowledge.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1.Clear explanations. The authors present their ideas clearly, making the paper easy to follow 2.Thorough experiments. The authors compare the proposed method with several typical segmentation models.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Contribution is somewhat limited. Applying knowledge distillation on the affinity maps is not very novel. Please refer to ‘Adaptive Affinity Fields for Semantic Segmentation’ .
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Easy to reproduce
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

It would be more convincing to validate the proposed method on some additional datasets
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The writing structure of this article is clear and the approach is reasonable
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

Submission 368 proposes a novel knowledge distillation approach which is suitable to distill networks trained to do very difficult tasks, i.e. networks where one would expect a large number of parameters is necessary. Through the use of both instance- and pixel-level consistency between teacher and student networks, the authors achieve excellent distillation results, as evaluated on 3D EM segmentation data (CREMI challenge) and 2D natural image data (CVPPP challenge).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel idea to enforce both instance-graph consistency and pixel affinity consistency
- evaluation on difficult datasets clearly shows the method superiority
- the paper is very well written and illustrated
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- nothing significant
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

the data is public, the code will hopefully be made available
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- I’d be curious to see if the student network can be trained without the affinity loss L_aff. This would enable distillation of a large network pre-trained on private data without access to groundtruth.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel ideas for an important problem, detailed evaluation shows superiority compared to state-of-the-art
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper presents a knowledge distillation method targeting on medical image segmentation task. This method can transfer the knowledge learned in a large teacher network to a lightweight student network using two distillation schemes, i.e. instance graph distillation and pixel affinity distillation. The experiments show the potential of improvement on student models trained with the proposed distillation method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The ideas of instance graph and pixel affinity distillation is novel, since it explores a different aspect of knowledge distillation where high-level representations of feature maps are used to transfer knowledge from teacher model to student model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The complexity of the distillation method was not discussed, especially, instance graph distillation. Since constructing graph can be costly and time consuming, it would be better to include such discussion in the paper. Some details are missing from the paper, for example, L_{PAD} in Eq. 4 is not defined anywhere in the paper and other reproducibility mentioned in the section 7.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Some details are missing. The authors seems just check everything ‘yes’ in the checklist.
1. Missing one of the loss function definition. Missing software framework and version.
2. No new dataset proposed.
3. Missing training time for the proposed distillation method.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The authors should emphasize on how significant the performance gap between the student and teacher networks. The authors do include a table in supplementary materials, however, no further discussion is provided.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written and easy to follow. The proposed distillation method is novel and the experimental results demonstrate its advantage. Although there are some missing details, they are not the major issues. Therefore, my initial rating is accept.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work introduces the concept of knowledge distillation into biomedical instance segmentation, and derives a unique combination of losses to perform this transfer. It evaluates the concept on two datasets with a somewhat limited generality, however, the novel idea outweighs this limitation. Up to the knowledge of the reviewers and this meta reviewer, this contribution is original and of interest for medical image analysis. A main suggestion is to validate the approach on a medical dataset where training, validation and testing sets are less correlated than the setup used here involving the CREMI dataset, but this can be seen as outside the scope of the paper. Favorable reviews from the three reviewers lead to provisional acceptance of the paper. It is highly recommended to closely study the reviewer comments for a minor revision in case of final acceptance.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

We thank the reviewers for appreciating the contribution of our paper and for constructive comments on potential improvements. We will revise some details to fully address concerns raised by the reviewers in the camera-ready version.

back to top

Efficient Biomedical Instance Segmentation via Knowledge Distillation