Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yushan Xie, Yuejia Yin, Qingli Li, Yan Wang

Abstract

In this paper, we focus on semi-supervised medical image segmentation. Consistency regularization methods such as initialization perturbation on two networks combined with entropy minimization are widely used to deal with the task. However, entropy minimization-based methods force networks to agree on all parts of the training data. For extremely ambiguous regions, which are common in medical images, such agreement may be meaningless and unreliable. To this end, we present a conceptually simple yet effective method, termed Deep Mutual Distillation (DMD), a high-entropy online mutual distillation process, which is more informative than a low-entropy sharpened process, leading to more accurate segmentation results on ambiguous regions, especially the outer branches. Furthermore, to handle the class imbalance and background noise problem, and learn a more reliable consistency between the two networks, we exploit the Dice loss to supervise the mutual distillation. Extensive comparisons with all state-of-the-art on LA and ACDC datasets show the superiority of our proposed DMD, reporting a significant improvement of up to 1.15% in terms of Dice score when only 10% of training data are labeled in LA. We compare DMD with other consistency-based methods with different entropy guidance to support our assumption. Extensive ablation studies on the chosen temperature and loss function further verify the effectiveness of our design. The code is publicly available at https://github.com/SilenceMonk/Dual-Mutual-Distillation

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_52

SharedIt: https://rdcu.be/dnwBM

Link to the code repository

https://github.com/SilenceMonk/Dual-Mutual-Distillation

Link to the dataset(s)

https://www.cardiacatlas.org/atriaseg2018-challenge/

https://www.creatis.insa-lyon.fr/Challenge/acdc/#phase/5966175c6a3c770dff4cc4fb


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a new method called Deep Mutual Distillation (DMD) for semi-supervised medical image segmentation. The method is designed to handle the class imbalance and background noise problem and learn a more reliable consistency between the two networks. DMD is a high-entropy online mutual distillation process that is more informative than a low-entropy sharpened process, leading to more accurate segmentation results on ambiguous regions, especially the outer branches. The paper reports a significant improvement of up to 1.15% in terms of Dice score when only 10% of training data are labeled in LA.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a novel method called Deep Mutual Distillation (DMD) for semi-supervised medical image segmentation. The method is based on consistency regularization methods such as initialization perturbation on two networks combined with entropy minimization. However, entropy minimization-based methods force networks to agree on all parts of the training data. For extremely ambiguous regions, which are common in medical images, such agreement may be meaningless and unreliable. To this end, DMD presents a conceptually simple yet effective method, a high-entropy online mutual distillation process, which is more informative than a low-entropy sharpened process, leading to more accurate segmentation results on ambiguous regions, especially the outer branches. Furthermore, to handle the class imbalance and background noise problem, and learn a more reliable consistency between the two networks, DMD exploits the Dice loss to supervise the mutual distillation. The authors compare DMD with other consistency-based methods with different entropy guidance to support their assumption. Extensive ablation studies on the chosen temperature and loss function further verify the effectiveness of their design. The authors report that DMD works favorably especially when annotated data is very small. Without bells and whistles, DMD achieves 89.70% in terms of Dice score on LA when only 10% training data are labeled, with a significant 1.15% improvement compared with state-of-the-arts. In summary, DMD is a novel method that addresses the limitations of entropy minimization-based methods in semi-supervised medical image segmentation by introducing a high-entropy online mutual distillation process that leads to more accurate segmentation results on ambiguous regions. The authors also show that DMD works favorably especially when annotated data is very small and achieves significant improvement compared with state-of-the-arts.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In Figure 1(b), the author initially refers to theta as sigmoid, but in Section 2.1, they mention that a two-way KL mimicry loss is applied directly to the probability distribution learned by the softmax layer in DML [26]. This creates confusion regarding whether it is sigmoid or softmax being referred to. Additionally, while citing Deep Mutual Learning (DML) in Figure 1(b), reference [25] is used; however, references [26] are cited for DML in Sections 1 and 2.2. It would be helpful to clarify these differences.

    2. In Section 2.2, Tables 2 and 3 use different spellings of “labelled” and “unlabelled.” To ensure consistency throughout the text, please use one spelling consistently.

    3. The meaning of x/y axis on Figure 3(b) is unclear and should be noted within the figure itself for clarity purposes. Furthermore, what does “0~10” on the left side signify?

    4. The term high-entropy requires clarification as its meaning is not clear from context alone.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author’s mention of releasing the code, which is beneficial for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In Figure 1(b), the author initially refers to theta as sigmoid, but in Section 2.1, they mention that a two-way KL mimicry loss is applied directly to the probability distribution learned by the softmax layer in DML [26]. This creates confusion regarding whether it is sigmoid or softmax being referred to. Additionally, while citing Deep Mutual Learning (DML) in Figure 1(b), reference [25] is used; however, references [26] are cited for DML in Sections 1 and 2.2. It would be helpful to clarify these differences.

    2. In Section 2.2, Tables 2 and 3 use different spellings of “labelled” and “unlabelled.” To ensure consistency throughout the text, please use one spelling consistently.

    3. The meaning of x/y axis on Figure 3(b) is unclear and should be noted within the figure itself for clarity purposes. Furthermore, what does “0~10” on the left side signify?

    4. The term high-entropy requires clarification as its meaning is not clear from context alone.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    DMD is a novel method that addresses the limitations of entropy minimization-based methods in semi-supervised medical image segmentation by introducing a high-entropy online mutual distillation process that leads to more accurate segmentation results on ambiguous regions. The authors also show that DMD works favorably especially when annotated data is very small and achieves significant improvement compared with state-of-the-arts.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The contribution of the paper is the use of a temperature term in the cost-function of semi-supervised multiple-network deep learning segmentation methods. This temperature term smoothed (or increases the entropy of) the probability distribution of the foreground and background classes under the semi-supervised strategy. The authors show that the use of this temperature term improves semi-supervised segmentation of left atrium from cardiac images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is conveying the idea that the use of high entropy probability distributions in multiple-network semi-supervised learning approaches help improving segmentation results. In other words, making the networks agree on diffuse segmentations provides better than making them agree in hard decisions. The authors reason that the temperature parameter prevents over-focusing on ambiguous regions. The authors also grade different approaches according to the entropy of their probability output, from lowest entropy (segmentation) to highest entropy (the use of the proposed temperature term). The experimental section and the comparison to other methods is convincing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One wonders what is the limit of the proposed method. In Figure 4, the authors study the influence of the temperature parameter in segmentation results, looking, among others, to the Dice coefficient as performance metric. With high temperature parameters, one would expect a decrease in performance, as in the very limit, the scaled probabilities will tend to the maximum entropy distributions. For the largest temperature value tested, 2, the dice coefficient continues increasing. What would happen with higher values? Another weakness of the paper is its presentation. It is clear that this paper is a continuation of previous work, and therefore assumes that the reader is familiar with the subject. Such can not be the case. For instance, LA is not defined until very late in the text. The ACDC dataset is not defined anywhere. Whan the authors talk in the abstract about “outer branches”, the reader has no clue what the context of such words is. A write-up for clarity is strongly recommended. Also for clarity, and self-containment, while the authors provide a reference for the dice loss function, it would be helpful to include the equation in the write-up. Images of the input data and output segmentations may be also helpful, beyond the masks and gradients of figures 2 and 3. Also in the abstract, the authors talk about “online” mutual distillation. It is very unclear the context of that online word, as there is no online training setup.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The datasets are publicly available and the code is promised to be so, therefore reproducibility seems granted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Besides the comments made in the weakness section, this reviewer wonders if the method is truly generalizable to other contexts / datasets. Would the proposed method work for the segmentation of small elongated structures, such as blood vessels? In such case, the ratio of contour to volume of the target structure is higher than for the atrial segmentation task analysed, and the use of the temperature parameter proposed may not be as useful as it seems on this dataset.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting and is, in my opinion, worth of acceptance at MICCAI. Presentation needs to be improved.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The authors improve upon an existing semi-supervised segmentation method named Deep Mutual Learning [Zhang et al. 2018] by adding a temperature term into the softmax (sigmoid) calculation of the last layer and by replacing the used unsupervised 2-way KL mimicry loss with a Dice based overlap measurement. The temperature term is a generalization over the DML approach and the Dice Loss should help to reduce the medical image segmentation inherent class imbalance problem. The additions help to outperform current semi-supervised segmentation methods on the atrial segmentation challenge dataset and the ACDC (Adverse Conditions Dataset?) dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The work is scientifically sound, well evaluated and demonstrates good performance compared to existing approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • As a technician and not a clinical expert it was verry hard to understand what datasets were used for the evaluation of the method, throughout the read of the paper. Although the authors state in the abstract what datasets were used they only use acronyms which are never explained.
    • The authors state that their work is a significant improvement over the methods they were comparing to, however no significant tests were presented. In some cases the improvements over the competing methods are quite small. The addition of significant tests would add value to the evaluation.
    • The evaluation of the additional ACDC dataset is not verry clear, isn’t ACDC a 2D dataset? How did the architecture change? Why is the supervised reference generated by a U-Net and not a V-Net like in the left atrial segmentation? Did the backbone network change as well?
    • In supervised medical image segmentation the combination of the Dice and a Cross Entropy based loss is state-of-the-art (compare Ma et al. Loss odyssey in medical image segmentation 2021), however the authors used a dice based loss only
    • Some language issues could be identified
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The changes to the original method, the used architecture, training setting and datasets are well described. The work should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The work is scientifically sound, the results good. Although, the contribution is incremental, the evaluation clearly demonstrates the value of the work. The minor weaknesses could be considered to improve the work, but they are mainly structural and help to understand the work better.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evaluation demonstrates clear improvments over current semi-supervised methods.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a semi-supervised segmentation framework based on DML. Compared with the previous works, the spotlight of this work is the introduction of measurements with KL-div. As pointed out by reviewers, current version lacks the explanations, and the confusions may also mislead the readers. Please carefully prepare the rebuttal to improve the maunscript.




Author Feedback

We thank all reviewers and the AC for constructive feedback. We adopt two MRI datasets: the 2018 Atria Segmentation Challenge (LA) and Automated Cardiac Diagnosis Challenge (ACDC). Experimental settings follow SOTA methods such as MCNet (Wu et al., MICCAI’22). Detailed illustrations of the datasets will be added (@R3&R4). All language issues will be corrected (@R4). The code of the paper will be released after acceptance (@R1&R3). Then in the following, we address the major concerns one by one.

@R1: 1) Symbol confusion We use sigma to represent the sigmoid function in Fig. 1, and theta represents network parameters in section 2.2. We further explain the choice of sigmoid over softmax layer in section 2.3, eq 5. In the case of the binary segmentation task, we replace the original softmax layers in the CPS and DML cases with sigmoid for unification.

2) Citation error on DML Thanks for pointing out the error. we will correct it in the write-up.

3) The unaligned spelling of “labelled” and “unlabelled” We will align the spelling.

4) x/y axis is unclear in Fig.3(b) For the y-axis: “0~10” signifies the “voxel” measures (95HD, ASD), and “72~92” signifies the “percentage” measures (Dice, Jaccard). For the x-axis, x denotes lambda value. We will add detailed illustrations on the x/y axis in the write-up.

5) Clarification on the “high-entropy”. In Eq1, when we set T>1, like soft targets defined in the original Knowledge Distillation paper (Hinton et el.), we get the “high-entropy” softer output.

@R3: 1) How high can the temperature be? We observe peak performance when T=2 and the performance is not sensitive when T is in the range of [1.4,3]. We will update Fig.4 to include higher temperature values.

2) Clarification on the “outer branches” Sorry for the confusion. The example of “outer branches” is highlighted in red boxes in Fig.3(a). We will also provide the equation for the dice loss function.

3) Visualization of the input data and output segmentations Thanks for pointing this out, and we will add them to the paper.

4) Include the Dice equation We will include the detailed Dice function in the supplementary material.

5) Clarification on the “online” in “online mutual distillation” “Online” means two networks are also updated simultaneously during training, which is the same setting as the online knowledge distillation.

6) Generalization to datasets with small elongated structures targets Due to the time limit, we are not able to perform experiments on such datasets. However, as analyzed in the paper, our method works on datasets that contain ambiguous regions, which should also apply to datasets with small elongated structures targets. Therefore, though the ratio of contour to the volume of the target structure is higher than datasets like LA and ACDC, it should not affect the performance.

@R4: 1) T-Test on the test set Under the setting of the LA dataset with 8 available labels, we performed T-Test on a SOTA method MCNet (Wu et al., MICCAI’22), and our Dice score on the test set. We observe p=0.017<0.05, which denotes a significant statistical difference.

2) Backbone choice on LA and ACDC datasets We use 3D VNet for the LA and 2D UNet for the ACDC dataset, following MCNet(Wu et al.). In the write-up, we will clear up the confusion in the implementation details.

3) Why not use compound loss? The object function we use is Eq.3. For the mutual distillation loss, we’ve already discussed the empirical evidence for choosing Dice, illustrated in Fig.4. For the supervised loss, using compound loss discussed in Ma et al. might improve the performance, but it is not our main focus.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposed a semi-supervised framework via the K-L Divergence. The reviewers have pointed out the lacks of explanations and details. According to the rebuttal, the authors have carefully answered the issues raised by the reviewers. Therfore, I recommend the acceptance of this manuscript.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I have read the comments and rebuttal. This paper is about semi-supervised multiple-network medical image segmentation. The new method, namely Deep Mutual Distillation, handles the problems of class imbalance and background noise by using the high-entropy online mutual distillation process. The rebuttal addresses most of the concerns raised by the reviewers. The authors are encouraged to consider the reviewers’ comments and include the rebuttal’s replies if the paper is accepted.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a novel method called Deep Mutual Distillation (DMD) for semi-supervised medical image segmentation. The method is based on consistency regularization methods such as initialization perturbation on two networks combined with entropy minimization. The authors compare DMD with other consistency-based methods with different entropy guidance to support their assumption. Extensive ablation studies on the chosen temperature and loss function further verify the effectiveness of their design. However, more descriptions and explanations are needed and necessary to make it easier for the reader. In addition, the authors state that their work is a significant improvement over the methods they were comparing to, more evidence is necessary. Combining the comments of the reviewer and myself, it is an interesting paper with merits very slightly weigh over weakness.



back to top