Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, Gustavo Carneiro

Abstract

The problem of missing modalities is both critical and non-trivial to be handled in multi-modal models. It is common for multi-modal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most “qualified” teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_21

SharedIt: https://rdcu.be/dnwC5

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a Learnable Crossmodal Knowledge Distillation (LCKD) model to daptively identify important modalities and distil knowledge from them to help other modalities. The authors provide extensive experiments to support their claimed contributions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed Learnable Crossmodal Knowledge Distillation (LCKD) model is novel. It is used to address missing modality problem in multi-modal learning. The LCKD method is designed to automatically identify the important modalities per task. It can also handle missing modality during both training and testing phases. The extensive experiments show that LCKD model achieves state-of-theart performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

For segmentation visual results (e.g., in Figure 2), it would be better to include more visual results in the main paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It could be easy to reproduce the results based on the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please refer to the weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I weigh its strenghts more.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes the Learnable Cross-modal Knowledge Distillation (LCKD) method for multimodal brain tumor segmentation. LCKD can handle missing modality during training and testing by distilling knowledge from selected important modalities for all training tasks to train other modalities, achieving state-of-the-art performance in missing modality segmentation with brain tumor imaging data.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The novel LCKD approach helps to improve for multi-modal segmentation by handling missing modalities during both training and testing.
- The approach can potentially distill knowledge from important modalities to train other modalities, and it achieves state-of-the-art performance in missing modality segmentation problems on the BraTS2018 dataset.
- The paper also presents thorough experimental results and analysis, including comparisons with strong baseline models, analysis of the teacher election procedure, and analysis of the effect of hyperparameters on performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The evaluation is limited to brain tumor segmentation, but it is not clear if this translates to other domains with different pathology and variation across imaging modalities.
- The teacher election procedure relies on the assumption that one modality is always more useful than others for a certain task, which may not hold true in all cases.
- The paper is lacking in qualitative results, only one figure showing label maps.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors indicate their resources will be shared.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- It would be helpful to dig into failure modes and identify specific factors that might be limiting performance.
- How does performance change as more data is excluded? Can you test this by gradually dropping scans?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is interesting and achieves state-of-the-art performance on multi-modal brain tumor segmentation, and it is potentially valuable to share with the community.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The authors propose Learnable Cross-modal Knowledge Distillation (LCKD) model to address missing modality problem in multi-modal learning. It consists of a Teacher Election Procedure to identify the important modality for each target, and a cross-modal knowledge distillation to transfer the knowledge from “qualified” modality to the others.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The motivation is clear. Substantial empirical results. Statistical analysis is provided.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The discussion of related works is limited, which could undermine the novelty. There is no reuslt and analysis for the Teacher Election Procedure. The formulas seem to be imprecise.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors intend to release the code and the implementations details are provided. The results should be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The authors should re-organize the related works adn provide a thorough discussion with the related works [1][2][3] to show the novelty and supriority of the proposed method. Please provide the reuslts and analysis for the Teacher Election Procedure as it is a key component of the proposed method. In Eq. (4) and (5), it seems that it only allows for one missing modality, i.e., modality n. Besides, in Eq. (4), could j also be in T? The caption of Fig. 1 should be self-explained.

[1] Ding, Yuhang, et al. “RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation.” Proceedings of the IEEE/CVF international conference on computer vision. 2021. [2] Zhang, Yao, et al. “Modality-aware mutual learning for multi-modal medical image segmentation.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer International Publishing, 2021. [3] Hu, Minhao, et al., “Knowledge distillation from multi-modal to mono-modal segmentation networks,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. Cham: Springer International Publishing, 2020, pp. 772–781.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well organized and motivated, and shows promising performance. However, ablation study of key component and some dicussion of related works are missing.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The authors adress most of my concerns and thus I would like to raise the score.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a Learnable Cross-modal Knowledge Distillation (LCKD) method for multimodal brain tumor segmentation. The reviewers noted the clear motivation, the substantial empirical results, the fact that statistical analysis is provided and the intention of the authors to release their code if the paper is accepted. While two of the reviewers noted the novelty of the proposed LCKD method one of the reviewers was not convinced and asked for a detailed related work discussion. The contributions with respect to published works (and the references provided by the reviewer, in particular) should be discussed in the rebuttal. The authors should also refer in the rebuttal to the teacher election procedure in the context of the questions raised, the ability of the method to generalize to other domains and pathologies and clarify Eqs. 4-5.

Author Feedback

Reviewer #1: Q1: Visual results. Due to space limitations, additional visualizations have been put in the supp. material.

Reviewer #3: Q1: Model translates to other domains? We focused on BraTS because it is recognised as an important benchmark for the missing modality problem, being used by many SOTA methods (e.g., HeMIS , HVED, Robust-MSeg, mmFormer, etc.). Also, space limitations didn’t allow us to try our method in other domains. Nevertheless, in this short rebuttal period, we quickly adapted our method and obtained results for the multi-modal analysis with missing modality for the Heart Segmentation (MMWHS) problem by training with random dropping modalities for 4000 epochs. Evaluated on the CT modality only, compared with a baseline model (a multi-modal model that replaces missing modality inputs with 0s), our LCKD model improves Dice from 90.7 to 92.2 on the left ventricle (LV) and from 86.1 to 88.0 on the myocardium (Myo). Using the MR modality only, the improvements are more obvious: from 78.6 to 84.7 on LV, and from 62.8 to 68.4 on Myo. We will include these results in the supp. material.

Q2: One teacher is more useful than others. To be more precise, we should say that we assume that there will be performance differences between modalities, which is an assumption met in all our experiments. Our approach is designed to select one or more modalities as teachers, so we can have more than one teacher (Tab. 2, LCKD-m). This updated assumption can only be violated if all modalities have exactly the same performance. However, even if this happens, our algorithm still works by randomly selecting one or more modalities as teachers, which shouldn’t have any impact on ‌performance given that they all have the same performance.

Q3: Qualitative results. Qualitative results are in the supp. material due to the space limitation in the paper.

Q4: Suggestions When the number of excluded modalities increases, ‌segmentation accuracy generally decreases (Tab.1). However, if we only have results from the top-performing modality, then the results are good even if all other modalities are excluded. Tab.1 also shows that model performance drops when the top-performing modality is absent. Our paper mitigates this issue. Compared with mmFormer and other SOTAs, this gap has been shrunk by our model, but there is still more room for improvement.

Reviewer #4: Q1: Re-organize related works We appreciate this comment. [1] proposed an RFM module to fuse the modal features based on the sensitivity of each modality to different tumor regions and a segmentation-based regularizer to address the imbalanced training problem. [2] proposed an MA module to ensure that modality-specific models are interconnected and calibrated with attention weights for adaptive information exchange. [3] proposed a model to distill multimodal knowledge into a single modal model. We will re-organise the related work.

Q2: Results of Teacher Election Procedure (TEP) We have results w.r.t. TEP in the supp. material due to space limitations. TEP is performed every 5000 iterations, where Flair and T1c are elected as teachers more often than other modalities, which resonates with our expectations.

Q3: Imprecise formulas We thank the reviewer for pointing it out. In Eq. (4) the cross-modal KD happens for all teacher-student pairs. Missing modality features are generated with Eq. (5). These small issues will be fixed.

Q4: Caption of Fig. 1 Here’s the new caption of Fig. 1. LCKD framework: the N modalities {x^{(n)}}{n=1}^{N} are processed by the encoder to produce the features {f^{(n)}}{n=1}^{N}, which are concatenated and used by the decoder to produce the segmentation. The teacher is elected using a validation process that selects the top-performing modalities as teachers. Cross-modal distillation is performed by approximating the students’ features to the teachers’ features. Features from missing modalities are generated by averaging the other modalities’ features.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors’ addressed most of the reviewers’ concerns. In the camera ready version, please include the additional clarification, information and discussion provided in the rebuttal. In particular, refer to your contribution with respect to published work.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal addresses my concerns.The author explains in detail the visual results and model translates. At the same time, he also explained the role of teacher module and its effect. Thee reviewer #4 also approved the rebuttal.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

After reading the reviews and authors rebuttal, I feel that the responses provided addressed most of the raised concerns. In particular, I appreciate that authors have provided additional results to show the superiority of their approach in other datasets, and clarifications on unclear details. Having said this, I found that the positioning of the proposed approach with respect to existing literature (suggested by R3) is rather weak. Authors merely describe what suggested methods do without really highlighting the differences with this work. Thus, this important concern remains unanswered. Taking into account all these points, I recommend the acceptance of this work, even though I strongly suggest the authors to correct this important point in their camera ready (i.e., include references brought by R3 and position the proposed approach wrt them).

back to top

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality